



































































































































































































































































































































































































































































































































































































































EDITORIAL


Putting the Genie Back in the Bottle


What had all the earmarkings of a major-league food fight sure didn't take
long to dry up and blow away. Triggering the flap was an Internet posting of
source code that implemented the RC4 algorithm, an act that knocked on all
kinds of legal doors--trade secrets, Internet-host liabilities, reverse
engineering, shrink-wrap licensing, export control. You name it. 
The problem is that RC4, the block-cipher encryption algorithm at the heart of
RSA Data Security cryptography, is protected as a trade secret. But some on
the net say the online posting let the genie out of the bottle--RC4 was made
public and available for anyone to use, RSA's claims notwithstanding. RSA
counters that the company used trade-secret law simply to protect its
intellectual property, that there's never really been any "secret" about the
algorithm. Anyone willing to sign a nondisclosure agreement acknowledging
RC4's trade-secret status could have ready access to the reference and source
code. Among companies which have licensed RC4-based tools from RSA are
Microsoft, Novell, Apple, and Lotus, all of which distribute RC4-based binary
files in shrink-wrapped applications. 
You can imagine the furor when an unidentified person (or persons) used an
anonymous remailer to post worldwide--first to a cryptographer mailing list,
then to a newsgroup--source code that was supposedly RC4. Subsequent testing
by programmers and cryptographers confirmed that the code was indeed
compatible with "real" RSA RC4 code. RSA Data Security responded by calling in
everyone from the U.S. Customs Service to the Federal Bureau of Investigation.
In a strongly worded warning on the net, RSA said it considered the posting "a
violation of law_[and]_a gross abuse of the Internet." 
If the person(s) who posted the source code had in fact signed an RSA
nondisclosure agreement, the issue seems pretty clear-cut. They broke the law,
not to mention RSA's trust. If, as some claim and RSA disputes, the code was
reverse engineered from object files in off-the-shelf software, then the law
was probably broken--unless RSA and other vendors decide to test the strength
of highly questionable and likely unenforceable shrink-wrap licenses that try
to prohibit disassembly/decompilation. Of course, it just might be that some
cryptographer derived the algorithm after examining the key, plaintext, and
encrypted text. And there's even the chance, albeit unlikely, that a dumpster
diver ran across discarded copies of the code in RSA's corporate wastebasket. 
Questions concerning the legal status of copyrighted material that's made
freely available (illicitly or otherwise) on the Internet also have to be
tackled. Can Internet hosts be held accountable for an anonymous postings of
protected material? And don't forget, RC4 isn't just any software--it's
encryption software. Is posting such software online worldwide the same as
exporting it? If so, the State Department might have a thing or two to say.
The end result is that RC4 code is available on ftp sites worldwide, ready and
waiting for you to use it. But if you grab it off the net, can you use it
without RSA's permission? For the time being, the answer probably depends on
which lawyer you ask.
Speculation aside, the RC4 controversy explains why many developers are
protecting their intellectual property with patents instead of copyrights.
Gray areas like RC4 would be black and white if RC4 had been patented. But
then patenting would also mean that RC4 would have been public in the first
place. 
The immediate impact may be on RC5, the next-generation version of RC4, which
Ron Rivest describes in this issue. In part because of the RC4 controversy,
Ron and RSA Data Security are considering patenting RC5, a departure from
their original plans. At one point, RC5 code and reference was to be
distributed free-of-charge for noncommercial use. Small businesses could
license the material for $500, and large businesses, for $1000. All proceeds
were to go to RSA Labs--not RSA's bottom line--to fund further R&D. This could
still happen even if RSA patents RC5, but the licensing fees would be higher
to offset the patent costs.
Likewise, there could be some repercussion in terms of exporting RC4-based
systems. For the past couple of years, vendors have been allowed to export
software that uses RC4 short-key encryption. The State Department could change
this since RC4 is no longer secret. 
As for the multitude of legal questions, nothing concrete will immediately
come of the RC4 brouhaha, unless those responsible for posting the code are
identified. Existing RC4-based systems weren't compromised and may have
benefited, since we can now see that system backdoors don't exist. 
What we're left with are more questions, fewer answers, and the suspicion that
one of these days a big shoe is going to fall on software and
intellectual-property rights--one that won't make anyone completely happy. 
Cursor Sine Termino
Gee, could it really have been 20 years ago that the MITS Altair first
appeared on the cover of Popular Electronics, ushering in what we like to now
call the "personal-computer revolution"? And have 20 years passed since Dennis
Allison, Bob Albrecht, and folks at the People's Computer Company put out the
first issue of Dr. Dobb's Journal of Computer Calisthenics & Orthodontia:
Running Light Without Overbyte? It sure looks like it. If nothing else, the
past two decades have proved that time sure flies when you're having fun. (Of
course, some in the PC industry have had more fun than others--just ask Bill
Gates.)
In any event, it is with this issue that Dr. Dobb's Journal launches into its
20th year of publication, a remarkable accomplishment for any magazine and
particularly so for a computer publication. I'd like to say thanks to Dennis,
Bob, and the other pioneers who had the vision to see that something truly
important was on the horizon and the spirit to do something about it. But more
so, I'd like to thank all of you readers who have supported Dr. Dobb's Journal
over the years--we wouldn't be here without you. 
Coincidentally, Jim Warren, DDJ's first editor, was recently awarded the Hugh
M. Hefner First Amendment Award for his work in using computers for online
advocacy and network-assisted citizen action. In particular, Jim organized a
grass-roots campaign to provide a low-cost, computerized public-information
system for the citizens of California. 
Join us in celebrating both this 20th anniversary issue and congratulating
Jim. Here's to the next 20 years.
Jonathan Ericksoneditor-in-chief







































LETTERS


A True Test for Fuzzy Logic


Dear DDJ,
I get a big laugh every time the fuzzy-logic example of modeling the decisions
a driver makes when reaching an intersection is used. (See "Programming
Paradigms," November 1994.) I bet a clever programmer could code for that
situation and get it right as often as a wetware driver, but for different
reasons.
Now, a real test of fuzzy logic is the hungry-children-on-the-long-drive
scenario! Anyone with little kids knows exactly what I am talking about.
Perhaps more seriously, new logical operators need to be created. Perhaps but
and maybe would qualify. I have also thought that an event loop would better
accommodate fuzzy logic in which sensory data and stimuli adjust variables
that trigger actions only if they reach thresholds that would themselves be
controlled by other functions and stimuli. This would accommodate a
distraction effect that invariably screws up any algorithm, analytical or
fuzzy.
Barr Bauer
Foster City, California


On the Whole, I'd Rather Be in Pittsburgh


Dear DDJ,
I really enjoyed Jonathan Wilcox's article, "Object Databases" (DDJ, November
1994). He nicely lays out some of the complex design alternatives involved
with object- database-method location and invocation. One small correction: We
are in Pittsburgh, not Philadelphia. 
John Nestor
Persistent Data Systems
75 West Chapel Ridge Road
Pittsburgh, PA 15238
info@persist.com


PowerPC Address Munging


Dear DDJ,
Jim Gillig's article, "Endian-Neutral Software" (DDJ, October/November 1994)
is an excellent presentation of Endian issues. Still, I think the article
overlooked an important facet of the PowerPC handling of Little-endian data
and instructions--address munging. When it is in Little-endian mode, the
PowerPC processor performs an operation called "munging a transformation of
the three low-order bits of the effective address of a CPU bus transaction.
The three low-order bits are XORed with a value, depending on transfer size,
to produce a munged effective address. 1-byte transfers are XORed with 111b.
2-byte transfers are XORed with 110b. 4-byte transfers are XORed with 100b.
8-byte transfers are not munged.
Only in Little-endian mode do the following restrictions apply to 60x PowerPC
addressing:
Data transfers cannot cross 8-byte (double-word) boundaries.
The CPU can only handle external transfer sizes of 1, 2, 4, or 8 bytes.
Data transfers must be aligned on a multiple of the transfer size.
Internally, the PowerPC CPU operates in Big-endian mode. Therefore, data and
instructions in system memory must be converted from Little-endian mode before
they can be correctly operated on by the CPU.
The purpose of the munge in the CPU is to align the data transfer with an
8-byte byte swap. Table 1 shows valid addresses for the possible transfer
sizes and the munged result. (Remember that an 8-byte transfer is not munged;
the only legal 8-byte address is 000b.)
The memory controller in a PowerPC system can be designed to work with the CPU
to correctly address Little-endian data and instructions by implementing a
byte swapper and an unmunge of the effective address from the CPU. The byte
swapper swaps bytes within an 8-byte double-word, as in Table 2.
As Figure 1 shows, address 000b for a 2-byte transfer is munged to 110b, and
data is input to the byte swapper at 110 and 111. The data emerges from 000
and 001 on the output side of the swapper with the bytes swapped.
The memory controller unmunges the munged address to convert 110 to 000 (XOR
110 with 110 to get 000), and the data is picked up on the correct byte lanes
of the output side of the swapper with the byte order reversed. This same
process applies to all the possible combinations of aligned memory accesses in
Little-endian mode, and the process correctly converts Little-endian memory
data and instructions to Big-endian internal CPU format, and vice versa. 
The combination of the address munge in the CPU and the byte swap and address
unmunge in the memory controller produces a perfectly transparent conversion
of aligned Little-endian data in system memory to and from the CPU.
Section 2.4.3 of the PowerPC 601 RISC Microprocessor User's Manual describes
the Endian processes in the PowerPC. Sections 2.4.4, 2.4.5, and 2.4.6 continue
the discussion of Endian-related issues. Section 2.4.7 describes a method that
can be used to orient Little-endian I/O data so that the CPU can correctly
access Little-endian media without using a memory controller that incorporates
the byte swapper/unmunge logic.
The method that the PowerPC microprocessors use to deal with Little-endian
data and instructions is a very elegant solution to a complex problem.
Jeffery Ferris
Austin, Texas


Undocumented OS/2


Dear DDJ,
In "Undocumented Corner," (DDJ, August 1994), Troy Folger discussed the
undocumented OS/2 function DosQProcStatus. From my investigations, it appears
that DosQueryTmr does exist--under another name. In my explorations around the
165 Mbytes of totally undocumented CD-ROM that is my copy of C-Set++ (there
really was documentation--the slip of paper in the CD cover), the third readme
file (out of 62) mentioned something called EXTRA (or IXTRA if you want to go
by the executable name). I found it not because it was documented, but because
the installation program left a Device=DDE4XTRA.SYS command in my CONFIG.SYS
file. It has taken me a month to work out how to use it, but now I have and it
seems that during profiling, the timer ticks occur at approximately 800
nanosecond intervals, which is considerably more frequent than those offered
by the ANSI-standard clock() function, which ticks about once a millisecond.
Peter Verstappen
Kaleen, Australia


Poor Programmer's Security System



Dear DDJ,
Here's a quick tip for a poor man's security system. It works great for those
times when you want people to be able to use a machine (without the hassle of
a password scheme or some sort of watchdog program) yet not be able to disturb
a delicately tweaked CONFIG.SYS or AUTOEXEC.BAT file.
We all know the DOS ATTRIB command. We all know that files have the
attributes: archive, hidden, system, directory, volume label, read-only, and
so on. The attributes of a file are stored in an attribute byte which follows
the filename (you can see this if you use a utility for editing a disk
directly). If you manually (again using a disk-editing utility) set either
bits 6 or 7 high, the ATTRIB command will not allow you to change any of the
six documented attributes of that file. Use ATTRIB to make the file read-only
and then set bit 7 (or 6) high. You can't delete the file because it's
read-only. You can't make it read/writable because ATTRIB won't change any of
the other attributes while bit 7 is high. Incidentally, I call the 2 high bits
the "Big Dog" bits, part of the "Big Dog Security System"--K-9 Protection, you
know.
Philip M. Sbrogna
Kittery, Maine


Bloom Filters


Dear DDJ,
As a one-time developer and maintainer of OEM spell-checking software, I have
a short comment to make on using Bloom filters for spell checking as described
in "Algorithm Alley," by William Stallings (DDJ, August 1994): Don't. From
research we did and also by observing a certain product on the market that did
use them, we discovered that they allow too many nonwords to pass. "Too many"
means that the customers were very unhappy about it. In the final analysis,
that's all that matters.
Bill Wells
Mt. Laurel, New Jersey 


More Hause Calls


Dear DDJ,
In the July 1994 "Letters", William Hause writes about secure encryption by
XORing a plain text message with an equally long (or longer) string of random
bits. His letter was very interesting, and I agree that for a single message,
the encryption is unbreakable. However, I believe that this approach suffers
when the same encryption key is used more than once to send messages.
As an example, assume that Alice has sent two encrypted messages to Bob and
both messages use the same encryption key K. Unknown to either party, a copy
of the encrypted messages P' and Q' have been intercepted by a third party,
Charlie. If Charlie knows that the algorithm used is the XOR algorithm, but
does not know K, he can still attack the encryption with the following
approach:
Let P'i represent the ith bit of encrypted message P'.
Let Q'i represent the ith bit of encrypted message Q'.
Let Ki represent the ith bit of the common encryption key K.
Charlie knows that P'i=Pi xor Ki and Q'i=Qi xor Ki. Therefore, if Charlie does
a bitwise XOR of P'i with Q'i, he gets the result in Example 1. By performing
the XOR operation of the encrypted messages with each other, Charlie knows
that he has arrived at the same results as XORing the plain text messages with
each other. The encryption key K has been completely canceled out. Although
P'i and Q'i each have completely random distributions of 0 and 1 bits, when
taken together they do not.
Charlie can then proceed to search for likely substrings in the encrypted
messages by XORing them with each byte offset in (P xor Q). For example, by
XORing the string "the" with bytes 1 through 3, 2 through 4, 3 through 5, _,
n--2 through n, Charlie can look for plausible substrings emerging by similar
cancellation. Charlie is starting to get windows into the contents of the
messages; the encryption is falling apart already.
This weakness in the simple XOR encryption algorithm renders it impractical
for most real-world secret communication because it requires a distinct
encryption key for each secured message. No matter how many encryption keys
Alice and Bob have agreed upon in advance, eventually Alice will have to
generate new random strings and securely send them to Bob, if we assume that
their communications are ongoing and sizable. It is not so much to the large
size of the encryption keys that is objectionable; it is that this method
assumes Alice can send Bob a megabyte of encryption key over a secure data
channel for every megabyte of message she ever wants to send Bob over an
insecure channel. If such a secure channel were available, no encryption
procedure would be necessary at all. Instead, Alice could send Bob the
plaintext messages over the secure channel with no need for worry.
Don Cross
Oviedo, Florida


If I Had a Hammer


Dear DDJ,
Have you had anything to do with either architects or carpenters? ("C
Programming," DDJ, November 1994) We're redoing our house and have (among
other things) a bathroom that's 5 cm lower on the sides than in the middle. A
"wet cell" for the washing machine with no outlet, no hot water tap, no drain
in the floor, and water-sensitive sheet rock on the walls (the architect has
U.S. and German degrees). In addition, Xeno did this time plane. The job
should have been finished in July. Maybe I just believed too much of what I
read about German quality.
R.G. McKenzie
Schopfheim, Germany
Table 1: PowerPC munging. (a) 1-byte transfers are XORed with 111b; (b) 2-byte
transfers are XORed with 110b; (c) 4-byte transfers are XORed with 100b.
Original Address Munged Address 
 (a)
 000 111
 001 110
 010 101
 011 100
 100 011
 101 010
 110 001
 111 000

 (b)
 000 110
 010 100
 100 010
 110 000


 (c)
 000 100
 100 000
Table 2: Byte swapper.
Input Address Output Address 
 000 --> 111
 001 110
 010 101
 011 100
 100 011
 101 010
 110 001
 111 000
Figure 1 PowerPC address munging.
Example 1: More Hause calls.
P'i xor Q'i
= (Pi xor Ki) xor (Qi xor Ki) {substitution}
= Pi xor Ki xor Qi xor Ki {associativity of XOR}
= Pi xor Qi xor Ki xor Ki {commutativity of XOR}
= (Pi xor Qi) xor (Ki xor Ki) {associativity}
= (Pi xor Qi) xor 0
= Pi xor Qi









































Pentium Optimizations and Numeric Performance


Correctly produced Pentium code can run up to a factor of two faster than
486-optimized code




Stephen S. Fried


Stephen is president of Microway, P.O. Box 79, Research Park, Kingston, MA
02364; 508-746-7341.


The Pentium is the first member of the Intel x86 family that requires
RISC-style instruction scheduling to achieve its full potential. For the last
ten years, however, each succeeding Intel x86 generation has had less
efficient numerics. This means that increasingly mediocre compilers could get
by without addressing the issues that have been facing RISC vendors for years.
In this article, I'll discuss the issues that pertain to Pentium
floating-point performance and the tools needed to get full throughput from a
Pentium. 
Many 486 owners using hand-me-down 16-bit compilers copied from old systems
report that the Pentium does not provide the increase in speed they were
expecting, so we decided to investigate. We did this by using Microsoft
Fortran 5.0 to recompile the benchmarks we ran for my company's NDP Fortran
4.5 Pentium-aware compiler. MS Fortran 5.0 is a 1990-vintage compiler
representative of 16-bit technology which produces large- and huge-model x86
code. In performing these tests, we learned a number of interesting things
about Pentium performance. For example, 16-bit numeric programs speed up by
only 50 to 80 percent when moved from a 486 to a Pentium--even though the
Pentium is capable of running four to five times faster than a 486. We also
learned that programs translated with 16-bit compilers run two to eight times
slower on a Pentium than properly optimized 32-bit codes! The Pentium does not
hit full speed running 486-optimized 32-bit code either. Code optimized
specifically for the Pentium runs 10 to 100 percent faster than 486-optimized
code. In general, the smallest Pentium optimization speedups occur with scalar
programs, while the largest are found with vector codes like LINPACK. Pentium
systems are also sensitive to issues like alignment. Mistakes in data
placement can reduce speed by up to a factor of four. 


Pentium Features 


When it comes to numerics, the main interest in the Pentium ought to be in the
changes made to the 486 that enable the Pentium to do floating-point
calculations up to five times faster. The Intel i860 runs up to 20 times
faster than a 486, so there's plenty of numerics power on the cutting-room
floor in Santa Clara, where bits and pieces get glued together into new CPUs.
Yes, the Pentium does have an extra integer pipe, but you won't see much
benefit from it if you are doing engineering or science. The Pentium is
neither fish nor fowl, but rather a combination of the best of RISC and CISC.
In fact, the line between RISC and CISC is very thin for both the Pentium and
i860.
The Pentium numerics units contain an i860-like adder and multiplier, less the
interstage pipeline registers. This omission speeds up scalar operations
(three cycles on the i860 but only two on the Pentium), but eliminates single-
or half-cycle vector operations. Part of the reason for deleting vector
pipeline operations is that to take full advantage of them, you need to be
able to issue both an integer and floating-point instruction on every cycle.
This was done on the i860 using dual-instruction mode, which does not map at
all to the x86 instruction set. We suspect that Intel will reintroduce similar
methodology in the future that is either automatic or introduces the VLIW
(very long instruction word) technology that it purchased from Multi-Flow. The
Pentium's two-cycle scalar latency means that it gets reasonable speed when
driven by ordinary scalar compilers (like NDP Fortran) without special
vectorization tools or libraries. The scalar units' two-cycle speed also means
that for code to hit full speed, instructions that must wait for data or prior
instructions to complete, have to be addressed. Scheduling now becomes
important.
Another i860 feature which found its way into the Pentium is the i860XP's
64-bit memory interface unit. Signal for signal, the Pentium's data bus looks
just like an XP. One of the differences between RISC and CISC shows up here.
When the hardware detects a problem in a RISC processor, a TRAP is generated
and a TRAP handler that "solves" the problem gets invoked. In the case of an
i860, if you use an 8-byte load instruction and the item you are loading does
not fall on an 8-byte boundary, the processor invokes your handler. The
handler branches to a section that does a pair of 8-byte loads to solve the
problem. RISC exception handlers take hundreds of cycles to execute! On the
Pentium, if you attempt to load a REAL*8 and it does not lie on an 8-byte
boundary, the 64-bit data-bus interface unit encounters the same problem. But
because it is a CISC device, it doesn't have to call a routine--it uses a
built-in circuit (and maybe some microcode) to divide the access into a pair
of 8-byte loads, taking four bytes from each. This costs time. We observed
four cycles, although Intel claims three cycles. However, four cycles is still
much less expensive than a TRAP. The Pentium's CISC heritage is a clear winner
here, and this is an example of the difference between RISC and CISC. CISC
came out of houses like Intel that were rich in die space and manpower, and
where problems get solved with circuits. RISC devices need to run
well-scheduled code on aligned data to make up for their lack of electronic
helpers. The Pentium hits its peak speed running RISC scheduled code. However,
it also has a handicap--it must be able to execute code originally written for
16- and 32-bit processors in a 64-bit environment. The bottom line is that the
Pentium needs a full bag of RISC tricks to help it overcome the 12-year
software legacy it must deal with. 
The two best-known features of RISC devices are single-cycle instructions and
a lot of registers. The Pentium is capable of one- and two-cycle performance
but has only six general-purpose registers, fewer than the 32 typical of RISC
devices. To make up for this deficiency, Intel engineered the on-chip caches
so that they could be accessed in a single cycle. This effectively created a
register file that can hold 2000 32-bit items, which is very large by RISC
standards. The same sort of ploy was used in the i860, which was designed to
mimic a Cray. Where the Cray had a Vector Register, the i860 employed a data
cache to emulate one. The i860 is a RISC device that also has many CISC-like
instructions. Again at Intel, the boundary between RISC and CISC theology
becomes blurred in favor of market-driven solutions.


Benchmarking the Pentium


To benchmark the Pentium, we performed a sequence of tests using several 60-
and 66-MHz systems. The results were virtually identical from system to
system. For all programs tested, the data sets stayed in the on-chip or
secondary caches, and we did not see the benefits of fast external memory. The
two exceptions to this case were Whetmat and LINPACK, which demonstrated that
dot products and DAXPY are sensitive to the speed of external memory.
First some comments. These results are preliminary, but reasonably accurate.
Some users have made claims about Pentiums which we can not duplicate. It is
very easy, when running a mark like LINPACK, to get results which are 20
percent off the mark because of the speed of the Pentium. If you don't adjust
the repeat counts in these marks, the times will be wrong. When you are
looking for effects as small as 10 percent, timing errors become important. In
addition, we discovered that resetting the machine and doing a benchmark can
improve things by as much as 10 percent--some Pentium systems seem to slow up
as time progresses. In the early days of the 386, we had similar problems,
which we attributed to code moving from fast 32-bit memory into slower 16-bit
memory located in the I/O channel. We don't understand how this effect comes
about on a Pentium system. 
The first set of marks run is a suite of programs we distribute with NDP
Fortran that date back to the 386 in 1986. This suite tests what we considered
then to be the three important characteristics of floating-point devices. Two
of the three marks were derived from the Whetstone, the third is a simple
matrix multiply. The first of our derived marks is named the "Whetscale." It
contains all of section 1 of the Whetstone and measures the speed of x87
floating-point stack operations. This mark provides an upper limit on the
scalar speed of a register machine, since interregister operations are always
the fastest. It is also scalar bound: It does not benefit from scheduling
because it cannot be rescheduled (a characteristic of scalar-limited code).
The Whetscale does not benefit from loop unrolling because it already contains
21 numeric instructions in its inner loop. Its units are megaflops and, true
to form, the benefits of optimization for this mark are minimal, yet it runs
at the full speed of the CPU. On the 486, this mark does not hit full speed if
the variables do not end up in the x87 stack. This is also the case on the
Pentium, but for double precision only. If all programs had these
characteristics, there would be no need for optimizing compilers. I think it
is possible to write a more-trivial mark for the Pentium that has fewer load
store dependencies, can be scheduled, and achieves higher interregister speed.

The second derived mark, the "Whettrans," measures the speed of
transcendentals such as sin, cos, atan, exp, log, and sqrt. Like the
Whetstone, it has no meaningful unit of measurement. The Whetmat does a simple
matrix multiply on 100x100 arrays, and its results are reported in megaflops.
We studied a fourth mark, LINPACK, which we think is very important because it
measures how caches and memory systems affect floating-point speed. Anyone who
has ever worked with a supercomputer knows that memory bandwidth can be more
important than floating-point speed. Memory bandwidth and data-cache
management becomes crucial when large arrays are being processed.
The Whetstone and the single- and double-precision variations of the three
marks are listed in Table 1. Each column represents the results of turning on
a different optimization switch. These switch settings range from no
optimizations to full Pentium optimizations, which we have labeled "ALL."
These marks were taken with an experimental compiler that used our old switch
nomenclature. On the Pentium compiler, you get the effect of ALL when you
select --OLM. Going across the columns, you discover that the Pentium runs
very well in its own situations, where everything fits in the cache and there
is no ancillary integer footwork. This is not surprising--most of the marks
here were designed to measure the speed of hardware, and things were kept
simple to make that possible.
The Whetscale was developed partly to determine the efficiency of vector
routines. If you compare the Whetmat with the Whetscale running on the 486
under Microsoft Fortran 5.0, you will discover that the matrix multiply runs
at 1.51 megaflops while the Whetscale hits 5.85 megaflops. Our interpretation
is that the vector mark was running 25 percent efficient. This same ratio can
be computed from the NDP results, and varies from about 10 to 80 percent as
optimizations get turned on. These two marks actually tell us something about
the Pentium and the code that drives it: The cost of executing vector codes
can vary greatly with compiler, but probably is never greater than 80 percent.
The typical efficiency we see with 486s executing good vector code is 50
percent. The only device that we know of that hits 100 percent is the i860,
and this occurs not because the i860 is a RISC device, but because Intel
developed CISC-like load and store instructions which do two things at once.
In the inner loop of a dot product, an i860 can issue one of these
dual-purpose integer instructions along with a floating-point instruction on
every cycle. As a result, there is no indexing overhead, and the processor is
able to run as fast as if it were executing nonvector instructions.. 
The Whetmat demonstrates why correct code is crucial. The double-precision
Whetmat running on a 66-MHz Intel Express motherboard sees a 20-to-1 speed-up
going from no optimizations to full optimizations. This is the largest
variation in speed we have ever seen on any processor. The argument can be
made that the code in the "NONE" column must be awful, but we also note that
half the marks in this column beat the Microsoft Fortran 5.0 marks. If we had
more time, we could determine why this code runs so slowly. It is probably
related to the fact that small mistakes on the Pentium can make a big
difference in speed. It's more interesting that the Whetmat increases in speed
by a factor of 2 when we transition from the --OLM --n2 --n3 to the "ALL"
column. This is the effect of Pentium scheduling and loop unrolling combined,
and is the largest code-related speed-up that we've seen for the Pentium. The
loop unrolling is required to get this speed, and by itself, makes less of an
improvement in the Pentium than it does in a 486. Note that the only marks in
Table 1 which benefit from scheduling in a big way are the two Whetmats. All
the other marks keep variables in registers or the cache or are not involved
in flowing data through the processor. The less indexing and movement of data
that goes on, the faster the Pentium runs. Indexing costs cycles because extra
integer instructions have to be executed; fetches cost cycles because the item
may lie out of the cache or in a memory page not currently being accessed.
The Whetstone is now of little use, since it is dominated by transcendentals,
as can be seen by comparing it with the Whettrans to observe where the big
breakthroughs come. The big "breakthrough" here is the use of inline x87
transcendental operations like fsin. Two other interesting effects are hidden
in these marks. The first involves x87 stack storage (--n3 switch). It seems
to pay off in double precision for the Whetmat and Whetscale. We expected no
pay off here, which is the case for the single-precision versions of these
marks. My guess is that this effect is associated with data-cache thrashing.
If your cache is thrashing, there still is a benefit to storing things in
registers. The second hidden effect is the use of inline transcendentals (n2),
which paid off big time for both single and double precision. 
The trend we see developing for the Pentium is that it runs at 20 megaflops at
60 MHz and 23--25 megaflops at 66 MHz when running good "Pentium" code.
However, it doesn't always take good code to run it fast, so depending on your
application, you may luck out with bad code. However, if your program massages
data stored in off-chip cache or memory, it pays to run Pentium-scheduled
code. 


Pentium Scheduling 


All RISC-like processors--including the Pentium--are extremely sensitive to
the code used to drive them. If you access data from external cache or memory
when it can be avoided or execute instructions which depend on values still
being computed by other units, you will add cycles (called "stalls") to your
overall cycle count and time. When things that were supposed to take two
cycles suddenly start taking four, your program slows up by a factor of 2!
CPUs such as the 486 take 10--12 cycles to do floating-point operations and
benefit much less from scheduling because they are relatively insensitive to
one or two cycle mistakes.
In the NDP compilers, floating-point operations are scheduled when the number
of stores and loads in a block make it worthwhile to determine a better
sequence or schedule of instructions. The compiler helps this along by loop
unrolling, which turns frequently used small loops into long ones which do
four times as much work. In heavily used inner loops often two or three
numeric operations are performed in succession on pieces of data brought in
from memory. At the beginning of such a loop, the data is usually loaded into
registers, and at the end of the loop it gets stored. Normally, there are
several operations between the loads and stores, each of which must execute in
order for the program to translate properly. This often will force one numeric
unit to wait for another, which costs cycles. It is possible to start the
adder unit while the multiplier unit is doing an operation, just as long as
adder does not have to wait for the multiplier to finish its current operation
before it starts. This implies that the output of the multiplier is not being
used as the input of the adder. When this condition exists, we say there is a
"load store" conflict or stall. If it is possible to execute the iterations of
a loop out of order (the definition of a vectorizable loop) and the loop gets
unrolled, then it's often possible to swap numeric instructions such that the
numeric units stay busy. This technique is called "scheduling."
The scheduler spends most of its time figuring out which instructions in a
sequence can be legally swapped. (In C, this is often impossible, because of
the aliasing of information by pointers.) Next, it does a bubble-like sort,
which examines several instructions at a time to see if the code can be
improved by swapping instructions locally. This process goes on until no more
improvements can be made. There is no unique way to do scheduling, and
everyone has their favorite approach. The basic goal is to reduce time due to
units waiting for each other. For example, if an add immediately follows a
multiply and one of the inputs is the output of the multiply, a 1-cycle stall
occurs. Similarly, if a multiply immediately follows a multiply, there is a
1-cycle loss because multiplies take two cycles. If the second multiply
depends on the output of the first, add another lost cycle. If you look at the
scheduled code in Example 1, you'll see that multiplies (fmuls) usually get
padded with loads (flds). Loads are 1-cycle instructions that get paired with
either adds or multiplies, so they make good filler. You also will see that by
the time the first add (fadd) gets used, the multiply that precedes it is
working on the second iteration, whose result will not be needed till the
second add is started--that is, a "load store" dependency between numeric
operations does not exist. A block of code that has been unrolled by four will
normally end up having a rhythm to it, in which most of the loads will float
up to the top and most of the stores will sink to the bottom. In the middle
are combinations of adds and multiplies interspersed with loads and stores.
Note that the NDP code in Example 1(d) is quite different from that
recommended by Intel, yet provides equivalent performance. Normally there are
dozens of equivalent ways to schedule a sequence.
The routine that we chose as our example is DAXPY (a primitive defined in
LINPACK). Its inner loop reads:
DO i = 1,n
Y(i) = Y(i) + DA*X(i)
END DO
This loop is one of the most studied pieces of code on the planet because it
is the key to Gauss Elimination. What's so special about it is that the
solution of linear systems used to account for 30 percent of the CPU time at a
typical scientific establishment. This algorithm is the key to these solutions
when the arrays used are dense. What makes DAXPY tricky is the amount of
I/O--for every line of DAXPY executed, the system will have to read and write
between 16 and 24 bytes of data, depending on whether X can be cached or not.
On old mainframes with relatively slow numerics, the bottleneck was floating
point. As machines got faster, the bottleneck switched from numerics to memory
bandwidth. This is exactly what happens when you switch from a 486 to a
Pentium. Execute this line 10 million times and you will have to do between
160 and 240 megabytes of I/O. The 10 LINPACK megaflops we measured correspond
to a bandwidth of 120 Mbytes/sec. Intel thought that DAXPY was so important
that they chose it as the primary floating-point example in their Pentium
compiler notes, and I've written in length on how to optimize this code on the
i860. 
Example 1(a) is the pseudocode executed by a loop that has not been unrolled.
The corresponding assembly listing is shown in Example 1(b). The x87
floating-point instructions always reference the x87 stack top, st0. When I
say load DA, I mean place the constant DA on the stack, or into st0. This is
followed by multiply Y(i), which means take Y(i) out of memory, multiply it by
st0 (that is, DA), and leave the result in st0. This is followed by the add,
whose result ends up in st0 and a store which takes the result out of st0 and
places it back in the location of Y(i). There are seven other registers that
aren't used in Example 1(b), but are used in Examples 1(c) and 1(d). The store
instruction has a postfix P attached to it which results in the stack getting
popped or cleaned up. Examples 1(c), (d), and (e) were taken from the Intel
Pentium Compiler Writer's Guide. Example 1(d) shows what transpires in a
single iteration of the algorithm. Note the characteristic ld, mul, add, and
stp operations just discussed. 
The pattern used in Example 1(b) gets repeated three times in Example
1(c)--that is, Intel is unrolling by 3. At the end of each loop, the indexes
are updated, and the process repeats. Example 1(d) shows what happens when
Pentium scheduling is applied. The three iterations unrolled by Intel suddenly
get mixed together. At the start is a sequence of load/multiplies. The
scheduler begins by multiplying DA by the first and second X(i)s. It then does
the first add followed by the last multiply and completes with the last two
adds. Between the numeric operations, you will find a load in the beginning,
three stores toward the end, and a number of fxchs. The fxchs are old x87
instructions that have been revamped on the Pentium so that instead of costing
three cycles, they cost zero cycles. In Example 1(c), note that each of the
three iterations clean up the stack before the next one starts. This doesn't
happen in Example 1(d) because the operations have now been rescheduled to
improve speed. Getting things back to the TOS (top of stack) requires fxchs,
which swap the specified register (st1..st7) with the TOS (st0). So, fxchs
make it possible to write scheduled code, but they also produce code that
resembles a "shell game." It is nearly impossible to simply examine the code
and determine its meaning unless you "play computer." This, of course, is what
code generators are designed to do. 
Example 1(e) is the code produced by NDP Fortran. It is very similar to that
produced by Intel, except that it is unrolled by 4 and uses a slightly simpler
addressing mode. This is an example of machine generated and scheduled code.
Notice how our scheduling algorithm broke the problem into pairs of multiplies
followed by pairs of adds. This arrangement does not affect performance, as
the one-cycle "cracks" between multiplies are filled with loads. It appears
that our code has one extra cycle of overhead per loop than the Intel
sequence, but the extra increment is the exception to the earlier comment
about superscalar integer operations not helping floating point. When you
divide this benefit over four loops, it produces a 1/4-cycle improvement out
of 6, roughly a 4 percent increase in speed. It's nice to see the integer
pipes playing a role, but the role they play is a factor of 25 less than that
played by properly scheduling the floating-point units. We used cycle counts
produced by Intel for Example 1(b), (c), and (d). The best Pentium code beats
the 486 by a factor of 5 to 6. 
In Fortran programs, the proper scheduling of lower-level code is a feasible
task because it is possible to associate memory references with Fortran
arrays, and thus to figure out if it is legal to swap instructions. In C,
scheduling is often an impossible task because you can't always tell if arrays
are aliased or not. This is just one reason why Fortran is the preferred
language for numeric optimizations. Another is that algorithms expressed in
Fortran map better to most systems' hardware. In his article, "Optimizing
Matrix Math on the Pentium" (DDJ, May 1994), Harlan W. Stockman discusses the
Pentium scheduling of DAXPY and DDOT (routines in LINPACK). He examines three
different ways to write a matrix multiply in C to get it to perform optimally,
then takes the best and hand schedules it. His best results on a 60-MHz
Pentium are roughly 16 megaflops. The best value we got for the Whetmat doing
a simple compilation of a Fortran matrix multiply using our built-in Pentium
scheduler was 20 megaflops. Harlan also observed a 12-to-1 speed variation
across his dot-product routines. We observed a 20-to-1 speed variation over
our program compiling with different switches. Harlan then went on to improve
the code produced by a production-grade C compiler for DAXPY to demonstrate
how to rewrite code by hand to get good Pentium results for LINPACK. I join
Harlan in lamenting the lack of Pentium compilers that don't necessitate
coding by hand. He also points out that the code that results from translating
Fortran to C with a tool like f2c is quite bad. It executes three times more
slowly than the original because of translation-associated problems. If you
want fast numeric code, stick with Fortran. Most mainframes were built around
it, and it comes closest to the equations of science. If you want to do it
fancy, buy a copy of Fortran-90. If you compare Harlan's DAXPY code with the
DAXPY suggested by Intel and produced by NDP Fortran, you'll see that the
Stockman code did not do as many fxchs as the machine-generated code, partly
because it takes a machine to schedule instructions. Harlan's code is more
readable than the compiler-produced code, mainly because heuristics do not
concern themselves with readability. 



LINPACK Benchmarks


The real test of any primitive routine is not how well it runs in a concocted
test program, but how well it runs doing the real thing. The Whetmat benchmark
tests a dot product doing the real thing--a matrix multiply. Because it was
written in Fortran, new columns are fetched from memory in order and the
reused column lies in the cache. As a result, a well-scheduled Whetmat will
run as close to the theoretical scalar speed of the Pentium up to the point
where the row being cached no longer fits into the cache. Even then, it is
possible to hit full speed using a vectorization trick known as "strip
mining." Conditions for DAXPY are quite a bit more harsh when it gets used in
the real world to do Gauss Elimination. Whether or not it hits its peak
performance of 6 cycles per loop is a function of the size of the arrays, the
bandwidth of the external memory system, and the operating system's ability to
turn off the caching of Y(i). 
We measured DAXPY under the ideal condition that vectors X and Y were in the
Pentium's cache. This resulted in a 20-megaflop value quite close to the one
we obtained for the Whetmat. We plotted the results of running DAXPY and
LINPACK in Figure 1. The DAXPY test speed is similar to testing an engine on a
dynamometer: You learn facts about shaft horsepower and torque as a function
of engine RPM. However, you won't be able to predict its performance on a
track until you put it into a chassis and try it out. LINPACK could be thought
of as a Formula I track. To run it, a system not only requires good numerics
power, but a very fast memory system and the ability to keep X in the cache
and Y out of it. All of these situations were taken into account in the XP and
Microway's ArrayPRO/XP card. This is quite apparent in Figure 1, where a
50-MHz XP beats a 60-MHz Pentium by factors that range from 3:1 to 6:1. 
The upper two arcs of data in Figure 1 are for LINPACK and DAXPY running on
the ArrayPRO/XP card, which features a single-cycle memory system that is
among the world's fastest running on a desktop. It has a 288-bit wide data bus
which drives four leaves of 64-bit memory at 400 Mbytes/sec. DAXPY is the
topmost curve, and we can see that it parallels LINPACK, running approximately
10 percent above it. DAXPY peaks on the plot at 31.5 megaflops while LINPACK
asymptotes up to 28. These two plots are typical of real vector systems
running vector codes. Vector algorithms running on vector machines always have
a break-even length somewhere in the vicinity of vector lengths 10--20. Above
the break-even length it pays to use vector code instead of scalar code. The
reason there is a break-even point is that properly coded vector primitives
have overhead associated with setting up their inner loops. As the vector
length gets much larger than the break-even length, the overhead becomes less
important. This explains the case of the XP. If we were to plot the XP results
out beyond lengths of 2000, we would see them fall off. That would correspond
to the XP's cache no longer being able to hold the X vector. The reason
LINPACK stays below the DAXPY mark is twofold. First, DAXPY is the upper
limit, since it is the rate-limiting activity. Second, the balance of LINPACK
has to do things like back substitution, which are not as vectorizable. 
The Pentium has the same data bus as the XP, and at 60 MHz it is capable of
480 Mbytes/sec if run with a single-cycle (no wait state) memory system.
Unfortunately, this adds expense and does not benefit Windows, so don't expect
to see a supercomputer-style memory running on a Pentium anytime soon. The
LINPACK and DAXPY results for the Pentium essentially demonstrate what happens
when things don't fit into the cache. Our DAXPY test program quickly rises to
20 megaflops for vector lengths of 300. It then rolls over as the vectors
climb out of the cache, an effect which begins at lengths of 330. LINPACK
shows an even faster rollover and only peaks at 10 megaflops, a factor of 2
below the DAXPY peak. This is a result of no longer working with vectors that
fit into the cache, but dealing instead with columns of a 200x200 array whose
locations in the array are always shifting. Since a 200x200 array of doubles
takes 320,000 bytes, the 8K on-chip cache is now useless. As a consequence,
both the Xs and Ys are now being mostly fetched from external cache or memory.
In addition, on the Pentium, you can not count on X being cached (even though
it changes only rarely) because there is no way to prevent the Ys from
flushing the Xs out of the cache. To make matters worse, where the XP only had
to worry about reading Ys, the Pentium now has to worry about switching back
and forth between Xs and Ys, and that means that memory pages will switch
every time it accesses an X after accessing a Y. Since this will happen very
often and most systems today use paged DRAMs, which take a hit when pages
change, the memory system will take a further hit in performance. Adding up
this little house of access horrors, you can now see why our DAXPY engine
loses half of its steam at the peak, and for most lengths of interest, only
produces a LINPACK on the order of 6. The peak in LINPACK occurs when X is
coming partly out of the on-chip cache. When this no longer occurs, we see a
slower linear drop off that follows gradual wash out of Y from the external
cache. LAPACK is now starting to replace LINPACK because it is based on dot
products instead of DAXPY, making it less sensitive to the memory bottleneck
observed here. 
At the bottom of the plot is the DAXPY of Microsoft Fortran 5.0. We see 3.3
megaflops for MS Fortran 5.0 running on the Pentium and 2.04 running on the
486. Note that the MS Fortran results do not have an obvious peak, although we
did see a 0.5 percent increase in the same area where the NDP Pentium marks
exhibited a peak. The 16-bit codes run so poorly that the problems with memory
and cache usage never appear. 
The LINPACK in Figure 1 exhibits three different types of system behavior. At
one extreme, we have the 486 demonstrating constant speed independent of
vector length. This behavior is characteristic of old-style microprocessors
and mainframes. On the top, we see behavior typical of supercomputers and
devices that combine vector numerics, vector registers and very large memory
bandwidth. In the middle, we see the Pentium, which has a fast scalar unit but
has no way of keeping its cache coherent and does not sport a single-cycle
memory system like the XP.


Conclusion


After writing this article, we revisited our benchmarks because we hadn't
really pinned down how to produce the best numeric Pentium code and didn't
understand all the side effects which appeared to determine Pentium speed. We
started off investigating some "mysterious" data, discovering that our scalar
anomalies could be reproduced--they weren't simply mistakes. We also
discovered that it's difficult to track down speed issues with complex
benchmarks--a benchmark-uncertainty principle exists which prohibits you from
reducing a complex benchmark to its components and getting agreement between
the components and complete mark. This is especially true for double-precision
marks compiled so that they do not allocate certain variables to x87
registers. We found that the speed of double-precision codes which did not use
register coloring could vary by a factor of 2! We suspected that cache
dependencies create doubles in stacks and heaps. These dependencies could also
make it difficult to get accurate times--benchmarks fluctuate significantly
between runs even though our repeat counts are high enough to guarantee
accurate timings. We discovered one inner loop that ran faster with one
Fortran run time than it did with another, even though no run time was being
called. 
Consequently, I set up an experiment to find out what was going on. First, I
looked at what happened to my matrix-multiply benchmark when we improved the
code. Using larger unrolled loops resulted in faster speeds. On an 60-MHz
Intel Express motherboard, I was able to hit 23 megaflops, unrolling by 4, and
24.8 megaflops, unrolling by 16. While a 7 percent improvement was okay, it
didn't match the 15 percent improvement our scheduler's statistics predicted,
and it was nowhere near the 30+ megaflops the benchmark ought to get. For the
problem in question, an i860XR hits 70 megaflops and an XP hits 94 megaflops,
so the problem wasn't with the bus-interface unit. The i860XR hits this mark
with half the data coming from the cache, and the XP will hit it with all the
data coming from memory. For our original Pentium mark, half the data should
be coming from memory and half from cache--but we didn't expect that at 25
megaflops the system would be memory bound. To explore just how bound it was,
I switched to a mark which did dot products with arrays that slightly
overflowed the Pentium's cache using arrays that were 4000 elements long in
single precision. The results are shown in Table 2. 
As Table 2 indicates, we were suddenly hitting 35 megaflops, which is under
two cycles per operation--marks that scaled nicely with our scheduler
predictions. In fact, the Pentium was beating the scheduler, which is
something we thought might happen and probably results from discrepancies in
our reading of the Intel Compiler Writer's Guide, not to mention the level of
effort we put into integer scheduling (which might have saved us one cycle at
best). After examining the code and data, we decided our code was nearly
identical to that produced by the highly touted Intel code generator. Also,
when running out of the on-chip cache, it is possible to write a simple
formula for the time that it takes to execute a dot-product inner loop:
#cycles=4(+2)+n*3 (where (+2) is a code-uncertainty factor that appears in the
last loop and might not exist at all). In general, it takes three cycles to
execute a single dot product after the unrolling factor becomes large. In
fact, for the Pentium it is easy to show that the speed of a dot product
running out of the cache can asymptote to 40 megaflops. "Asymptote" implies
that we have unrolled the loop to the point where the four to six cycles of
overhead per loop are no longer important; the problem runs long enough to
average down the time required to switch from one dot product to the next. For
most RISC devices, the sum of these two overheads is around 10 percent, and
you only exceed 90 percent of a chip's theoretical capabilities at large
vector lengths. We've seen these effects with XPs running large dot products,
where the benchmark tells you 94 megaflops but a logic analyzer indicates the
CPU can cruise at 100 megaflops until the inner loop ends and the next dot
product starts.
Next, we explored what happens when vector lengths increase. Figure 2 shows
single- and double-precision dot products run with arrays dimensioned to 4000
elements, while Figure 3 shows results of the same program with the arrays
dimensioned out to 128,000 elements. (The source code and data that generated
these plots is available electronically; see "Availability," page 3.) For each
of these points, we performed 150,000,000 floating-point operations, each of
which takes anywhere from 5 to 30 seconds. Note that both plots are different:
There is a disadvantage to using large arrays to hold small amounts of data.
That disadvantage is related to the fact that while the cache is large enough
to hold both arrays, there aren't enough tag bits to keep the two arrays from
squabbling over the lower address bits. You see in Figure 2 that the Pentium
starts off around 30 megaflops, increases to 35 megaflops as the vector
overhead washes out, and falls away to 26.7 and 24.0 megaflops when the marks
leave the cache. These fall-offs occur for v1=1000 in single precision and
v1=500 in double precision. We actually see a small drop off at the saturation
point itself and a more precipitous fall for the points where v1=600 and
v1=1200. The rates after the fall-off correspond to the speed the Pentium can
muster running out of the second-level cache. Figure 2 includes our estimate
of the effective bandwidth of the L2 cache for each of these marks. The
double-precision one is larger than the single, which indicates that what is
holding the chip back is not absolute cache bandwidth, but the ability of the
cache to switch back and forth between reads and writes to different addresses
in the L2 cache.
Figure 3 is more complicated. Instead of getting two asymptotic performance
levels, we get three. The Pentium just barely makes it to 35 megaflops in
single precision and then falls off to its L2 asymptote, which is lower than
the L2 asymptote in Figure 2. This is followed by another fall when L2 gets
filled up, corresponding to data coming from external memory. We discovered
that the transition between Figures 2 and 3 occurs on the Intel Express
motherboard when we went from 64,000 element arrays to 128,000 elements. The
external memory limits corresponds to bandwidths of 75 (single precision) and
85 Mbytes/sec. Again, we get more bandwidth in double precision, but not that
much more, which indicates we are probably measuring the speed of a single
process. My guess is these effective bandwidths are well below the peak
bandwidth the motherboard can achieve, as evidenced by our LINPACK mark, which
required 156 Mbyte/sec from the system to hit 13.5. The real problem with
memory accesses is likely the architecture. True RISC devices have enough true
registers to load 64 bytes from one array before switching to the other. They
also make it possible to protect the contents of the cache. In the Pentium
there is no way to avoid driving the cache-memory system crazy with small
random requests that alternate back and forth between arrays and the pages in
memory which hold them. Ironically, a chip which can hit 35 megaflops running
out of its internal cache slows up a factor of 5 when running in double
precision from external memory.
In conclusion, a 60-MHz Pentium is capable of delivering 9 to 35 megaflops
doing dot products and 19 to 25 megaflops doing matrix multiplies on matrices
of order 100. In a 100x100 matrix multiply, 100 elements come out of the cache
and 100 elements come out of memory. This is a hybrid situation midway between
the lower and upper asymptotes in Figure 3. With LINPACK, it hits 12.25
megaflops at v1=100 and 13.5 at v1=150, before falling rapidly off. The
Pentium depends heavily on its code and on the motherboard it runs on. The
best code allocates variables to x87 registers, unrolls to factors of 8 or
more, and uses Pentium scheduling. The Pentium is much more sensitive to code
quality than the 486; the same binaries that show a speedup of 12:1 on the
Pentium show a speedup of only 3.3:1 on a 486.
All the Pentium motherboards we examined had memory systems whose performance
fell below what the chip is capable of. At 60 MHz, a properly supported
Pentium can read/write memory at 480 Mbytes/sec (almost as fast as the latest
HP PA-RISC technology). However, Intel doesn't provide the technology to
interface the chip properly to memory systems which run very fast. For
example, the Intel Express motherboard we used to run our DDOT experiments
(which just happens to employ one of the best cache controllers in the
industry) appears to have an effective bandwidth that is a factor of 6 slower
than the Pentium's theoretical bandwidth! What we often discovered about
Pentium motherboards is that the higher the frequency, the poorer the memory
performance. The same LINPACK that runs at 12 megaflops on the 60-MHz Intel
Express falls off to 7 megaflops on a less-expensive 90-MHz motherboard that
used a well-known 486 chipset adapted to the Pentium.
The Pentium is not alone in this quandary. Numerous RISC devices capable of
incredible performance out of the cache fall off the mark quickly as you leave
their design points. High performance out of the cache is important if you are
running problems which have a small number of variables that fit into the
cache. However, these are the sorts of problems that unrolling can often not
be employed with, because they have scalar dependencies which prevent loads
and stores from being swapped. We lost in DDOT a factor of 2 in performance
running out of the cache when we went from unrolled by 4 to unrolled by 1.
This also means that the illusive peak performance of 35 megaflops that we
finally discovered by caching both vectors are not an issue with most scalar
problems. The typical performance we see for these problems ranges from 17 to
23 megaflops. Problems that do fit into the cache and are not too memory bound
include DSP applications involving FFTs, 3-D graphics, rendering, and
neural-net backpropagation. However, the i860 still beats the Pentium in these
areas, so don't expect any quick wins in the high-end data-processing markets,
where the fastest part usually wins. In the end, the Pentium will go down as a
very good try, whose legacy may end up becoming "don't expect a
general-purpose part designed around Windows to blow away the numerics world
of FFTs, DAXPY, and DDOT."


Acknowledgments


I'd like to thank Mahesh Srinivasan for adapting our i860 scheduler to the
Pentium and Mark Barrenechea and David Livshin for numerous intellectual
discussions.
Example 1: (a) Pseudocode; (b) Intel, no unrolling (cycles: 486, 38; Pentium,
12); (c) Intel, unrolled by 3 (cycles: 486, 102; Pentium, 32); (d) Intel,
scheduled by 3 (cycles: 486, 128; Pentium, 19); (e) NDP, scheduled by 4.
(a)
loop:
load DA
multiply X(i)
add Y(i)
store Y(i)
increment i
check end of loop
loop if necessary

(b)
loop:
fld [esp+8]
fmul [ebx+eax*4]
fadd [ecx+eax*4]
fstp [ecx+eax*4]
inc eax
cmp eax,ebp
jle loop

(c)
loop:
fld [esp+8]
fmul [ebx+eax*4]
fadd [ecx+eax*4]
fstp [ecx+eax*4]
fld [esp+8]
fmul [ebx+eax*4+4]
fadd [ecx+eax*4+4]

fstp [ecx+eax*4+4]
fld [esp+8]
fmul [ebx+eax*4+8]
fadd [ecx+eax*4+8]
fstp [ecx+eax*4+8]
add eax,3
cmp eax,ebp
jle loop

(d)
loop:
fld [esp+8]
fmul [ebx+eax*4]
fld [esp+8]
fmul [ebx+eax*4+4]
fxch st1
fadd [ecx+eax*4]
fld [esp+8]
fmul [ebx+eax*4+8]
fxch st2
fadd [ecx+eax*4+4]
fxch st1
fstp [ecx+eax*4]
fxch st1
fadd [ecx+eax*4+8]
fxch st1
fstp [ecx+eax*4+8]
add eax,3
cmp eax,ebp
jle

(e)
loop:
fld [esi]
fmul [ebx]
fld [esi+8]
fmul [ebx]
fxch st1
fadd [eax]
fld [esi+16]
fxch st2
fadd [eax+8]
fxch st2
fmul [ebx]
fld [esi+24]
fmul [ebx]
fxch st1
fadd [eax+16]
fxch st2
fstp [eax]
fadd [eax+24]
fxch st2
fstp [eax+8]
fstp [eax+16]
fstp [eax+24]
add esi,32
add eax,32
dec ecx
jne loop

jle
Table 1
Table 2: Measured versus predicted single-precision, dot-product performance.
 Scheduler- 
Optimization Measured Measured predicted 
set speed in cycles cycles/stalls 
NDP Fortran 4.5. mflops per loop per loop 
-OLM-onrc -sc 17.06
-OLM -on (-u4=4) 29.7/29.64 16 19/2
-OLM -no -ur=8 33.7/33.2 28 31/2
-OLM -on -ur=16 35.47/34.95 54 56/3
Figure 1 DAXPY and LINPACK versus vector length.
Figure 2 Single- and double-precision dot-product speed versus vector length.
Figure 3 Dot-product speed for vectors dimensioned to 128,000 elements.

















































Undocumented Features of PC Fortran Libraries


Multilanguage vendors sometimes give you more for your money




Kenneth G. Hamilton


Ken has a PhD in physics from the University of California, San Diego and has
used numerous Fortran compilers in the pursuit of solutions to problems in
solid-state theory, numerical hydrodynamics, signal processing, and
random-number generation. He can be contacted on CompuServe at 72727,177.


Language wars are bogus. When you think about it, computers don't really run
any high-level language, they run machine code. A compiler is actually just an
interface to the programmer, and once things get beyond the first pass or two,
all languages start looking the same.
Developers at DEC recognized this a number of years ago. As a result, the VAX
came out with one common library (STARLET.OLB) supporting all of the
languages. Some vendors of PC compilers are now starting down this path, so
that language-support libraries often contain extra "goodies"--things for use
with toolsets other than the one that you bought.


Cylindrical Bessel Functions


For example, Microsoft and Watcom both sell compilers for multiple languages,
and at the moment, the Fortran customers are the lucky ones. At these
companies, developers decided to support that language by taking the C library
as a fundamental core, and then adding some modules to cover those
requirements unique to Fortran.
Consequently, those companies' Fortran customers already have several useful
features spinning around on their hard disks. Probably the most interesting,
from the perspective of a numerical analyst, are the cylindrical Bessel
functions. Explicit functions are available for J0(x), Y0(x), J1(x), Y1(x), as
well as for Jn(x) and Yn(x), where n is a nonnegative integer order. All of
these are provided only in DOUBLE PRECISION (REAL*8) form--there are no
single-precision versions.
Listing One is a program that exercises Microsoft's Bessel functions. Fortran
compilers normally convert the names of external references to upper case.
Therefore, I have used a series of INTERFACE statements to make the connection
to the library elements, which have lowercase entry-point names that start
with underscore characters. The declarations also allow the program to use
more descriptive names (containing the character string "BES") in the program
itself.
When executed, this program displays several values from the Bessel function
of the first kind (J), for orders n=0, 1, 2, and 3. The values for the first
two sets of numbers come from specific J0 and J1 functions, while the others
are produced by the general-purpose function for Jn.
A PAUSE statement is used to keep the data from scrolling off the screen, and
after a key press, the Bessel function of the second kind (Y) is displayed.
Again, Y0, Y1, Y2, and Y3 are output, the first two from individual functions,
and the second two from the generalized Yn routine.
Watcom has the same features hidden inside its library, and the same program
can be used, as long as the INTERFACE blocks are removed and replaced by the
c$pragma declarations in Listing Two . In this case, the library members have
entry-point names with trailing underscores, in addition to being in lower
case, so the pragmas also perform a name conversion.
Watcom users have these features available in both the 16- and 32-bit
compilers; Microsoft customers can find the capabilities in the Powerstation
compilers for MS-DOS and Windows NT.
Those of little faith can compare the output from this program to the tables
given in Chapter 9 of Handbook of Mathematical Functions (see "References").
It is a bit puzzling to me why these modules are standard in C libraries, but
not in Fortran libraries: Fortran users--scientists and engineers--are the
people most likely to have an application that involves Bessel functions. (For
those readers who don't have an application handy right at this moment, Bessel
functions are used in problems that involve oscillations with cylindrical
symmetry: The normal modes of vibration of a drumhead, for example, are
described in terms of the Jn functions.)


Floor, Ceiling, and Hypot


The definition of Fortran-90 includes a requirement for FLOOR and CEILING
functions that, given a REAL value, return the next-lower and next-higher
integers. Both Microsoft and Watcom have already provided FLOOR and CEILING
functions to their customers--they just didn't say so in the manuals.
Listing Three shows a FLOOR and CEILING demonstrator. The only hitch is that,
unlike the Fortran-90 definition, these two routines return DOUBLE PRECISION
values, and require arguments of the same type. The demo program walks a
variable from 4.2 to 6.1 in steps of 0.1, and shows what FLOOR and CEILING
return for each of these values of the argument. Watcom users should replace
the Microsoft INTERFACE declarations with the two c$pragma declarations in
Listing Four .
C libraries also usually include a function called hypot that computes the
hypotenuse of a triangle, presumably for those who have forgotten Pythagoras'
rule. This is readily accessible from both Fortrans, using the demonstration
program in Listing Five ; the Watcom replacement for the INTERFACE block is
given in Listing Six .


Character Operations


Because Fortran has traditionally been a batch-oriented language, its default
I/O handling is built to process whole records. This is the most efficient
method of transferring large amounts of data, as the inevitable word-by-word
loop is pushed down as far as possible in the software, as close to the
hardware as it can be. When running interactively, however, this usually means
that it is necessary to press an Enter key before input data is turned over to
a running program.
This is a bit of a nuisance, and often programmers want an application to
accept a single keypress without waiting for the Enter key. When running
directly under DOS, it is always possible to execute an interrupt to perform
such an operation, and higher-level environments (DOS extenders, Windows,
Windows NT, OS/2) often claim to translate properly. (Then again, they may
not--it's not the normal way of asking a big operating system for something.)
Programs written in C can read and write single characters and can even push a
character back into the input buffer to be read again. Since we have a copy of
the C library in these Fortran packages, we can do this, too. We'll simply
connect to the same system service routines as the C programs_.
In Listing Seven , you can see a program for the Microsoft Powerstation
compilers, in which a single character is read from the keyboard using the
getche function, which echoes back to the screen. The ungetch function is then
used to shove the character back into the keyboard input buffer, so that it
can be read a second time by getch. Since getch does not echo the input to the
screen, we then use putch to write it to the screen.
Listing Eight contains the c$pragma declarations that should replace the
INTERFACE blocks, for use with one of the Watcom Fortran compilers.
C still handles characters as integers and arrays of integers, a bad habit
picked up from old Fortran-66. Ever since the release of the ANSI-77
specification, however, Fortran programmers should have been using type
CHARACTER variables. While it is still possible to use INTEGERs for character
handling, I have inserted the CHAR() and ICHAR() explicit conversion functions
in Listing Seven, whenever appropriate.
If you are really just dealing with single characters, then it may be somewhat
simpler to stay with INTEGERs in the processing. On the other hand, if you
intend to use any of the CHARACTER-oriented routines in the Fortran library
(such as INDEX()) to concatenate strings or manipulate substrings, then it
makes sense to get these values converted as soon as possible.
The C library also has a set of is functions, which can distinguish digits
from letters, upper case from lower, and so on. These all have relatively
straightforward names: isalpha reports whether or not a given character is a
letter, for example.
In the second half of Listing Eight, I have used some of these routines and
written out messages with "T" or "F," denoting which category the input
character falls into. The is functions return zero if the condition is false
and a nonzero value if the condition is true. I have coded in '.EQ.0'
comparisons to produce officially correct LOGICAL values.
I have used the INTERFACE (or c$pragma) declarations to give the Fortran side
slightly more readable names. Thus islower becomes IS_LOWER (rather than
implying anything negative about program speed) and isupper is converted to
IS_UPPER (instead of trying to make you hungry).
If a character is a lowercase letter, then there is a function that will
convert it to upper; if it is already in upper case, then there is another
that will shift it down. Depending upon the results of the upper- and
lowercase tests, the sample program displays the opposite-case letter, using
the tolower or toupper functions. These are also used in Listing Seven.
Microsoft has provided wrappers for a few of these keyboard routines, so that
their library function GETCHARQQ really connects to getche, and PEEKCHARQQ
allows you to check if there is anything there beforehand. They don't provide
wrappers for the nonechoing function getch, the ungetch reversal routine, or
any of the other character-manipulation functions.
It is theoretically possible to write through the C function printf but, since
FORMAT is generally more powerful, I have not provided any examples of this.
The adventuresome reader may find numerous other interesting items buried in
the software.



Sorting


In its Powerstation compilers, Microsoft provides a wrapper for a quicksort
routine. Watcom does not, however, even though its libraries (both 16- and
32-bit) include a sort module. Watcom users, wait no longer: The wrapper you
need is in Listing Nine .
Again, I have used a c$pragma declaration to specify how each argument should
be passed to QSORT. The interface differs slightly between real- and
protected-mode worlds, and so the source code makes use of conditional
compilation, activated by the symbol __386__ (which is automatically defined
in Watcom's 32-bit compiler only).
The quicksort routine itself requires, as arguments, the name of the array to
sort, the number of elements in the array, the size of a single array member
(in bytes), and a pointer to a comparison function. To show the versatility of
the library routine, my sample program first sorts an INTEGER*2 array, and
then a REAL*4 array. For each type of data, you must set up a comparison
function (such as ICMPI or ICMPR) and declare it to be EXTERNAL. (This causes
its entry-point address to be passed.)
The actual calls to QSORT are in subroutines ISORT and RSORT, which insulate
the call process from the main program. If you put ISORT and ICMPI themselves
in a source file (with the c$pragma, of course), they can be treated as a
black box: An application program could just call ISORT, oblivious to the
whole ugly interface process. The same could be done with RSORT and ICMPR,
giving a convenient wrapper for floating-point arrays. It should be quite
straightforward for you to construct similar packages for other data types as
needed using these examples.


Conclusion 


I suspect that we will see more of the "common library" approach in PC
software. Users of compilers from multilingual companies may wish to use a
librarian utility to check for modules with interesting names. Here is a list
of suggestions:
Salford Software's FTN77 is delivered with a large, well-documented library
(including a sort). Apparently, the compiler is itself written in Fortran, so
the items that developers needed for that task have been written up for the
customers to use as well, leaving little to be discovered. Salford's new
FTN90, however, is mingled with a C++ compiler, so that many of the routines I
described might work with it. 
Programs compiled with Silicon Valley Software's Fortran compiler must be
linked with that company's Pascal library; this should provide some different
possibilities.
Borland customers may well find C things in the Pascal box and vice versa.
You never know, there may be free software already on your disk. Good luck and
happy spelunking!


References


Abramowitz, Milton and Irene A. Stegun. Handbook of Mathematical Functions.
Washington, D.C.: National Bureau of Standards, 1972. 

Listing One

 INTERFACE TO REAL*8 FUNCTION DBESJ0[C,ALIAS:"__j0"](X)
 real*8 x
 end
 INTERFACE TO REAL*8 FUNCTION DBESJ1[C,ALIAS:"__j1"](X)
 real*8 x
 end
 INTERFACE TO REAL*8 FUNCTION DBESY0[C,ALIAS:"__y0"](X)
 real*8 x
 end
 INTERFACE TO REAL*8 FUNCTION DBESY1[C,ALIAS:"__y1"](X)
 real*8 x
 end
 INTERFACE TO REAL*8 FUNCTION DBESJN[C,ALIAS:"__jn"](N,X)
 real*8 x
 integer*2 n
 end
 INTERFACE TO REAL*8 FUNCTION DBESYN[C,ALIAS:"__yn"](N,X)
 real*8 x
 integer*2 n
 end
 PROGRAM MBESSEL
 real*8 x,y,z,dbesj0,dbesj1,dbesy0,dbesy1,dbesjn,dbesyn
c
c Bessel function demo
c Kenneth G. Hamilton
c
 print 10
 10 format (1X/' Bessel functions of the first kind, orders 0 and 1')
 do i=0,10
 x=0.1D0*dfloat(i)
 y=dbesj0(x)
 z=dbesj1(x)
 print 20, x,y,z

 20 format (' X =',F5.2,', J0 =',1PD20.12,', J1 =',1PD20.12)
 enddo
c
 print 30
 30 format (1X/' Bessel functions of the first kind, orders 2 and 3')
 do i=0,10
 x=0.1D0*dfloat(i)
 y=dbesjn(2,x)
 z=dbesjn(3,x)
 print 40, x,y,z
 40 format (' X =',F5.2,', J2 =',1PD20.12,', J3 =',1PD20.12)
 enddo
 pause
c
 print 50
 50 format (1X/' Bessel functions of the second kind, orders 0 and 1')
 do i=1,10
 x=0.1D0*dfloat(i)
 y=dbesy0(x)
 z=dbesy1(x)
 print 60, x,y,z
 60 format (' X =',F5.2,', Y0 =',1PD20.12,', Y1 =',1PD20.12)
 enddo
c
 print 70
 70 format (1X/' Bessel functions of the second kind, orders 2 and 3')
 do i=1,10
 x=0.1D0*dfloat(i)
 y=dbesyn(2,x)
 z=dbesyn(3,x)
 print 80, x,y,z
 80 format (' X =',F5.2,', Y2 =',1PD20.12,', Y3 =',1PD20.12)
 enddo
c
 stop
 end




Listing Two

c$pragma aux dbesj0 "j0_" parm (value*8)
c$pragma aux dbesj1 "j1_" parm (value*8)
c$pragma aux dbesjn "jn_" parm (value*2, value*8)
c$pragma aux dbesy0 "y0_" parm (value*8)
c$pragma aux dbesy1 "y1_" parm (value*8)
c$pragma aux dbesyn "yn_" parm (value*2, value*8)





Listing Three

 INTERFACE TO REAL*8 FUNCTION FLOOR [C,ALIAS:"_floor"] (X)
 real*8 x
 end
 INTERFACE TO REAL*8 FUNCTION CEILING [C,ALIAS:"_ceil"] (X)

 real*8 x
 end
 PROGRAM MFLOOR
 real*8 floor, ceiling
 real*8 t, tbelow, tabove
c
 do i=0,20
 t= 4.2D0 + 0.1D0*dfloat(i)
 tbelow = floor(t)
 tabove = ceiling(t)
 print 20, t, tbelow, tabove
 20 format (' T =',F5.2,', Below = ',F5.2,', Above = ',F5.2)
 enddo
c
 stop
 end
 



Listing Four

c$pragma aux floor "floor_" parm (value*8)
c$pragma aux ceiling "ceil_" parm (value*8)




Listing Five

 INTERFACE TO REAL*8 FUNCTION HYPOT [C,ALIAS:"__hypot"] (X,Y)
 real*8 x, y
 end
 PROGRAM HYPE
 real*8 hypot
 real*8 a, b, c
c
 a = 3.D0
 b = 4.D0
 c = hypot(a,b)
 print *, 'a,b,c=',a,b,c
c
 stop
 end




Listing Six

c$pragma aux hypot "hypot_" parm (value*8, value*8)




Listing Seven

 INTERFACE TO INTEGER FUNCTION GETCH[C,ALIAS:"__getch"]()
 end

 INTERFACE TO INTEGER FUNCTION GETCHE[C,ALIAS:"__getche"]()
 end
 INTERFACE TO INTEGER FUNCTION PUTCH[C,ALIAS:"__putch"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION UNGETCH[C,ALIAS:"__ungetch"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_ASCII[C,ALIAS:"___isascii"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_ALNUM[C,ALIAS:"_isalnum"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_ALPHA[C,ALIAS:"_isalpha"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_CNTRL[C,ALIAS:"_iscntrl"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_DIGIT[C,ALIAS:"_isdigit"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_LOWER[C,ALIAS:"_islower"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_PUNCT[C,ALIAS:"_ispunct"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_SPACE[C,ALIAS:"_isspace"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_UPPER[C,ALIAS:"_isupper"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION IS_XDIGIT[C,ALIAS:"_isxdigit"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION TO_LOWER[C,ALIAS:"_tolower"](IC)
 integer ic
 end
 INTERFACE TO INTEGER FUNCTION TO_UPPER[C,ALIAS:"_toupper"](IC)
 integer ic
 end
 PROGRAM CHARS
 implicit integer*4 (i-n)
 integer getch, getche, putch, ungetch
 integer is_ascii, is_alnum, is_alpha, is_cntrl, is_digit
 integer is_lower, is_punct, is_space, is_upper, is_xdigit
 integer to_lower, to_upper
 logical l_lower, l_upper
 character*1 c1, c2
c
c Demonstration of C routines available in the Fortran library
c Kenneth G. Hamilton
c
c Perform console I/O using a single character
c
 c1 = char(getche()) ! Read one character, with echo

 print 10, ichar(c1) ! Tell us what it is
 10 format (1X/' Character value is ',Z2,'(hex)')
 istat = ungetch(ichar(c1)) ! Put the character back
c
 c2 = char(getch()) ! Reread the "ungotten" character
 print 20 ! And it it ...
 20 format (1X/' Reread character is:',$)
 istat = putch(ichar(c2)) ! this character!
c
c What are the properties of the character?
c
 if (is_ascii(ichar(c2)).eq.0) then
 print 30 ! This is not a good character
 30 format (1X/' The character is non-ASCII')
 stop
 endif
 print 40
 40 format (1X/' The character is in the ASCII set')
c
 print 50, 'Alphanumeric', (is_alnum(ichar(c2)).ne.0)
 print 50, 'Control', (is_cntrl(ichar(c2)).ne.0)
 print 50, 'Digit', (is_digit(ichar(c2)).ne.0)
 print 50, 'Hex Digit', (is_xdigit(ichar(c2)).ne.0)
 print 50, 'Punctuation', (is_punct(ichar(c2)).ne.0)
 print 50, 'White Space', (is_space(ichar(c2)).ne.0)
 50 format (1X,A15,'?',2X,L5)
c
 print 50, 'Alphabetic', (is_alpha(ichar(c2)).ne.0)
c
 l_lower = (is_lower(ichar(c2)).ne.0) ! .TRUE. if lower case
 print 50, 'Lower Case', l_lower
 if (l_lower) print 60, char(to_upper(ichar(c2)))
 60 format (6X,'(Upper case equivalent is "',A1,'")')
c
 l_upper = (is_upper(ichar(c2)).ne.0) ! .TRUE. if upper case
 print 50, 'Upper Case', l_upper
 if (l_upper) print 70, char(to_lower(ichar(c2)))
 70 format (6X,'(Lower case equivalent is "',A1,'")')
c
 stop
 end




Listing Eight

c$pragma aux getch "getch_" parm ()
c$pragma aux getche "getche_" parm ()
c$pragma aux putch "putch_" parm (value*4)
c$pragma aux ungetch "ungetch_" parm (value*4)
c$pragma aux is_ascii "isascii_" parm (value*4)
c$pragma aux is_alnum "isalnum_" parm (value*4)
c$pragma aux is_alpha "isalpha_" parm (value*4)
c$pragma aux is_cntrl "iscntrl_" parm (value*4)
c$pragma aux is_digit "isdigit_" parm (value*4)
c$pragma aux is_lower "islower_" parm (value*4)
c$pragma aux is_punct "ispunct_" parm (value*4)
c$pragma aux is_space "isspace_" parm (value*4)

c$pragma aux is_upper "isupper_" parm (value*4)
c$pragma aux is_xdigit "isxdigit_" parm (value*4)
c$pragma aux to_lower "tolower_" parm (value*4)
c$pragma aux to_upper "toupper_" parm (value*4)



























































Using the Multiple Precision Library


Dealing with infinite-precision integers 




John Rogers


JR is a programmer in the Seattle area. He can be contacted on CompuServe at
72634,2402.


ANSI C is a fine language--as long as you don't have to count too high.
Although most implementations of C boast 32-bit integer arithmetic, many
applications are beginning to require higher precision. (Microsoft, for
instance, makes over a billion dollars a year, and that dollar amount barely
fits in 32 bits. What if Bill wants to count in pennies to keep the auditors
happy?) However, even the IEEE double-precision floating-point format only
gives 15 digits of precision, as I understand it.
What's needed is a way to deal with multiple-precision integers, independent
of the machine's word size. The multiple-precision (MP) integer library
available with UNIX V7, UNIX SVR4, 4.3BSD, and other versions of UNIX provide
"infinite-precision" signed integer operations for C programs. I've also
ported the GMP library to Windows NT. In this article, I'll describe how to
use the scantily documented MP routines, along with providing sample code,
portability information, some MP "helpers," and a few other hints.


Declaring Variables


To declare variables for use with the MP library, you generally declare them
as pointers to the MINT (multiple-precision integer) type. The MINT type is
often defined in <mp.h> as a structure. Even if the fields in the structure
are documented in your version of MP, I recommend avoiding direct references
to them in your programs. The field names and usage differ greatly between MP
implementations.
You should also be aware that the MINT type name (all uppercase letters) is
only available as mint (all lowercase letters) in a few versions. I recommend
you define NEED_MINT when using those versions, and write code like that shown
in Example 1; after that, you can use the MINT type. I do this in the header
file for my MP helpers (see MPHelp.h in Listing One). In this article, I will
only use the MINT type name.
Table 1 lists the MP functions supported by various versions of MP, along with
entries for the type name(s) supported by each, and whether or not the header
file <mp.h> is available. Limitations of the various versions are also noted.
The details in this table should help you write portable code using MP.
One rule is important when writing code with MINT variables: Make sure every
variable is initialized by calling itom or xtom, both of which return pointers
to MINT variables. Throughout this article, I use phrases like "refers to a
MINT variable" when I really mean "is passed a pointer to a MINT variable." In
short, none of the MP functions are passed MINT variables directly.
The itom (integer-to-MINT) routine is available in every version of MP. It is
by far the most common way to initialize MINT variables, as itom returns a
pointer to a MINT. Beware that itom really only accepts a value of type short
in most versions (despite what some man pages imply), and at least one version
will not accept the "most negative short integer" (SHRT_MIN, I assume). Being
allowed to initialize only with a short does not limit the final size of the
variable.
The xtom (hex-string-to-MINT) routine is not widely available, and it is the
only way to initialize a MINT variable besides itom. Therefore, xtom returns a
pointer to a MINT variable. The only argument to xtom is a pointer to a string
containing hex digits. For example, to declare and initialize some variables,
use code like that in Example 2. Once you have initialized a MINT variable,
you can update its value repeatedly with a variety of MP functions. The MP
library will automatically expand and shrink the memory allocated for that
MINT variable. In some versions, you can explicitly give back the memory
associated with a variable by using the mfree function. Note that mfree
effectively deinitializes the MINT variable.
You should be aware of one other aspect related to memory allocation: MP
routines do not inform your code in any way if they run out of memory. There's
no return code, no global-error variable, no callback, no signal, no
exception. The routines die in whatever manner they choose, probably writing a
message to stderr in English, and possibly leaving a core file if they are
running on a UNIX system. MP routines are fine for many applications, but
think twice before using them in something like an air-traffic control system.
Passing out-of-range arguments to the MP routines will generally have similar
results as running out of memory. So, write robust code, and test it well.
Beware of passing references to the same MINT as input and output for the same
function call. (Using the same MINT for two different inputs seems to work
fine, however.)
The MP compare (mcmp) routine and the standard-C string-compare (strcmp)
routine are very similar. Both return type int when comparing two values. Both
return 0, when the values are equal; less than 0, when the first value is
smaller; and greater than 0, when the first value is larger. See MPHelp.c
(Listing Two) for samples of using the mcmp routine.


Add, Subtract, Multiply, Divide


Every version of MP has support for the basic math operations add, subtract,
multiply, and divide. The MINT add routine (madd) takes its first two
arguments as inputs, adds them, and stores the result in the third argument.
The madd routine, like almost every other MP routine, has no return value and
assumes that all of its arguments (even the output arguments) have already
been initialized. The MINT subtract routine, msub, is similar to madd in
Example 3. It subtracts the value of its second argument from its first
argument and sets its third argument accordingly. The MINT multiply routine
(mult) simply multiplies its first argument by its second argument and updates
the third argument with the product. All three arguments must refer to MINT
variables.
The MP library has two divide routines that differ only in name and the data
types of two arguments. For the MINT divide routine, mdiv, all arguments refer
to MINT variables. For the short divide routine, sdiv, the data type of the
divisor is short. The remainder in sdiv is represented as a pointer to a
short. Other than those two exceptions in sdiv, all other arguments to the
divide routines must refer to MINT variables. For both routines, the first two
arguments are inputs. The first argument is the dividend and the second is the
divisor. The divide routines also have two outputs each. The third argument
will be set to the quotient, and the fourth will be set to the remainder. Be
careful when passing negative numbers to the divide routines, because various
versions of mdiv and sdiv set the sign for the remainder using different
rules. As with itom, be aware that at least one version of sdiv will not
accept the "most negative short integer" (SHRT_MIN, I assume) as the divisor.
Listing Two (MPHelp.c) gives examples of calling both MP divide routines, mdiv
and sdiv.


I/O


The MP versions of which I am aware include eight different MINT I/O routines.
However, only two of those routines are available in all versions, and one of
those two is plagued by problems. The two common routines are MINT output
(mout) and MINT input (min). The mout and min routines do decimal I/O using
stdout. The min routine in some MP versions may return an int value of EOF
(end of file), but this return value is often not documented or missing
altogether. For portability, I recommend avoiding the min return value and
calling the stdio feof function instead. The feof routine and the EOF #define
are declared in <stdio.h>. The name min is doubly cursed; in some older
versions of Microsoft C an incompatible min macro is defined in <stdlib.h>. 
If you have access to them, I recommend the Berkeley-style m_in and m_out
routines, which support any file (not just stdin/stdout) and more number bases
(2 through 10 in 4.3BSD).
Table 2 gives more-detailed information on all eight of the MP I/O functions.
It documents the return values, argument lists, number bases, and files
supported by each function. IsPrime.c (Listing Three) provides a sample call
to mout, while PowTable.c (Listing Four) contains calls to m_out.


Square Roots


All MP versions have a MINT square-root routine, msqrt. The only input is the
first argument, which must refer to a MINT having a value greater than or
equal to 0. The outputs are the other two arguments. The second argument must
point to a MINT set with the square root, and the third argument must refer to
a MINT set with the remainder. MPHelp.c (Listing Two) illustrates a call to
msqrt.


GCD


Although the function name may differ, every version of MP has a greatest
common divisor (GCD) routine, usually named gcd. In one version it is mgcd,
probably to avoid conflicts with an incompatible math-library routine.

The gcd and mgcd routines each have two inputs and one output. All arguments
refer to MINT variables. The first and second arguments are inputs; the third
will be set to the greatest common divisor of the other two. I recommend that
you avoid calling either routine with inputs that have 0 or negative values.


Powers


Every version of MP has an rpow (regular power) routine. The first input
(referring to a MINT) is the base, the second is the exponent. The data type
of the exponent is often undocumented. I recommend only passing a short for
this argument. The third argument must refer to a MINT that will be set to the
first argument raised to the second argument. Try to avoid 0 and negative
powers.
All versions of MP have another power function, usually called pow. At least
one MP version calls it mpow (also because of a math-library-function
incompatibility). The arguments for pow and mpow are identical; all refer to
MINT values. The first three arguments are all inputs. The first argument is
the base, the second is the exponent, and the third is a value the result of
which is to be computed modulo. The fourth argument is the output; it will be
set with the result. Unfortunately, the "modulo" argument of this function
makes it more or less useless.
The code sample for the power routines produces a table of the powers of two,
in various number bases, up to any arbitrary number. (This is part of the
reason for my interest in the MP routines. I'm writing a reference book and I
want to include a table of powers of two, up to at least 64 bits, but my C
compiler quits at 32.) See PowTable.c (Listing Four) for an example of calling
rpow.


Miscellaneous Routines


One miscellaneous routine I've found useful is move, which copies one MINT
value to another. The first argument is the input; the second is the output.
This routine is often used because it is not wise to pass the same MINT value
to an MP function as both an input and an output. Instead you introduce
temporary variables and call the move routine here and there.
The Berkeley version of MP also provides a function named invert which gets
three arguments, all of which must refer to MINT variables. The first two
arguments are inputs; the third is the output. The 4.3BSD man page says
"invert computes c such that a*c mod b=1, for a and b relatively prime." I
don't know of any use for this function yet; I'm only including it for
completeness.
The last miscellaneous MP function is the MINT-to-hex-string (mtox) routine.
The only argument to mtox is an input, referring to the MINT variable to be
converted. The return value from mtox is a pointer to a newly allocated
string. The area allocated for the string may be released by calling the usual
function (free, declared in <stdlib.h>).


Conclusions


The MP library includes some powerful functions that give you the means to
expand numbers in programs in a fairly portable way. You now have enough
information to make effective use of them in your programs. Good luck!
Table 1: MP functions versus version of MP. 
 UNIX UNIX Research UNIX 4.3 GMP 
 V7 Tenth Edition SVR4 BSD 1.3.2 
 header file <mp.h> no yes yes yes yes
 type MINT no no yes yes yes
 type mint no yes no no no
 fmin no yes no yes1 no
 fmout no yes no yes no
 gcd yes2 no8 yes yes yes
 invert no no no yes no
 itom yes5 yes9 yes yes5 yes
 madd yes yes yes yes yes
 mcmp no yes7 yes7 yes7 yes
 mdiv yes yes yes yes yes17,19
 mfree no yes yes no yes
 mgcd no15 yes no15 no15 no15
 min yes1 yes yes1 yes1 yes1,20
 m_in no no no yes1 no
 mout yes yes yes yes yes19
 m_out no no no yes14 no
 move no yes no yes yes
 mpow no13 yes no13 no13 no13
 msqrt yes yes yes yes yes
 msub yes yes yes yes yes
 mtox no no yes16 no yes18
 mult yes yes yes yes yes
 omin no no no yes1 no
 omout no no no yes no
 pow yes no10 yes yes yes
 rpow yes3 yes6 yes11 yes12 yes11
 sdiv yes4 yes9 yes yes yes17
 xtom no no yes16 no yes
 1 fmin, min, m_in, omin: Return type and/or value not documented.
 2 gcd: Prototype is given in man page, but not documented.
 3 rpow: Second argument (exponent) is documented as "MINT *"; 
 perhaps this is wrong, as all other versions seem to use short or 

 int.
 4 sdiv: Second argument (divisor) type not documented; probably 
 short as in other versions.
 5 itom: Argument type is not documented; probably short.
 6 rpow: Second argument (exponent) type not given. Perhaps short or 
 int was meant.
 7 mcmp: Return type not documented; probably int.
 8 gcd: Probably conflicts with a libm/math.h routine. In this 
 version, use mgcd instead.
 9 itom, sdiv: Don't call with SHRT_MIN (the most negative short 
 value).
 10 pow: Probably conflicts with pow in libm/math.h; use mpow 
 instead.
 11 rpow: Second argument (power) documented as short.
 12 rpow: Second argument (power) documented as int; use short for 
 portability.
 13 mpow: Use pow instead.
 14 m_out: Only supports bases 2--10.
 15 mgcd: Use gcd instead.
 16 mtox, xtom: It is undocumented whether hex uses upper- and/or 
 lowercase letters.
 17 mdiv, sdiv: Remainder is same sign as the dividend.
 18 mtox: Generates lowercase hex letters; negative numbers have 
 leading minus sign.
 19 mdiv, mout: Buggy on systems with 16-bit int.
 20 min: Ignores leading spaces and tabs.
Example 1: Define NEED_MINT when using the MINT type name.
#include <mp.h>
#ifdef NEED_MINT
typedef mint MINT;
#endif
Example 2: Declaring and initializing variables.
MINT * One;
MINT * Three;
MINT * Sum;
 ...
One = itom(1);
Three = itom(3);
Sum = itom(7); /* dummy value */
Table 2: MP I/O routines.
Input Routines 
Base File Prototype 
10 any int? fmin( MINT * Number, FILE * File )
10 stdin int? min( MINT * Number )
2--10 any int? m_in( MINT * Number, int Base, FILE * File )
8 stdin int? omin( MINT * Number )
Output Routines 
Base File Prototype 
10 any void fmout( MINT * Number, FILE * File )
10 stdout void mout( MINT * Number )
2--10 any void m_out( MINT * Number, int Base, FILE * File )
8 stdout void omout( MINT * Number )
Example 3: The madd function.
madd(
 One, /* first input */
 Three, /* second input */
 Sum); /* result - set */

Listing One 


/* Copyright (c) 1994 by John Rogers. All rights reserved.
 * FUNCTION - MPHelp.h declares routines which are "helpers" to users of the 
 * multiple precision (MP) library. The routines declared here are:
 * miseven(number), misodd(number), misprime(number), miszero(number)
 * All of these functions return "boolean" values in "int" types, where 0 
 * means false and 1 means true.
 * AUTHOR - JR (John Rogers), CompuServe: 72634,2402
 * Internet: 72634.2402@CompuServe.com
 */

#ifndef MPHELP_H
#define MPHELP_H

#include <mp.h> /* MINT or mint typedef. */

/* Define NEED_MINT in makefile if local mp.h does not provide the typedef. 
 * In that case, we define the MINT type here.
 */
#ifdef NEED_MINT
typedef mint MINT;
#endif

/* ROUTINES, in alphabetical order: */

int /* Note: return values are 0=false and 1=true. */
miseven(
 const MINT * Number);
int /* Note: return values are 0=false and 1=true. */
misodd(
 const MINT * Number);
int /* Note: return values are 0=false and 1=true. */
misprime(
 const MINT * Number);
int /* Note: return values are 0=false and 1=true. */
miszero(
 const MINT * Number);
#endif /* MPHELP_H */



Listing Two

/* Copyright (c) 1994 by John Rogers. All rights reserved.
 * FUNCTION - MPHelp.c contains the MP "helpers": miseven(number), 
 * misodd(number), misprime(number), miszero(number). All of these functions 
 * return "boolean" values in "int" types, where 0 is false and 1 means true.
 * AUTHOR - JR (John Rogers), CompuServe: 72634,2402
 * Internet: 72634.2402@CompuServe.com
 */

#include <assert.h> /* assert(). */
#include <mp.h> /* MINT typedef. */
#include "mphelp.h" /* My prototypes. */

/* ROUTINES, in alphabetical order: */
int /* Note: return values are 0=false and 1=true. */
miseven(
 const MINT * Number)

{
 MINT * Quotient;
 short Remainder;
 int ReturnValue; /* 0=false, 1=true. */

 /* Initialize MINT variable (any value will do), so MP routines 
 * allocate space. */
 Quotient = itom(7); /* Dummy value. */

 /* Divide Number by two and look at remainder. */
 sdiv(
 Number, /* dividend - input */
 2, /* divisor - input */
 Quotient, /* quotient - output */
 & Remainder); /* remainder - output */
 /* We'll return "true" if-and-only-if Remainder is zero. */
 ReturnValue = ( Remainder == 0 );

 mfree( Quotient ); /* Free space used by temp. */

 return (ReturnValue);

} /* miseven */

int /* Note: return values are 0=false and 1=true. */
misodd(
 const MINT * Number)
{
 return ( !miseven( Number ) );
}
int /* Note: return values are 0=false and 1=true. */
misprime(
 const MINT * Candidate)
{
 int CompareResult;
 MINT * Constant_Two = itom(2);
 MINT * Divisor = NULL;
 MINT * MaxDivisor = NULL;
 MINT * Quotient = NULL;
 MINT * Remainder = NULL;
 int ReturnValue; /* 0=false, 1=true. */

 /* Check for easy cases:
 * -infinity <= x <= 1 not prime
 * x = 2 prime
 * x > 2, x is even not prime
 */
 CompareResult = mcmp(Candidate, Constant_Two);
 if (CompareResult < 0) {
 /* Anything less than 2 isn't prime. */
 ReturnValue = 0; /* false */
 goto Cleanup;
 } else if (CompareResult == 0) {
 /* Exactly two, yes that is a prime. */
 ReturnValue = 1; /* true */
 goto Cleanup;
 }
 assert( CompareResult > 0 );
 if (miseven(Candidate)) {

 ReturnValue = 0; /* false */
 goto Cleanup;
 }
 /* Well, all that's left is the hard stuff. Try all of the odd divisors
 * from 3 up to the square root of the candidate. */
 assert( misodd( Candidate ) );

 Divisor = itom( 3 );
 MaxDivisor = itom( 1 );
 Quotient = itom( 1 ); /* don't care value */
 Remainder = itom( 1 ); /* don't care value */

 msqrt(
 Candidate, /* input value */
 MaxDivisor, /* square root */
 Remainder ); /* remainder */
 for ( ; ; ) { /* loop forever */
 /* Try dividing this one. */
 mdiv(
 Candidate, /* dividend */
 Divisor,
 Quotient,
 Remainder);
 /* Does this divisor divide evenly? */
 if ( miszero( Remainder ) ) {
 /* If we were dividing by Candidate, then Candidate must be prime. */
 if (mcmp( Candidate, Divisor ) == 0) {
 ReturnValue = 1; /* true */
 goto Cleanup;
 } else {
 /* Otherwise, if this divisor divides evenly, it factors the 
 * candidate, which therefore cannot be prime. */
 ReturnValue = 0; /* false */
 goto Cleanup;
 }
 }
 /* Have we gone as far as we can? If so, this must be prime! */
 if (mcmp( Divisor, MaxDivisor ) >= 0) {
 ReturnValue = 1; /* true */
 goto Cleanup;
 }
 /* Bump to next odd divisor. We shouldn't use MP routines to
 * update Divisor in place, so use a temporary for result of madd(). */
#define RandomTemp Quotient
 madd(
 Divisor, /* a value */
 Constant_Two, /* 2nd value */
 RandomTemp ); /* the sum */
 move(
 RandomTemp, /* source */
 Divisor ); /* dest */
 assert( misodd( Divisor ) );
 } /* loop forever */
Cleanup:
 /* Free memory used by temp vars. */
 if (Constant_Two != NULL) {
 mfree( Constant_Two );
 }
 if (Divisor != NULL) {

 mfree( Divisor );
 }
 if (MaxDivisor != NULL) {
 mfree( MaxDivisor );
 }
 if (Quotient != NULL) {
 mfree( Quotient );
 }
 if (Remainder != NULL) {
 mfree( Remainder );
 }
 return (ReturnValue);
} /* misprime */
int /* Note: return values are 0=false and 1=true. */
miszero(
 const MINT * Number)
{
 MINT * Constant_Zero = itom(0);
 int ReturnValue; /* 0=false, 1=true. */
 /* We'll use the standard mcmp (MP compare) function for this. 
 * mcmp(a,b) returns <0 if a<b
 * =0 if a=b
 * >0 if a>b
 */
 if ( mcmp( Number, Constant_Zero ) == 0 ) {
 ReturnValue = 1; /* true */
 } else {
 ReturnValue = 0; /* false */
 }
 mfree( Constant_Zero );
 return (ReturnValue);
} /* miszero */



Listing Three

/* Copyright (c) 1994 by JR (John Rogers). All rights reserved.
 * FUNCTION - This program takes a given number and
 * determines whether or not it is prime.
 * SYNTAX - IsPrime -n number
 * AUTHOR - JR (John Rogers) CompuServe: 72634,2402
 * Internet: 72634.2402@CompuServe.com
 */

#include <assert.h> /* assert(). */
#include <mp.h> /* MINT typedef, itom(). */
#include "mphelp.h" /* misprime(). */
#include "sample.h" /* Die(), StringToM(). */
#include <stdio.h> /* printf(). */
#include <stdlib.h> /* EXIT_SUCCESS. */
#include <unistd.h> /* getopt(), optind, optarg. */

#define USAGE \
 "This program determines if a number is prime.\n" \
 "Usage: IsPrime -n number\n" \
 "Or: IsPrime -?\n\n" \
 "where:\n" \
 " -n number gives number to check\n" \

 " -? displays this message\n\n" \
 "Author: JR (John Rogers).\n"

int main(
 int argc,
 char * argv[])
{
 const char * MyOpts = "n:N:";
 MINT * Number = NULL;
 int ThisOpt;
 int ValueIsPrime; /* 0=false, 1=true */

 /* Do initial setup and argument handling. */
 while ((ThisOpt=getopt(argc,argv,MyOpts)) != EOF) {
 switch (ThisOpt) {
 case 'n':
 case 'N':
 Number = StringToM( optarg );
 assert(Number != NULL);
 break;
 case '?':
 Die( USAGE );
 /*NOTREACHED*/
 default:
 Die( "bad return value from getopt");
 /*NOTREACHED*/
 }
 }
 /* Handle missing number. */
 if (Number == NULL) {
 Die( USAGE );
 /*NOTREACHED*/
 }
 /* Call an MP helper to do the hard work. */
 ValueIsPrime = misprime( Number );
 /* Tell user what we found out. */
 mout( Number );
 if (ValueIsPrime) {
 printf("NUMBER IS PRIME!\n");
 } else {
 printf("NUMBER IS NOT PRIME!\n");
 }
 /* Free temp storage. */
 if (Number != NULL) {
 mfree( Number );
 }
 /* All done; set exit code. */
 if (ValueIsPrime) {
 return(EXIT_SUCCESS);
 } else {
 return(EXIT_FAILURE);
 }
} /* main */



Listing Four

/* Copyright (c) 1994 by JR (John Rogers). All rights reserved.

 * FUNCTION - This program generates a table of powers
 * of 2, using the MP library routines.
 * SYNTAX - PowTable -n number
 * AUTHOR - JR (John Rogers) CompuServe: 72634,2402
 * Internet: 72634.2402@CompuServe.com
 */

#include <assert.h> /* assert(). */
#include <mp.h> /* MINT, itom, rpow, mfree. */
#include "sample.h" /* Die(), StringToShort(). */
#include <stdio.h> /* stdout. */
#include <stdlib.h> /* EXIT_SUCCESS. */
#include <unistd.h> /* getopt(), optind, optarg. */

/* Only define this if m_out is supported at all. */
/* #define MP_SUPPORTS_M_OUT */

/* Only define this if m_out supports bases > 10. */
/* #define MP_OUTPUT_SUPPORTS_LARGE_BASES */

#define MY_USAGE \
 "Usage: PowTable -n number\n" \
 "Author: JR (John Rogers).\n\n"
static void
DisplayOneTableEntry(
 short PowerToCompute )
{
 MINT * Two = itom(2);
 MINT * PowerOf2 = itom(42); /* Dummy value. */

 /* Compute regular power (2 ** N). */
 rpow(
 Two, /* number to raise */
 PowerToCompute, /* exponent (short) */
 PowerOf2); /* result */
 /* Write the power of two for this one. */
 printf( "\n\n2 ** %d is:\n", (int) PowerToCompute );
#ifdef MP_SUPPORTS_M_OUT
 /* Write 2**N in binary, octal, decimal, hex. */
 m_out( PowerOf2, 2, stdout );
 m_out( PowerOf2, 8, stdout );
 m_out( PowerOf2, 10, stdout );
#ifdef MP_OUTPUT_SUPPORTS_LARGE_BASES
 m_out( PowerOf2, 16, stdout );
#endif

#else
 /* Output in decimal (only base supported). */
 mout( PowerOf2 );
#endif
 mfree( Two ); /* Free temps */
 mfree( PowerOf2 );
} /* DisplayOneTableEntry */
static void
GeneratePowerTable(
 short MaxPower ) /* >= 1 */
{
 short CurrentPower = 1;
 for (;;) {

 DisplayOneTableEntry( CurrentPower );
 ++CurrentPower;
 /* Have we gone far enough? */
 if (CurrentPower > MaxPower) {
 break; /* done; go cleanup and return */
 }
 }
} /* GeneratePowerTable */
int
main(
 int argc,
 char * argv[])
{
 short MaxPower = 0;
 const char * MyOpts = "n:N:";
 int ThisOpt;
 /* Do initial setup and argument handling. */
 while ((ThisOpt=getopt(argc,argv,MyOpts)) != EOF) {
 switch (ThisOpt) {
 case 'n':
 case 'N':
 MaxPower = StringToShort( optarg );
 assert(MaxPower != 0);
 break;
 case '?':
 Die( MY_USAGE );
 /*NOTREACHED*/
 default:
 Die( "bad return value from getopt");
 /*NOTREACHED*/
 }
 }
 /* Handle missing argument. */
 if (MaxPower == 0) {
 Die( MY_USAGE );
 /*NOTREACHED*/
 }
 /* Generate the table. */
 GeneratePowerTable( MaxPower );
 return(EXIT_SUCCESS);
} /* main */






















Basic Arithmetic with Infinite Integers


Working with large numbers without losing digits




Jeffrey W. Hamilton


Jeff is a researcher at IBM's T.J. Watson Research Center. He can be contacted
at jeffh@watson.ibm.com.


Sometimes you probably wish you had a little more flexibility with integer
arithmetic. While 32-bit integers permit you to express a large range of
numbers, there are problems that require larger values. For example, did you
realize you can't balance the U.S. budget with 32-bit integers? Of course, you
could switch to floating-point arithmetic, but then you sacrifice the accuracy
of your results. In this article, I'll describe how to implement an efficient
method for representing infinite integers and algorithms for doing simple
arithmetic with infinite integers.
I first ran across the need for infinite integers while implementing a small
LISP interpreter. Common LISP supports standard fixed-length integers, but it
also contains a data type called "big numbers." In theory, a big number can
represent any integer (assuming you have enough memory to contain the bits).
There are two common methods for representing infinite integers. 
The first involves storing the numbers in a byte array, which is also known as
"unpacked binary-coded decimal notation." Each digit of the number occupies
one byte. You start the byte array with a two-byte integer that contains the
length of the number. To represent the sign of the number, you use the
most-significant bit of the length: 0 means the number is positive, and 1
means the number is negative. With this representation, you can store a number
that is up to 32,767 digits long. If you need a larger number, you can always
reserve four bytes at the start of the array. Notice in Example 1 that the
digits are stored in reverse order. This makes expanding and contracting the
number easier as we manipulate its value. Adding two numbers simply involves
adding each digit. If the result of the addition is greater than 9, subtract 9
and add 1 to the next digit to the right. If there is a carry from the last
digit, you adjust the length field by adding 1. The advantages of this storage
format are as follows:
It is easy to understand: Manipulations are just like the arithmetic you
learned in grade school. 
Many processors contain built-in functions to perform arithmetic on byte
arrays, such as the AAA, AAS, AAM, and AAD instructions in the Intel 80x86
microprocessors. 
You can print a number by simply adding the value of the character A to each
digit.
The main disadvantage is the amount of space the number occupies. Each byte
could hold 256 unique values, but you are only using ten of them.
To minimize this problem, the packed binary-decimal format stores a digit in
each nibble (4 bits) of memory. Notice in Example 2 that you have reduced the
amount of memory required to hold the number by half while increasing the
number of digits that can be represented. You can now represent a number that
is up to 65,534 digits long, but you have lost some advantages. While still
storing each byte of the number in reverse order, you are storing the digits
within the byte with the most-significant nibble first. You do this because
many processors have built-in instructions for handling packed decimal
numbers. The Intel 80x86 DAA and DAS instructions, for instance, expect the
most-significant nibble to be on the left side of the byte. While it is more
difficult to read, it can be done, so most implementors prefer this format to
the unpacked binary-decimal format. The packed binary-decimal format
introduces additional problems. Intel processors can add, subtract, multiply,
and divide unpacked binary-decimal numbers, but they can only add and subtract
packed binary-decimal numbers. This is not much of a hindrance; you simply
must unpack each byte as you multiply two numbers and repack the results.
Since you can do this byte-by-byte as you work through the number, you don't
need additional space in memory to hold the unpacked versions of the two
numbers. Of course, for printing, each digit must be unpacked as each number
is printed.


A Better Way


Still, it bothered me that I was only using 20 values in each byte. There were
236 values just sitting idle in memory, not being productive! In Seminumerical
Algorithms (Addison-Wesley, 1981), Knuth presents examples that use any based
arithmetic, not just decimal. I began to wonder what would happen if I set my
base to the size of a byte, or better yet, an integer? Why, then nothing would
be wasted! As an added bonus, I could use the common instructions that are
available in every compiler to manipulate the numbers and it would still be
efficient. Using this format with a two-byte length field means I can
represent numbers with over 160,000 digits in their printed form.
While my implementation of these numbers is in C, you could use any language
to accomplish the same thing. The only requirement is that the language and
target machine can represent integers in two different forms where the larger
form is at least twice the size of the smaller form. In my C implementation, I
am assuming a short int is 16 bits long and a long int is 32 bits.


Creating Big Numbers


The file bignum.h (Listing One) contains the basic definition of an infinite
integer type called bignum--a pointer to a structure containing a length field
(that also holds the sign bit) and a variable-length array of elements. Every
bignum will contain at least one element. The least-significant element is
stored in the lowest index in the array. The value in the elements is kept as
a positive integer. Only the sign bit determines whether the number is
positive or negative.
Besides defining the type, bignum.h also contains ANSI-C prototypes for all
the functions, a set of convenience macros for decomposing bignums, and
definitions for some necessary constants.
The routine new_bignum in convert.c (Listing Two) is the basic allocation
routine. It takes one argument--the number of elements the bignum will
contain. It returns a bignum with the length field set to the number of
elements in the number and the sign bit set to POSITIVE. The actual value of a
newly created bignum is undefined. The creating routine is required to fill in
the values of each element.
Two things can go wrong when you try to create a bignum: You can run out of
memory or the user can ask for too many elements. In the case of an error,
NULL is returned instead of a bignum, and errno is set to the reason the call
failed. This places the burden of checking for errors and handling them on the
callers of the function.
If you can make a bignum, you also need a way to discard numbers that are no
longer needed. The routine destroy_ bignum releases the memory occupied by a
bignum, accepting a previously created bignum or a NULL. Accepting NULL allows
for simpler handling of exceptions when dealing with bignums. If you set all
temporary bignums to NULL before attempting to create them, you can destroy
all temporary bignums on an exception without having to keep track of which
bignums were actually allocated.
Another useful routine is copy_bignum, which creates a temporary number that
can be altered within a routine. Without temporary copies, a subroutine would
end up changing the values of bignums passed to it.


Converting Integers to Big Numbers


In an ideal world, users of the infinite-integer library should never have to
see the internals of a bignum. Instead, they would start with known data types
and call functions to convert the known data type into and out of bignum
format. Most conversions will require math functions that work on bignums, so
for the moment, I'll restrict the discussion to signed and unsigned integers.
The ltobig routine converts a signed long integer into a bignum. Because C
automatically promotes smaller integer formats to the long format, you can
avoid creating routines for converting char, short, and int data types. The
first thing the ltobig routine does is check for the sign of the long integer
being passed in. If it is negative, the number is complemented so that its
absolute value can be stored. We also take the time to make sure only the
minimum number of elements needed to hold a bignum are allocated. By
guaranteeing that all bignums are in their smallest format, you simplify the
comparison of two bignums.
The ultobig routine converts unsigned long integers into bignum format.
Converting a bignum to an integer is more awkward. Since a bignum can contain
more significant bits than a long, you must have a way to notify the caller
that the bignum couldn't be converted. The bigtol and bigtoul routines handle
signed and unsigned long conversions, but a complete library would need
functions for signed and unsigned chars, shorts, and ints. The caller passes
where to store the converted value instead of directly returning the value.
The direct return value is used to indicate success or failure. These
functions return 0 if the value could be converted and --1 if there was a
problem. The errno value is set to BIGNUM_ NOSPACE when there are too many
significant bits.


Negation


Negating a bignum is trivial. All you have to do is flip the sign bit. To
maintain the rule that no function alters the value of its parameters,
neg_bignum first duplicates its argument and then flips the sign bit in the
copy.



Comparing Big Numbers


A fundamental operation in any number system is deciding whether a number is
bigger, smaller, or equal to another number. The cmp_bignum routine in the
logic.c (Listing Three) returns 1 if the first number is larger than the
second, 0 if they are equal, and --1 if the second number is larger than the
first.
The natural way to implement a comparison is to subtract the two numbers and
check for negative, positive, or zero results. However, this leads to
inefficiencies, since the subtraction function creates a new number that will
have to be discarded. Another problem is that the implementation of sub_bignum
needs a comparison function to determine the sign of the result. Using
sub_bignum in the comparison function would have lead to an infinite
recursion.
Several optimizations are done in cmp_ bignum to return the answer as quickly
as possible. The first check is the sign of the two numbers. If they are
different, a positive number is always bigger than a negative one, so there is
no need to go any further. After noting the sign of the two numbers, the
second shortcut is to check how many elements are in each number. Longer
numbers are bigger than shorter numbers (or, if we are working with negative
numbers, longer numbers are smaller than shorter numbers). In the worst case,
you begin checking each element, starting with the most significant, stopping
as soon as you find a difference. If there is no difference, then the numbers
are the same, and we return 0.


Minimal Representation


For this function to work, you must guarantee that two bignums with the same
value are stored in the same format. For example, you could create a number 0
that contained two elements, both of which had the value zero. If this occurs,
the complexity of the comparison would increase and its speed would be
severely diminished.
You avoid this by making a rule stating that all bignums are stored in the
smallest way that still holds all significant bits. The reduce_bignum routine
in math.c (Listing Four) implements this rule by scanning for leading 0
elements in a bignum and removing them.


Addition


The add_bignum function in the file math.c sums two bignums and returns the
result in a newly created bignum. Because these functions do not alter the
values passed to them, the caller is responsible for releasing numbers that
are no longer used.
The algorithm for adding bignums is the same one used when adding numbers on a
piece of paper. To add 59,283 to 3,876,390, you were taught to start with the
least-significant digits, add them together, and write down the result. If the
result had more than one digit, you only wrote the least-significant digit
down and added the more-significant to the next pair of digits. (In other
words, you carried the overflow.) 
Adding bignums works the same way, but instead of ten unique digits, you have
65,535 unique values for each digit. Remember that the largest carry when
adding two digits is still the value 1, and the size of the result is at most
one digit more than the longest number.
In the add_bignum routine, first determine constant values so you don't have
to recompute them in the addition loop. Then allocate space for the resulting
bignum using the largest potential size. If the results happen to be less than
what's allocated, you can recover the space with the reduce_bignum function. 
The For loop is where the real work is accomplished. Add one digit (or
element) at a time with each pass of the loop. Since the addition can be
bigger than an element by one bit, do the addition in a bigger number format
than that in which the element is stored. The carry variable is the temporary
storage location. After the addition, the lower half of the carry contains the
number to store in the result, and the upper half contains the carry for the
next pass of the loop. While adding digits, use zeros at the end of a number
when adding two numbers of different lengths, and watch the sign of the number
being added. Since the digits are being stored in absolute-value form, you
must add the digit if it is positive, or subtract the digit if it is negative.
This also means that the upper value of carry can take on three values: 1, 0,
or --1. By defining carry as a signed number, shifting the upper half down to
the bottom half ensures that the value remains the same.
Most machines have add-with-carry and subtract-with-borrow instructions. These
would have been useful, but most high-level languages do not give you direct
access to these instructions or the carry flag. If you want to optimize the
addition loop, you can always replace it with a machine-level version tailored
for your particular machine. However, the code that I present is fairly
machine independent and will execute rapidly on most machines.
Once the two numbers have been added, you're faced with the difficult problem
of deciding whether the result is positive or negative. The key is that the
sign is the same as the number with the largest absolute value, unless the two
numbers have the same absolute value but different signs. In the exception
case, the answer is zero, which is always positive. 
Since the largest potential size for the result was allocated, the result must
be passed through the reduce_bignum routine to remove any leading zeros.


Subtraction


Subtraction is handled the same as addition. Since addition handles negative
numbers, you just flip the sign bit of the second number before entering the
addition loop. I could have avoided duplicating the add_ bignum code in
sub_bignum by calling neg_bignum on the second number and then calling add_
bignum, but this would have caused a temporary number to be created and
destroyed. Since most memory-manager functions are expensive to call, I felt
it was better to duplicate the code.
Example 1: The number 598341038459653 as unpacked binary decimal.
0100: 00 0F 03 05 06 09 03 04
0108: 08 03 00 01 04 03 08 09
0110: 05
Example 2: The number 598341038459653 as packed binary decimal.
0100: 00 08 53 96 45 38 10 34
0108: 98 05

Listing One

/* Infinite Integer Definitions -- by Jeffrey W. Hamilton, 1993
** NOTE: This code assumes that a 'short int' is two bytes
** and a 'long int' is four bytes.
*/
#ifndef BIGNUM_DEFINES
#define BIGNUM_DEFINES
typedef struct BigNum {
 unsigned short int length; /* Left most bit is used for the sign of the
number */
 unsigned short int element[1]; /* Minimum of one element */
} *bignum;

/* Convience Routines */
#define NEGATIVE 0x8000
#define POSITIVE 0
#define BIGNUM(sign, value) { (sign) 1, (value) }
#define BIGNUM_SIZE(b) ((b)->length & ~NEGATIVE)
#define BIGNUM_SIGN(b) ((b)->length & NEGATIVE)

#define BIGNUM_SIZEOF(len) (sizeof(struct BigNum) + (((len)-1) * 
 sizeof(unsigned short)))
#define FORCE_POSITIVE(b) ((b)->length &= ~NEGATIVE)
#define FORCE_NEGATIVE(b) ((b)->length = NEGATIVE)
#define FORCE_INVERT(b) ((b)->length = (BIGNUM_SIGN(b) ^ NEGATIVE) 
 BIGNUM_SIZE(b))

/* Error Conditions */
#define BIGNUM_BADARG 22
#define BIGNUM_NOSPACE 12
#define BIGNUM_ZERO 1

#ifndef max
#define max(x,y) (((x) < (y)) ? (y) : (x))
#endif

bignum new_bignum(short int size);
void destroy_bignum(bignum);
bignum copy_bignum(bignum);
bignum ltobig(long int number);
double bigtod(bignum);
bignum dtobig(double);
bignum strtobig(char *string, char **endPoint, int radix);
char * bigtostr(bignum number, char *target, int maxLength, int radix);
bignum reduce_bignum(bignum);
bignum add_bignum(bignum, bignum);
bignum sub_bignum(bignum, bignum);
bignum neg_bignum(bignum);
bignum mult_bignum(bignum, bignum);
bignum div_bignum(bignum, bignum, bignum *);
bignum shiftl_bignum(bignum, unsigned long);
bignum shiftr_bignum(bignum, unsigned long);
bignum or_bignum(bignum, bignum);
bignum xor_bignum(bignum, bignum);
bignum and_bignum(bignum, bignum);
bignum not_bignum(bignum);
bignum set_bit_bignum(bignum, unsigned long, int value);
int test_bit_bignum(bignum, unsigned long);
#endif



Listing Two

/* Infinite Integer Conversion Facility -- by Jeffrey W. Hamilton, 1993 */
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <math.h>
#include <errno.h>
#include "bignum.h"

/* Allocate space for a bignum, but does not intialize it.
** Input: Number of elements to allocate. Must be greater than 0.
** Output: The allocated space
** Error: Returns NULL with errno set to one of the following:
** BIGNUM_BADARG - Size if less than 1
** BIGNUM_NOSPACE - No enough memory available to allocate the number

*/
bignum new_bignum(short int size)
{
 bignum temp;

 if (size <= 0) {
 errno = BIGNUM_BADARG;
 return NULL;
 }
 if ((temp = malloc(BIGNUM_SIZEOF(size))) == NULL) {
 errno = BIGNUM_NOSPACE;
 }
 temp->length = size;
 return temp;
}
/* Release the space occupied by a bignum 
** Input: Number to be released.
*/
void destroy_bignum(bignum temp)
{
 free(temp);
}
/* Create a duplicate copy of a bignum
** Input: Number to be copied.
** Output: Copy of the number
** Error: Returns NULL with errno set to one of the following:
** BIGNUM_NOSPACE - No enough memory available to allocate the number
*/
bignum copy_bignum(bignum temp)
{
 int size;
 bignum result;
 size = BIGNUM_SIZE(temp);
 if ((result = new_bignum(size)) == NULL) return NULL;
 memcpy(result, temp, BIGNUM_SIZEOF(size));
 return result;
}
/* Convert a long int to a bignum
** Input: A long integer number to be converted.
** Output: The equivalent value in bignum format
** Error: Returns NULL and errno will be set to:
** BIGNUM_NOSPACE - Not enough space to hold the number in memory.
*/
bignum ltobig(long temp)
{
 int sign;
 bignum result;

 /* Determine the sign of the number and put the value in absolute form */
 if (temp < 0) {
 sign = NEGATIVE;
 temp = -temp;
 } else {
 sign = POSITIVE;
 }
 /* Try to allocate the minimum space needed */
 if ((temp & 0xFFFF0000L) == 0) {
 /* Number will fit in one element */
 if ((result = new_bignum(1)) == NULL) return NULL;

 result->element[0] = (unsigned short) temp;
 } else {
 /* Number needs two elements */
 if ((result = new_bignum(2)) == NULL) return NULL;
 result->element[0] = (unsigned short) temp;
 result->element[1] = (unsigned short) (temp >> 16);
 }
 /* Record the correct sign */
 result->length = sign;
 return result;
}
/* Convert an unsigned long int to a bignum
** Input: A long integer number to be converted.
** Output: The equivalent value in bignum format
** Error: Returns NULL and errno will be set to:
** BIGNUM_NOSPACE - Not enough space to hold the number in memory.
*/
bignum ultobig(unsigned long temp)
{
 bignum result;

 /* Try to allocate the minimum space needed */
 if ((temp & 0xFFFF0000L) == 0) {
 /* Number will fit in one element */
 if ((result = new_bignum(1)) == NULL) return NULL;
 result->element[0] = (unsigned short) temp;
 } else {
 /* Number needs two elements */
 if ((result = new_bignum(2)) == NULL) return NULL;
 result->element[0] = (unsigned short) temp;
 result->element[1] = (unsigned short) (temp >> 16);
 }
 return result;
}
/* Convert a bignum to a long
** Input: A bignum to be converted.
** Output: The equivalent value as a long.
** Error: Returns NULL and errno will be set to:
** BIGNUM_NOSPACE - Not enough space to hold the number in a long.
*/
int bigtol(bignum num, long *temp)
{
 if (BIGNUM_SIZE(num) > 2) {
 /* Two many significant bits */
 errno = BIGNUM_NOSPACE;
 return -1;
 } else if (BIGNUM_SIZE(num) == 2) {
 if (num->element[1] & NEGATIVE) {
 /* The number contains more than 31 significant bits */
 errno = BIGNUM_NOSPACE;
 return -1;
 }
 *temp = ((long) num->element[1] << 16) num->element[0];
 } else {
 *temp = num->element[0];
 }
 if (BIGNUM_SIGN(num) == NEGATIVE) *temp = -*temp;
 return 0;
}

/* Convert a bignum to an unsigned long
** Input: A bignum to be converted.
** Output: The equivalent value as a long.
** Error: Returns NULL and errno will be set to:
** BIGNUM_NOSPACE - Not enough space to hold the number in a long.
*/
int bigtoul(bignum num, unsigned long *temp)
{
 if (BIGNUM_SIZE(num) > 2) {
 /* Two many significant bits */
 errno = BIGNUM_NOSPACE;
 return -1;
 } else if (BIGNUM_SIZE(num) == 2) {
 *temp = ((unsigned long) num->element[1] << 16) num->element[0];
 } else {
 *temp = num->element[0];
 }
 if (BIGNUM_SIGN(num) == NEGATIVE) *temp = -*temp;
 return 0;
}



Listing Three 
/* Infinite Integers Logic Facility -- by Jeffrey W. Hamilton, 1993 */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include "bignum.h"

/* Compare first bignum to the second
** Input: Two bignums
** Output: Returns -1 if first is less than second
** 0 if first equals second
** +1 if first is greater than second
*/
int cmp_bignum(bignum num1, bignum num2)
{
 int isNegative;
 int i;

 /* Quick check based on sign */
 if (BIGNUM_SIGN(num1) != BIGNUM_SIGN(num2)) {
 if (BIGNUM_SIGN(num1) == NEGATIVE) return -1;
 return 1;
 }
 /* Use a flag to compensate for positive / negative numbers */
 isNegative = (BIGNUM_SIGN(num1) == NEGATIVE) ? 1 : 0;

 /* Quick check based on length */
 if (BIGNUM_SIZE(num1) < BIGNUM_SIZE(num2)) {
 return (isNegative) ? 1 : -1;
 } else if (BIGNUM_SIZE(num1) > BIGNUM_SIZE(num2)) {
 return (isNegative) ? -1 : 1;
 }
 /* It looks like we have to be more thorough */
 for (i = BIGNUM_SIZE(num1) - 1; i >=0; i--) {
 if (num1->element[i] < num2->element[i]) {

 return (isNegative) ? 1 : -1;
 } else if (num1->element[i] > num2->element[i]) {
 return (isNegative) ? -1 : 1;
 }
 }
 /* They are equal */
 return 0;
}



Listing Four

/* Infinite Integers Math Facility -- by Jeffrey W. Hamilton, 1993 */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include "bignum.h"

/* Reduce a bignum to occupy the minimum space required
** Input: bignum to be reduced
** Output: reduced bignum
*/
bignum reduce_bignum(bignum number)
{
 register int j;

 /* Remove leading zero values from the high end of the number */
 for (j = BIGNUM_SIZE(number); (j > 1) && (number->element[j-1] == 0); j--);

 /* Special case: We don't allow a negative zero */
 if ((j == 1) && (number->element[0] == 0)) {
 FORCE_POSITIVE(number);
 }
 /* If the number is already at a minimal size, return it */
 if (j == BIGNUM_SIZE(number)) return number;

 /* Reallocate the number at a smaller size. In theory realloc could fail, 
 ** but since we are always reducing the space we are occupying, a failure
 ** would be a sign of a VERY poor memory manager.
 */
 number->length = BIGNUM_SIGN(number) j;
 return realloc(number, BIGNUM_SIZEOF(j));
}
/* Add two bignums
** Input: numbers to add
** Output: a bignum that is the sum
** Errors: Returns NULL with errno set to one of the following:
** BIGNUM_BADARG - Size if less than 1
** BIGNUM_NOSPACE - No enough memory available to allocate the results
*/
bignum add_bignum(bignum num1, bignum num2)
{
 long carry;
 int size, size1, size2;
 int sign1, sign2;
 register int i;
 bignum result;


 carry = 0;

 /* Insure sum can hold the results */
 size1 = BIGNUM_SIZE(num1);
 size2 = BIGNUM_SIZE(num2);
 size = max(size1, size2) + 1;
 if ((result = new_bignum(size)) == NULL) return NULL;

 sign1 = BIGNUM_SIGN(num1);
 sign2 = BIGNUM_SIGN(num2);

 /* Add the numbers */
 for (i = 0; i < size; i++) {
 if (i < size1) {
 /* We still have elements of num1 to process */
 if ((unsigned) sign1 == NEGATIVE) {
 carry -= num1->element[i];
 } else {
 carry += num1->element[i];
 }
 }
 if (i < size2) {
 /* We still have elements of num2 to process */
 if ((unsigned)sign2 == NEGATIVE) {
 carry -= num2->element[i];
 } else {
 carry += num2->element[i];
 }
 }
 result->element[i] = (unsigned short) carry;
 carry >>= 16;
 }
 /* Adjust the sign of the results */
 if (carry < 0) {
 /* Complement the answer */
 carry = 0;
 for (i = 0; i < size; i++) {
 carry -= result->element[i];
 result->element[i] = (unsigned short) carry;
 carry >>= 16;
 }
 FORCE_NEGATIVE(result);
 }
 return reduce_bignum(result);
}
/* Subtract the second bignum from the first.
** Input: numbers to subtract.
** Output: a bignum that is the subtraction
** Errors: Returns NULL with errno set to one of the following:
** BIGNUM_BADARG - Size if less than 1
** BIGNUM_NOSPACE - No enough memory available to allocate the results
*/
bignum sub_bignum(bignum num1, bignum num2)
{
 long carry;
 int size, size1, size2;
 int sign1, sign2;
 register int i;

 bignum result;

 carry = 0;

 /* Insure sum can hold the results */
 size1 = BIGNUM_SIZE(num1);
 size2 = BIGNUM_SIZE(num2);
 size = max(size1, size2) + 1;
 if ((result = new_bignum(size)) == NULL) return NULL;

 sign1 = BIGNUM_SIGN(num1);
 /* Invert the sign for subtraction */
 sign2 = (BIGNUM_SIGN(num2) == NEGATIVE) ? POSITIVE : NEGATIVE;

 /* Do a normal add with the inverted second number */
 for (i = 0; i < size; i++) {
 if (i < size1) {
 /* We still have elements of num1 to process */
 if ((unsigned)sign1 == NEGATIVE) {
 carry -= num1->element[i];
 } else {
 carry += num1->element[i];
 }
 }
 if (i < size2) {
 /* We still have elements of num2 to process */
 if ((unsigned)sign2 == NEGATIVE) {
 carry -= num2->element[i];
 } else {
 carry += num2->element[i];
 }
 }
 result->element[i] = (unsigned short int) carry;
 carry >>= 16;
 }
 /* Adjust the sign of the results */
 if (carry < 0) {
 /* Complement the answer */
 carry = 0;
 for (i = 0; i < size; i++) {
 carry -= result->element[i];
 result->element[i] = (unsigned short) carry;
 carry >>= 16;
 }
 FORCE_NEGATIVE(result);
 }
 return reduce_bignum(result);
}
/* Compute the negative of a bignum
** Input: number to negate
** Output: a negated copy of the number
** Errors: Returns NULL with errno set to one of the following:
** BIGNUM_BADARG - Size if less than 1
** BIGNUM_NOSPACE - No enough memory available to allocate the results
*/
bignum neg_bignum(bignum number)
{
 bignum result;


 if ((result = copy_bignum(number)) == NULL) return NULL;
 FORCE_INVERT(result);
 return result;
}



























































Data Attribute Notation Relationships


An object-oriented approach to analysis and design methodologies




Reginald B. Charney


Reg is president of Charney & Day Inc. and is a voting member of ANSI's X3J16
Committee on the C++ Language. He can be reached on CompuServe at 70272, 3427
or on the Internet at rbcharney@delphi.com.


Data Attribute Notation (DAN) is an object-oriented coding style that
emphasizes data abstraction. DAN, which I described in the article "Data
Attribute Notation and C++" (DDJ, August 1994) binds the abstract concepts
defined in a project's analysis and design stages with the actual
implementation stage. 
In this article, I'll cover how DAN can represent relationships that occur in
most problems. I'll also discuss functions as attributes and how DAN can
represent iterator classes.


Static and Dynamic Relationships


Relationships can be static or dynamic. A static relationship is always true,
even if there are no instances of the class to represent it. For example,
class definitions declare a static relationship between the components of the
class, even if the class is never instantiated. This is the "definition
relationship." The truth value of dynamic relationships is determined during
execution. For example, Ted is married to Alice as long as they are not
divorced. This example shows that analysis and design determines whether a
relationship is static or dynamic. If the system design does not support
divorce, then Ted being married to Alice is a static relationship.
In a system that allows for divorce, marriage is a dynamic relationship. That
is, the relationship must be checked at run time to see if Ted is married to
Alice. Note that even here, a static relationship is needed to relate Ted to
Alice. For instance, if each person has a Spouse attribute, you can ask if
Ted's Spouse is Alice and Alice's Spouse is Ted. If they are, then Ted is
married to Alice. Thus, there needs to be a static relationship such as spouse
to evaluate a dynamic relationship such as marital status.


Representing Relationships


For the purposes of this discussion, I'll use declarative code to represent
relationships. Declarative code is easy to write and easy to check for
correctness, and it can be nonprocedural since most declarations may be
reordered. The rest of this article uses people and car owners as examples.


Relationships as Functions


Listing One shows the function owner() that returns a nonzero value if the
given person owns the given car model. The value returned by the owner() is
determined at run time. 


Nonmember versus Member Functions


Nonmember functions have the following benefits:
They handle derived arguments fairly well.
They can be extended to n-ary relationships.
They express dynamic relationships well.
They handle noncommutative relationships well. (That is, they can distinguish
between a@b and b@a, where @ is a relationship between objects a and b.)
They can be overloaded to handle similar functions for different types; for
instance, home owners or car owners.
A nonmember function can be overloaded, but cannot serve as a base for another
function in the same sense that one class can serve as a base for another
class. A member function of a base class, on the other hand, can be overridden
by a derived-class member function. Further, if the member function is
virtual, invoking the correct function depends on the type of instance for
which the function is invoked. Thus, relationships represented by functions
can be extended when using virtual-member functions. Another benefit of using
member functions is that the implied first argument (that is, the this
pointer) is not implicitly converted. This eases the problem of a function
accepting either base- or derived-class instances as arguments.


Relationships as Classes


Classes can represent static relationships. For example, an Owner relationship
between a Person and a Car is shown in Listing Two . Listing Three , however,
shows that a FordOwner relationship can be defined as a class using
composition and inheritance. Listing Four goes all the way and defines a
ChevOwner only in terms of inheritance. In Listing Four, if a Person can be
taxed and a Chev can get fixed, then you can get a ChevOwner fixed and taxed
all at the same time. This shows that a class shares all the attributes of any
one of its inherited parts. If the class represents a relationship and it
inherits some of its parts, then what is true for any one of its inherited
parts applies to the relationship as a whole. When a relationship like Owner
inherits some of its attributes, those parts should make sense in toto. Thus,
the previous example shows poor design. In contrast, FilledCircle is a useful
derived class composed only of inherited base classes. It inherits all
attributes from its two base classes: FillPattern and Circle.


Functions are Attributes



DAN states that a class is defined by its attributes. Consistent with this is
the fact that member and friend functions of a class are also attributes of
the class. 
For example, if an Owner has to renew his car license every year, an attribute
Renewed can be defined. To check if this attribute is set, a member function
isRenewed can be invoked, as in Listing Five . The function isRenewed could
also be defined as an attribute class, IsRenewed. Listing Six shows that a
dynamic relationship can be converted into a static relationship of some kind
as represented by an attribute class.


Multiple Relationships


Relationships can be one-to-one, one-to-many, many-to-one, and many-to-many.
Having illustrated in the previous relationships that a one-to-one
relationship can be represented by a function or an attribute class, I'll
extend this to the other relationships. It seems fairly obvious that a
one-to-many relationship can be represented by a function member, where the
class instance is the one and the argument list is the many. In the case of a
nonmember function, the first argument is the one and the rest of the argument
list is the many. (Neither of these two implementations imply that the order
of evaluation of the arguments is the same as the order of the arguments.) In
the case of the many-to-one and many-to-many relationships, things get more
interesting.
The many could be represented by a single class containing the many. This many
class can have any form mentioned earlier in the car-and-owner example. That
is, it could be either a complete composite of all the many or a mixture of
inherited parts and parts composing the many class. It could also be a
completely inherited class where all the parts of the many are inherited. The
form of the class is problem dependent.
In a many-to-one relationship, a member function can be used if the class
instances represent the many and the one represents the single function
argument.
In a many-to-many relationship, a function member can have its class instance
represent the first many, and the single arguments represent the second many.
Nonmember functions have two arguments, each representing a many.
In implementing one-to-many, many-to-one, and many-to-many relationships, the
most important concept is that the many can be represented by a single class,
so complete relationships can be represented by one class per relationship.
Listing Seven shows classes representing each relationship. Pure composition
was used in all these classes. A mixture of composition and inheritance might
be more appropriate, depending on the problem.
It is important to note that in any relationship, you must be able to
encapsulate each side of the relationship into a class. For example, if a
number of people own a number of cars, you have a group called People and a
group called Fleet. FleetOwners is the resulting relationship. In the case of
the one-to-one relationship called CarOwner, one side was encapsulated into
one Person and the other side was encapsulated into a Car.


Iterators


The relationship classes I've just discussed often contain collections.
Iterator classes are used to iterate over collections of objects. The rest of
this article uses iterator classes to show how code normally thought to be
procedural in nature can be written in a declarative fashion using DAN.
Normally, an iterator class has next(), prev(), and reset() function members
or their equivalent. For example, consider iterating over the collection
FleetOwner so that a report is produced showing the list of cars owned in the
collection Fleet. A classic C++ program using iterators would look something
like Listing Eight ; Listing Nine describes a more DAN-like approach for the
same example.


Declarative Code


In Listing Nine, I have written executable code when my intention was to
illustrate writing declarative statements. To express this fragment of code in
declarative format, we need to agree that all fragments have a basic form. The
form consists of an initialization stage (Init), a main stage (Body), and a
termination stage (Term)--any or all of which can be empty. Further, when two
fragments are placed together, the result is also a fragment. As such, you can
declare any fragment to be a class of the simplified form. Listing Ten , which
illustrates this technique, intentionally contains no definition for the
insert operator << function. Combining code may be problem and language
dependent. In a sequential machine, the termination stage of one fragment can
be concatenated with the initialization stage of the next fragment. In a
parallel machine, all initialization stages may be executed in parallel. In a
C++ program, there is a difference between static and dynamic data. Thus,
initialization code would need to be subdivided in those two types of data
before initialization could be performed. Regardless of these variations,
Listing Eleven is true. Both f1 and p1 are statically defined fragments. This
is consistent with languages like C++ that are static in nature. Languages
like Lisp exhibit behavior like f2 and p2 that can only be evaluated at run
time.
Listing Twelve is a program in which strings of tokens serve as arguments to
the Init, Body, and Term constructors. Listing Twelve is purely declarative in
nature. The only sequencing rule is that everything must be defined before
use.

Listing One 

Person ted(Ford);
int owner(Person& p, Model m);
// ... 
if (owner(ted,Chevy)) // ...



Listing Two
class Person { };
class Car { };
class Owner {
 Person p; // part 1
 Car c; // part 2
};



Listing Three
class Ford : public Car { };
class FordOwner : public Person
{
 Ford f;
};



Listing Four
class Person { };

class Chev { };
class ChevOwner : 
 public Person, 
 public Chev
{
};



Listing Five
int isRenewed() 
 { return Renewed(*this)==1; }
// . . .
Owner o;
// . . .
if (o.isRenewed()) // . . .



Listing Six
#include <iostream.h>
#include <string.h>
class Renewed {
 int r;
public:
 Renewed(const int rr=0)
 { r = rr; }
 operator int() const
 { return r; }
};
class Car { };
class Person {
 char *n;
public:
 Person(const char *nn="")
 { strcpy(n,nn); }
 friend ostream& operator <<
 (ostream& os, Person& p)
 { os << p.n; }
};
 class Owner : public Person
{
 Car c;
 Renewed r;
public:
 Owner(const char *n) :
 Person(n) { }
 operator Renewed() const
 { return r; }
 Owner& operator <<
 (const Renewed& rr)
 { r = rr; return *this; }
};
class IsRenewed {
 Renewed r;
public:
 IsRenewed(const Owner& o)
 { r = Renewed(o); }
 operator int() const

 { return r; }
};
int main()
{
 Owner owner("Ted");
 owner << Renewed(1);
 if (IsRenewed(owner))
 cout << Person(owner)
 << " has renewed\n";
 return 0;
}



Listing Seven
class Person { };
class Car { };
class People // any # of people
{
 Person **pp;
};
class Fleet // any # of cars
{
 Car **cp; 
};
// 1:1 relationship of CarOwner
class CarOwner
{
 Person p;
 Car c;
};
// 1:m relationship of one person 
// owning any number of cars
 class FleetOwner
{
 Person p; // one person
 Fleet f; // many cars
};
// m:1 relationship of a group of people
// owning one Car
class TaxiCoop
{
 People p; // many people
 Car cp; // one car
};
// m:m relationship of many cars
// owned by many people
class FleetOwners
{
 People p; // many people
 Fleet f; // many cars
};



Listing Eight
class FleetOwner
{
 friend class FOIter;

 Person p;
 Fleet f;
};
class FOIter
{
 int status; // =0 if empty
 CurrElem c; // save cur elem
public:
 FOIter(FleetOwner& fo);
 int next(); // =0 at end
 int prev(); // =0 at start
 void reset();
 friend ostream& operator <<
 (ostream& os, FOIter& foi)
 { return os << foi.c; }
};
int main()
{
 FleetOwner fo;
 FOIter foI(fo);
 foI.reset();
 while(foI.next())
 cout << foI << endl;
 return 0;
}



Listing Nine
class FleetOwner
{
 friend class FOIter;
 Person p;
 Fleet f;
};
class Next { };
class Prev { };
class Reset { };
class CurrElem
{
public:
 friend ostream& operator <<
 (ostream& os, CurrElem& c);
};
class FOIter{
 int status; // =0 if empty
 CurrElem c; // save cur elem
public:
 FOIter(FleetOwner& fo)
 { status = 0; }
 operator Next();
 operator Prev();
 operator Reset();
 operator int()
 { return status; }
 friend ostream& operator <<
 (ostream& os, FOIter& foi)
 { return os << foi.c; }
};

int main()
{
 FleetOwner fo;
 FOIter foI(fo);
 Reset(foI);
 while(Next(foI))
 cout << foI << endl;
 return 0;
}

class Code { };
class Init : public Code { };
class Body : public Code { };
class Term : public Code { };
class Fragment // order of parts
{ // in this class
 Init ii; // determines the
 Body bb; // order of
 Term tt; // initialization
public:
 Fragment(Init i,Body b,Term t)
 : ii(i), bb(b), tt(t) { }
 Fragment() { }
 Fragment& operator <<
 (Fragment& f);
};



Listing Ten
class Code { };
class Init : public Code { };
class Body : public Code { };
class Term : public Code { };
class Fragment // order of parts
{ // in this class
 Init ii; // determines the
 Body bb; // order of
 Term tt; // initialization
public:
 Fragment(Init i,Body b,Term t)
 : ii(i), bb(b), tt(t) { }
 Fragment() { }
 Fragment& operator <<
 (Fragment& f);
};



Listing Eleven
Init i1, i2;
Body b1, b2;
Term t1, t2;
Fragment f1(i1,b1,t1);
Fragment f2;

f2 << f1;
Fragment p1(i2,b2,t2);
Fragment p2;


p2 << f1 << f2;



Listing Twelve
Init i1("FleetOwner fo;
 FOIter foI(fo);
 ");
Init i2("foI << r;");
Body b2("while(Next(foI))
 cout << foI << endl;
 ");
Term t1("return 0;");
Fragment f1(i2,b2,Term(""));
Fragment program(i1,f1,t1);















































The RC5 Encryption Algorithm


A fast, symmetric block cipher that may replace DES




Ronald L. Rivest


Ron is associate director of the MIT Laboratory for Computer Science, a
coinventor of the RSA public-key cryptosystem, and a cofounder of RSA Data
Security Inc. He can be contacted at rivest@theory.lcs.mit.edu. RC5 and
RSA-RC5 are trademarks of RSA Data Security Inc. Patent pending.


The RC5 encryption algorithm is a fast, symmetric block cipher suitable for
hardware or software implementations. A novel feature of RC5 is the heavy use
of data-dependent rotations. RC5 has a variable-length secret key, providing
flexibility in its security level.
RC5 is a parameterized algorithm, and a particular RC5 algorithm is designated
as RC5-w/r/b. The parameters are as follows:
w is the word size, in bits. The standard value is 32 bits; allowable values
are 16, 32, and 64. RC5 encrypts two-word blocks: plaintext and ciphertext
blocks are each 2w bits long.
r is the number of rounds. Allowable values are 0, 1_255.
The number of bytes in the secret key K. Allowable values of b are 0, 1_255.
RC5 uses an "expanded key table," S, derived from the user's supplied secret
key K. The size t of table S depends on the number r of rounds: S has t=2(r+1)
words.
RC5 is not intended to be secure for all possible parameter values. On the
other hand, choosing the maximum parameter values would be overkill for most
applications.
We provide a variety of parameter settings so that users may select an
encryption algorithm whose security and speed are optimized for their
application, while providing an evolutionary path for adjusting their
parameters as necessary in the future.
For example, RC5-32/16/7 is an RC5 algorithm with the number of rounds and the
length of key equivalent to DES. Unlike unparameterized DES, however, an RC5
user can upgrade the choice for a DES replacement to an 80-bit key by moving
to RC5-32/16/10. 
As technology improves, and as the true strength of RC5 algorithms becomes
better understood through analysis, the most appropriate parameters can be
chosen. We propose RC5-32/12/16 as providing a "nominal" choice of parameters.
Further analysis is needed to analyze the security of this choice.


Overview of the Algorithm


RC5 consists of three algorithms, one each for key expansion, encryption, and
decryption. These algorithms use the following three primitive operations (and
their inverses).
Two's complement addition of words, denoted by "+". This is modulo-2w
addition. 
Bit-wise exclusive-OR of words, denoted by _. 
A left-rotation (or "left-spin") of words: the rotation of word x left by y
bits is denoted x <<< y. Only the lg(w) low-order bits of y are used to
determine the rotation amount, so that y is interpreted modulo w. 
The key-expansion routine expands the user's key K to fill the expanded key
array S, so S resembles an array of t random binary words determined by the
user's secret key K. The array S is first initialized using a linear
congruential generator modulo 2w determined by some "magic constants." Then, S
is mixed with the secret key K in three passes by both the + and <<<
operations.
The key-expansion function has a certain amount of "one-wayness": It is not so
easy to determine K from S. 
For encryption, we assume that the input block is given in two w-bit
registers, A and B, and the ouput is also placed in the registers A and B.
Example 1 is a pseudocode version of the encryption algorithm. The output is
in the registers A and B. The decryption routine is easily derived from the
encryption routine.


Speed and Security


The encryption algorithm is very compact, and can be coded efficiently in
assembly language on most processors. The table S is accessed sequentially,
minimizing issues of cache size. The RC5 encryption speeds obtainable are yet
to be fully determined. For RC5-32/12/16 on a 90-MHz Pentium, a preliminary
C++ implementation compiled with the Borland C++ compiler (in 16-bit mode)
performs a key setup in 220 msec and performs an encryption in 22 msec
(equivalent to 360,000 bytes/sec). These timings can presumably be improved by
more than an order of magnitude using a 32-bit compiler and/or assembly
language--an assembly-language routine for the 486 can perform each round in
eight instructions.
A distinguishing feature of RC5 is its heavy use of data-dependent
rotations--the amount of rotation performed is dependent on the input data,
and is not predetermined.
The use of variable rotations should help defeat differential and linear
cryptanalysis since bits are rotated to "random" positions in each round. 
I invite the reader to help determine the strength of RC5.


Acknowledgments


I'd like to thank Burt Kaliski, Lisa Yin, Paul Kocher, and everyone else at
RSA Laboratories for their comments and constructive criticism.


References


Biham, E. and A. Shamir. A Differential Cryptanalysis of the Data Encryption
Standard. Berlin: Springer-Verlag, 1993.

Matsui, Mitsuru. "The First Experimental Cryptanalysis of the Data Encryption
Standard" in Proceedings CRYPTO '93. Berlin: Springer, 1994.
Example 1 Pseudocode of RC5 encryption algorithm.





























































Time for the 68332


Connecting timekeeping devices to microcontrollers




Eric McRae


Eric, an independent embedded-systems consultant from Redmond, WA, won the
1993 Motorola 68HC16 design contest and was a finalist in the 1994 Motorola
TPU design contest. He can be contacted at eric@digex.wa.com or by phone at
206-885-4107.


I was recently involved in a Motorola 68332-based design project that required
an external, battery-powered timekeeping device and a small amount of
nonvolatile memory. Initially, we chose the Dallas Semiconductor 1202 serial
timekeeping chip as the external device because it provides a 3-wire
synchronous serial interface, a real-time clock/calendar function that
produces data nearly identical to the ANSI C function localtime(), and 24
bytes of static RAM (SRAM) retained by the battery which powers the clock.
Even though their synchronous interfaces aren't directly compatible and they
expect data in opposite bit order, tying the 68332 to the 1202 appeared
straightforward enough because of the flexibility of Motorola's Queued Serial
Peripheral Interface (QSPI). On the 1202 side, the critical connections are
reset, clock, and bidirectional data, as described in Figure 1. The key 68332
connections, on the other hand, are clock, data out, data in, and chip
selects. 
Since the interface protocol was fixed by the 1202, we made accommodations
both in the external hardware and QSPI configuration. For instance, the 1202
monitors its own reset line: When its reset line goes high, the 1202 begins
latching data on the rising edges of the clock. The first bit in is the
least-significant bit of a command byte. After receiving the eighth bit, the
1202 interprets the command. If a read command was issued, the 1202 drives the
data line following the falling edge of the clock; see Figure 2. If a
write-data command was issued, the 1202 continues latching data bits on each
rising edge of the clock.
Motorola's 68332 QSPI is flexible in terms of data rates, clock polarity, and
clocking edge. The chip selects can be programmed to assume any desired
pattern before, during, and after a serial transaction. You can
programmatically configure the setup time between the assertion of the chip
selects and the first edge of the clock, as well as the hold time during which
the chip selects are held asserted after the last data bit has been clocked.
The flexibility of the interface is useful but requires careful setup of the
low-level driver routines. We also used the QSPI to serial load a sizable
field-programmable gate array (FPGA) from program ROM at 4 Mbits/sec. 
The first problem we confronted in the design was how to split the single
bidirectional data line at the 1202 into the two unidirectional data lines at
the 68332. We did this by implementing the simple gating circuit controlled by
PCS1; see Figure 3. If PCS1 is high, data output from the MOSI output of the
68332 is transmitted to the 1202. If PCS1 is low, data transmitted by the 1202
is directed to the MISO input of the 68332. The reset line of the 1202 is
driven directly from the PCS2 chip select. The serial-clock output from the
68332 is connected to the serial-clock input on the 1202. The QSPI is a
full-duplex interface. Data is normally both transmitted and received on every
clock. In this system, the data received while transmitting commands is
ignored.
The command/data protocol of the 1202 commences with a command byte that's
transmitted to the 1202. This byte contains a read/write bit which determines
the direction of data flow for bits following the command byte. You can
request the transfer of a single byte or a whole block (burst) of bytes. We
used both modes in our design. The clock-burst command causes the transfer of
the entire set of clock/calendar registers. Since the QSPI can transfer up to
16 packets of 1 to 16 bits each, it worked well to transfer the command byte
and then the eight bytes associated with the clock/calendar.
Listing One is the include file which is referenced by all listings presented
in this article. Listing Two is the software that drives the QSPI. The
set_clkW() and time() routines--the entry points for upper-level application
code--always reconfigure the QSPI for use in communicating with the 1202. This
is necessary because the interface is used to talk to the FPGA mentioned
earlier. Note that these routines bit reverse the data sent to the clock
because the QSPI uses MSB-first transactions while the 1202 is just the
opposite. Also note that the set_clkW() function verifies the operation by
calling time() after setting the time via set_clkV(). The command and
configuration registers need be set up only once using qspi_initV(). 


Curses


Two control bits in the 1202 must be manipulated: the clock-halt bit (which
must be set to 0 for the clock function to work) and the write-protect bit
(which must be 0 before the clock or SRAM can be written to). This
write-protect bit feature led me to curse the 1202 databook.
Initially, I thought I could do a burst read or write of the clock registers
and access only the seven bytes I was interested in; see Figure 4. I thought
it would be proper to create a separate routine to set or clear the
write-protect bit; see qset_writeproV() in Listing Two. Then I could also use
this function when accessing SRAM. To set the clock, I would clear the
write-protect bit, do a burst transfer of the seven clock registers, and then
set the write-protect bit. 
After prototyping and testing the code, I discovered that no matter how hard I
tried, I couldn't set the clock. Finally, I noticed a small box tucked away at
the bottom of the transfer chart that was apparently associated with the
burst-mode transfer diagram; see Figure 2. According to the box, all registers
associated with the clock have to be transferred. After setting the transfer
for eight bytes, I could suddenly set the clock. I had been led astray by the
fact that the prototype burst-mode read transferred only seven bytes with no
ill effect. I'll say here what the databook should have said about burst-mode
transfers: You must transfer all eight bytes of the clock/calendar function or
all 24 bytes of the RAM registers when doing burst writes. If you don't
transfer the complete set, the entire operation is discarded.
Individual routines (not included in Listing Two) handled transactions with
the SRAM in the clock chip. These routines differed from time() and set_clkW()
in that no bit reversing was needed, the address in the command byte was
different, and the number of bytes transferred was different. Finally,
qset_writeproV(ON) is called after writes to SRAM.
Having discovered the solution to the clock-set problem, I reviewed the
databook to see if I'd missed anything else. I had: The timing diagram shows
that the SCLK input must be low when reset is de-asserted and high when reset
is reasserted at the end of the transfer. I hadn't noticed this before, and
the setup I created for the QSPI didn't operate that way. In general, the QSPI
serial clock starts and ends on the same phase. Since everything seemed to
work fine, I ignored this discrepancy. 


Big Time


As the specifications were firmed up for the project, it became necessary to
provide some sort of unique serialization for every unit. One project engineer
noted that the Dallas Semiconductor 2404 provides a real-time clock, SRAM, and
a 64-bit ROM containing a unique bit pattern. Examination of the databook
revealed that the device used the same 3-wire interface as the 1202, had a
real-time clock/calendar in binary format, and provided substantially more
SRAM. Since we were due to get actual hardware "real soon now," I didn't
bother to wire the 2404 in the prototype. A month later, I wished I had.
The 2404 contains three main functions: the serial-number ROM, SRAM, and
timekeeping. The ROM can be accessed only through a "1-wire" interface--a real
nuisance because this forced us to add another connection in the circuit and
more driver software to the project. I had expected that all functionality
would be available from either interface. Fortunately the 68332 has plenty of
I/O, so the extra bit was readily available.
The 1-wire interface uses a protocol whereby the CPU initiates every bit
transfer by driving the wire low for one msec; see Figure 5. If the CPU is
transmitting, it either keeps the line low for 15 or more msecs, or drives it
high. Thus if the 2404 sees a low pulse less than 15 msecs long, it interprets
it as a 1. If the CPU is reading data, it still drives the line low for one
msec. It then tri-states its driver and delays about 13 msecs before sampling
the line. The 2404 will hold the line low for a 0 bit or leave the line alone
for a 1. The line is pulled high by an external 4.7-KW resistor.
This interface has several critical timing requirements. First, there is a
required reset sequence; see Figure 6. The 480-msec recovery period after the
end of the initial reset pulse is required. The CPU must not attempt further
communication during this time. I tried it, sending a command byte right after
the end of the "presence pulse" sent by the 2404. Response to the command was
correct in every way except that it could only pull the interface down to
about 1.8 volts for a logic 0. (If you see middle-level signals like this,
take a good look at your timing.)
Listing Three is the C code necessary to read the serial number from the
1-wire port. Note that I used delays longer than necessary to meet the
specifications of the 2404. I wanted the routine to function properly if the
CPU was ever upgraded from 16 to 20 MHz. Also note that this routine will
probably fail if interrupts are enabled. This forced me to read the 1-wire
interface once at start-up to get the serial number, then never use it again.
All other transactions are handled on the 3-wire interface.
One problem with the 1-wire interface stemmed from our choice of the RMC pin
on the MC68332 to drive the interface. We made this decision because we didn't
need the read-modify-write status it provided as a default function. I
configured the pin for discrete I/O, but under very unusual circumstances, the
pin seemed to have a mind of its own. The CPU32 core has a reset instruction
that resets all peripheral functions but not the actual CPU core. I
implemented a software reset by executing a reset instruction and then
fetching the reset vector and stack pointer. I believe (but can't confirm)
that the reset instruction was not fully resetting all peripheral functions
because RMC output, even though it was programmed as discrete I/O, would track
bus grant activity (every refresh cycle). The pulsing on the I/O line
completely disrupted all "1-wire" transactions--but only when the discrete
output was set low. I circumvented this by changing the reset mechanism to
cause a double-bus fault followed by a halt-monitor reset that actually drives
the hardware-reset line low.


Not Quite the Same Interface


The specification for both the 2404 and the 1202 says that for write
transactions, the clock must start low and end high. I was able to ignore this
for the 1202 but the 2404 would not function properly unless this requirement
was met. The QSPI is very flexible, but you cannot easily configure separate
starting and ending states for the clock. You can, however, implement what is
needed. The key is that while the QSPI is running, the pins behave according
to the QSPI configuration and control registers. When the QSPI is inactive,
the pins assume that the states defined are the normal port-D data and
data-direction registers.
Listing Four shows how I accomplish the needed change in the clock state.
Basically, I start the QSPI with the default state of the clock pin low.
Immediately after transmission starts, I change the default state of the clock
pin to high and the default state of the chip-select signal that controls
reset to the 2404. After the QSPI finishes transmission, the clock pin is left
high and the reset line to the 2404 is not de-asserted. In a sense, the 2404
is still waiting for more bits. At this point, the code asserts the reset pin,
then lowers the clock output. Since the QSPI drives the clock low before
turning it over to the default pin control, I am actually transmitting an
extra bit. This extra bit could cause trouble if ignored because it writes
into the next byte in the 2404. However, this was not a problem in this
application.
The 2404 contains a port selector which arbitrates control between the 1- and
3-wire interfaces. If both ports are in the reset state, the first one to come
out of reset will have control. The 1-wire port, however, will keep control
unless a second reset/presence pulse pair is generated at the end of all
1-wire transactions. If you forget this, you won't be able to get the 3-wire
interface to respond. Also, be sure that there are no glitches on the 1-wire
line that could simulate a reset pulse. This could easily happen if the I/O
pin were improperly configured by start-up code.


Wrapping It Up


Having handled all the aforementioned issues, I was frustrated to find that
the 2404 still did not respond properly to the interface. The project leader
proposed eliminating the 2404 from the design because we couldn't figure out
the problem and we needed to move forward. However, I wanted to give the 2404
one last shot. During one final late-night call, a Dallas Semiconductor
engineer correctly suspected ringing on the clock line, so I inserted a
series-softener (33W) resistor at the SCK output from the 68332. This cleaned
up the ringing, but there were still random failures.
In this design, the FPGA controls DRAM refresh. I was enabling the FPGA,
reading the serial number from the 2404, and then reading configuration
information from the 2404 SRAM, which was used to set the refresh rate in the
FPGA. It turned out that the power-up refresh rate of the FPGA was so fast
that it severely impacted the timing of the 1-wire interface code and would
often--but not always--lock up the 2404. Reading the 1-wire interface before
enabling the FPGA circumvented this problem. The 2404 stayed in the design,
and the interface has been rock solid ever since.



References


Woehr, Jack J. "Programming the Motorola 68332." Dr. Dobb's Journal (August
1993). 
Figure 1 68332 and 1202 synchronous serial interfaces.
Figure 2 1202 protocol.
Figure 3 68332-to-1202 interface.
Figure 4 1202 command/register map.
Figure 5 2404 1-wire data formats.
Figure 6 2404 reset sequence.

Listing One
/* qspi.h. Contains MC68332 specific definitions for use with the QSPI */
#define UBYTE unsigned char /* An 8 bit value */
#define UWORD unsighed short /* A 16 bit value */
#define ON 1
#define OFF 0

/* Pointers to internal registers are used for ease in debugging. The pointers
** are const, the registers they point to are volatile. */
typedef volatile UBYTE * const RegPtrB; /* for internal control registers */
typedef volatile UWORD * const RegPtrW; /* for internal control registers */

/* 68332 specific defines and pointers used in this file */
#define REG_BASE 0xfff000 /* Base address for internal registers */
RegPtrB r32_qpdrPB = (RegPtrB)(REG_BASE + 0xc15); /* QSPI default data Reg */
RegPtrB r32_qparPB = (RegPtrB)(REG_BASE + 0xc16); /* QSPI Pin Assignment Reg
*/
RegPtrB r32_qddrPB = (RegPtrB)(REG_BASE + 0xc17); /* QSPI Data Direction Reg
*/

/* Bits defines for *r32_qparPB & *r32_qddrPB */
#define TXD B7 /* transmit data */
#define PCS3 B6 /* peripheral chip select 3 */
#define PCS2 B5 /* peripheral chip select 2 */
#define PCS1 B4 /* peripheral chip select 1 */
#define PCS0 B3 /* peripheral chip select 0 */
#define SCK B2 /* serial clock */
#define MOSI B1 /* master out slave in */
#define MISO B0 /* master in slave out */

RegPtrW r32_qcr0PW = (RegPtrW)(REG_BASE + 0xc18); /* QSPI Control Register 0
*/
#define MSTR B15 /* master/slave mode select */
#define WOMQ B14 /* wired-or mode for QSPI pins */
#define CPOL B9 /* clock polarity */
#define CPHA B8 /* clock phase */

RegPtrW r32_qcr1PW = (RegPtrW)(REG_BASE + 0xc1a); /* QSPI Control Register 1
*/
#define SPE B15 /* QSPI enable */

RegPtrW r32_qcr2PW = (RegPtrW)(REG_BASE + 0xc1c); /* QSPI Control Register 2
*/
#define SPIFIE B15 /* SPI finished interrupt enable */
#define WREN B14 /* wrap enable */
#define WRTO B13 /* wrap to */
#define NEWQP B0 /* new queue pointer */
#define ENDQP B8 /* ending queue pointer */

RegPtrB r32_qcr3PB = (RegPtrB)(REG_BASE + 0xc1e); /* QSPI Control Register 3
*/
#define LOOPQ B2 /* QSPI loop mode */

#define HMIE B1 /* HALTA and MODF interrupt enable */
#define HALT B0 /* halt */

RegPtrB r32_qsrPB = (RegPtrB)(REG_BASE + 0xc1f); /* QSPI Status Register */
#define SPIF B7 /* QSPI finished flag */
#define MODF B6 /* mode fault flag */
#define HALTA B5 /* halt acknowledge flag */
#define CPTQP B0 /* completed queue pointer */

RegPtrW r32_qrxPW = (RegPtrW)(REG_BASE + 0xd00); /* QSPI receive RAM */
RegPtrW r32_qtxPW = (RegPtrW)(REG_BASE + 0xd20); /* QSPI transmit RAM */
RegPtrB r32_qccPB = (RegPtrB)(REG_BASE + 0xd40); /* QSPI control RAM */
#define CONT B7 /* continue */
#define BITSE B6 /* bits per transfer enable */
#define DT B5 /* delay after transfer */
#define DSCK B4 /* PCS to SCK delay */
#define PCS3C B3 /* peripheral chip select 3 control */
#define PCS2C B2 /* peripheral chip select 2 control */
#define PCS1C B1 /* peripheral chip select 1 control */
#define PCS0C B0 /* peripheral chip select 0 control */


Listing Two
/* C code for DS1202 3 wire interface. This file contains handlers for the 
** MC68332 QSPI interface to the Dallas 1202 clock/calander time functions. */

#include <string.h> /* for memcmp def */
#include <time.h> /* for time_t and struct tm defs */
#include "qspi.h" /* MC68332 QSPI specific definitions */

/* Private declarations */
static UWORD qspi_lockW; /* Lock semiphore, 1 if locked, 0 if not */
static UBYTE bitrevB( UBYTE ); /* bit reverses the arg */
static UBYTE spif_validB; /* becomes valid after 1st xmit */
static UBYTE *qbufPB; /* pointer into QSPI transmit buffer */

/* qspi_initV -- Initial config of QSPI. Called at power-up. */
void qspi_initV( void )
{
 /* Port Q pin assignments: use all but PCS3 */
 *r32_qparPB = PCS2+PCS1+PCS0+MOSI+MISO; /* Pin Assign: use all but PCS3 */
 /* Default Pin state: all high except MISO */
 *r32_qpdrPB = TXD+PCS3+PCS2+PCS1+PCS0+SCK+MOSI;
 /* Pin dir: all out except MISO */
 *r32_qddrPB = TXD+PCS3+PCS2+PCS1+PCS0+SCK+MOSI;
 qspi_lockW = 0; /* initialize the lock on QSPI */
 /* QSPI must be further separately initialized by qcfg4fpgaV() or
 ** qcfg4clkV() for their separate and incompatible purposes. */
}
/* Function: qcfg4clkV. Description: configures the QSPI to talk to the clock 
** chip. Arguments: none. Returns: void */
static void qcfg4clkV( void )
{
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */
 *r32_qpdrPB = 0xff; /* Default Pin state: all high */
 /* Master mode, no wired OR, 16 bit default, SCK inactive low, data change
 ** on falling edge, captured on rising edge, 1.05 MHz clock rate */
 *r32_qcr0PW = MSTR + 0x008;
}

/* Function: tobinN
** Description: Converts BCD in lower byte in UWORD to int
** Arguments: UWORD containing BCD in lower byte
** Returns: integer equivilent to BCD value.
** Caveats: No check is done on validity of argument */
int tobinN( UWORD valW )
{
 return( ( ( (valW >> 4) & 0x0f ) * 10 ) + ( valW & 0x0f ) );
}
/* Function: tobcdB
** Description: Converts byte argument to BCD
** Arguments: UBYTE value to be converted to BCD ( 0 - 99 )
** Returns: UBYTE BCD equivilent of argument
** Caveats: No check is done on validity of argument */
UBYTE tobcdB( UBYTE valB )
{
 UBYTE tmpB;
 tmpB = valB/10;
 tmpB <<= 4;
 return( tmpB += valB % 10 );
}
/* Function: qset_writeproV
** Description: Sets or clears the write protect bit in the DS1202
** Arguments: int: 0 to set write protect off, 1 to set it on
** Returns: void
** Caveats: Routine does not verify that the WP bit was properly set */
static void qset_writeproV( int on_offN )
{
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */
 /* Set up the clock command and data starting at slot 6 */
 /* Xfer 16 bits, PCS1 low, stop, Send command byte and WP byte */
 *(r32_qccPB + 6) = CONT+BITSE+DT+DSCK+PCS2C+PCS0C;

 /* Setup the transmit RAM. Need to encode & reverse the bits to be sent */
 /* Bits are sent MSB first */
 if( on_offN ) /* if wants write protect set */
 *(r32_qtxPW+0) = 0x7101; /* Control byte 8e, set write protect */
 else
 *(r32_qtxPW+0) = 0x7100; /* Control byte 8e, clr write protect */
 *r32_qsrPB = 0; /* Clear SPIF flag */

 /* send the command and data to the clock */
 *r32_qcr2PW = 0x0606; /* Start = slot 6, end = 6, no Wrap or loop */
 /* Enable xmit, selects lead clock by 1.8 uSec, delay 8 Usec at end */
 *r32_qcr1PW = SPE + 0x1f04;
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */
 /* write protect should now be configured */
}
/* Function: set_clkW
** Description: Sets the clock from a time_t value.
** Arguments: pointer to struct tm
** Returns: void
** Caveats: This routine does not verify it's actions */
static void set_clkV( struct tm *timePH )
{
 UWORD tmpW;
 while ( lockW( &qspi_lockW ) ) ; /* wait here until we own the QSPI */
 qcfg4clkV(); /* set up pin config for DS1202 */
 /* first Unset write protect */

 qset_writeproV( OFF );

 /* Set up the clock data and commands. */
 /* Xfer 8 bits, PCS1 low, continue, Send command byte */
 *(r32_qccPB + 0) = CONT+DSCK+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, Send seconds and minutes bytes */
 *(r32_qccPB + 1) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, Send hours and day of month */
 *(r32_qccPB + 2) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, send month and day of week */
 *(r32_qccPB + 3) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16, PCS1 low, delay, stop, send year and write protect */
 *(r32_qccPB + 4) = DT+BITSE+PCS2C+PCS0C;

 /* Setup the transmit RAM. Need to encode & reverse the bits to be sent */
 *(r32_qtxPW+0) = 0x007d; /* Control byte 0xbe, clock burst write */

 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_sec ) ) << 8;
 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_min ) );
 *(r32_qtxPW+1) = tmpW; /* seconds and minutes bytes */

 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_hour ) ) << 8;
 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_mday ) );
 *(r32_qtxPW+2) = tmpW; /* hours and day of month */

 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_mon ) ) << 8;
 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_wday + 1 ) );
 *(r32_qtxPW+3) = tmpW; /* month and day of week */

 tmpW = bitrevB( tobcdB( (UBYTE)timePH->tm_year ) ) << 8;
 tmpW = 0x0001; /* add in write protect bit */
 *(r32_qtxPW+4) = tmpW; /* year and write protect */

 *r32_qsrPB = 0; /* Clear SPIF flag */

 /* send the data to the clock */
 *r32_qcr2PW = 0x0400; /* Start = slot 0, end = 4, no Wrap or int. */
 /* Enable xmit, selects 1.8 uSec lead, delay 8 Usec at end */
 *r32_qcr1PW = SPE + 0x1f04;

 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */
 /* Clock should now be set */
 qspi_lockW = 0; /* release the lock on QSPI */
}
/* Function: time
** Description: Gets the current time from the DS1202.
** Arguments: pointer to time_t or NULL
** Returns: ULONG time value */
time_t time( time_t *timePT )
{
 struct tm mytimeH;
 time_t timeL;
 while ( lockW( &qspi_lockW ) ) ; /* wait here until we own the QSPI */
 /* set up pin config */
 qcfg4clkV();


 /* Set up the clock data and commands */
 /* Transfer 8 bits with PCS1 low (xmit), continue; xmits command byte */
 *(r32_qccPB + 8) = CONT+DSCK+PCS2C+PCS0C; /* Also allow 1 usec setup */

 /* Transfer 16 bits with PCS1&2 low (rcv), continue; get secs & mins */
 *(r32_qccPB + 9) = CONT+BITSE+PCS0C;

 /* Transfer 16 bits with PCS1&2 low (rcv), continue; get hrs and DoM */
 *(r32_qccPB + 0xa) = CONT+BITSE+PCS0C;

 /* Transfer 16 bits with PCS1&2 low (rcv), continue; get month & DoW */
 *(r32_qccPB + 0xb) = CONT+BITSE+PCS0C;

 /* Transfer 8 bits with PCS1&2 low (rcv), delay and stop; get year */
 *(r32_qccPB + 0xc) = DT+PCS0C;

 *(r32_qtxPW+8) = 0x00fd; /* Control byte, read clock */

 /* set up the start and end */
 *r32_qcr2PW = 0x0c08; /* Start = cmd 8, end = 0xc, no Wrap or int. */
 /* Enable xmit, selects 1.8 uSec lead, delay 8 Usec at end */
 *r32_qcr1PW = SPE + 0x1f04;

 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */

 /* OK, now convert clock/cal data to time_t */
 mytimeH.tm_sec = tobinN( *(r32_qrxPW+9) ); /* get seconds */
 mytimeH.tm_min = tobinN( *(r32_qrxPW+9) >> 8); /* get minutes */
 mytimeH.tm_hour = tobinN( *(r32_qrxPW+0xa) ); /* get hours */
 mytimeH.tm_mday = tobinN( *(r32_qrxPW+0xa) >> 8); /* get day of month */
 mytimeH.tm_mon = tobinN( *(r32_qrxPW+0xb) - 1); /* get month */
 mytimeH.tm_wday = tobinN( *(r32_qrxPW+0xb) >> 8); /* get day of week */
 mytimeH.tm_year = tobinN( *(r32_qrxPW+0xc) ); /* get year */

 qspi_lockW = 0; /* release the lock on QSPI */
 timeL = mktime( &mytimeH ); /* convert struct to time_t */
 if (timePT)
 return( *timePT = (time_t)timeL );
 else
 return(timeL);
}
/* Function: bitrevB
** Description: Reverses the bits in the argument. This
** routine is used when sending data between the QSPI and the
** clock. QSPI expects MSB first, the clock expects lsb first.
** For data RAM in the clock we just let the data get stored backwards.
** Arguments: UBYTE
** Returns: UBYTE contains mirror reflection of argument */
static UBYTE bitrevB( register UBYTE dataB ) /* bit reverses the arg */
{
 register UWORD rdataW; /* These reg type better be used! */
 rdataW = 0x0100; /* done when this bit has shifted out */
 while( rdataW & 0xff00 ) /* While the bit is still in there */
 {
 rdataW <<= 1; /* Move outgoing left */
 rdataW += dataB & 1; /* Transfer incoming to outgoing */
 dataB >>= 1; /* Move incoming right */
 }

 return (UBYTE) rdataW;
}
/* Function: set_clkW
** Description: Sets the clock to the time specified by the argument
** Arguments: time_t time (seconds since 00:00 1 Jan 1970)
** Returns: int 0 on success, else verification failure */
int set_clkN( time_t stimeL )
{
 time_t checktimeL;
 set_clkV( localtime( &stimeL ) ); /* set the clock */
 checktimeL = time( NULL ); /* Now go read it back*/
 if( checktimeL != stimeL ) /* if not same as what we requested */
 return( 1 ); /* Failed to set */
 else
 return( 0 ); /* success */
}


Listing Three

/* DS2404 one wire interface code. This file contains the code reading the 
** serial number from the clock chip. The routines contain hardcoded delays 
** based on CPU execution speed I have weighted the delays to the heavy side 
** so that if the CPU is converted to 20Mhz, things should still work here.
The
** timings and algorithms will make more sense if you've read the Dallas book.
*/
#include "qspi.h" /* for generic typedefs and defines only */

/* local defines */
#define SNCLKBIT 0x08 /* This bit corresponds to the I/O pin used for the 
 ** single wire interface to the clock chip. The name
 ** stands for Serial Number Clock Bit :-) */
#define Delay( time ) for(ii = time; ii; ii--) /* Simple uSec delay */
#define SetInput *r32_peddrPB &= ~SNCLKBIT /* make pin an input */
#define SetOutput *r32_peddrPB = SNCLKBIT /* make pin an output */
#define SetHigh *r32_pedrPB = SNCLKBIT /* Set output high */
#define SetLow *r32_pedrPB &= ~SNCLKBIT /* Set output low */

/* globals */
UBYTE serial_numberAB[6]; /* Unique ID from DS2404 */
UWORD id_statusW; /* zero if ID is valid */

/* private variables */
static UBYTE my_crcB;

/* function pre_declarations */
static void send_oneV( void );
static void send_zeroV( void );
static void send_byteV( UBYTE );
static UBYTE get_byteB( void );
static UWORD wait_highW( void );
static UWORD wait_lowW( void );

/* Functions */
/* Function: get_idW
** Description: Reads the Serial number from the clock chip using the 1 wire
** interface protocol. If anything goes wrong with the read, the
** serial number is set to all zeros. There must be no interrupts
** during the operation of this routine.

** arguments: None
** returns: UWORD 0 on success, else failure */
UWORD get_idW( void )
{
 register UWORD ii;
 UWORD resultW;
 int i;
 UBYTE tmpB, fam_codeB, crcB;
 for( i = 0; i < 6; i++ )
 serial_numberAB[i] = 0; /* Initialize serial number */
 SetOutput; /* Do the special initialization sequence */
 SetHigh; /* Make sure line is high for a little while */
 SetHigh; /* delay several uSec */
 SetLow; /* Ok, start reset pulse here */
 Delay( 326 ); /* delay 800 uSec */
 SetInput; /* End of driving reset pulse */
 if( wait_highW() ) return 1; /* wait for line to go back high */
 if( wait_lowW() ) return 2; /* wait for clock to drive line low */
 if( wait_highW() ) return 3; /* wait for clock to release line back high */

 Delay( 326 ); /* delay 800 uSec */
 my_crcB = 0; /* Start with a fresh CRC */
 SetHigh; /* Make sure line is high */
 SetOutput;
 send_byteV( 0x0f ); /* Send Read ROM command */

 fam_codeB = get_byteB(); /* Get family code, (Not used) */

 for( i = 0; i < 6; i++ ) /* Get serial number */
 serial_numberAB[i] = get_byteB();
 tmpB = my_crcB; /* save computed CRC */
 crcB = get_byteB(); /* Get expected CRC */
 if( tmpB != crcB )
 resultW = 4;
 else
 resultW = 0;
 Delay( 100 ); /* Delay a bit before issuing the reset pulse */
 SetLow; /* reset the chip so we can use 3 wire I/F */
 SetOutput;
 Delay( 326 );
 SetInput;
 Delay( 326 ); /* delay 800 uSec */
 return resultW;
}
/* Function: send_byteV
** Description: sends one byte to the clock chip using the 1-wire
** interface protocol. Bits are sent LSB first
** arguments: UBYTE value to be sent
** returns: Void */
static void send_byteV( UBYTE valB )
{
 int i;
 for( i = 0; i < 8; i++ )
 {
 if( valB & 1 ) /* send the bit */
 send_oneV();
 else
 send_zeroV();
 valB >>= 1; /* shift to next bit */

 }
}
/* Function: get_byteB
** Description: sends one byte to the clock chip using the 1 wire
** interface protocol. Bits are sent LSB first. CRC using 
** X**8 + X**5 + X**4 + 1 is calculated over bits. Global
** my_crcB is updated.
** arguments: None
** returns: UBYTE value received. */
static UBYTE get_byteB( void )
{
 register UWORD ii;
 int i;
 UBYTE valB, tmpB;
 for( i = 0; i < 8; i++ )
 {
 SetLow; /* Send start of signal pulse */
 SetOutput;
 SetLow; /* dely to give chip a chance to see it */
 SetInput; /* Turn pin into input */
 if( *r32_pedrPB & SNCLKBIT ) /* read within 15 uSec of going low */
 tmpB = 1; /* if bit is high */
 else
 tmpB = 0;
 valB >>= 1; /* bits come in LSB first, get ready for nxt */
 if( tmpB ) valB = 0x80; /* add in one bit */
 tmpB = (tmpB ^ my_crcB) & 1; /* tmpB = XOR of input and CRC bit 0 */
 my_crcB >>= 1; /* Always do shift */
 if ( tmpB ) /* if result of XOR was 1 */
 my_crcB ^= 0x8c; /* add in new bits */
 wait_highW(); /* make sure chip has released line */
 Delay( 24 ); /* wait a while to meet clock's time slot reqmnts */
 } /* end of for each bit in byte */
 return valB;
}
/* Function: send_oneV
** Description: sends a one bit to the clock chip using the 1 wire
** interface protocol. The function waits the appropriate time
** after transmitting so the inter-bit timing reqirements are met.
** arguments: UBYTE value to be sent
** returns: Void */
static void send_oneV( void )
{
 register UWORD ii;
 SetLow; /* Send "start of bit pulse" */
 SetLow; /* Keep it there long enough for clock chip */
 SetHigh; /* deassert pin because we're sending a '1' */
 Delay( 29 ); /* Delay 100 uSec */
}
/* Function: send_zeroV
** Description: sends a zero bit to the clock chip using the 1 wire
** interface protocol. The function waits the appropriate time
** after transmitting so the inter-bit timing reqirements are met.
** arguments: UBYTE value to be sent
** returns: Void */
static void send_zeroV( void )
{
 register UWORD ii;
 SetLow; /* Send "start of bit pulse" */

 Delay( 29 ); /* Keep pin low for 100 uSec so chip reads a zero */
 SetHigh; /* deassert pin */
}
/* Function: wait_highW
** Description: waits for data pin to go high. Will time out after a while.
** Assumes that we're in input mode.
** arguments: none
** returns: UWORD 0 if got signal in time, 1 if timed out */
static UWORD wait_highW( void )
{
 register UWORD i;
 for( i = 100; i; i-- ) /* This timeout should be excessive */
 { /* for any normal situation */
 if( *r32_pedrPB & SNCLKBIT ) /* if bit is high */
 return(0); /* return success */
 }
 return(1); /* return failure if timed out */
}
/* Function: wait_lowW
** Description: waits for data pin to go low. Will time out after a while
** Assumes that we're in input mode.
** arguments: none
** returns: UWORD 0 if got signal in time, 1 if timed out */
static UWORD wait_lowW( void )
{
 register UWORD i;
 for( i = 100; i; i-- ) /* This timeout should be excessive */
 { /* for any normal situation */
 if( ! (*r32_pedrPB & SNCLKBIT) ) /* if bit is low */
 return(0); /* return success */
 }
 return(1); /* return failure if timed out */
}



Listing Four
/* DS2404 Three wire interface code. This file contains handlers for the 
** MC68332 QSPI interface to the Dallas DS2404 clock/calander time
functions.*/
#include <string.h> /* for memcmp def */
#include <time.h> /* for time_t and struct tm defs */
#include "qspi.h" /* For MC68332 QSPI specific defs */

/* The following functions were shown in listing 1 */
extern UBYTE bitrevB( UBYTE ); /* bit reverses the arg */
extern void qcfg4clkV( void ); /* Config QSPI for clock work */

/* Private declarations */
extern UWORD qspi_lockW; /* Lock semiphore, 1 if locked, 0 if not */
static UBYTE spif_validB; /* becomes valid after 1st xmit */
static UBYTE *qbufPB; /* pointer into QSPI transmit buffer */

/* Function: set_clkW
** Description: Sets the clock from a time value. Note: the Dallas 2404
** requires the clock to be low when reset is deasserted and high
** when reset is asserted (writes only). Therefore, those
** sequences below that write, contain some tricks with default
** values of the chip selects and SCLK outputs.
** Arguments: time_t time value

** Returns: UWORD 0 on success, 1 if readback failure */
UWORD set_clkW( time_t stimeL )
{
 UBYTE *qxbufPB;
 time_t timeL;
 while ( lockW( &qspi_lockW ) ) ; /* wait here until we own the QSPI */
 qcfg4clkV(); /* set up pin config */

 /* Set up the clock data and commands. */
 timeL = stimeL; /* For debug loop (preserves timeL) */

 /* Xfer 8 bits, PCS1 low, continue, Send write scratchpad cmd */
 *(r32_qccPB + 0) = CONT+DSCK+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, Send TA1 and TA2 */
 *(r32_qccPB + 1) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, Send Ctrl reg and second fractions */
 *(r32_qccPB + 2) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1 low, continue, send low word seconds */
 *(r32_qccPB + 3) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16, PCS1 low, delay, stop, send high word seconds */
 *(r32_qccPB + 4) = DT+BITSE+PCS2C+PCS0C;

 /* Setup the transmit RAM. Need to unravel & reverse the bits to be sent */
 *(r32_qtxPW+0) = 0x00f0; /* Control byte, write scratchpad */
 *(r32_qtxPW+1) = 0x8040; /* address 201 */
 *(r32_qtxPW+2) = 0x0a00; /* CTRL REG=enable osc, no interval, 0 fracs */
 qxbufPB = (UBYTE *) (r32_qtxPW+3); /* get byte ptr to xmit buf */

 /* Now move the time into transmit RAM. */
 *qxbufPB++ = bitrevB( (UBYTE)timeL );
 timeL >>= 8;
 *qxbufPB++ = bitrevB( (UBYTE)timeL );
 timeL >>= 8;
 *qxbufPB++ = bitrevB( (UBYTE)timeL );
 timeL >>= 8;
 *qxbufPB = bitrevB( (UBYTE)timeL );
 *r32_qsrPB = 0; /* Clear SPIF flag */

 /* prepare to send the command and data to the clock */
 *r32_qcr2PW = 0x0400; /* Start = cmd 0, end = 4, no Wrap or int. */

 /* We disable interrupts because we must change the default pin states
 ** before the QSPI finishes it's transaction. We only have a few tens
 ** of microseconds to work with. */
 asm(" move.w #$2700,sr"); /* Disable interrupts */

 /* Enable xmit, selects 1.4 uSec lead, delay 8 Usec at end */
 *r32_qcr1PW = SPE + 0x1404;
 *r32_qpdrPB = 0xef; /* immediately set default CLK high, PCS1 low */

 asm(" move.w #$2000,sr"); /* Re-enable interrupts */

 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */

 *r32_qpdrPB = 0xff; /* set everything High */

 *r32_qpdrPB = 0xfb; /* set CLK low for next xmit */ 
 *r32_qsrPB = 0; /* Clear SPIF flag */

 /* OK, let's verify the scratchpad */
 /* Xfer 8 bits, PCS1 low, continue, send read scratchpad command */
 *(r32_qccPB + 0) = CONT+DSCK+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS1+PCS2 low, continue, Read TA1 & TA2 */
 *(r32_qccPB + 1) = CONT+BITSE+PCS0C;

 /* Xfer 8 bits, PCS1+PCS2 low, continue, Read ES */
 *(r32_qccPB + 2) = CONT+PCS0C;

 /* Xfer 16 bits, PCS1+PCS2 low, continue, Read CTRL REG & fraction secs */
 *(r32_qccPB + 3) = CONT+BITSE+PCS0C;

 /* Xfer 16 bits, PCS1+PCS2 low, continue, Read low word of seconds */
 *(r32_qccPB + 4) = CONT+BITSE+PCS0C;

 /* Xfer 16, PCS1+PCS2 low, delay, stop, Read high word of seconds */
 *(r32_qccPB + 5) = DT+BITSE+PCS0C;

 *r32_qcr2PW = 0x0500; /* Start = slot 0, end = 5, no Wrap or int. */

 *(r32_qtxPW+0) = 0x0055; /* Control byte 0xaa, read scratchpad */

 /* Enable xmit, selects 1.4 uSec lead, delay 8 Usec at end */
 *r32_qcr1PW = SPE + 0x1404;
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */

 /* verify that what we received is what we sent */
 if( memcmp( r32_qtxPW + 2, r32_qrxPW + 3, 6 ) )
 {
 qspi_lockW = 0; /* release the lock on QSPI */
 return( 1 ); /* Bomb out if not */
 }
 /* OK, time data is OK in scratchpad RAM. Need to copy to time regs
 ** DO this by sending copy scratchpad command and security keys. */
 *r32_qsrPB = 0; /* Clear SPIF flag */
 /* Xfer 16 bits, PCS1 low, continue, Send TA1 and TA2 */
 *(r32_qccPB + 1) = CONT+BITSE+PCS2C+PCS0C;
 /* Xfer 8, PCS1 low, delay, stop. Send ES, only sending 4 bytes total */
 *(r32_qccPB + 2) = DT+PCS2C+PCS0C;
 *(r32_qtxPW+0) = 0x00aa; /* Control byte, copy scratchpad */
 *(r32_qtxPW+1) = 0x8040; /* address 201 */

 *(r32_qtxPW+2) = 0x00e4; /* E/S byte showing load end address of 7.
 ** We actually only write to 201 - 206 but the
 ** funky clock deal cause one bit of 207 to be
 ** written. PF flag set too. */
 /* send the data to the clock */
 *r32_qcr2PW = 0x0200; /* Start = cmd 0, end = 2, no Wrap or int. */

 /* We disable interrupts because we must change the default pin states
 ** before the QSPI finishes it's transaction. We only have a few tens
 ** of microseconds to work with. */
 asm(" move.w #$2700,sr"); /* Disable interrupts */

 *r32_qcr1PW = SPE + 0x1404; /* Enable transmission, delay 8 Usec at end */

 *r32_qpdrPB = 0xef; /* set default CLK high, PCS1 low */

 asm(" move.w #$2000,sr"); /* Enable interrupts */
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */
 *r32_qpdrPB = 0xff; /* set everything High */
 *r32_qpdrPB = 0xfb; /* set CLK low */

 /* Clock should now be set */
 qspi_lockW = 0; /* release the lock on QSPI */
 return 0;
}
/* Function: time
** Description: Gets the current time from the clock. Scratchpad is not used.
** Time is read directly from the registers.
** Arguments: none
** Returns: time_t time value */
time_t time( time_t *timePT )
{
 int i;
 UBYTE *qrbufPB;
 time_t timeL;
 while ( lockW( &qspi_lockW ) ) ; /* wait here until we own the QSPI */
 /* set up pin config */
 qcfg4clkV();

 /* Set up the clock data and commands */
 /* Transfer 8 bits with PCS1 low (xmit), continue; xmits command byte */
 *(r32_qccPB + 8) = CONT+DSCK+PCS2C+PCS0C; /* Also allow 1 usec setup */

 /* Xfer 16 bits, PCS1 low (xmit), continue; xmits address */
 *(r32_qccPB + 9) = CONT+BITSE+PCS2C+PCS0C;

 /* Xfer 16 bits, PCS2&1 low (recv), continue; receives low word of time */
 *(r32_qccPB + 0xa) = CONT+BITSE+PCS0C;

 /* Xfer 16 bits, PCS2&1 low (recv), delay, stop; rcvs high word of time */
 *(r32_qccPB + 0xb) = DT+BITSE+PCS0C;

 /* Tx Data RAM gets command byte */
 *(r32_qtxPW + 8) = 0x000f; /* Command byte 0xf0, read memory */
 *(r32_qtxPW + 9) = 0xc040; /* memory address is 0x0203 */
 *r32_qsrPB = 0; /* Clear SPIF flag */

 /* send the data to the clock */
 *r32_qcr2PW = 0x0b08; /* Start = cmd 8, end = 0xb, no Wrap, int. */
 *r32_qcr1PW = SPE + 0x1404; /* Enable transmission, delay 8 Usec at end */
 while( ( *r32_qsrPB & SPIF ) == 0 ) ; /* Wait until QSPI is finished */

 /* Now unscramble the receive RAM. Need to reverse the bits and the byte 
 ** order */
 qrbufPB = (UBYTE *) (r32_qrxPW+0xc); /* get ptr just past recv buf */

 for( i = 0; i < 4; i++ )
 {
 timeL <<= 8;
 timeL += bitrevB( *(--qrbufPB) );
 }
 qspi_lockW = 0; /* release the lock on QSPI */
 if (timePT)

 return( *timePT = (time_t)timeL );
 else
 return(timeL);
}



























































Remote Network Printing


Using the Windows Sockets API to create a UNIX-like daemon




Zongnan H. Lu


Henry is a systems analyst for the Mental Health Research Institute at the
University of Michigan. He can be contacted at henry.lu@med.umich.edu.


In today's world of TCP/IP-based heterogeneous networks, it is increasingly
common to find UNIX workstations and PCs linked and working together. Just as
common in many of these network models are multiple printers and plotters
residing on PC-based Novell NetWare or Banyan VINES networks, while printers
directly attached to UNIX workstations are few and far between. This is
particularly ironic since many distributed applications that use client/server
databases reside on the UNIX side of the net because of its better
performance, greater storage capacity, and multiuser/multitasking features.
Consequently, various daily, monthly, and yearly reports created on the server
typically wait for system administrators to print them out. On the other hand,
PC networks which let you print files through shared printer queues are rarely
fully utilized, particularly in the late night or early morning hours.
Clearly, a solution to the printing bottleneck is to move the files from the
UNIX to the PC side of the network to make the most efficient use of printing
capabilities.
One way of doing this is to use FTP to download files to a local PC, then send
them to a printer on the PC's network. Applications that implement standard
UNIX network utilities such as lpd (line-printer daemon) make it possible for
all files on UNIX 4.3BSD-based workstations to be automatically sent to
printers on a PC network system at any time. In this article, I'll present an
lpd server implemented for the PC that does just this. This PC lpd runs under
Windows and is based on the Windows Sockets API.


Using the Windows Sockets API


Because of the differences between UNIX 4.3BSD and Windows internals and file
structures, there are two basic ways to build a PC-based lpd server. In the
first approach, you write two programs--one running on a UNIX workstation as a
client that collects and sends files to a PC--and the other running as a
server, sitting on the PC, receiving files from UNIX, and sending them to a
shared printer. In this scheme, the client program has to be installed on all
UNIX workstations which need remote printers, and somewhat duplicates existing
UNIX-standard network programs. 
A second approach, which I implement here, is to use the standard lpd program
as a client program on UNIX workstations, then write a server program for the
PC. This approach mirrors a common usage of lpd whereby the program runs as
both a printer server and client to receive files from a user's print command
(lp or lpr). The program then sends files to an attached or remote printer
connected to another UNIX workstation. The advantage of this approach is its
portability and simplicity. The program I wrote that does this is called
"winlpd" and is written in Microsoft Visual C/C++ using the Windows Sockets
API, WINSOCK.DLL.
The winlpd program handles four messages from the UNIX lpd: receiving a job;
listing the queue, short form; listing the queue, long form; and canceling
jobs from the queue. Since all files received by winlpd will be sent to a
printer and then immediately deleted, winlpd does nothing but acknowledge when
it gets the listing and canceling messages. On the UNIX side, the remote host
must be set in the file /etc/hosts (or /etc/hosts.equiv or /etc/hosts.lpd;
refer to UNIX Network Programming, by W. Richard Stevens, Prentice Hall, 1990)
and in the file /etc/printcap. This file defines a mapping of symbolic printer
names into physical device names, along with a complete specification of the
printer's capabilities. If a remote PC's printer name is different from the
UNIX printer name, the remote PC's printer name must be specified in the file
/etc/printcap. In the local PC PRINTCAP file, I only specify symbolic printer
names and their spooler directories. Listing One is winlpd.cpp, and Listing
Two is mainfrm.cpp. Other required files, including winlpvw.cpp, the include
files, and other Visual C++ standard files are provided electronically; see
"Availability," page 3.
To call functions in the WINSOCK.DLL, winlpd first loads the DLL and all
necessary functions by calling openLib() in the constructor of class
CWinlpdApp. If there is a failure, the program terminates. The library is
closed in the destructor of class CWinlpdApp when the program is terminated;
see Listing One. 
The winlpd menu selection is implemented as one that can be toggled Off/On. A
global flag, stop_lpd, is set by Close in the system menu, and Start lpd and
End lpd in the lpd menu. As soon as Start lpd is selected, the server is
active and accepts connections by calling functions socket(), bind(),
listen(), and select(). When the function stop_lpd is set to STOP_LOCK, the
menu items Start lpd and Close are grayed out. To stop the server, the menu
item End lpd must be selected, which sets stop_lpd to STOP_UNLOCK. The program
sets the select() function to five seconds to wait for any incoming
connections. After five seconds it checks the value of flag stop_lpd. The
server is stopped if stop_lpd is not set to STOP_LOCK. Calling select() again
or accepting a connection is dependent upon the value returned from select().
This process guarantees a complete job for the server and closes all sockets
and the DLL; see Listing Two.
When a connection is established, the doit() function is called (see the file
winlpvw.cpp, available electronically). If remote files are ready to be
received, the following occurs: 
1. The Winlpd selection finds a printer as required by the remote host in the
file PRINTCAP. It does this by calling pgetent() in recvjob() and then
readjob().
2. The chksize() function is called to ensure that disk space is available to
hold the incoming file. The readfile() function is called to download the
file. 
3. The mapfilename() function (in readfile()) is invoked to get a PC's
filename because the remote filename is often longer than eight characters. 
4. The printjob() function (in recvjob()) is called when all files have been
received for the current job. 
5. Files specified in printit() are sent to print_lp() if a line printer is
required, or to print_lsr() if a laser printer is required. Before printing,
the NetWare Capture_ command is called in printjob() to capture the specified
printer queue. 
6. If there is a failure on either side during the transaction, all of the
job's unprinted files are deleted by rcleanup() (in readjob()), while the
remaining files continue to be received.
The moveprintf() and strfwrite() functions are called by print_lsr() for laser
printers. If fancy characters, different sizes, or special-text formats are
required, these functions will need to be modified accordingly. 
To send files from a UNIX workstation, use the commands lpr or lp; see Example
1.


Putting It All Together: An Example


To illustrate how to use winlpd, suppose that a database server is running on
a UNIX workstation which has the host name "dbhost" and an IP address of
141.211.222.222. Further suppose that a report-creation process scheduled in a
crontab produces two different report files--f1.rpt, sent to a line printer
called "netlp," and f2.rpt, to a laser printer called "netlsr." Both printers
are connected to a NetWare network with LAN WorkPlace for DOS installed. A PC
on the network with IP address 141.211.111.11 and host name "henry" has the
winlpd program running. With all this in place, there are three files you need
to set properly: /etc/printcap, /etc/hosts, and /etc/services. On the UNIX
side, two printer names are added to the file /etc/printcap; see Example 2,
where lp1 and lp2 are symbolic printer names on the UNIX; netlp and netlsr are
printer names on the PC server side, rm declares a remote host name or IP
address, and rp is assigned a remote printer name (netlp or netlsr). For the
network communication, the PC's IP address should be added to the file
/etc/hosts as 141.211.111.11 pclpsrv, where pclpsrv is the PC's host name (we
assume that the UNIX host name is already in /etc/hosts); an entry "printer
515/tcp spooler" must appear in the file /etc/services.
On the NetWare side, both the UNIX and PC host names (dbhost and pclpsrv,
respectively) and their IP addresses, 141.211.222.22 and 141.211.111.11,
should be set in the file HOSTS, which is located in the network TCP
directory, ..\TCP\HOSTS. Additionally, entry "printer 515/tcp spooler" must be
in the ..\TCP\SERVICES file. For printers on the Novell network, the
..\TCP\PRINTCAP file must be created; see Example 3. Note that the backslash
(\) is treated as a symbol of continuation in a regular UNIX printcap file.
With PCs, however, a backslash is a root directory or a separator of
subdirectories. Winlpd eliminates this ambiguity by setting a syntax that each
line has only one item; a backslash is viewed as a continuation character if
the backslash is at the end of the line; otherwise it is viewed as a normal
character.
When all settings are complete, winlpd is executed on the PC. As soon as a
report file is ready on the UNIX workstation, the command lpr --Plp1 f1.rpt or
lp --dlp1 f1.rpt is invoked to send the f1.rpt file to the remote printer
netlp. Similarly, lpr --Plp2 f2.rpt or lp --dlp2 f2.rpt send as f2.rpt to the
printer netlsr on the PC Novell network. If your crontab file looks like
Example 4(a), the reportwiter.sch file should include the commands in Example
4(b).


Conclusion


Winlpd can be used not only for remote printing, but also for other remote
activities such as sending faxes, transferring files, and the like. All you
have to do is put files into appropriate directories shared with other
programs for different tasks. For performance and flexibility, winlpd can be
implemented as a server that only receives files. If you decide to use it this
way, you'll probably need to rewrite displayq(), rmjob(), while disregarding
printjob(), printit(), print_lp(), print_lsr(), moveprintf(), and strfwrite().
Example 1: Sending files using the UNIX lpr and lp commands.
lp -dlp1 filename1, filename2 ...
lpr -Plp1 filename1, filename2 ...
Example 2: Setting printer names on the UNIX side of the net.
(a)
lp1network line printer:\
 :rm=141.211.111.11:rp=netlp:
lp2network laser printer:\

 :rm=141.211.111.11:rp=netlsr:
(b)
lp1network line printer:\
 :rm=pclpsrv:rp=netlp:
lp2network laser printer:\
 :rm=pclpsrv:rp=netlsr:
Example 3: Setting printers on the PC side of the net.
netlp:\
 :sd=C:\TMP\PRINTERS\NETLP:
netlsr:\
 :sd=C:\TMP\PRINTERS\NETLSR:
Example 4: (a) Sample crontab-file contents; (b) commands for sample crontab
file.
(a)
0 4 * * 1,2,3,4,5 /usr/henry
/reportwriter.sch
(b)
# run report creator program
/usr/henry/report_writer
# send file to PC server
lpr -Plp1 /usr/henry/f1.rpt
lpr -Plp2 /usr/henry/f2.rpt

Listing One 
// winlpd.cpp : Defines the class behaviors for the application.

#include "stdafx.h"
#include "winlpd.h"

#include "mainfrm.h"
#include "winlpdoc.h"
#include "winlpvw.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

// CWinlpdApp
BEGIN_MESSAGE_MAP(CWinlpdApp, CWinApp)
 //{{AFX_MSG_MAP(CWinlpdApp)
 ON_COMMAND(ID_APP_ABOUT, OnAppAbout)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code !
 //}}AFX_MSG_MAP
 // Standard file based document commands
 ON_COMMAND(ID_FILE_NEW, CWinApp::OnFileNew)
 ON_COMMAND(ID_FILE_OPEN, CWinApp::OnFileOpen)
END_MESSAGE_MAP()

// CWinlpdApp construction
HINSTANCE m_hLibrary;
SOCKET (FAR PASCAL *lpfn_accept)(SOCKET, LPSOCKADDR, LPINT);
int (FAR PASCAL *lpfn_bind)(SOCKET, LPCSOCKADDR, int);
int (FAR PASCAL *lpfn_closesocket)(SOCKET);
int (FAR PASCAL *lpfn_err)(void);

LPHOSTENT (FAR PASCAL *lpfn_gethostbyaddr)(LPCSTR, int, int);
LPSERVENT (FAR PASCAL *lpfn_getservbyname)(LPCSTR, LPCSTR);


int (FAR PASCAL *lpfn_listen)(SOCKET, int);
int (FAR PASCAL *lpfn_recv)(SOCKET, LPCSTR, int, int);
int (FAR PASCAL *lpfn_select)(int, LPFD_SET, LPFD_SET, LPFD_SET, LPTIMEVAL);
int (FAR PASCAL *lpfn_send)(SOCKET, LPCSTR, int, int);
int (FAR PASCAL *lpfn_shutdown)(SOCKET, int);
SOCKET (FAR PASCAL *lpfn_socket)(int, int, int);
int (FAR PASCAL *lpfn_WSACleanup)(void);
int (FAR PASCAL *lpfn_WSACancelBlockingCall)(void);
int (FAR PASCAL *lpfn_WSAStartup)(WORD, LPWSADATA);
SOCKET finet;
CWinlpdApp::CWinlpdApp()
{
 // TODO: add construction code here,
 // Place all significant initialization in InitInstance
 finet = INVALID_SOCKET;
 if (!openLib())
 exit(0);
}
CWinlpdApp::~CWinlpdApp()
{
 // TODO: add destruction code here,
 closeLib();
}
// The one and only CWinlpdApp object
CWinlpdApp NEAR theApp;
// CWinlpdApp initialization
BOOL CWinlpdApp::InitInstance()
{
 // Standard initialization
 SetDialogBkColor(); // set dialog background color to gray
 LoadStdProfileSettings(); // Load standard INI file options
 // Register the application's document templates. Document templates
 // serve as the connection between documents, frame windows and views.
 AddDocTemplate(new CSingleDocTemplate(IDR_MAINFRAME,
 RUNTIME_CLASS(CWinlpdDoc),
 RUNTIME_CLASS(CMainFrame), // main SDI frame window
 RUNTIME_CLASS(CWinlpdView)));
 // create a new (empty) document
 OnFileNew();
 return TRUE;
}
// CAboutDlg dialog used for App About
class CAboutDlg : public CDialog
{
public:
 CAboutDlg();
// Dialog Data
 //{{AFX_DATA(CAboutDlg)
 enum { IDD = IDD_ABOUTBOX };
 //}}AFX_DATA
// Implementation
protected:
 virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support
 //{{AFX_MSG(CAboutDlg)
 // No message handlers
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};
CAboutDlg::CAboutDlg() : CDialog(CAboutDlg::IDD)

{
 //{{AFX_DATA_INIT(CAboutDlg)
 //}}AFX_DATA_INIT
}
void CAboutDlg::DoDataExchange(CDataExchange* pDX)
{
 CDialog::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CAboutDlg)
 //}}AFX_DATA_MAP
}
BEGIN_MESSAGE_MAP(CAboutDlg, CDialog)
 //{{AFX_MSG_MAP(CAboutDlg)
 // No message handlers
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()
// App command to run the dialog
void CWinlpdApp::OnAppAbout()
{
 CAboutDlg aboutDlg;
 aboutDlg.DoModal();
}
// CWinlpdApp commands
#include <io.h>
/* closeLib() - closes sockets and DLL library */
void CWinlpdApp::closeLib()
{ 
 int ret;
 if (finet != INVALID_SOCKET) {
 ret=(*lpfn_shutdown)(finet, 2);
 if (ret == SOCKET_ERROR) {
 ret=(*lpfn_err)();
 sprintf(m_err,"shutdown failed, ret=%d", ret);
 AfxMessageBox(m_err);
 }
 }
 ret=(*lpfn_WSACleanup)();
 if (ret == SOCKET_ERROR) {
 ret=(*lpfn_err)();
 sprintf(m_err,"WSACleanup failed, ret=%d", ret);
 AfxMessageBox(m_err);
 }
 FreeLibrary(m_hLibrary);
}
/* openLib() - opens DLL library and socket functions */
int CWinlpdApp::openLib()
{
 int ret;
 WORD wvr;
 WSADATA wsad;

 if (_access("WINSOCK.DLL",0)) {
 AfxMessageBox("Need WINSOCK.DLL.");
 return 0;
 }
 /* LOADING WINSOCK.DLL and FUNCTIONS */
 if ((m_hLibrary = LoadLibrary("WINSOCK.DLL")) <= HINSTANCE_ERROR)
 {
 AfxMessageBox("Can not load lib WINSOCK.DLL");
 return 0;

 }
 lpfn_WSACleanup=(int (FAR PASCAL*)(void))
 GetProcAddress(m_hLibrary, "WSACleanup");
 if (lpfn_WSACleanup == NULL) {
 AfxMessageBox("GetProcAddress-WSACleanup failed");
 return 0;
 }
 wvr = (WORD)MAKEWORD(1,1);
 lpfn_WSAStartup=(int (FAR PASCAL*)(WORD, LPWSADATA))
 GetProcAddress(m_hLibrary, "WSAStartup");
 if (lpfn_WSAStartup == NULL) {
 AfxMessageBox("GetProcAddress-WSAStartup failed");
 return 0;
 }
 lpfn_err=(int (FAR PASCAL*)(void))
 GetProcAddress(m_hLibrary, "WSAGetLastError");
 if (lpfn_err == NULL) {
 AfxMessageBox("GetProcAddress-err failed");
 return 0 ;
 }
 lpfn_getservbyname=
 (struct servent FAR* (FAR PASCAL*)(LPCSTR,LPCSTR))
 GetProcAddress(m_hLibrary, "getservbyname");
 if (lpfn_getservbyname == NULL) {
 AfxMessageBox("GetProcAddress-getservbyname failed");
 return 0;
 }
 lpfn_gethostbyaddr=
 (LPHOSTENT (FAR PASCAL*)(LPCSTR, int, int))
 GetProcAddress(m_hLibrary, "gethostbyaddr");
 if (lpfn_gethostbyaddr == NULL) {
 AfxMessageBox("GetProcAddress-gethostbyaddr failed");
 return 0;
 }
 lpfn_socket=(SOCKET (FAR PASCAL*)(int, int, int))
 GetProcAddress(m_hLibrary, "socket");
 if (lpfn_socket == NULL) {
 AfxMessageBox("GetProcAddress-socket failed");
 return 0;
 }
 lpfn_bind=(int (FAR PASCAL*)(SOCKET, LPCSOCKADDR, int))
 GetProcAddress(m_hLibrary, "bind");
 if (lpfn_bind == NULL) {
 AfxMessageBox("GetProcAddress-bind failed");
 return 0;
 }
 lpfn_select=
 (int (FAR PASCAL*)(int, LPFD_SET, LPFD_SET, LPFD_SET, LPTIMEVAL))
 GetProcAddress(m_hLibrary, "select");
 if (lpfn_select == NULL) {
 AfxMessageBox("GetProcAddress-select failed");
 return 0;
 }
 lpfn_listen=(int (FAR PASCAL*)(SOCKET, int))
 GetProcAddress(m_hLibrary, "listen");
 if (lpfn_bind == NULL) {
 AfxMessageBox("GetProcAddress-listen failed");
 return 0;
 }

 lpfn_accept=
 (SOCKET (FAR PASCAL*)(SOCKET, LPSOCKADDR, LPINT))
 GetProcAddress(m_hLibrary, "accept");
 if (lpfn_accept == NULL) {
 AfxMessageBox("GetProcAddress-accept failed");
 return 0;
 }
 lpfn_closesocket=(int (FAR PASCAL*)(SOCKET))
 GetProcAddress(m_hLibrary, "closesocket");
 if (lpfn_closesocket == NULL) {
 AfxMessageBox("GetProcAddress-closesocket failed");
 return 0;
 }
 lpfn_shutdown=(int (FAR PASCAL*)(SOCKET, int))
 GetProcAddress(m_hLibrary, "shutdown");
 if (lpfn_shutdown == NULL) {
 AfxMessageBox("GetProcAddress-shutdown failed");
 return 0;
 }
 lpfn_send=(int (FAR PASCAL*)(SOCKET, LPCSTR, int, int))
 GetProcAddress(m_hLibrary, "send");
 if (lpfn_send == NULL) {
 AfxMessageBox("GetProcAddress-send failed");
 return 0;
 }
 lpfn_recv=(int (FAR PASCAL*)(SOCKET, LPCSTR, int, int))
 GetProcAddress(m_hLibrary, "recv");
 if (lpfn_recv == NULL) {
 AfxMessageBox("GetProcAddress-recv failed");
 return 0;
 }
 ret=(*lpfn_WSAStartup)(wvr, &wsad);
 if (ret != 0) {
 sprintf(m_err,"WSAStartup failed, ret=%d", ret);
 AfxMessageBox(m_err);
 return 0;
 }
 return 1;
}



Listing Two

// mainfrm.cpp : implementation of the CMainFrame class

#include "stdafx.h"
#include "winlpd.h"
#include "mainfrm.h"
#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
// CMainFrame
IMPLEMENT_DYNCREATE(CMainFrame, CFrameWnd)
BEGIN_MESSAGE_MAP(CMainFrame, CFrameWnd)
 //{{AFX_MSG_MAP(CMainFrame)
 ON_WM_INITMENUPOPUP()
 //}}AFX_MSG_MAP

END_MESSAGE_MAP()
// CMainFrame construction/destruction
int stop_lpd;
CMainFrame::CMainFrame()
{
 // TODO: add member initialization code here
 stop_lpd = STOP_OK;
}
CMainFrame::~CMainFrame()
{
}
// CMainFrame message handlers
void CMainFrame::OnInitMenuPopup(CMenu* pPopupMenu, UINT nIndex, BOOL
bSysMenu)
{
 CFrameWnd::OnInitMenuPopup(pPopupMenu, nIndex, bSysMenu);
 // TODO: Add your message handler code here
 if (bSysMenu == TRUE) {
 if (stop_lpd != STOP_OK)
 pPopupMenu->EnableMenuItem(6, MF_BYPOSITION MF_GRAYED);
 else
 pPopupMenu->EnableMenuItem(6, MF_BYPOSITION MF_ENABLED);
 }
}








































Complying with Fortran 90


How does the current crop of Fortran 90 compilers measure up to the standard?




Steven Baker


Steven works for the Oregon Department of Energy coaxing energy conservation
out of new state buildings. He is the "Networking" columnist for Unix Review,
former editor of Programmer's Journal, and co-author of Extending DOS. Steven
can be reached at msbaker@cs.uoregon.edu.


Originally written at IBM in the 1950s, FORTRAN is by computer standards an
ancient language. Like its sibling COBOL, however, FORTRAN remains an
important software-development tool. One obvious reason is that a huge amount
of software in use today was written in FORTRAN. This code base also includes
some very substantial programming libraries that are fast, fully tested, and
debugged. 
FORTRAN is also fast at number crunching. While Pascal, C, and more recently
C++ have displaced earlier-generation languages for general-purpose
applications, numerics remains the realm of FORTRAN. Compared with C and C++,
FORTRAN supports a native complex data type along with a rich set of intrinsic
math functions that are easily inlined for fast execution speed. FORTRAN lacks
the loose variable aliasing of C and C++ pointers, so FORTRAN compilers can
better optimize code. With Fortran 90--the most recent version of the FORTRAN
standard--array syntax has been added to the language, allowing you to
manipulate arrays as simply as any other variable type. FORTRAN compiler
vendors are free to speed array operations by inlining or making calls to
optimized library routines. While C++ can offer some of these benefits with
suitable math class libraries (Rogue Wave's LAPACK++ and Dyad Software's M++
come to mind), execution speed inevitably favors FORTRAN. When you are
executing a simulation that may take hours or days, differences in run time
become a major consideration. 
While the previous FORTRAN standards (FORTRAN-66 and FORTRAN-77) dealt
primarily with codifying existing practice, Fortran 90 breaks new ground by
extending the FORTRAN language, adding features found in other modern computer
dialects (pointers, modules, interface blocks, and user-defined data types
equivalent to C's structures, for example) along with some special facilities
of its own for handling numerical accuracy. Because of this richness, the
Fortran 90 language is large and complex. Although Fortran 90 was adopted in
1991, it has taken time for compiler and tool vendors to comply with the
standard. The first wave of Fortran 90 tools were translators that converted
from Fortran 90 to an intermediate language (FORTRAN-77 or C) for compilation.
Traditional FORTRAN vendors are now releasing Fortran 90 tools based both on
these existing translators and on native compilers; see Table 1. 


Lingua Fortrana


With the arrival of Fortran 90 tools, it is timely to consider compliance with
the Fortran 90 standard. Using tools that comply with standards can have
obvious benefits, the least of which is portability between different computer
environments. Checking for compliance can have other positive effects,
including hastening improvements in tools, exposing gray areas in the
standard, and identifying problems to be addressed in future work. 
Historically, compliance with computer standards has been handled by the
adoption of official test suites by government agencies. In the U.S., the
National Institute of Standards and Technology (NIST) administers test suites
for compliance with FORTRAN-77, COBOL, Pascal, Ada, SQL, and the like. NIST
publishes a quarterly report, the Validated Processor List, listing those
vendor products which have passed and are currently in compliance. NIST has
yet to develop a procedure for validating Fortran 90 compilers.
Compiler vendors inevitably need test suites for quality assurance and testing
to support their own development efforts. In the case of Fortran 90, this
meant that the first Fortran 90 vendors were also forced to develop test
suites as part of the process of tool development. Based on its early entry
and experience in the FORTRAN market, the Numerical Algorithms Group (NAG) has
been successful in selling its test suite to many other vendors. Other test
suites have been developed by Lahey, Parasoft, and Cray--FORTRAN vendors on
PC, UNIX, and supercomputers, respectively. SHAPE, a special suite for testing
the array-processing areas of Fortran 90, was developed by Spackman and
Hendrickson (two members active in the Fortran 90 standards process). 
The companies in Table 2 were gracious enough to provide me copies of their
respective test suites as long as I reported the results in general
categories, not as specific test results. Since each test suite was developed
initially with a particular Fortran 90 compiler or translator, these test
suites all have a bias toward a particular product. 
Aside from these specific test suites, FORTRAN tool vendors often gather
collections of test code from customers to use for regression testing before
release of a new product or version. An advantage of such miscellaneous test
code is that it is authored by many programmers, typically with very different
coding styles. While test suites may contain a few thousand trivial 40-line
programs, "real" FORTRAN code is often more likely to stress the FORTRAN
compilers and discover problems. Basing testing on such code is difficult
because the user may not have any idea what the code does or how to properly
prepare input and check output, and may not have the time and patience to hand
review the code for standards compliance. 


Compliance and Conformance 


Before reviewing the specific test results, let's first examine what the
Fortran 90 standard specifies about compliance. The standard distinguishes
between the compiler/translator and a Fortran 90 program. A "processor" is
defined as "the combination of a computing system and the mechanism by which
programs are transformed for use on the computing system." For conformance,
the standard specifies the requirements, prohibitions, and options for
permissible forms and relationships in a program rather than in a processor.
Any requirements for a compiler or translator (a "processor" in standards
jargon) must be inferred from those given for a conforming program.
Realistically, the only reasonable way to test for a conforming program is to
compile the test program with a processor that complains when the Fortran 90
standard requirements are not met (a Fortran 90 catch-22). 
Table 3 lists the general conditions that must be met by a conforming
processor. Although no deleted features were included in the standard, a
number of features were marked as obsolescent; see Table 4. For the first
time, this allows features (often relics of ancient FORTRAN practices) to be
removed from subsequent FORTRAN standards. At current rates, however, this
won't be until the next century. 
A few additional caveats free the processor from meeting the aforementioned
requirements in format specifications that are not part of a FORMAT statement.
This particular exception reflects the practice of passing a format specifier
as a character variable, forcing format parsing and interpretation to occur
only during run time rather than during compilation. In this regard, FORTRAN's
formatted I/O is similar to the C stdio library, which also requires
interpretation at run time. 
A Fortran 90 compiler is free to support other language extensions as long as
they don't conflict with the standard. For example, Fortran 90 limits
variables to 31-character names. Several FORTRAN compilers offer longer
variable names as an extension, but a conforming Fortran 90 program must
comply fully with the standard and not use any extensions.
While Fortran 90 was designed to be backward compatible with FORTRAN-77,
interpretations differ in five cases. The standard also excludes certain
processor-dependent issues, including the physical representation of numbers,
rounding, input/output records and files, and storage. 
The Fortran 90 standard document spells out in gory detail the various
requirements, prohibitions, and options that must be met. A number of
prohibitions are specifically highlighted as "constraints" in the document.
For example, section 3.3.2 explicitly states: "Constraint: The maximum length
of a name is 31 characters." 
However many of the requirements are implicit in the document and must be
inferred by the reader from the text. For example, the syntax rules require a
name to begin with one of the 26 alphabetic letters and may consist of these
letters, the ten digits, and the underscore (_) character. An implied
requirement would be the prohibition of names beginning with a digit. A test
case might try the following code fragment:
PROGRAM 123TEST
PRINT *, 'Fail'
END
Creating test cases for the explicit constraints is reasonably
straightforward: Devise at least one test for each official constraint in the
standard document. Although this may be time consuming, the procedure is
reasonably well defined. In contrast, the process of testing for other rules
and implied requirements is much more challenging. The document must be
painstakingly scoured for these implicit restrictions. 


A Bumper Crop of Compilers


To gauge the current state of Fortran 90, I worked with most of the available
Fortran 90 compilers, translators, and test suites; see Tables 1 and 2. The
Apogee, Parasoft, and Pacific-Sierra Research (PSR) tools translate from
Fortran 90 to FORTRAN-77 and then invoke a native FORTRAN-77 compiler. VAST-90
from PSR is much more than a simple translator since it contains a powerful
vectorizer and facilities to translate from FORTRAN-77 to Fortran 90. Parasoft
f90 also translates CM Fortran (Thinking Machines) to FORTRAN-77 and has been
widely used by the High Performance Fortran (HPF) community. NAGWare f90
translates Fortran 90 to C and then invokes a native C compiler. The Apogee
compiler uses technology from Kuck and Associates for translation to
FORTRAN-77 using its native Apogee Fortran compiler for SPARC as a back end.
Microway's NDP compiler uses technology from PSR with its existing FORTRAN
compiler as a back end. The remaining vendors--CraySoft, Digital, EPC,
Fujitsu, IBM, and Lahey--have true Fortran 90 compilers. 
The products based on translators come with a driver program and libraries
that automate the process of translating Fortran 90 code and compiling the
resulting files. You must supply a suitable FORTRAN-77 or C compiler with the
Parasoft, PSR, and NAG products. These three translators are offered on
various UNIX workstations (Sun, HP, IBM, and so on) and have been around for
some time. EPC Fortran 90 was first introduced in the summer of 1993. The
others are relatively new, having been released in 1994. 
I tested these tools on a variety of platforms, including a Sun SPARCstation
2+ and a SPARCserver 1000 running Solaris 2.3, an IBM RS/6000 Model 250
(PowerPC) running AIX 3.2.5, and a DEC Alpha AXP Model 3000-50 running OSF/1
3.0. The SunPro Fortran 3.0 and C compilers were used as back ends with the
translators from NAG, PSR, and Parasoft. The IBM and Digital Fortran compilers
were used as back ends for the translators on their respective platforms. 


Problems Being Portable


Writing a good test suite is a daunting task. For completeness, this
necessitates poring through dense standard documents and culling out the
requirements, prohibitions, and other options that must be met. Then the
actual test cases must be written and debugged to verify proper or erroneous
behavior. For automating the process, some sort of driver program or scripts
must be written that compile the test cases, compare the results against
expected behavior, and report the outcome in a usable format. The difficulty
is exacerbated since the motivation for most of these tests (with the
exception of SHAPE) was in support of Fortran 90 compiler development, not an
end unto itself. As long as the test suite worked well with the compiler under
development, it was deemed satisfactory. My own experiences with multiple
Fortran 90 compilers taxed all of the test suites and found significant
problems. 

There can be basic problems with the ways individual tests are written. In the
four general-purpose suites, the largest share of test cases checks for
invalid constructs with negative tests that are expected to fail at compile
time. But the test code might fail for an improper reason, or a compiler might
support some Fortran 90 extensions that allow it to pass. In order to be
certain with negative tests, the compile results need to be saved and
hand-reviewed for appropriate behavior--a very time-consuming task. One
attraction of the NAG test suite is that it is relatively easy to set up
initially. But it does a poor job on negative tests. As long as the compile
failed, it considers the test to have passed. The test scripts don't even save
the compiler output messages for review.
The alternative is to customize negative tests to anticipate specific error
messages from the compiler. This provides the greatest amount of reliability,
but at a cost. With customized checks, the test suite for a single compiler
can take weeks to set up. If compiler error messages are changed, these
customized checks will need to be modified. CraySoft, the Cray test suite is
the worst in this regard, since it tests for matches with specific error
messages that indicate the source line that failed. Lahey's test suite allows
for this behavior in a more flexible way, using custom compiler-specific
blocks in an information file (.inf) for each test. But the Lahey test driver
also supports a default behavior if an info file is not specified, as well as
a default block in a test-specific info file. The Parasoft scripts test for
exact matches for the translation from Fortran 90 to FORTRAN-77 fed to the
native FORTRAN compiler and the run-time output. Fortunately, in the Parasoft
case, scripts are provided to generate the custom check files, assuming that
the translator output is correct--a major assumption. 
Most of the positive tests in all the suites suffer from using list-directed
I/O (print *, i, r, a(10)). The Fortran 90 standard allows complete latitude
by the compiler writer on the format produced with such print statements. Any
number of blanks may be inserted before the actual number or string, the
number of decimal digits displayed for a number is compiler dependent, and the
number of variables displayed on a single line (the assumed line length) can
all vary. Thus, creating output files for checking becomes completely compiler
dependent and consequently, time consuming. 
The SHAPE tests are better since they check the results internally against
known quantities. The NAG test suite tries to handle this with a
general-purpose "check" module linked into many of the tests that handles this
comparison after blanks and commas are stripped. Unfortunately, the test cases
and the check module were written in a manner that assumes certain behavior,
making them compiler dependent. Only after rewriting a large number of test
cases did the NAG tests become portable. Lahey requires that custom output
files be created along with modifying the test info file for each test case if
the output from a compiler does not match the Lahey-supplied comparison files.
This is a tedious process and forced me to limit my use to only portions of
the Lahey test suite. On the other hand, the Lahey test driver does by far the
best job of saving all the input and output files from a test for later
review. 
Another issue is how the entire test suite or portions are invoked and the
results saved for later perusal. The ideal would be an automated scheme that
allowed running all or only portions of a test and created simple yet
comprehensive report files. SHAPE takes the simplest scheme and doesn't
provide an automatic method to run all the test cases. Since it consists of
only 26 files, it would be time-consuming but manageable to execute 26 test
compiles followed by test runs by hand (I wrote my own shell scripts). 
The Lahey test suite comes with source code to a sophisticated driver program
written in C to manage the entire process, saving the results on failures
completely. The other test suites use a series of UNIX shell scripts to invoke
the tests. NAG uses a complex set of nested shell scripts spread over a dozen
different test directories. The Cray test suite requires GNU awk (gawk) for
its shell scripts but doesn't supply source or binaries to this freely
available utility. I spent considerable time removing bugs from the Lahey
driver program and cleaning up and revising the nested shell scripts from NAG.
For use with DOS, I benefitted from the DOS version of the MKS Toolkit (from
Mortice Kerns Systems, Waterloo, ON) with its Korn shell and UNIX-like
utilities that allowed me to rewrite the NAG and CraySoft shell scripts.
Finally, the individual test cases may include code that does not meet the
Fortran 90 standard or assumes certain behavior by the compiler. For example,
a number of the NAG test cases had embedded KIND numbers specific to the NAG
translator. Only after these were removed did the test cases became portable.
SHAPE assumed that preconnected units were supported and issued READ and WRITE
statements without prior OPEN or INQUIRE statements. 
The bottom line is that results from these validation suites should be taken
with a grain of salt. These tests consist mostly of very short, narrowly
focused bits of code that might well not stress a FORTRAN compiler as much as
a sizable "real" program. I, for instance, learned more from trying to get
NAG's rather complex check module to compile than from running many of the
tests.
Since many of the UNIX vendors (DEC, EPC, Fujitsu, IBM, PSR, and of course
NAG) have licensed the NAG test suite, you'd expect a high degree of
compliance with the NAG tests. However, testing these tools with a different
test code may reveal some problems that NAG misses. The SHAPE array tests were
a good case in point. 


Picking Test Cases


Setting up and using a test suite can require an enormous amount of work. The
time this takes is related to how well the driver program and test cases were
written in the first place. One benefit of configuring and running a test
suite is the considerable education you receive on the Fortran 90 language
during the process. 
For the results presented here, I selected what I consider the better parts of
several test suites. The NAG suite is most rigorous when testing for
operations with different KIND numbers. Fortran 90 data types can be declared
with a KIND number to support greater or less range in a portable manner.
However, the standard does not specify what these KIND numbers (integer
values) should be; see Table 5. Intrinsic functions will return the
appropriate KIND value for integer and real data types based on the range and
precision desired. NAG uses a compiler.f90 module that must be customized with
appropriate KIND values and I/O constants for each compiler. NAG uses template
files to generate test programs with these compiler-specific values for
testing array handling, keywords, and I/O. The actual number of test cases
generated depends on the KIND numbers supported by a compiler. The Cray tests
have a similar facility requiring the creation of a configuration file for
each compiler. 
The NAG suite includes a large number of invalid-constraint tests along with
separate tests for detecting obsolete features. NAG also includes a very
rigorous test using keywords as a variable in almost any possible situation.
While using keywords as variables isn't the best practice, this was the only
test suite to address this issue comprehensively. 
SHAPE is very strong on array handling, the sole focus of its tests. Lahey's
LTEST contains a sizable number of test cases for invalid and implied
constraints. Lahey also has separate directories to test some of the intrinsic
functions (over 70) required by Fortran 90 and memory-allocation routines.
Only the CraySoft test suite had separate tests for interpretations issued
since the Fortran 90 document was published. These tests gauge whether a tool
is keeping abreast of changes. Not surprisingly, the CraySoft tests were
strong on array syntax, given their supercomputing heritage. I would have used
more of the Lahey, CraySoft, and Parasoft tests except that the time to write
the necessary customized output files for comparisons was too demanding.


Comparing Compilers


Table 6 presents the results of the selected tests for these Fortran 90
processors. The new native UNIX compilers do quite a good job of complying
with the standard. NAG was responsible for a major test suite, hence its high
level of compliance. On the other hand, some of the products based on
translators still have a considerable way to go. 
Several translator-based products (Apogee, Microway, and Parasoft) suffered in
the NAG suite because many of the test cases depend on a complex check module
linked in with each test. If the translator had difficulty with this one
module, then unfortunately all the tests failed. Although Microway uses
technology licensed from PSR, the translator component is an earlier version
than that currently in PSR's Vast90. This accounts for the difference in
compliance between Microway and PSR. When Microway merges in the latest PSR
updates, compliance results should be similar. I should stress that both the
Apogee and Fujitsu products were still in beta and will undoubtedly improve
before commercial release. 
The compliance results based on LTEST were generally lower than the equivalent
NAG test outcomes. This likely reflects the fact that many of the vendors use
NAG for regression testing. The Lahey and NAG suites also weigh tests
differently, emphasizing a particular constraint or rule in separate test
cases. 
The number of failed constraint tests should be viewed with caution. For
example, the NAG suite includes 66 separate test cases that use variable names
longer than the Fortran 90 standard of 31 characters. If a compiler does not
fail these tests and omits reporting these as Fortran 90 extensions, this will
account for 3 percent of the failures. Similar numbers of repeated tests exist
for other common extensions (binary and hexadecimal constants, for example).
All constraint tests are weighted equally. Yet many of these tests are
nitpicking and don't affect real programs. For example, the standard specifies
that only a single PRIVATE or SEQUENCE keyword may appear in a derived type.
Extra (extraneous) PRIVATEs will likely have no effect on your current
programming efforts. 
Surprisingly, the Cray, Fujitsu, and Microway FORTRAN compilers failed to
recognize and report the obsolete features as required by the FORTRAN
standard. 
While for most tests the DEC compiler exhibited a high degree of compliance,
it failed a substantial number of keyword tests. A minor change to the NAG
test suite found an internal error in the compiler that accounts for most of
these failures. The SHAPE test suite was more problematic for many of the
products. Consisting of 22 source files varying in size from 20K to 1.5
Mbytes, each file contains from a dozen to 800 test cases. If a file failed to
compile, I was forced to note all the results as failures. 
Fortran 90 provides for the use of interface blocks to document how parameters
to a subroutine or function are to be used. The intent of parameter use can be
designated as IN (input only), OUT (output only), or INOUT (both). Aside from
its documentation value, this feature can be very useful in catching bugs such
as using a variable before it is initialized. Although not required by the
standard, PSR's Vast90 was the only product that flagged incorrect interface
blocks. The Fujitsu compiler supports a useful run-time option to check for
proper subroutine and function parameter types at run time. This found one of
the subtle bugs in the NAG check module. 
Along with the NAG translator, UNIX compilers from EPC, IBM, and Fujitsu lead
the pack. If I had to choose an official winner, it would probably be EPC.
Under DOS, the Lahey compiler exhibited a much higher level of compliance than
Microway. 
An obvious concern is the quality of code produced and the effect of
optimization on language compliance. All of the test results in Table 6 were
based on executing the compilers with optimization disabled. As a simple test,
I also reran the NAG suite using the IBM compiler with default optimizations
enabled (--O switch). Language compliance was reduced by about 5 percent. 


Other Considerations


Of course, official language compliance isn't everything. Most of these tools
also support a range of extensions to allow easy use with older FORTRAN. The
Digital and IBM tools are designed to swallow older FORTRAN written to their
earlier mainframe and minicomputer compilers. The IBM XL compiler also
supports Cray, VAX, and Sun extensions. EPC and Lahey also support a number of
VAX FORTRAN extensions. Apogee and Fujitsu offer language support for a number
of Sun FORTRAN extensions. Parasoft and PSR support CM Fortran (Thinking
Machines) extensions commonly used by the high-performance FORTRAN community.
DEC supports the HPF extensions (FORALL, for example) and offers options for
its Parallel Software Environment. 
There are other considerations aside from language compliance. For many
FORTRAN users, this may equate with the execution speed of the code generated
or the ease of use. Several of these products (EPC, Lahey, and Microway) come
bundled with debuggers designed to support Fortran 90. Apogee includes a
version of the GNU gdb debugger that has been modified to support FORTRAN-77
syntax. Fujitsu offers its compiler separately and also bundles it with a
Fortran 90 workbench environment supporting source-code editing and debugging.
DEC and IBM offer a separate workbench with which their Fortran 90 products
may be integrated. The DEC FUSE workbench currently lacks full support for
Fortran 90. The IBM Fortran POWERbench is an integrated GUI environment for
UNIX that fully supports the Fortran 90 language. The translators from NAG,
PSR, and Parasoft make debugging more difficult since the user is dealing with
translated code rather than with the original Fortran 90 source. NAG provides
some limited guidance in its manual to assist debugging the translated C code.
Of primary consideration is the quality of the code generated and the speed of
execution. Comparing the translators, the Apogee, Parasoft, and PSR products
that use FORTRAN-77 as an intermediate language produce noticeably faster code
than the NAG C translations. Much of this is likely inherent in the difference
between C and FORTRAN for numerics. Contrary to common assumptions, native
Fortran 90 compilers don't necessarily produce the fastest code. Because PSR's
Vast90 and Apogee's Fortran 90 include sophisticated vectorizers, their code
can often outperform that of native compilers. Using Vast90, this was the case
across all of the platforms tested (Sun, IBM, and DEC). IBM even offers a
special version of the PSR vectorizer (VAST2) at additional cost for regular
IBM FORTRAN users. The DEC compiler currently creates scalar code and lacks
the ability to generate faster vectorized output for the DEC Alpha. 


Back to the Future


Fortran 90 holds a lot of promise as a means to modernize the FORTRAN language
and FORTRAN user community. The ideal development environment would offer full
compliance with the Fortran 90 standard and yet support all of the old FORTRAN
code still being used today. Fortunately, many of the compilers are
approaching this goal. For some of the translator-based tools, complying with
the Fortran 90 standard remains a challenge. Compared with their UNIX
counterparts, the DOS compilers offer poor language compliance. While it has
taken longer than expected, programmers now have a healthy range of choices
for quality Fortran 90 development tools. Based on these tools and those still
to come, FORTRAN as a language may well be around for another half a century. 


For More Information


Apogee-Fortran 90 
Apogee Software
1901 South Bascom Avenue Suite 325
Campbell, CA 95008-2207
800-854-6705, 408-369-9010
CraySoft Fortran 90 SPARC 
Cray Research Inc.
1440 Northland Drive

Mendota Heights, MN 55120
800-289-2729, 612-683-3030
DEC Fortran 90 
DEC
146 Main Street
Maynard, MA 01754
800-332-7923
EPC Fortran 90 
Edinburgh Portable Compilers
20 Victor Square
Scotts Valley, CA 95066 
800-EPC-1110, 408-438-1851
Fujitsu Fortran 90 
Fujitsu Open Systems Solutions
3055 Orchard Drive
San Jose, CA 95134
800-545-6774, 408-456-7853
AIX XL Fortran Compiler/6000 
IBM 
1133 Westchester Avenue
White Plains, NY 10604
800-426-2255, 914-642-3000
LTEST Test Suite for Fortran 90
Lahey Fortran 90 
Lahey Computer Systems
865 Tahoe Boulevard
Incline Village, NV 89450
800-548-4778, 702-831-2500
NDP Fortran 90 
Microway
P.O. Box 79, Research Park
Kingston, MA 02364
508-746-7341
NAG Fortran 90 Test Suite
NAGWare f90 
Numerical Algorithms Group
1400 Opus Place, Suite 200
Downers Grove, IL 60515-5702
708-971-2337
Portable Fortran 90 
ParaSoft
2031 South Myrtle Avenue
Monrovia, CA 91016
818-305-0041
SHAPE Test Suite
Spackman & Hendrickson
13708 Krestwood Drive
Burnsville, MN 55337
612-892-5847
VAST-90 
Pacific-Sierra Research
2901 28th Street
Santa Monica, CA 90405
310-314-2300
Table 1: Fortran 90 compilers/translators tested.
Tool Version Vendor Available Platform(s) 
Apogee Fortran 90 3.0 Apogee Solaris 2 (SPARC), SunOS
CraySoft Fortran 90 1.0 Cray Solaris 2 (SPARC)
DEC Fortran 90 1.1 Digital DEC OSF/1 (Alpha)

Fortran 90 1.1.2 EPC RS/6000, SGI (Mips 3000/4000),
 Solaris 2 (SPARC), SunOS
Fortran 90 2.0beta Fujitsu Solaris 2 (SPARC)
XL Fortran 1.1 IBM AIX (RS/6000)
Lahey Fortran 90 1.00D Lahey DOS (on 386, 486, or Pentium)
NDP Fortran 90 4.5 Microway DOS (on 386, 486, or Pentium)
NAGWare f90 2.0b NAG HP-UX, RS/6000, SCO, SGI, Solaris
 2 (SPARC), SunOS
Portable Fortran 90 2.0 Parasoft Convex, HP-UX, IBM 3090, RS/6000, SGI,
 Solaris 2 (SPARC), SunOS
Vast90 2.6 PSR HP-UX, RS/6000, Solaris 2
 (SPARC), SunOS
Table 2: Fortran 90 test suites.
Tool Vendor 
Cray Fortran 90 Test Suite CraySoft
LTEST Lahey
NAG Fortran 90 Test Suite Numerical Algorithms Group (NAG)
Parasoft Fortran Test Suite Parasoft
SHAPE Spackman & Hendrickson
Table 3: General Fortran 90 requirements. (The term "detect" represents the
phrase "can detect and report" in the standard.)
Execute any standard-conforming program consistent with the standard (with
caveats on program size and complexity). 
Detect the use of deleted or obsolescent features. 
Detect the use of code not permitted by the Fortran 90 syntax rules. 
Detect the use of kind type parameter values not supported by the processor. 
Detect the use of source form or characters not permitted. 
Detect name usage not consistent with Fortran 90 scope rules for names,
labels, operators, and assignment symbols. 
Table 4: Fortran 90 obsolescent features.
Alternate return
Arithmetic IF
ASSIGN and assigned GOTO statements
Assigned FORMAT specifiers
Branching to an ENDIF statement from outside the IF block
H edit descriptors
PAUSE statement
Real and double precision DO loop variables
Shared DO termination on a statement other than ENDDO or CONTINUE
Table 5 Fortran 90 KIND numbers. (All tools support a single Character KIND of
1, with Fujitsu compiler adding a KIND number of 2 for Japanese characters.)
Table 6 Fortran 90 compliance results.

























Virtual Reality and the WorldToolKit for Windows


A C library for constructing high-end virtual worlds in a 32-bit environment




Ron Fosner


Ron is a principal software developer at Lotus Development, where he
researches and develops graphical and interactive techniques for data analysis
and exploration. Ron can be contacted at ron@lotus.com.


One maxim in software engineering is that if it needs to be faster, do it in
hardware. And when it comes to virtual reality, software engineers appear to
have taken this maxim to heart. Serious virtual-reality (VR) packages often
require huge amounts of disk space, memory, and computing power, as well as
specialized hardware such as data gloves and goggles. Indeed, the leading
workstation platform for VR applications is a Silicon Graphics workstation
with a RealityEngine dedicated graphics processor. 
At the same time, Windows is sorely underpowered when it comes to graphics,
particularly those needed for fast rendering of three-dimensional shaded
objects. However, the WorldToolKit for Windows (WTKWIN) from Sense8 overcomes
such obstacles without resorting to high-performance hardware. The
WorldToolKit is a library of over 400 C routines designed to provide optimized
performance of interactive 3-D programs under any 32-bit version of Microsoft
Windows (Win 3.1 with Win32s to Windows NT 3.5). The toolkit takes care of
most of the requirements of VR, providing 3-D rendering, object interactivity,
drivers for a number of input devices, and other items necessary for
simulating a virtual world. 
In this article, I'll create a VR application that tracks down a contamination
problem in an hypothetical town. The test problem I've designed lends itself
particularly well to modeling. In the process, I'll examine how the
WorldToolKit is used to construct a virtual world, cover some of the
limitations you can expect when using the toolkit, and discuss the overall
system requirements when creating a virtual world. 


The WorldToolKit


The WorldToolKit is a real-time graphics development environment for building
3-D simulations and virtual-reality applications. Although in this article
I'll focus on the Windows version of WorldToolKit, it is also available for
platforms ranging from UNIX-based workstations (SGI Irix, Sun Solaris, Kubota
Kenai, to name a few) to PCs (running Windows or DOS). This is important
because WorldToolKit's hardware independence makes cross-platform development
possible. A WorldToolKit application written for DOS, for instance, will
compile on high-end workstations. 
WTKWIN contains routines to control, interact, view, and change objects in a
3-D view. The user operates any of a number of input devices to move around or
manipulate objects in the world view. WTKWIN automatically displays a view
into the virtual world and takes care of all aspects of the display, including
the view perspective, shading, texture mapping, display updating, and querying
input devices. All the user needs to do is design the virtual world's objects
and their behavior, hook up the input devices, and run the simulation.
Additionally, WTKWIN can read 3-D objects created with tools such as Autocad,
3D Studio, Swivel 3D, or other modelers that generate .DXF or .3DS files. 
A WTKWIN simulation contains a number of items. The first, called a
"universe," holds all things to be simulated. Things can be dynamically added
to or deleted from the universe, and the universe can be started or stopped.
The universe can also contain a "portal," a polygon that loads another
universe or executes some user-defined function after the user has passed
through it. The basic graphical entity in a universe is an "object." An object
is anything that resides in the universe and is usually represented
graphically. There are both static and dynamic objects. Static objects are
usually the background of the universe. They neither move nor change during
the simulation. In a simulation of a house, the walls, floor, and ceiling
would be static. A dynamic object can change or interact in the simulation. A
dynamic object can also have a task associated with it. For example, if an
object is affected by gravity, the task associated with the object moves it
according to the laws of gravity. Objects can be collected into hierarchies
that affect each other. If you have a box with a hinged lid, the lid can swing
open or closed. But if you drop the box, the lid goes with it. In addition,
you can create objects on the fly as the simulation runs.
An object is described as a collection of polygons each of which, in turn, is
described by a collection of vertices (corners). For example, the simplest
polygon is a triangle which consists of three vertices. In WTKWIN, a polygon
can have up to a maximum of 256 vertices. An object is also described by its
appearance: It's given attributes such as color, texture, and size. All
attributes and vertices can be changed dynamically.
Light sources can also be created for the universe. You can specify ambient
light (the amount of brightness that illuminates everything in the universe)
and actual light sources, which have direction and brightness. During each
frame of the simulation, WTKWIN automatically handles the shading of each
polygon in the universe, thus presenting the user with a realistically shaded
view. 
Viewpoint objects are the frame of reference from which a view is rendered.
Typically, a viewpoint object is attached to a sensor object (like a mouse,
forceball, or joystick). Thus, when the sensor is moved, so is the viewpoint.
A simulation can have multiple viewpoints, sensors, and lights.
A WTKWIN program spends most of its time in a simulation loop. The static and
dynamic objects are created, the sensors are initialized, the universe is
started, and the simulation loop begins; see Figure 1. Inside the loop, the
sensors are read, tasks are performed, positions are updated, and the universe
is rendered. Windows events are also queried, along with sensor input. Most
impressive, however, is that WTKWIN insulates you from Windows, while
providing versatility. You can take a program written for WTKWIN, and if you
haven't added any Windows-specific items, you can port it with little change
to an SGI workstation. On the other hand, you can add a regular Windows menu
onto the VR window and make calls to the Windows API, including GDI calls to
draw into the VR window.


VR for Problem Analysis


Imagine that a serious problem has surfaced in the town of Virtual Falls--a
resort community with little native industry other than tourism. The local
government has noticed the lake in the area (a major attraction for
vacationers) showing increasing levels of a contaminant that is causing algae
to grow progressively faster during the summer months. If this contaminant is
not removed, the lake will eventually be choked, killing most of the fish and
ruining the aesthetic appeal of the lake. The pollutant has been determined to
be the result of a chemical reaction between the runoff from an old mine in
the mountains and a component in an ash typically found as a byproduct from a
paper-products factory. The ash builds up over the dry summer and winter
months, then reacts with the runoff when the spring rainy season starts. Only
two nearby paper-products factories could possibly be the source of the ash,
but they each place the blame on the other. 
Each company's ash contains a chemical that will react with the mine runoff.
Each ash will also react with the other's in a chemical reaction that will
deplete the chemical before it can react with the runoff. This prevents the
local government from determining the extent of each company's responsibility.
The available evidence includes measurements taken throughout the area of just
three chemical levels: the mine runoff, Company A's ash, and Company B's ash.
Since these measurements were taken before the start of the rainy season,
there is no information on how the two ashes react prior to meeting up with
the mine runoff. All the authorities have is the raw data and the knowledge
that the two ashes consume each other in equal parts, and the minimum levels
of each chemical required to react with the mine runoff.


Construction of the Virtual World


The first step in creating the virtual world is to generate the underlying
topographical data. Most VR systems accept some form of a geometry file
describing various 3-D objects. However, 3-D terrain is usually fairly data
intensive. For example, the French SPOT satellite data typically has 20-meter
resolution, which translates to about 1,300,000 pixels for a 10-square-mile
area. Trying to run over a million polygons through a couple of transformation
matrices and lighting calculations at a rate of 15 times per second (or
faster) is problematic at best. Consequently, I took advantage of WTKWIN's
ability to read in an ASCII format file called a "neutral file format"
(NFF)--a collection of vertex and polygon definitions. Thus, I was able to
create a mathematical model of the major terrain features and generate an NFF
file at any resolution desired.
The next step is adding features. For this example, I want to use a visual
representation of the data, voice command and response, audio queues, and
interactivity. Additionally, I want the user to be able to travel in, around,
or through the data, and to query the data. For example, to query the program
about the ash concentrations at a particular point, you select that point with
the mouse. WTKWIN reduces the mouse pick to a particular polygon in the
virtual world, from which we can extract the coordinates in the nonvirtual
world. Using these coordinates, we can get the chemical concentrations at that
location and then pass them to the output routines.
The next step is processing the user's request to change some aspect of the VR
world. The request can be via voice command, direct manipulation of objects,
or a menu interface placed upon the VR view. Some VR toolkits have reasonably
complete menu systems that operate in a manner similar to that of Windows; in
the VR view the menu system remains fixed in front of the user's view much
like you'd expect to see in a head-mounted display. WTKWIN doesn't have a
formal menuing system. However, one of the demo programs creates an array of
buttons that remain in front of the user, with each button having a bitmap
texture applied to it. Alternatively, you could create your own menu system
using WTKWIN's 3-D text objects to create a true 3-D menu. Alternately, you
can use Windows' menu system and have a menu across the top of the window,
just like any other Windows program. 


The Virtual World


The various parts that make up the VR test program have been assembled. The
program will read in the terrain model, initializing the user viewpoint to be
over the southwest corner, looking inwards; see Listing One, page 102. The
user has, by default, a keyboard interface and mouse interface. If a
six-dimensional device is specified on the command line, it's used as the
motion sensor. If no 6-D device is specified or found, then the mouse is used
as the motion sensor. Once the world is initialized and the simulation loop
has started, the user can fly around or change it. The user can bring up a
menu in the VR window that contains a number of options.
The first two buttons that pop up in the VR display provide an exit from the
program and a method to reset the viewpoint, respectively. The next button
toggles the voice annotation, while the last one brings up three other
buttons, which modify the landscape by indicating levels of the various
chemicals we're interested in. As each chemical button is toggled, an
indicator appears in the upper-left corner and its particular concentrations
are calculated. Chemical levels sufficient to react are then displayed. When
more than one chemical is active, the chemical reactions are accounted for and
the remaining chemical levels left can still react are displayed. Each
chemical has a different color, so the mixtures of chemicals plus the
resulting pollutant can easily be seen. Thus the user can toggle the runoff
and each ash type and discover the dispersion areas. When all three are
toggled, you can fly around the landscape, tracing the pollution back to its
source. If you trace the pollutant up a river, you can see where the pollutant
enters the river. Tracing it back to its source, you can fix on the origin of
the pollution--in this case, Company A. The complete program including the
full source code are available electronically; see "Availability," page 3.


Hardware and Software Requirements for PC VR


I started off working on a Compaq 486/50, using a mouse, 32-Kbyte color video
board, and MediaVision ProAudio 16 sound board. With the addition of a 6-D
force ball, this platform was acceptable for using the WTKWIN for programs of
up to medium complexity; about 1000 polygons at a low level of refresh (less
than six frames/sec). The important parts are the computer and the video
board. You'll need a fast computer with a math coprocessor and as fast a video
board as you can get, preferably one with a VL or PCI bus. 

If you decide to go all out, you may want to consider a faster processor,
possibly a Pentium-based PC with a PCI bus, or a 6-D input device. The Pentium
is a big jump up from a 486 due to its redesigned math coprocessor, and the
PCI bus makes for a fast graphics pathway. 6-D input devices make interaction
with the virtual world much more natural. Once you experience using one,
you'll discover how intuitive 6-D motion can be. Throw in a MIDI sound board,
a video board, and monitor capable of at least 800x600x32K colors, and you
have quite a respectable PC VR platform. In my case, I used a Compaq Deskpro
5/50M (60-MHz Pentium), coupled with a Matrox MGA Impression video board with
3 Mbytes of RAM, a Media Vision ProAudio Studio 16 sound board, and an IDEK
21-inch monitor.
For information on 6-D input devices, contact Logitech (510-795-8500),
Spaceball Technologies (508-970-0330), or CIS Graphics (508-692-9599). Prices
range from about $300 to $1500.
Other options include the Mattel PowerGlove (if you can find one) or the Sega
Visor (due out sometime soon). WTKWIN will support both. The toolkit also
supports 3-D stereo video monitors from StereoGraphics, which can produce some
dazzling effects without a head-mounted display. However, you still have to
wear LCD shutter glasses that look like thick, nerdy sunglasses. In total,
you're looking at between $5000 and $7000 for a VR development system,
including the computer.
Aside from the basic Windows 3.1 with Win32s installed, WTKWIN will run under
Windows NT and Windows for Workgroups. For development, you'll need a 32-bit
C/C++ compiler such as: Watcom C/C++ 9.5, Borland C++ 4.x, Symantec C++
Professional 6.1, and Microsoft Visual C/C++ 32-bit 1.1. I used the Microsoft
compiler. You'll also want a 3-D modeling program to design the parts of your
virtual environment, and a paint program to create and edit textures with
which to paint the polygons.


Summary


Generally, I found the process easier than I expected. The toolkit performed
well and made development of a real application possible. In the end, I came
out with a fairly detailed virtual reality, complete with interactivity and
voice annotation. The speed at which you can fly around is good, as are the
refresh rates (around 5--6 times/sec). Although I was using somewhat high-end
equipment, this is the target that most PC-based VR will be designed for. And
as better video hardware and rendering boards become available, expect to see
some dramatic improvements in rendering times.


For More Information


WorldToolKit for Windows Toolkit
Sense8 Corp.
4000 Bridgeway, Suite 101
Sausalito, CA 94965
415-331-6318
$795.00
Figure 1 The simulation loop.

Listing One 
/***********************************************************************
Function: user. After WTK initialization routines are done, they call the 
routine WTuser, which can be compared to the normal C function main. WTuser is
passed in argc & argv, just like main is. Universe and sensors are created and
initialized here, and actions are connected to the universe or its objects.
***********************************************************************/

int WTSTD WTuser(int argc, char *argv[])
{
 printf("Geological Terrain/Pollution VR Demo\n");
 printf("using the Sense8 WorldToolKit\n");
 printf("Programmed by Ron Fosner, 1994\n");
 printf("Parts of this program are Copyright 1994 Sense8 Corporation\n");
 // read command line arguments
 ScanCommandLineArgs(argc, argv);
 // initialize the static universe
 printf ("Creating new universe\n");
 WTuniverse_new(WTDISPLAY_DEFAULT, WTWINDOW_DEFAULT);
 uview = WTuniverse_getviewpoint();
 // prepare to read keyboard
 WTkeyboard_open();
 // Load in the terrain NFF file
 LoadTheUniverse();
 // load in the texturemaps for the buttons
 LoadTheButtons();
 // create the industrial sites and place then in the terrain
 LoadTheIndustrialSites();
 // load some default lights (these are locations & directions in a file)
 printf ("Loading lights\n");
 if ( !WTlight_load("lights") )
 printf("Couldn't read lights\n");
 // set universe action function
 WTuniverse_setactions(UniverseActions);
 printf("Universe ready\n");
 WTuniverse_ready();

 // OK, the universe is set, now hook it up to the outside world
 InitTheSensors();
 // enter main loop
 printf("Universe go\n");
 WTuniverse_go(); // we'll remain in this function till the user quits
 // all done - clean up
 WTuniverse_delete();
} 
/* Function: LoadTheUniverse. Loads NFF file that describes the terrain model.
It also allocates memory to hold the initial colors of the polygons
that make up the universe for later replacement of modified colors. */
void LoadTheUniverse()
{
 WTpq modelpq;
 printf ("Loading stationary model: '%s'\n",universe_model);
 if ( !WTuniverse_load(universe_model,&modelpq,1.0) )
 {
 // Use the supplied WTerror function that will simply
 // write an error message to the text window
 WTerror("Couldn't load file '%s'", universe_model);
 }
 WTviewpoint_setposition(uview, modelpq.p);
 WTviewpoint_setorientation(uview, modelpq.q);
 WTviewpoint_zoomall(uview); // make sure we can see it all
 // now save the initial viewpoint so the user can get back
 // to the original position in case they get lost
 WTviewpoint_getposition(uview, initial_pq.p); 
 WTviewpoint_getorientation(uview, initial_pq.q);
 printf("There are %ld polygons in the stationary universe\n",
 WTuniverse_npolygons());
 // Now save the original colors
 poly_array_pointer = malloc( sizeof(unsigned) * WTuniverse_npolygons() );
 if ( NULL != poly_array_pointer )
 {
 SavePolyColors();
 }
 else
 {
 printf ("Not enough memory to save the polygon colors\n");
 }
}
/* Function: LoadTheButtons. Create the VR view's UI. An alternate UI that
will
hover directly in front of the user's VR view. We create "buttons" that the
user can press (using the right mouse button) that will trigger various tasks.
We specify a bitmap to paint each button. We also create a second level menu 
system that will be displayed when the a first level button is toggled. */
void LoadTheButtons(void)
{
 Impart("Loading button images");
 // associate a bitmap with a task
 NewButton("sunset", 0.0, 0.95, QuitTask);
 NewButton("buteye", 0.2, 0.95, ResetViewpointTask);
 NewButton("uvula", 0.4, 0.95, VoiceAnnotateTask);
 NewButton("ash1", 0.6, 0.95, Ash1Task);
 NewButton("info", 0.8, 0.95, InfoTask);
 NewButton("butbulb1", 1.0, 0.95, ModifyLightingTask);
 // These are the 2nd tier menu buttons
 // Lighting tasks
 UpButton = NewButton("butarrwr", 1.0, 0.75, BrightenLightTask);

 DownButton = NewButton("butarrwl", 1.0, 0.55, DimLightTask);
}
/* Function: NewButton. Create 3D button object in universe. Button is given
a task (ButtonTask) which ensures that the button object follows the
viewpoint,
so that interface will always be in front of user. Button is also given an
action, which is the action that's to occur when the user presses button. */
WTobject * NewButton(
 char * texturename, // file containing button image
 float x, // x screen coord of button. between 0.0 and 1.0
 float y, // y screen coord of button. between 0.0 and 1.0
 void (* button_action)() // what the button does when activated
 )
{
 Buttoninfo *info;
 WTobject *o;
 Buttonlist *blist;
 WTpq modelpq;
 // load in button object template, a simple rectangle
 // (We'll differentiate them by specifying a unique bitmap texture)
 o = WTobject_new("button.nff", &modelpq, 1.0, FALSE, TRUE);
 if ( !o )
 {
 WTerror("Couldn't load file 'button.nff'");
 }
 if ( !WTobject_settexture(o, texturename, FALSE, FALSE) )
 {
 printf("Couldn't apply texture %s\n",texturename);
 printf("Perhaps the textures in images\\"
 "buttons are not on your VIM path?\n");
 }
 info = malloc(sizeof(Buttoninfo));
 info->x = x;
 info->y = y;
 info->action = button_action;
 WTobject_setdata(o, (void *)info);
 // Assign task that'll keep all buttons aligned
 WTobject_settask(o, ButtonTask);
 // chain button into global buttonlist
 blist = malloc(sizeof(Buttonlist));
 blist->button = o;
 blist->next = buttons;
 buttons = blist;
 // interface is initially off
 WTobject_remove(o);
 return o;
}
/* Function: ButtonTask. A fairly complicated function I stole from Sense8 
to keep user buttons right in front of the viewpoint. Thus, whenever user
swings viepoint around, this will ensure that buttons swing right along. */
void ButtonTask( WTobject *obj )
{
 WTq q;
 WTp3 p;
 Buttoninfo *info;
 long x0, y0, x1, y1;
 WTwindow *curr;
 float horiz_angle, vert_angle, height, width;
 WTviewpoint_getorientation(uview, q);
 WTviewpoint_getposition(uview, p);

 WTobject_setorientation(obj, q);
 WTobject_setposition(obj, p);
 curr = WTuniverse_getcurrwindow();
 WTwindow_getposition(curr, &x0, &y0, &x1, &y1);
 width = x1;
 height = y1;
 // fetch the data stored with the button
 info = (Buttoninfo *)WTobject_getdata(obj);
 // Fetch the current viewpoint's half angle
 horiz_angle = 2.0 * WTviewpoint_getviewangle(uview);
 // and vertical viewing angle 
 vert_angle = horiz_angle * ( height / width );
 // caclulate new location of the buttons so that they remain in the same
 // relative place with respect to our viewing position this has the effect
 // of making them "float" in space directly in front of our viewpoint.
 p[Z] = 5.6;
 p[X] = (info->x - 0.5) * p[Z] * horiz_angle;
 p[Y] = (info->y - 0.5) * p[Z] * vert_angle;
 // OK, so traslate the buttons position
 WTobject_translate(obj, p, WTFRAME_VPOINT);
}
/* Function: ButtonsToggle. Adds/Removes the buttons from the universe */
void ButtonsToggle( void )
{
 static FLAG button_control_on = FALSE;
 button_control_on = !button_control_on;
 if ( button_control_on )
 {
 ButtonsAdd();
 }
 else
 {
 ButtonsRemove();
 }
}
/* Function: ButtonsAdd. Adds nontransient buttons to universe using 
WTobject_add. They have a location, and ButtonTask will enure they're
correctly
drawn when the next screen refresh occurs. */
void ButtonsAdd()
{
 Buttonlist *blist;
 for ( blist=buttons ; blist ; blist=blist->next )
 {
 // skip buttons that are transient
 if ( blist->button != UpButton
 && blist->button != DownButton
 )
 {
 WTobject_add(blist->button);
 }
 }
}
/* Function: ButtonsRemove. Removes buttons from the universe */
void ButtonsRemove()
{
 Buttonlist *blist;
 for ( blist=buttons ; blist ; blist=blist->next )
 {
 WTobject_remove(blist->button);

 }
}
/* Function: ButtonsAction. */
FLAG ButtonsAction( WTobject *obj )
{
 Buttonlist *blist;
 Buttoninfo *info;
 for ( blist=buttons ; blist ; blist=blist->next )
 {
 if ( blist->button==obj )
 {
 info = (Buttoninfo *)WTobject_getdata(obj);
 info->action();
 return TRUE;
 }
 }
 return FALSE; /* no button picked */
}
/* Function: LoadTheIndustrialSites. Demonstrates how to call a function to 
operate on each polygon in the static universe. In this case, the function to 
call takes the center of gravity of each polygon. This is used to calculate
the
pollution concentration of the whole polygon and then set the color. */
void LoadTheIndustrialSites( void )
{
 WTobject * mine_obj, * ash1_obj, * ash2_obj;
 WTp3 pos;
 /* first place the mine */
 mine_obj = WTobject_newblock(1,1,1,FALSE,TRUE);
 ash1_obj = WTobject_newblock(5,5,5,FALSE,TRUE);
 ash2_obj = WTobject_newblock(5,5,5,FALSE,TRUE);
 if ( NULL == mine_obj 
 NULL == ash1_obj 
 NULL == ash2_obj )
 {
 Impart("Industrial sites could not be created");
 return;
 }
 /* set color to near black */
 WTobject_setcolor( mine_obj, 0X111 );
 WTobject_setcolor( ash1_obj, 0X111 );
 WTobject_setcolor( ash2_obj, 0X111 );
 /* mine position */
 pos[X] = -27.0; pos[Z] = +39.0; pos[Y] = -23.2;
 WTobject_setposition( mine_obj, pos );
 /* ash sites */
 pos[X] = -53.0; pos[Z] = -39.0; pos[Y] = -16.3;
 WTobject_setposition( ash1_obj, pos );
 pos[X] = +36.0; pos[Z] = +6.0; pos[Y] = -19.7;
 WTobject_setposition( ash2_obj, pos );
}
/* Function: InitTheSensors. Attempt to initialize specified sensor and mouse,
and where we take care of WTK fundametals like attaching a sensor to a 
viewpoint. While deceptively simple, this is an important point of using the 
WTK, that you can get a lot of mileage out of a few simple calls. */
void InitTheSensors(void)
{
 char answer[10];
 printf ("Setup sensors\n");
 // set up the sensors as requested on command line

 if ( use_geoball )
 {
 sensor = geoball = WTgeoball_new(com[geoball_on]);
 }
 if ( use_spaceball )
 {
 /* Since I'm not that great juggling 6 dimensions, you can use the
 Spaceball mode that limits the user to one dimension at a time; 
 just use the next line instead of the one below it */
// sensor = spaceball = WTspaceball_newdominant(com[spaceball_on]);
 sensor = spaceball = WTspaceball_new(com[spaceball_on]);
 }
 // if not using any other sensors, use mouse
 if ( !use_spaceball && !use_geoball )
 {
 mouse = WTsensor_new(WTmouse_open, WTmouse_close, 
 WTmouse_moveview2, NULL, 1, WTSENSOR_DEFAULT);
 if (mouse)
 {
 movemouse = TRUE;
 sensor = mouse;
 }
 else
 {
 printf("Unable to open mouse!\n");
 }
 }
 else
 {
 movemouse = FALSE;
 /* we need mouse for polygon picking. */
 mouse = WTsensor_new(WTmouse_open, WTmouse_close,
 WTmouse_drawcursor, NULL, 1, WTSENSOR_DEFAULT);
 } 
 // Use a WTK function to scale sensor speed with size of universe
 // This is one of the "magic" things that make life easier
 WTsensor_setsensitivity(sensor, 0.1 * WTuniverse_getradius());
 normalspeed = WTsensor_getsensitivity(sensor);
 /* OK, now here is an important part of using WTK. Attach the selected
 sensor to the viewpoint (you can attach a sensor to just about anything, 
 like a box, but here we want to move the viewpoint, not an object). This
 allows you to easily connect the viewpoint to a manipulatory device. WTK 
 take care of the universe from here! */
 if (use_geoball)
 {
 WTviewpoint_addsensor(uview, geoball);
 }
 if (use_spaceball)
 {
 WTviewpoint_addsensor(uview, spaceball);
 }
 if (mouse)
 {
 WTviewpoint_addsensor(uview, mouse);
 }
}
/* Function: UniverseActions. WTK-called function WTuniverse_setactions()
in function user. The equivalent to a message queue. This is where we poll the
input devices and act upon any changes from the user. If we make any changes 

to the universe (like changing the color of a polygon) WTK will take care of 
it in the next cycle, we don't have to take any other action. */
void UniverseActions()
{
 int key;
 WTobject *obj;
 // These are the defined actions if we are using the Spaceball...
 if ( use_spaceball )
 {
 /* stop by pressing 8 on the spaceball */
 if ( WTsensor_getmiscdata(spaceball) & WTSPACEBALL_BUTTON8 )
 {
 QuitTask();
 }
 /* teleport to initial view by pressing button 7 */
 else if ( WTsensor_getmiscdata(spaceball) & WTSPACEBALL_BUTTON7 )
 {
 ResetViewpointTask();
 }
 /* turn on/off the buttons by pressing button 1 */
 else if ( WTsensor_getmiscdata(spaceball) & WTSPACEBALL_BUTTON1 )
 {
 ButtonsToggle();
 }
 }
 /* These are the defined actions if we are using the Geometry Ball... */
 if ( use_geoball )
 {
 /* stop by pressing both buttons on the Geoball */
 if ( WTsensor_getmiscdata(geoball) ==
 (WTGEOBALL_LEFTBUTTONWTGEOBALL_RIGHTBUTTON) )
 {
 QuitTask();
 }
 /* toggle buttons by pressing the left button on the Geoball */
 else if ( WTsensor_getmiscdata(geoball) == WTGEOBALL_LEFTBUTTON )
 {
 ButtonsToggle();
 }
 }
 /* Mouse actions. This is only active if mouse is not being used for 
 movement. If users have no other device for movement, then they must 
 toggle between mouse movement and mouse movement and mouse picking. */
 // left mouse button to pick polygons (like terrain)
 if ( !movemouse && (WTsensor_getmiscdata(mouse) & WTMOUSE_LEFTBUTTON))
 {
 poly = WTuniverse_pickpolygon(*(WTp2*)WTsensor_getrawdata(mouse));
 if ( poly )
 {
 WTp3 c_g;
 WTpoly_getcg(poly,c_g);
 // Simplified version, just caclulate ash 1 concentration rather
 // than the total concentration depending upon which chemical
 // composition buttons are active. We only care about ash 1 conc.
 ImpartAsh1ConcentrationAtLocation(c_g);
 }
 }
 // right mouse button to pick objects (like buttons)
 if ( !movemouse && (WTsensor_getmiscdata(mouse) & WTMOUSE_RIGHTBUTTON))

 {
 // get the location of the object under the mouse cursor
 obj = WTuniverse_pickobject(*(WTp2*)WTsensor_getrawdata(mouse));
 if ( obj ) // is there one under the cursor?
 {
 ButtonsAction(obj); // see if it's a button, if so do it's task
 }
 }
 // process the keyboard (Notice that I don't use any Windows calls...)
 key = WTkeyboard_getkey();
 if (key)
 {
 HandleKeyPress(key); // pass it off the the key handler
 }
}
/* Function: HandleKeyPress. Key handler. If the user is 6D input device 
impaired, the provide an alternate user input path to access the functionality

of both the on-screen buttons, and the 6D input device buttons. */ 
void HandleKeyPress(int key)
{
 // interpret keypresses, and if we recognize one, process it
 switch ( key )
 {
 case 'b': // Toggle the button interface
 ButtonsToggle();
 break;
 case 'f': // Flip mouse move <-> mousepick
 movemouse ^=1;
 /* switch between using mouse to move and using it to point */
 if (movemouse)
 {
 Impart("Use mouse to move around world");
 WTsensor_setupdatefn (mouse, WTmouse_moveview2);
 }
 else
 {
 Impart("Use mouse to select objects");
 WTsensor_setupdatefn (mouse, WTmouse_drawcursor);
 }
 break;
 case 'i': // Display status information
 InfoTask();
 break;
 case 'q': // Quit
 QuitTask();
 break;
 case '!': // '!' resets view back to initial view
 ResetViewpointTask();
 break;
 // special resolution modification keys that are driver dependent
#if DVI SPEA
 case '2':
 printf("Set LOW resolution\n");
 WTuniverse_setresolution(WTRESOLUTION_LOW);
 break;
#if SPEA
 case '3':
 printf("Set Medium resolution\n");
 WTuniverse_setresolution(WTRESOLUTION_MEDIUMRGB);

 break;
#elif DVI
 case '4':
 printf("Set Adaptive resolution\n");
 WTuniverse_setresolution(WTRESOLUTION_ADAPTIVE);
 break;
#endif
 case '5':
 printf("Set HIGH resolution\n");
 WTuniverse_setresolution(WTRESOLUTION_HIGH);
 break;
#endif /* DVI SPEA */
 default: // unrecognized key press? - then dislay help text
 DisplayHelpTask();
 printf("\nEnter command..\n");
 break;
 }// end o' switch
}
/* Function: ScanCommandLineArgs. Process command line args to see if user 
specified an alternate input device. We accept -[GS][12] to specify device
and the serial port it's connected to. */
void ScanCommandLineArgs(int argc, char *argv[] )
{
 while (--argc > 0)
 {
 if ('-' == (*++argv)[0])
 {
 switch ((*argv)[1])
 {
 case 'g': // GeoBall
 case 'G':
 use_geoball = TRUE;
 geoball_on = (*argv)[2] - '1'; /* Convert from ASCII */
 break;
 case 's': // SpaceBall
 case 'S':
 use_spaceball = TRUE;
 spaceball_on = (*argv)[2] - '1';
 break;
 default:
 WTerror("Unrecognized argument -%c",(*argv)[1]);
 } // switch
 } // if
 } // while
}
/* Function: SavePolyColors. Saves all of the colors of all the polygons in
the
static universe. It demonstrates use of the WTuniverse_getpolys(), and
WTpoly_next() functions, which enable you to visit all of the polygons
in the universe. The value of each poly color is saved in an array. */
void SavePolyColors( void )
{
 unsigned * c;
 for ( poly = WTuniverse_getpolys(), c = poly_array_pointer ;
 poly != NULL ;
 poly = WTpoly_next(poly)
 )
 {
 *c++ = WTpoly_getcolor(poly);
 }

}
/* Function: RestorePolyColors. Replaces the original polygon color. Similar
to
SavePolyColors(), it demonstrates how you can use the polygon order to 
know which polygon you're operating one. */
void RestorePolyColors( void )
{
 unsigned * c;
 if ( NULL == poly_array_pointer )
 return;
 for ( poly = WTuniverse_getpolys(), c = poly_array_pointer ;
 poly != NULL ;
 poly = WTpoly_next(poly)
 )
 {
 WTpoly_setcolor(poly,*c++);
 }
}
/* Function: LoopThroughPolys. Demonstrates how to call a function to operate 
on each polygon in the static universe. In this case, function to call takes
the center of gravity of each polygon. This is used to calculate the
pollution concentration of the whole polygon and then set the color. */
void LoopThroughPolys( void *(func)(WTp3) )
{
 WTp3 c_g; // center of gravity
 for ( poly = WTuniverse_getpolys() ;
 poly != NULL ;
 poly = WTpoly_next(poly)
 )
 {
 WTpoly_getcg(poly,c_g); // get the center of the polygon
 (*func)(c_g); // call da function
 }
}
/* Function: DisplayHelpTask. Display how to use the various UIs */
void DisplayHelpTask( void )
{
 printf("\n-----------------------------------------------\n");
 printf("Right mouse button to toggle object selection\n");
 printf("Left mouse button to select polygon\n");
 printf("'b' Toggle user interface buttons\n");
 printf("'f' Flip between mouse move and mouse pick\n");
 printf("'i' status Information\n");
 printf("'q' Quit immediately\n");
 printf("'!' reset view back to initial position\n");
 printf("To change: 'w' Convergence 's' Sensor sensitivity\n");
#if DVI
 printf("Resolution: 2 - LOW 4 - Adaptive 5 - HIGH\n");
#elif SPEA
 printf("Resolution: 2 - LOW 3 - Medium 5 - HIGH\n");
#endif
 if ( use_spaceball )
 {
 printf("Spaceball commands...\n");
 printf(" Button 1 - toggle the 3D viewport buttons\n");
 printf(" Button 7 - reset view back to initial position\n");
 printf(" Button 8 - Quit immediately\n");
 }
 if ( use_geoball )
 {

 printf("Geoball commands...\n");
 printf(" Both Buttons - Quit immediately\n");
 }
 printf("\n-----------------------------------------------\n");
}
#define DISTANCE(x1,y1,x2,y2) (sqrt( (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) ))
#define INSIDE_DISTANCE(x1,y1,x2,y2,d) ( DISTANCE(x1,y1,x2,y2) < d )

/* Function: ComputeAsh1ConcentrationAtLocation. Pass in the center of gravity
of a polygon, then use sum up the contributions from all overlapping circles 
used to distribute the chemical over the area. Rather than using a graduated
scale of chemical data, this is quick since it results in a boolean result. */
void ComputeAsh1ConcentrationAtLocation(WTp3 c_g)
{
 float x = c_g[X], z = c_g[Z];
 int count;
 // Ash 1 is at pos[X] = -53.0; pos[Z] = -39.0
 for ( count= 0 ; count < sizeof( Ash1 )/sizeof( Ash1[0] ) ; ++count )
 {
 if ( INSIDE_DISTANCE(x,z,Ash1[count].x,Ash1[count].z,Ash1[count].d) )
 {
 WTpoly_setcolor(poly,0xF0F);
 return;
 }
 }
} 
/* Function: ImpartAsh1ConcentrationAtLocation. Pass in the center of gravity 
of a polygon, then calculate if there is any ash 1 found at that location.
This
is a simplified version that simply returns a found/not-found. */
void ImpartAsh1ConcentrationAtLocation(WTp3 c_g)
{
 float x = c_g[X], z = c_g[Z];
 int count;
 char text[50];
 sprintf(text,"Location %+4.f,%+4.f ",c_g[X],c_g[Z]);
 Impart(text);
 // Ash 1 is at pos[X] = -53.0; pos[Z] = -39.0
 for ( count= 0 ; count < sizeof( Ash1 )/sizeof( Ash1[0] ) ; ++count )
 {
 if ( INSIDE_DISTANCE(x,z,Ash1[count].x,Ash1[count].z,Ash1[count].d) )
 {
 Impart("Ash 1 found");
 return;
 }
 }
 Impart("None found");
} 
/* Function: Ash1Task. Loops through polygons in the terrain model, computing 
ash 1 concentrations and setting polygon colors, or resetting polygon
colors.*/
void Ash1Task( void )
{
 static FLAG ash1_on = FALSE;
 ash1_on = !ash1_on;
 if ( ash1_on ) 
 {
 LoopThroughPolys(ComputeAsh1ConcentrationAtLocation);
 }
 else
 {

 RestorePolyColors();
 }
} 
void QuitTask( void )
{
 Impart("Quitting");
 WTuniverse_stop();
}
void ResetViewpointTask( void )
{
 Impart("Resetting Viewpoint");
 WTviewpoint_moveto(uview, &initial_pq);
}
void InfoTask( void )
{
 WTp3 p;
 WTq q;
 printf("Polygons: %6d, Frame rate: %8.2f fps\n",
 WTuniverse_npolygons(), WTuniverse_framerate());
 WTviewpoint_getposition(uview, p);
 printf("Viewpoint: x=%8.3f, y=%8.3f, z=%8.3f\n", p[X], p[Y], p[Z]);
 WTviewpoint_getorientation(uview, q);
 printf("Orientation: qx=%8.4f, qy=%8.4f, qz=%8.4f, qw=%8.4f\n",
 q[X], q[Y], q[Z], q[W]);
}
/* Function: VoiceAnnotateTask. Voice annotation feature is only connection 
to Windows, and there just because we need to start a DDE conversation with
the
external voice server, called Monologue, by First Byte. Note phonetic
spelling,
which and dramitically clarify what you want it to say. */
void VoiceAnnotateTask( void )
{
 static FLAG control_on = FALSE;
 control_on = !control_on;
 if ( NULL == hConvTalk ) // Have we initiated the DDE conversation yet?
 {
 // No, so do it.
 InitiateDDEConversation( ); // This will set hConvTalk if successful
 }
 if ( NULL == hConvTalk ) // Did we fail?
 {
 control_on = FALSE;
 Talking = FALSE;
 MessageBox (NULL, "Cannot connect with Monologue!",
 "VR Demo", MB_ICONEXCLAMATION MB_OK) ;
 return;
 }
 if ( TRUE == control_on )
 {
 // Tell user that Voice Annotation is activated
 Talking = TRUE;
 Say("<<~V4S7>>"); // reset volume & speed
 Say("<<~'AEkt-IXv-EY-IX-ted>>"); //Say "activated" phonetically
 }
 else // turn off control
 {
 // Tell user that Voice Annotation is deactivated
 Say("<<~d-IY'AEkt-IXv-EY-IX-ted>>"); //Say "deactivated" phonetically
 Talking = FALSE;
 }

}
/* Function: Impart. If we're talking, then say the text, else print it out.
*/
void Impart(char * text)
{
 if ( !Talking )
 {
 printf(text);
 printf("\n");
 }
 else
 {
 Say( text );
 }
}
/* Function: Say. If voice annotation is running, then do Windows DDE thing, 
and pass the string off the the voice server. */
void Say(char * text)
{
 HSZ hszItem;
 if ( !Talking )
 {
 return;
 }
 hszItem = DdeCreateStringHandle (idInst,text, 0) ;
 DdeClientTransaction (
 text, strlen(text)+1, hConvTalk, hszItem, 
 CF_TEXT, XTYP_POKE, 1500000L, NULL) ;
 DdeFreeStringHandle (idInst, hszItem) ;
} 
/* Function: InitiateDDEConversation. Just what it says. */
void InitiateDDEConversation( void )
{
 HSZ hszService, hszTopic, hszItem ;
 // Initialize for using DDEML
 if ( DMLERR_NO_ERROR != 
 DdeInitialize( &idInst,
 (PFNCALLBACK) MakeProcInstance ((FARPROC) DdeCallback, hInst),
 APPCLASS_STANDARD APPCMD_CLIENTONLY, 0L))
 {
 MessageBox (NULL, "Could not initiate DDE conversation!",
 "VR Demo", MB_ICONEXCLAMATION MB_OK) ;
 }
 else
 {
 printf("DDE connect with voice annotation\n");
 }
 // Try connecting to MONOLOG.EXE
 hszService = DdeCreateStringHandle (idInst, "MONOLOG", CP_WINANSI) ;
 hszTopic = DdeCreateStringHandle (idInst, "TALK", CP_WINANSI) ;
 hConvTalk = DdeConnect (idInst, hszService, hszTopic, NULL) ;
 // Free the string handles
 DdeFreeStringHandle (idInst, hszService) ;
 DdeFreeStringHandle (idInst, hszTopic) ;
}
/* Function: DdeCallback. Just to make Windows happy... */
HDDEDATA FAR PASCAL DdeCallback (UINT iType, UINT iFmt, HCONV hConv,
 HSZ hsz1, HSZ hsz2, HDDEDATA hData,
 DWORD dwData1, DWORD dwData2)
{

 return NULL ; // we don't need to do anything...(yet)
}
/* Function: ModifyLightingTask. Add/remove transient buttons objects. */
void ModifyLightingTask( void )
{
 static FLAG control_on = FALSE;
 control_on = !control_on; /* toggle the flag */
 /* control turned off; get rid of buttons */
 if ( !control_on )
 {
 WTobject_remove(UpButton);
 WTobject_remove(DownButton);
 }
 else
 {
 WTobject_add(UpButton);
 WTobject_add(DownButton);
 }
}
/* Function: DimLightingTask. Modify the ambient lighting. */
void DimLightTask( void )
{
 short bgcolor = WTuniverse_getbgcolor();
 if ( --bgcolor<0 )
 {
 bgcolor = 0;
 }
 WTlight_setambient(0.9*WTlight_getambient());
 WTuniverse_setbgcolor(bgcolor);
}
/* Function: DimLightingTask. Modify the ambient lighting. */
void BrightenLightTask( void )
{
 short bgcolor = WTuniverse_getbgcolor();
 if ( bgcolor<15 )
 {
 bgcolor++;
 }
 if ( bgcolor>15 )
 {
 bgcolor = 15;
 }
 WTlight_setambient(1.11*WTlight_getambient());
 WTuniverse_setbgcolor(bgcolor);
}


















PROGRAMMING PARADIGMS


The Programmer Paradigm




Michael Swaine


This month we commemorate that fateful day just 20 years ago when Harvard
freshman Bill Gates walked into the historic Aitken Computation Laboratory to
take his first college course in programming. Years later, the director of the
lab would remember Bill this way:
He was a hell of a good programmer. In terms of being a pain in the ass, he
was second in my whole career here. He's an obnoxious human being_. He'd put
people down when it was not necessary, and just generally not be a pleasant
fellow to have around the place. (Thomas Cheatham, in Gates, by Stephen Manes
and Paul Andrews, Doubleday, 1994.)
On this anniversary of Bill Gates's official entry into programmerhood, it
seems appropriate to reflect on what it means to program. And on what kind of
person you have to be to be good at it.
I'm going to make two assumptions about you, which I'll confess right here:
First, I assume that you are a programmer. Unless your reading habits run to
the masochistic, I believe I am on safe ground here; second, I assume that you
are now, or have been at some time, involved in some way in the design or
development of a programming tool. This is a riskier assumption, but the odds
on it are good. The phenomenon of tool user as tool maker is not at all
unusual in the profession of programming.
Well, it may not be unusual in programming, but it does make programming
unusual among professions. Most workers do not create their own tools and work
environments; carpenters don't make saws, doctors don't make X-ray machines,
and bus drivers don't make buses. For that, they are at the mercy of other
professions or trades. But programmers do make programming tools.
This distinction is, I suggest, crucially important, because it makes
programming uniquely capable of self-definition. Or self-redefinition. The
ability to change the tools and environment of programming is the ability to
change fundamentally the nature of the enterprise. Programmers can redefine
what it means to be a programmer. Programming has, as it were, the power to
rewrite its own genetic code.
Surely that is why programming has changed so radically over the 20 years
since Bill Gates enrolled at Harvard. Back then, the typical edit-and-test
cycle involved wrestling with a keypunch machine; assembling your deck of
punched cards in a box, rubber-banding it, and handing it to an operator; and
waiting in front of a wall of bins for your deck and printout to come back.
We've come a long way, and we've done it by tugging on our own bootstraps.
Well, you have. By your bootstraps. Technically, my profession is writing. And
as a writer, I now inform you that we need a note of dramatic tension here.
Here it is: This ability to lift ourselves (okay, yourselves) by the old
bootstraps may disappear one day.


The Threat of Specialization


If that happens, the villain would be specialization.
Of course, programming already encompasses specialties. Corporate-database
programming and commercial-application development and embedded-systems design
all have their own goals and methods and views of the world. These are
examples of horizontal specialization, but there is also vertical
specialization. The most complex applications today could not be built with
the lowest-level programming tools. We are already at the point where higher
strata of programming use tools developed at lower strata.
It's not hard to imagine this vertical specialization increasing to the point
where computer-science and engineering students would train for a specific
stratum of software development, each stratum having its own courses of study
and its own paradigms, methods, vocabulary, and tools. In such a scenario,
tools would be black boxes supplied by incomprehensible wizards working in
what would effectively be a different discipline; moving from one stratum to
another would be about as easy and as likely as a physicist going back to
school to study biochemistry.
Actually, you might consider this scenario highly desirable; but it does have
the drawback of making the various specialties of programming as dependent on
other specialties for their tools as lawyers and dentists are today.
Well, it's just a scenario. We're not there yet. We may never get there. I've
heard it argued that this is pretty much the goal of object-oriented
programming, in which case it's probably not going to happen any time soon. In
any case, we still can define our own tools, and we should appreciate this
ability while we've got it. We should be ready to challenge our basic
assumptions.
It's not easy. Programmers are as susceptible as anyone else to the blinders
of the task at hand. The unconscious assumption is that the job is really all
about fixing the problems that we've created for ourselves; that the way to
the other side of this wall must surely be through the wall.
All professions are susceptible to these blinders, but it matters more in
programming because the potential to advance the state of the programming art
is so much greater than, say, that of the bricklaying art.
I am writing myself into a corner here. Where I'm heading, obviously, is to a
stirring exhortation to question all your assumptions, to look with an
innocent eye at what you do and how you do it.
And I do make that exhortation. It's important to question the basic
assumptions about what programming is.
Having made that exhortation, though, where do I go from here? After telling
an audience to question assumptions and think for themselves, the only
sensible thing for the speaker to do is shut up. But I don't think Jon will
let me cut off the column here.
The only alternative is to relate an amusing anecdote.


An Amusing Anecdote


You may have heard the story of Steve Dompier and the Altair's first recital.
It has been variously related.
It was in the spring of 1975, a year after Bill Gates began his formal
academic study of programming. By this time Bill was calling himself the
"President of Micro-soft" and was claiming to have an implementation of Basic
for the Altair. This Altair was a computer, or so said its manufacturer, a
hobby electronics company in Albuquerque, New Mexico named "MITS." There was a
mystery or a miracle surrounding this Altair, because it included an 8080
microprocessor and, according to an article in the January Popular
Electronics, sold for roughly Intel's quantity-one price for the 8080: $397.
Clearly, these guys must have got a very good deal on the chips.
Steve Dompier was living in Berkeley, California at the time. What he was
doing doesn't matter, since the minute he read the Popular Electronics
article, he became a man obsessed.
Dompier sent off a check for $397 and waited. He did not wait patiently. When
his Altair didn't arrive soon enough to satisfy him, he bought an airplane
ticket and flew to Albuquerque to pick it up. The folks at MITS were surprised
to see him. He didn't get his computer then, though it arrived in the mail
shortly thereafter. Or more correctly, a box of parts arrived. "I received my
Altair 8800 in the mail at 10 a.m.," Dompier said, "and 30 hours later it was
up and running with only one bug in the memory!"
He then faced a problem: what to do with the thing. For I/O, all it had were
toggle switches and blinking lights. There was no software (Micro-soft hadn't
delivered yet).
There are people who will buy a tool without knowing what they are going to do
with it. Who will fly 2000 miles to pick up a $397 toy. And there are people
who won't. Dompier was the first kind.
In April, Dompier showed up at the Peninsula School in Menlo Park, where the
Homebrew Computer Club held its meetings. The club hadn't had many meetings:
Dompier's was the second Altair that any of them had seen. And none of them
had seen anything else you could call a homebrew computer. Turnout for the
meeting was terrific. Dompier was going to demonstrate his Altair.
It took some time to set up the machine, and then Dompier started programming
it. The audience waited as he flipped toggle switches to enter his program
into RAM. All went well, until halfway through the process, someone tripped
over the extension cord. The blinking lights went out, the Altair went dead,
and Dompier sighed and started over. He had set a portable radio on the table,
but didn't think to tune it to a station to entertain the crowd while he
flipped switches.
Finally he finished loading and ran the code. Immediately, the radio, sitting
next to the Altair, began to buzz with static. The unshielded Altair was
putting out so much RFI that the radio was buzzing in time with the loops in
Dompier's program. It buzzed Lennon and McCartney's "Fool on the Hill."
According to legend, when it finished that it buzzed "Daisy," also known as
"Bicycle Built for Two," as an encore.
The crowd went wild. Dompier had written a set of empty loops whose only
purpose was to play music in the static the Altair generated on a portable
radio left within RFI range.
All right, you may have heard this story. But you probably haven't heard the
moral. Here it is.
When he got his Altair home and put it together, Dompier confronted an odd
problem: the absence of a problem. He couldn't think of anything useful for
his Altair to do. And he dealt with this problem in the most direct way
imaginable: He wrote a program that did nothing. His program was nothing but a
set of empty loops, doing quite literally nothing.
It was all side effect.
Now that is an interesting programming paradigm.
Is it a useful one? Ask Alain Colmerauer. He invented a programming tool, the
language Prolog, that is all side effect, a language in which you can inform
the computer that two plus two equals four but you can't tell it to add.
As I said, an interesting paradigm.

I do not say that Colmerauer knew about the Altair's first recital when he
invented Prolog, particularly since he invented it in 1972, three years before
Dompier got his Altair.
I do say that the story shows that you can sometimes accomplish things you
might have thought were impossible by throwing out some of your most cherished
assumptions--such as the assumption that a program must do something. Or do
something useful. Science-fiction writer Algis Budrys points out that the
purpose of feet is to walk and run and the purpose of painting is to make
signs. Ballet and watercolors are obviously useless.
All art springs from the nonobvious use of tools.
The story also gives us one data point regarding what sort of person you have
to be to be successful in software: obsessive, imaginative, and impractical.
Maybe we should get another data point. And have another anecdote.


The First Programmer


The claim can be made that programmers have been around longer than computers.
Exactly the opposite claim can be made, too, but the claim that programmers
have been around longer than computers is more interesting than its opposite.
I want to make this claim, and I will defend it, but first I need to tell you
about Byron's daughter.
Augusta Ada Byron (1815--1852) was the daughter of a romantic poet and a
(woman) mathematician. All of her short life she worried about resolving the
poetic and mathematical sides of her nature. Then there were the dark rumors
about her father's sexual proclivities, a certain coolness in the way of her
mother's love, her frail health. She doesn't seem to have put a lot of effort
into being a mother or a wife. She did have a remarkable social circle,
including Charles Darwin, Augustus De Morgan, Charles Dickens, Michael
Faraday, and Charles Wheatstone.
And, of course, Charles Babbage.
Babbage was designing a computer. With the hindsight of history we know that
that is exactly what he was doing and that he had it right. At the time,
though, it was hard for him to get across just what it was that he was doing.
It was not that his contemporaries failed to grasp the idea of a machine to
perform mental operations; calculators and other devices were the intellectual
vogue. Babbage's contemporaries, including those close to him, probably
thought they got it. That was his problem. A computer is not just a
calculator, but that's a hard thing to explain to novices even today. And in
Babbage's time, everyone was a novice.
Babbage had got the idea across to a few people; one was an Italian engineer
named L.F. Menabrea, who had written a paper explaining the device, publishing
it in French in a Swiss journal in 1842. This didn't help Babbage, who was
trying to impress potential capital investors, all English.
Another person who got it was Ada. She not only understood it, she translated
Menabrea's paper, adding greatly to it in the process.
And she wrote programs for the Analytical Engine. Babbage and Ada agreed that,
to get across what the Analytical Engine could do, they had to supply some
examples of its operation. Sample programs, we would say today. Ada wrote
them: Chiefly, they calculated tables of numbers, as would the first real
computers, built nearly a century later.
So the argument that Ada was the first programmer is simple: She was a
programmer, and no computer ever existed during her lifetime. Babbage never
built his Analytical Engine.
Ada programmed to the spec, of course. She hand-coded on paper. This is
nothing; we've all done it. Hand-compiling is often recommended as a good
discipline for programmers. But Ada was doing something harder than that.
Put yourself in her place. Ada had never seen a computer. Nobody had ever seen
such a thing. She had, however, imagined one. That was the machine she
programmed.
What an intellectual challenge! To envision what a computer would be like, and
to write programs for it! Of course, that could never be done today; now that
computers exist, such a feat of disciplined imagination could never be
undertaken. Or could it?
Okay, okay, so what is the moral here? That we can break new ground by
unimagining the computer? Or that programming success comes from brilliant,
imaginative, obsessive social misfits?
Sigh.
Our two (or, counting Bill, three) data points seem to suggest some things
that we might not want to believe regarding the kind of person who becomes a
successful programmer. Let's try one more trip to the well of history.
John Backus headed up the team that developed Fortran and is the "B" in "BNF."
One of the giants, in other words. According to Backus,
Programming in_the 1950s had a vital frontier enthusiasm virtually untainted
by either the scholarship or the stuffiness of academia_.
Recognition in the small programming fraternity was more likely to be accorded
for a colorful personality_or the ability to hold a lot of liquor well than it
was for an intellectual insight. Ideas flowed freely along with the liquor_.
An idea was the property of anyone who could use it_.
Aha! You can almost see the tattoos, the eye patches, the glint of the gold
tooth, can't you? Yes, we need this alternative paradigm: the programmer as
antigeek.



































C PROGRAMMING


Of Text Seeks and Perplexed Geeks




Al Stevens


In his November 1994 "Swaine's Flames," Michael Swaine said these words:
"Magician is the second geekiest entertainment profession, right behind
ventriloquist. If you think back to high-school talent shows, you'll see that
I'm right."
Well. My first reaction was to go on the defensive. In high school, I was a
ventriloquist and a magician and demonstrated both skills in talent shows. I
did not know then that I was a geek. My eyes have been opened.
Michael's revelation gave me pause and cause to contemplate. Where did it all
start? Perhaps in early childhood in the small Baltimore suburb where I was
born. An older boy named Noel Stuckey would gather the neighborhood children
at his house and put on shows where he performed magic tricks, used puppets
and ventriloquism, and played the guitar and sang. A remarkable and talented
boy, Noel achieved fame as the tall guy in the folk group Peter, Paul and
Mary. Until reading Michael's column, I did not know that Noel Stuckey was a
geek and that through his influence I was destined to be one too.
There must be other geeks, in and out of the closet. I went in search of geeks
that everyone knows about, so that I might find kindred empathy and learn to
live with this stigma. Where to look? Johnny Carson's hobbies include magic
and ventriloquism. He also flies airplanes and is a competent jazz drummer.
Everyone knows what a geek Johnny Carson is. Edgar Bergen was a respected
humorist, famous radio and movie star, sired Murphy Brown, and was
undisputedly the world's most famous geek, er, ventriloquist. Paul Winchell
dropped out of show business as a popular ventriloquist to bring innovation to
medical cybernetics. He invented a mechanical heart that saved thousands of
lives. Who but a geek would do that? Anthony Hopkins, geek-in-hiding, learned
ventriloquism to play the demented protagonist in Magic. Not only did he enjoy
a romp with Ann-Margaret in that role, but went on to win an Oscar as
geekmeister-apparent Hannibal the Cannibal in the movie version of Silence of
the Lambs. Anyone who has witnessed David Copperfield's sensuous levitation of
a gorgeous, willowy assistant can detect latent geekist tendencies straining
to get out from under all that macho magic.
Finding geeks is easy, as it turns out, but how can you tell when someone is
definitely not a geek--who is the Anti-Geek, the standard bearer, the example
for the rest of us who would escape geekism? Who, indeed? Turn now to the last
page in this magazine and look at Michael Swaine, whose total absence of
geekism warrants a full-color portrait next to his column. Observe those
steely eyes, the confident, self-assured gaze, the powerful, in-charge bearing
and countenance. Look at the firm set to that jaw. (Use your imagination, it's
in there somewhere.) See the Fitzgeraldian total repose, the casual gesture of
the left hand. Somehow you know that the right hand, the one you can't see,
sports a macho tattoo and grips a Marlboro between the first two fingers. This
picture could be of a seasoned 747 captain, a rugged outdoorsman, a captain of
industry, or even a HyperCard programmer. The antithesis of geekdom. The one
who gets to say who the geeks are.
We are not worthy.


Static Text Databases


In last month's column I introduced a static text-search engine that will be
the subject of the next several columns. This month describes the system's
purpose, analysis, and architecture. As with all "C Programming" column
projects, the source code and, in this case, the database are available to all
readers.


Purpose


The purpose of the database engine is to permit online retrieval of small
documents within a large, static text database. The retrievals should support
simple navigation and complex Boolean queries. The navigation should allow
retrievals based on a user-specified document identification as well as
forward and backward scans of the database, one document at a time.
These are common requirements. Many organizations need to archive their
textual resources and provide orderly and structured access to them. The
purpose of this project, then, is to provide a foundation from which
programmers can launch custom text-retrieval systems where the feature set or
royalty policies of commercial packages are restrictive.
The engine in this project supports a front end that provides the user
interface. The example application uses a DOS text-mode user interface to test
the engine and a Visual Basic UI to implement the application. The engine
itself is written in C and implemented both as a stand-alone DOS application
for testing and a Windows DLL for the running application.


Analysis


To build a text database, you first perform an analysis to determine the best
way to identify, store, index, and retrieve documents. You must develop the
database organization as a function of the organization of the text itself.
This first step will drive the details of the text engine as well as the
user-interface front end. Furthermore, the results of this analysis will
differ from project to project, and the differences will influence the
architectures of the database and engine software.
Here are some of the questions you must answer: What is the hierarchical
organization, if any, of the text database? What is the smallest identifiable
entity of text? The largest? The levels between? Magazine archives are one
example, and their answers are easy. The database hierarchy for a collection
of periodicals could include: publication, issue year, issue month, article,
paragraph. The Bible, the text database that this project uses as an example,
is organized into testaments, books, chapters, and verses. Each individual
text collection implies its own organization, usually derived from the
volumes, chapters, headings, and so on of the source documents.
What is a document? In this project, a "document" is defined as the smallest
identifiable text entity in the database--for this example, a verse. This
works well in a database organization because verses in the Bible are
numbered, and these numbers are the standard method of reference for quotes,
stories, and so on. Furthermore, the largest verse is less than 600 characters
long, representing a manageable document size for retrievals, searches, and
display. Unlike most other text collections, there are no chapter and
paragraph headings. The books have names--Genesis, Exodus, and so on--but
nothing below that level has identifying text. Chapters are numbered within
books, and verses are numbered within chapters. This organization eliminates
tables of contents other than for a list of the books within the two
testaments. The example project, therefore, has no support for tables of
contents. Other text applications will need them. If you use this project to
launch your own, you will need to augment the software accordingly.
The organization of the text determines how it is stored and how the retrieval
tables work. At the lowest level, the documents are retrieved by document
numbers, which are an ascending series starting at document #1 for the first
logical document in the database and proceeding serially through to the
highest document number. The example application uses document numbers 1
through 31102. Note that these document numbers are static. You determine what
they will be in the analysis phase of the project. Inasmuch as the engine
supports a static text database, the numbers never change once you have
assigned them.
The user-interface front end does not deal with these document numbers. Users
specify documents using the application-specific external identifying
nomenclature. The text engine must be able to convert between its internal
document number and the user's language, which, in the example, consists of
book name and chapter and verse numbers (John 3:16 is document number 26137,
for example). Initial analysis must determine how to do this run-time
conversion. The example application uses a static table that records the
number of books in each testament, the number of chapters in each book, and
the number of verses in each chapter. I built the table by writing a C program
that scans the text and counts verses and chapters within books. The program
emits the table in C source code that the application includes. I'll describe
the format of the table in a subsequent column.
The conversion must go in both directions. If the user asks for a particular
document by its user nomenclature, the engine converts the specification to a
document number. If the user steps or pages through the database a document at
a time or retrieves a document with a query, the engine must convert the
retrieved document numbers to their correct user nomenclature.


Indexes


A text engine must support retrievals. The nature of the application will
determine what kinds of retrievals you must support. Typical retrievals use
ranges of publication dates, author, subject matter, keywords, and Boolean
word and phrase queries to find documents in the database. Each of these
retrieval techniques implies an indexing function that, given a retrieval
specification, returns a list of document numbers.
The example application uses only the phrase and word-search queries. The
others are not applicable to either the text or the application or not
possible given the size of the database and the absence of applicable
scholarship and time available among the project team. Authors are either
implied by the book names or unknown. The original dates of publication can
only be approximated. Indexes of keywords and subject matter involve an
extensive analysis of the subject matter with manual decisions being made
about what to include. Other text databases will require that these analyses
be performed. First you determine that the requirement exists. Next you
develop a technique for recording the index. Then you employ specialists in
the subject matter to extract and record the indexing keywords.


Word Index


The word and phrase queries use an index of text words. The index returns a
list of document numbers where the argument word appears. An early part of the
text analysis builds a word-frequency table. This table does not become a part
of the database or its index. Its purpose is to identify the words that will
not be included in the index. Some words appear so often that to include them
in the index would generate extremely long document lists and result in an
unnecessarily long index file. You want to eliminate words that have the
highest frequency--this involves a word-frequency list, which requires some
custom programs. The first program simply reads all the database text,
extracts each word, and writes a file of word records, where each record
contains a word. This program is specific to the application because it
depends on the format of the source-text data. Assuming a common output format
from the first program, the others can use common code across projects.
The second program sorts the word file into word sequence using a
case-insensitive sort. The third program reads the sorted word file, counts
each word, and writes a file with one record for each unique word. The record
contains the word and its frequency count. The fourth program sorts the
word-count file by the frequency count. Now you can look at the sorted output
from that program and see which words occur with the greatest frequency. You
can decide, based on the distribution of words, how far down into the
frequency list to go when selecting words to omit from the index. The omitted
words go into a table that the database-building program uses to decide which
words to include in the index and the retrieval program uses to decide which
words to omit from a search.
What about those omitted words? How does the system find documents that
include the high-frequency words? Quite simply, it assumes that those words
appear in every document in the database. For example, a query that searches
for the phrase "this is it" will return an initial list consisting of every
document in the database because those three words are probably in every
English language high-frequency list. To work effectively, a query must
contain other criteria with which to narrow the search. The system scans the
documents selected by those other criteria to see if the high-frequency words
also appear before adding the documents to the list of retrieved documents.



Boolean Word and Phrase Queries


Boolean word queries use the Boolean operators AND, OR, and NOT to construct
query expressions based on the presence and/or absence of specified words. A
phrase search looks for the combination of words in the phrase in the order in
which they appear in the expression.
A query returns a list of documents that match the expression. The user can
use this list to view the selected documents. The Boolean query expression
"Jesus AND Mary" returns 11 verses. The expression "Jesus OR Mary" returns 977
verses. The UI should use a threshold of hits to determine whether or not to
continue. In the first case, it might go ahead and display the list. But since
building the list involves a certain amount of overhead for each entry, the
user interface could permit the user to decide whether to proceed. Some
application-specific number of hits is the built-in threshold that determines
whether to involve the user in the decision.


Two-Pass Searches


Because the index assumes a hit on all high-frequency words and because the
engine supports phrase queries, each query potentially involves two search
passes. The first pass develops a list of potential hit documents. The second
pass examines the text of those selected documents to see if they really do
match the search criteria. The first pass of a phrase search constructs an
internal Boolean query expression with every word in the phrase joined with
the AND operator. That search returns all the documents that contain all the
words in the phrase. The second search examines those documents with a serial,
case-insensitive text scan to see if the actual phrase exists in the document.
The scan ignores excess white space and punctuation in the expression and in
the documents to develop the match.


Hypertext Links


Your project might include hypertext links. Most text-search projects do. A
hypertext link highlights specific words and phrases in the displayed text and
associates the highlighted data with another document in the database. Every
reference to Stan Musial, for example, would be linked to his biography.
Building the software is not difficult. The database includes embedded tokens
that identify a hypertext phrase, which displays in a highlight, and a hidden
document number, which is the hypertext link. The front end of the application
provides a way for the user to select a link and jump to the linked document.
That's the easy part. Setting up the links, however, involves an extensive
manual effort by your applications specialists, who read all the text,
identify the link phrases and the linked documents, and insert the link tokens
into the text. The example application in this project does not support
hypertext links, primarily because of the effort involved to establish the
links. The implementation would be done primarily in the front end, which
would manage the display of the link and the retrieval from the engine of the
associated document. The database builder would need to recognize the link's
semantics when building the index. The text engine itself does not need to
know that it is supporting hypertext links.


Engine and Database Architecture


The text engine is a C program that can be compiled as a stand-alone DOS
program or as a Windows DLL. In DOS mode, it includes a simulated user
interface that exercises all of its retrieval functions. As a DLL, it provides
entry functions that do the same thing.
The database consists of one file that is a concatenation of five files
developed during the database-build task. The first three files implement the
index, which associates words with document numbers. A fourth file is used to
develop the position in the text of the start of each document. The last
logical file is the text itself, compressed with a Huffman table. Figure 1
shows the relationship of the five files.
The index is implemented as a sequential list of words. The words are, of
course, of variable length, so the first part of the index is a table of
fixed-length offsets into the word list. The search uses the offsets to
perform a binary search of the words to see if the word is in the database.
The word list is followed by a file of document lists with one list for each
word in the index. Each word in the word list points to a document list. Each
document list contains a list of document numbers that contain the word.
Following the index is a table of fixed-length offsets into the text part of
the database. Each offset represents a document number and contains a byte/bit
offset value into the text.
The text itself uses Huffman compression, which is why the document offsets
include byte and bit values. The lengths of the five files are a function of
several application-specific variables. The number of unique indexed words,
the total character length of those words, the word distribution among
documents, and the number of documents all contribute to the lengths of the
files. The database-build task develops these files individually, and their
lengths are determined by viewing them in a directory listing. The actual
lengths are built into the text engine with macros at compile time. The files
are concatenated with the DOS COPY command. 


Next Month...


Having analyzed the text and developed the format for the database and the
index, the project will now focus on building the database from the raw text.
That effort involved more custom software, which will be the subject of next
month's column.
By the way, it is not my intention that the "C Programming" column be
perceived as showing a reverent bias. Many people distrust discussions that
focus on religious subjects when the particular forum is unrelated to matters
of religion. Each of us has private and personal feelings about such things,
knows that efforts by others to modify those feelings are usually without
effect, and would prefer to be spared the trouble. A prominent contemporary PC
C/C++ compiler in the past included spiritual messages scattered frequently
throughout the documentation. I'll leave the assessment of their greater
relevance to others, but the small, well-intentioned testaments added nothing
to and in fact distracted from my understanding of the compiler. (The vendor
has since dropped the practice.) I don't want the subject of this project to
similarly deter anyone's continued patronage of this column. In short, you'll
find no preaching here.
Figure 1 The text-engine database format.


























ALGORITHM ALLEY


The GOST Encryption Algorithm




Bruce Schneier


As a cryptographer, I sometimes get to see some really exciting technology.
Last year, for example, I was introduced to "GOST," my first encryption
algorithm from the former Soviet Union. The algorithm, whose formal name is
"Cryptographic Transformation Algorithm--GOST 28147-89," was published in 1989
by the National Soviet Bureau of Standards (also known as "GOST"). The
algorithm was not classified, which implies it was for protection of civilian,
not military, data. Don't ask me how it leaked out of the country.
In this month's column, I'll describe GOST and its relation to DES, and
examine its security. GOST is still new to Western cryptographers, and
more-detailed information about its security is sure to emerge as more people
get the chance to study it.


A Description of GOST


GOST is a secret-key algorithm, similar in construction to DES. I believe it
was invented by Soviet cryptographers who had examined DES. Its publication in
1989 as a Soviet standard occurred well after DES became public, but there is
no way of knowing when it was developed and for how long it was used within
the Soviet government.
Also like DES, GOST is a Feistel network; the algorithm iterates a simple
encryption algorithm for multiple rounds. The text is first broken up into a
left half, L, and a right half, R. K is the subkey for that particular round.
A round, i, of either algorithm looks like this:
Li=Ri--1
Ri=Li--1 XOR f(Ri--1,Ki)
DES has 16 rounds; GOST has 32. Both algorithms can be reversed easily:
Decryption is the same as encryption with the order of the K reversed.
Figure 1 is a single round of GOST. First, the right half and the ith subkey
are added modulo 232 (that's just 32-bit addition, ignoring any carry). The
result is then broken into eight 4-bit chunks, each of which becomes the input
to a different S-box. There are eight different S-boxes in GOST; the first
four bits go in the first S-box, the second four bits go in the second S-box,
and so on. Each S-box is a permutation of the numbers 0 through 15. For
example, an S-box might be: 7, 10, 2, 4, 15, 9, 0, 3, 6, 12, 5, 13, 1, 8, 11.
In this case, if the input to the S-box is 0, the output is 7. If the input is
1, the output is 10, and so on. All eight S-boxes are different, and they are
considered additional key material. The S-boxes are to be kept secret.
The outputs of the eight S-boxes are recombined into a 32-bit word, and then
the entire word undergoes an 11-bit circular shift left, towards the
higher-order bits. Finally, the result is added modulo 232 to the left half to
become the new right half, and the right half becomes the new left half. Do
this 32 times and you're done.
The subkeys are generated simply. The 256-bit key is divided into eight 32-bit
blocks. The first block is used in the first round, the second block is used
in the second round, and so on. At the ninth round and again at the 17th
round, the cycle starts again: The first block is used in the ninth and 17th
rounds, the second block is used in the 10th and 18th rounds, and so on. For
the 25th through 32nd rounds, the order is reversed: The eighth block is used
in the 25th round, the seventh block is used in the 26th round, the sixth
block is used in the 26th round, and on and on.


Comparison with DES


For comparison, Figure 2 is a single round of DES. (DES also has an initial
and final permutation; these add nothing to the algorithm's security, and are
omitted for this discussion.) First, the right half is expanded from 32 to 48
bits by a fixed permutation. The result is XORed with the ith subkey, and then
broken into eight 6-bit chunks. Each chunk becomes the input to a different
S-box. DES has eight different S-boxes; the first six bits go in the first
S-box, the second six bits go in the second S-box, and so on. Each S-box is
four permutations of the numbers 0 through 15. In DES, the S-boxes are fixed
and public; they are part of the standard and are known by everyone.
The outputs of the eight S-boxes are recombined into a 32-bit word, and then
the entire word is permuted by another fixed permutation. Finally, the result
is added modulo 232 to the left half to become the new right half, and the
right half becomes the new left half. This is repeated 16 times.
DES's subkey-generation process is complicated. The 56-bit key is first
divided in half. Then, each 28-bit half is circularly shifted to the left by
either one or two digits, depending on the round. After the shift, 48 bits are
selected by a fixed permutation.
Here again are the major differences between DES and GOST:
DES has a complicated procedure for generating the subkeys from the keys. GOST
has a very simple procedure.
DES has a 56-bit key; GOST has a 256-bit key. If you add in the secret S-box
permutations, GOST has a total of about 610 bits of secret information.
The S-boxes in DES have 6-bit inputs and 4-bit outputs; the S-boxes in GOST
have 4-bit inputs and outputs. Both algorithms have eight S-boxes, but an
S-box in GOST is one fourth the size of an S-box in DES.
DES has an irregular permutation, called a "P-box;" GOST uses an 11-bit left
circular shift.
DES uses XOR to add the key to the right half and to add the right half to the
left half; GOST uses addition modulo 232 for both of those operations.
DES has 16 rounds; GOST has 32.


Generating GOST's S-Boxes


The GOST standard does not discuss how to generate the S-boxes, only that they
are somehow supplied. Recent discoveries of differential and linear
cryptanalysis show that the choice of S-boxes greatly affects security; there
are good S-boxes and bad S-boxes. The same holds true for GOST, leading some
to speculate that the erstwhile Soviets would supply good S-boxes to those
organizations it liked and bad S-boxes to those organizations it wished to
eavesdrop on. This may very well be true, but further conversations with a
GOST chip manufacturer within Russia offered another alternative. He generated
the S-box permutations himself, using a random-number generator. The S-boxes
in the accompanying source code were used in a Soviet banking application. 


Is it Secure?


Is GOST secure? The short answer is that no one knows. Here is a longer
answer: 
Assuming there is no better way to break GOST than brute force, it is
certainly much more secure than DES. GOST has a 256-bit key--longer if you
count the secret S-boxes. Against differential and linear cryptanalysis, GOST
is probably stronger. Although the random S-boxes in GOST are probably weaker
than the fixed S-boxes in DES, they are part of the key. The secrecy of GOST's
S-boxes adds to its resistance against differential and linear attacks. Also,
both of these attack are very much dependent on the number of rounds in the
algorithm: The more rounds, the harder the attack. GOST has twice as many
rounds as DES, and this probably makes both differential and linear
cryptanalysis infeasible.
The other building blocks are either on par or worse. GOST doesn't have the
same expansion permutation that DES has. It is known that deleting this
permutation weakens DES by reducing the avalanche effect; it is reasonable to
believe that GOST is weaker for not having it. GOST's use of addition instead
of exclusive-OR is probably no different.
The most serious change seems to be after the S-boxes: GOST's cyclic shift
instead of a permutation. The DES permutation is designed to increase the
avalanche effect: the rate in which a single-bit change in the input affects
bits in the output. In GOST a change in one input bit affects one S-box in one
round, which then affects two S-boxes in the next round, three the round after
that, and so on. GOST requires eight rounds before a single change in an input
affects every output bit; DES only requires five rounds. This is certainly a
weakness. But remember that GOST has 32 rounds to DES's 16.



Conclusions


GOST's designers tried to achieve a balance between efficiency and security.
They followed the same basic design of DES, but made some modifications to the
algorithm. As Listing One shows, the result is an algorithm that is better
suited for software implementation (no irritating bit twiddling). They seem to
have been less sure of their algorithm's security and have tried to compensate
by making the key length very large, keeping the S-boxes secret, and doubling
the number of iterations. Whether or not their efforts have resulted in an
algorithm more secure than DES remains to be seen.
Figure 1 One round of GOST (O+=addition modulo 232).
Figure 2 One round of DES (_=exclusive-OR).

Listing One 

/* The GOST 28147-89 cipher by Colin Plumb */

/* If you read the standard, it belabors the point of copying corresponding
 * bits from point A to point B quite a bit. It helps to understand that
 * the standard is uniformly little-endian, although it numbers bits from
 * 1 rather than 0, so bit n has value 2^(n-1). The least significant bit
 * of the 32-bit words that are manipulated in the algorithm is the first,
 * lowest-numbered, in the bit string.
 */

/* A 32-bit data type */
#ifdef __alpha /* Any other 64-bit machines? */
typedef unsigned int word32;
#else
typedef unsigned long word32;
#endif

static unsigned char const k8[16] = {
 1, 15, 13, 0, 5, 7, 10, 4, 9, 2, 3, 14, 6, 11, 8, 12 }; 
static unsigned char const k7[16] = {
 13, 11, 4, 1, 3, 15, 5, 9, 0, 10, 14, 7, 6, 8, 2, 12 };
static unsigned char const k6[16] = {
 4, 11, 10, 0, 7, 2, 1, 13, 3, 6, 8, 5, 9, 12, 15, 14 };
static unsigned char const k5[16] = {
 6, 12, 7, 1, 5, 15, 13, 8, 4, 10, 9, 14, 0, 3, 11, 2 };
static unsigned char const k4[16] = {
 7, 13, 10, 1, 0, 8, 9, 15, 14, 4, 6, 12, 11, 2, 5, 3 };
static unsigned char const k3[16] = {
 5, 8, 1, 13, 10, 3, 4, 2, 14, 15, 12, 7, 6, 0, 9, 11 };
static unsigned char const k2[16] = {
 14, 11, 4, 12, 6, 13, 15, 10, 2, 3, 8, 1, 0, 7, 5, 9 };
static unsigned char const k1[16] = {
 4, 10, 9, 2, 13, 8, 0, 14, 6, 11, 1, 12, 7, 15, 5, 3 };

/* Byte-at-a-time substitution boxes */
static unsigned char k87[256];
static unsigned char k65[256];
static unsigned char k43[256];
static unsigned char k21[256];

/* Build byte-at-a-time subtitution tables.
 * This must be called once for global setup.
 */
void
kboxinit(void)
{
 int i;
 for (i = 0; i < 256; i++) {
 k87[i] = k8[i >> 4] << 4 k7[i & 15];

 k65[i] = k6[i >> 4] << 4 k5[i & 15];
 k43[i] = k4[i >> 4] << 4 k3[i & 15];
 k21[i] = k2[i >> 4] << 4 k1[i & 15];
 }
}

/* Do the substitution and rotation that are the core of the operation, like
 * the expansion, substitution and permutation of the DES. It would be
possible
 * to perform DES-like optimisations and store the table entries as 32-bit 
 * words, already rotated, but the efficiency gain is questionable.
 * This should be inlined for maximum speed
 */
#if __GNUC__
__inline__
#endif
static word32
f(word32 x)
{
 /* Do substitutions */
#if 0
 /* This is annoyingly slow */
 x = k8[x>>28 & 15] << 28 k7[x>>24 & 15] << 24 
 k6[x>>20 & 15] << 20 k5[x>>16 & 15] << 16 
 k4[x>>12 & 15] << 12 k3[x>> 8 & 15] << 8 
 k2[x>> 4 & 15] << 4 k1[x & 15];
#else
 /* This is faster */
 x = k87[x>>24 & 255] << 24 k65[x>>16 & 255] << 16 
 k43[x>> 8 & 255] << 8 k21[x & 255];
#endif

 /* Rotate left 11 bits */
 return x<<11 x>>(32-11);
}

/* The GOST standard defines the input in terms of bits 1..64, with
 * bit 1 being the lsb of in[0] and bit 64 being the msb of in[1].
 * The keys are defined similarly, with bit 256 being the msb of key[7].
 */
void
gostcrypt(word32 const in[2], word32 out[2], word32 const key[8])
{
 register word32 n1, n2; /* As named in the GOST */

 n1 = in[0];
 n2 = in[1];

 /* Instead of swapping halves, swap names each round */
 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);
 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);

 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);

 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);

 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);
 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);

 n2 ^= f(n1+key[7]);
 n1 ^= f(n2+key[6]);
 n2 ^= f(n1+key[5]);
 n1 ^= f(n2+key[4]);
 n2 ^= f(n1+key[3]);
 n1 ^= f(n2+key[2]);
 n2 ^= f(n1+key[1]);
 n1 ^= f(n2+key[0]);

 /* There is no swap after the last round */
 out[0] = n2;
 out[1] = n1;
}
 

/* The key schedule is somewhat different for decryption. (The key table is 
 * used once forward and three times backward.) You could define an expanded 
 * key, or just write the code twice, as done here.
 */
void
gostdecrypt(word32 const in[2], word32 out[2], word32 const key[8])
{
 register word32 n1, n2; /* As named in the GOST */

 n1 = in[0];
 n2 = in[1];

 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);
 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);

 n2 ^= f(n1+key[7]);
 n1 ^= f(n2+key[6]);
 n2 ^= f(n1+key[5]);
 n1 ^= f(n2+key[4]);
 n2 ^= f(n1+key[3]);
 n1 ^= f(n2+key[2]);
 n2 ^= f(n1+key[1]);

 n1 ^= f(n2+key[0]);

 n2 ^= f(n1+key[7]);
 n1 ^= f(n2+key[6]);
 n2 ^= f(n1+key[5]);
 n1 ^= f(n2+key[4]);
 n2 ^= f(n1+key[3]);
 n1 ^= f(n2+key[2]);
 n2 ^= f(n1+key[1]);
 n1 ^= f(n2+key[0]);

 n2 ^= f(n1+key[7]);
 n1 ^= f(n2+key[6]);
 n2 ^= f(n1+key[5]);
 n1 ^= f(n2+key[4]);
 n2 ^= f(n1+key[3]);
 n1 ^= f(n2+key[2]);
 n2 ^= f(n1+key[1]);
 n1 ^= f(n2+key[0]);

 out[0] = n2;
 out[1] = n1;
}

/* The GOST "Output feedback" standard. It seems closer morally to the counter
 * feedback mode some people have proposed for DES. 
 *
 * The IV is encrypted with the key to produce the initial counter value. 
 * Then, for each output block, a constant is added, modulo 2^32-1 (0 is 
 * represented as all-ones, not all-zeros), to each half of the counter, and
 * the counter is encrypted to produce the value to XOR with the output.
 *
 * Len is the number of blocks. Sub-block encryption is left as an exercise
 * for the user. Remember that the standard defines everything in a 
 * little-endian manner, so you want to use the low bit of gamma[0] first.
 *
 * OFB is, of course, self-inverse, so there is only one function.
 */

/* The constants for addition */
#define C1 0x01010104
#define C2 0x01010101

void
gostofb(word32 const *in, word32 *out, int len,
 word32 const iv[2], word32 const key[8])
{
 word32 temp[2]; /* Counter */
 word32 gamma[2]; /* Output XOR value */

 /* Compute starting value for counter */
 gostcrypt(iv, temp, key);

 while (len--) {
 temp[0] += C2;
 if (temp[0] < C2) /* Wrap modulo 2^32? */
 temp[0]++; /* Make it modulo 2^32-1 */
 temp[1] += C1;
 if (temp[1] < C1) /* Wrap modulo 2^32? */

 temp[1]++; /* Make it modulo 2^32-1 */

 gostcrypt(temp, gamma, key);

 *out++ = *in++ ^ gamma[0];
 *out++ = *in++ ^ gamma[1];
 }
}

/*
 * The CFB mode is just what you'd expect. Each block of ciphertext y[] is
 * derived from the input x[] by the following pseudocode:
 * y[i] = x[i] ^ gostcrypt(y[i-1])
 * x[i] = y[i] ^ gostcrypt(y[i-1])
 * Where y[-1] is the IV.
 *
 * The IV is modified in place. Again, len is in *blocks*.
 */

void
gostcfbencrypt(word32 const *in, word32 *out, int len,
 word32 iv[2], word32 const key[8])
{
 while (len--) {
 gostcrypt(iv, iv, key);
 iv[0] = *out++ ^= iv[0];
 iv[1] = *out++ ^= iv[1];
 }
}
void
gostcfbdecrypt(word32 const *in, word32 *out, int len,
 word32 iv[2], word32 const key[8])
{
 word32 t;
 while (len--) {
 gostcrypt(iv, iv, key);
 t = *out;
 *out++ ^= iv[0];
 iv[0] = t;
 t = *out;
 *out++ ^= iv[1];
 iv[1] = t;
 }
}

/* The message suthetication code uses only 16 of the 32 rounds. There *is*
 * a swap after the 16th round. The last block should be padded to 64 bits 
 * with zeros. len is the number of *blocks* in the input.
 */
void
gostmac(word32 const *in, int len, word32 out[2], word32 const key[8])
{
 register word32 n1, n2; /* As named in the GOST */

 n1 = 0;
 n2 = 0;

 while (len--) {
 n1 ^= *in++;

 n2 = *in++;

 /* Instead of swapping halves, swap names each round */
 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);
 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);

 n2 ^= f(n1+key[0]);
 n1 ^= f(n2+key[1]);
 n2 ^= f(n1+key[2]);
 n1 ^= f(n2+key[3]);
 n2 ^= f(n1+key[4]);
 n1 ^= f(n2+key[5]);
 n2 ^= f(n1+key[6]);
 n1 ^= f(n2+key[7]);
 }
 out[0] = n1;
 out[1] = n2;
}

#ifdef TEST

#include <stdio.h>
#include <stdlib.h>

/* Designed to cope with 15-bit rand() implementations */
#define RAND32 ((word32)rand() << 17 ^ (word32)rand() << 9 ^ rand())

int
main(void)
{
 word32 key[8];
 word32 plain[2];
 word32 cipher[2];
 int i, j;

 kboxinit();

 printf("GOST 21847-89 test driver.\n");

 for (i = 0; i < 1000; i++) {
 for (j = 0; j < 8; j++)
 key[j] = RAND32;
 plain[0] = RAND32;
 plain[1] = RAND32;

 printf("%3d\r", i);
 fflush(stdout);

 gostcrypt(plain, cipher, key);
 for (j = 0; j < 99; j++)
 gostcrypt(cipher, cipher, key);
 for (j = 0; j < 100; j++)
 gostdecrypt(cipher, cipher, key);


 if (plain[0] != cipher[0] plain[1] != cipher[1]) {
 fprintf(stderr, "\nError! i = %d\n", i);
 return 1;
 }
 }
 printf("All tests passed.\n");
 return 0;
}
#endif /* TEST */





















































UNDOCUMENTED CORNER


Windows 90+5




Andrew Schulman


When you turn on a PC running MS-DOS 6, possibly with the intention of running
Windows 3.1, one of the first things you see on the screen is the message
"Starting MS-DOS."
If you install Windows 95 (aka "Chicago") on this PC, the message that greets
you the next time you turn it on will be "Starting Windows." 
This is a dramatic demonstration (well, as dramatic as a PC sign-on message
can get) that Microsoft wants you to view Windows, not MS-DOS, as the
operating system: You turn on your PC, and it tells you you're starting
Windows. All that's missing, it seems, are an icon with a smiling computer and
the words, "Welcome to Macintosh"--I mean, "Welcome to Windows." 
The overall goal behind Windows 95 is to turn the PC into a machine which no
longer makes you wonder why you're not using a Macintosh, especially now that
Apple's prices have come down far enough that the Mac competes in the same
market as the PC. Microsoft's Brad Silverberg, quoted in an important Fortune
article, "What's Driving the New PC Shakeout" (September 19, 1994), sums up
the goal: "A PC should be like an appliance. Using one should be as easy as
toasting a bagel." 
After Windows 95 ships, Bill Gates reportedly said in a leaked memo, "there
will be no reason to buy an Apple Macintosh" (Computer Reseller News, March
21, 1994). Many reviewers have been impressed with how close Windows 95 comes
to this goal of turning the PC into a high-volume, low-price Windows
appliance. For example, here's another use of the Windows-as-toaster metaphor:
Starting a Chicago PC is like turning on a toaster: Just hit the button. Sure,
you'll still see BIOS information and other machine-specific hieroglyphics
scroll by, but far fewer than under Windows 3.1. And then, instead of pausing
at the nasty old C> prompt, your PC will charge directly into Chicago's
Desktop. (PC World, August 1994)
Indeed, the ability to boot, seemingly seamlessly, into Windows is impressive.
But what does it tell us about the architecture of Windows 95? 
The old "Starting MS-DOS" message is produced by a hidden MS-DOS file called
IO.SYS. IO.SYS and a second file called MSDOS.SYS form the kernel of the
real-mode MS-DOS operating system. With MS-DOS, what Microsoft calls the
"startup record" (books on the DOS file and disk system usually it call the
"boot record"), occupying the first sector of the DOS bootable disk partition,
contains code to load IO.SYS, as seen in the hex dump from the DOS DEBUG
utility in Figure 1.
In Windows 95, the "Starting Windows" message is produced by a file called
WINBOOT.SYS, which plays the same role in Windows 95 that IO.SYS and MSDOS.SYS
did in earlier versions of MS-DOS. As seen in Figure 2, the startup/boot
record in Windows 95 looks for WINBOOT.SYS rather than for IO.SYS and
MSDOS.SYS. Notice that, not only has WINBOOT.SYS replaced the hidden kernel
files IO.SYS and MSDOS.SYS, but what Microsoft calls the "OEM name" in the
boot record has changed too, from "MSDOS5.0" to "MSWIN4.0." 
Everywhere you look in Windows 95, the name "DOS" has been crossed off and the
name "Windows" written in. This has impressed the computer trade press, as
seen in its reaction to Beta-1 (May 1994) of Chicago:
Chicago bypasses DOS and runs completely in protected mode, although on
startup it can stop briefly in real mode to process the now-optional
CONFIG.SYS and AUTO-EXEC.BAT for loading TSRs and old device drivers.
(PC/Computing, July 1994)
First of all, there is no DOS hiding under Windows anymore. For the sake of
compatibility with legacy applications, Chicago will read and respect
CONFIG.SYS and AUTOEXEC.BAT files, but it doesn't require them. (InfoWorld,
July 4, 1994) 
Whereas previous versions of Windows merely hid real-mode DOS, Windows 95
appears to have abolished it. As the statements above indicate, in Windows 95
you don't need CONFIG.SYS or AUTOEXEC.BAT to run Windows. Windows 95 can
automatically load the files necessary to run Windows--not only WIN.COM, but
also two DOS device drivers, HIMEM.SYS and IFSHLP.SYS--without any instruction
from CONFIG.SYS or AUTOEXEC.BAT. 
This is analogous to the introduction in MS-DOS 6.0 of the ability to silently
and automatically "preload" DoubleSpace disk compression. All previous disk
compression such as Stacker required assistance (sometimes a lot of
assistance) from DOS initialization files. Likewise until Windows 95,
CONFIG.SYS and AUTOEXEC.BAT were needed to get Windows up and running. And
just as DOS 6's undocumented preload interface (over which Microsoft and Stac
Electronics got into an interesting court battle; see "Undocumented Corner,"
DDJ, May 1994) made disk compression transparent, likewise Windows 95's
ability to autoload HIMEM.SYS, IFS-HLP.SYS, and WIN.COM makes Windows
transparent. You turn on the PC, and it boots into Windows.


Bypassing COMMAND.COM_


If you have Chicago, you can try out its ability to transparently boot
Windows, even if you normally use a CONFIG.SYS or AUTOEXEC.BAT file, simply by
pressing F5 for a moment when the machine starts. F5 initiates a so-called
"fail safe" mode, in which WINBOOT.SYS ignores CONFIG.SYS and AUTOEXEC.BAT,
and loads HIMEM.SYS, IFSHLP.SYS and, if needed, SETVER.EXE and EMM386.EXE.
WINBOOT.SYS then loads WIN.COM which, in Windows 95 just as in previous
versions of Windows, will load the Windows Virtual Machine Manager (VMM) and
Virtual Device Driver (VxD) layer, which in turn will load the Windows
graphical user interface.
In all the files loaded by WINBOOT.SYS, notice that one well-known file hasn't
been mentioned: the DOS command interpreter, COMMAND.COM. Indeed, WINBOOT.SYS
only requires COMMAND.COM in order to process AUTOEXEC.BAT, so if you boot
Windows 95 without an AUTO-EXEC.BAT, WINBOOT.SYS will directly load
Windows--without a copy of the DOS command interpreter sitting under it.
The WINPSP program in my upcoming book, Unauthorized Windows 95 (IDG
Programmer Press, 1995) shows this. The PSP (Program Segment Prefix) is a
real-mode DOS data structure; every running real-mode DOS program has a PSP,
and the real-mode address of a program's PSP acts as its process identifier.
WINPSP is a protected-mode Windows program that prints out some information
about each PSP, such as its real-mode address and the name of its owner. When
Windows 95 has been booted with an AUTOEXEC.BAT file, COMMAND is plainly
visible in the WINPSP output: 1293 COMMAND 0140.
But as seen in Figure 3, when there's no AUTOEXEC.BAT--WINPSP shows that
COMMAND.COM is gone. The Windows loader, WIN.COM, has replaced the DOS command
interpreter, COMMAND.COM: whereas COMMAND was loading at address 1293h, WIN is
now loading at the nearly identical address 1292h. This is great!


But Bypassing DOS Too?


Windows 95's ability to dispense with COMMAND.COM has been the source of much
confusion. For example, one particularly ill-informed writer claimed, before
Windows 95 was even in beta (and long before Microsoft decided that it would
not give this new version the name "Windows 4.0"), that:
If no real-mode DOS device drivers or TSRs have been loaded in CONFIG.SYS or
AUTOEXEC.BAT, Windows 4 should be able to remove itself entirely from
real-mode DOS, relying on the VMM/VxD operating system. ("Undocumented
Corner," DDJ, February 1994)
Leaving aside the curiously inelegant phrase, "remove itself entirely from,"
and the way he hedged his bets with that word "should," this writer was
obviously deeply confused. COMMAND.COM is not MS-DOS; it is merely the DOS
command interpreter. If there's no AUTOEXEC.BAT file, Windows 95 can dispense
with COMMAND.COM, but this hardly means that Windows 95 can dispense with DOS.
Even though it is well known that COMMAND.COM and the C:\> prompt are not
synonymous with MS-DOS--you can replace them by naming some other program in
the CONFIG.SYS SHELL= statement--many people who do know the difference
between MS-DOS and COMMAND.COM nonetheless persist in viewing Windows 95's
ability to bypass COMMAND.COM as somehow synonymous with bypassing real-mode
DOS code. For example, responding to a reader's letter that took exception to
his July 1994 assertion that Windows 95 is "a complete operating system in its
own right," a writer for Windows Magazine (October 1994) offered the following
substantiation for this assertion:
Here's a test anyone with a Chicago beta system can perform: Rename your
AUTOEXEC.BAT and CONFIG.SYS to AUTOEXEC.OLD and CONFIG.OLD; then shut down and
restart Chicago. It will boot back up--into Windows. You will see no command
prompt_. That looks to me like a full-up operating system, not an
"environment" that runs on top of such a system.
As we've seen, Windows 95's ability to boot right into Windows is indeed
impressive, but because the C:\> prompt is not part of the real-mode DOS
kernel, this impressive feat tells us nothing about Windows 95's status as a
"full-up operating system." Yet, Windows 95's ability to bypass COMMAND.COM
must in part lie behind the widespread notion that, in the absence of any
real-mode TSRs or device drivers, Windows 95 can "push" real-mode DOS off the
machine. Certainly it's great not having Windows resting on top of
COMMAND.COM. And booting Windows right out of WINBOOT.SYS makes a big
difference to a user's perception of their PC. But the idea that Windows 95
"pushes the real-mode code aside," "bypasses DOS and runs completely in
protected mode," or "should be able to remove itself entirely from real-mode
DOS" (what that ugly expression was supposed to mean), simply has no
foundation in any version of Chicago that has shipped to date, and is
extremely unlikely to have any foundation in the final shipping Windows 95
retail.
COMMAND.COM is not MS-DOS. It is merely the MS-DOS user interface, the
provider of the familiar but contemptible C:\> prompt. Windows 95 makes it
possible to run Windows without ever seeing a C:\> prompt. Great! But does
this mean, as Adrian King puts it in Inside Windows 95 (Microsoft Press, 1994)
that in Windows 95 "if you run only Windows applications, you'll never execute
any MS-DOS code," and that "Windows 95 finally breaks all ties with the real
mode MS-DOS code"? No, it doesn't. 
The "if you run only Windows applications, you'll never execute any MS-DOS
code" statement is false: In Windows 95, even if you run only Windows
applications, and only Win32 applications at that, you'll still execute some
important pieces of MS-DOS code. As just one example, the IFSMgr (Installable
File System manager) VxD, which is the basis for both 32-bit file access
(32BFA) and long filenames, requires a real-mode DOS device-driver "helper,"
IFSHLP.SYS.
But equally important, the statement appears to assume that, if you do run
non-Windows (that is, DOS) applications, it necessarily follows that you'll
execute some MS-DOS code. This doesn't follow at all. Support for the DOS INT
21h interface tells you absolutely nothing about whether the real-mode DOS
code is present or not. 
If you've ever run Windows NT or OS/2, you know that you can have a DOS box
and the DOS INT 21h programming interface, without having real-mode DOS. These
operating systems emulate DOS; the presence of a DOS box really tells you
nothing about the presence or absence of real-mode DOS itself. So there's no
particular reason why opening a DOS box would by itself make Windows 95 more
dependent on real-mode DOS than if you only ran Windows applications. 
In fact, Windows 95 does emulate DOS to a large extent. But this emulation is
neither complete nor new. To the large but incomplete extent that Windows 95
emulates (rather than relies upon) the real-mode DOS code, Windows for
Workgroups (WfW) 3.11 with 32BFA does as well.
Even more significant than the strong similarities between Windows 95 and WfW
3.11 is the fact that all the technology behind Windows 95's impressive,
albeit partial, emulation of DOS goes back to Windows 3.0 Enhanced mode,
introduced in 1990 (we might as well call it "Windows 90"). In particular, the
seeds for Windows 95 were sown in a well-documented but little-known VMM
service named Hook_V86_Int_Chain. Over the past few years, Microsoft has been
using Hook_V86_Int_Chain, together with other VMM services such as
Begin_Nest_V86_Mode, Exec_Int, Set_PM_Int_Vector, Allocate_V86_Call_Back,
Install_V86_Break_Point, and Call_Priority_VM_Event, to turn MS-DOS into a
32-bit, protected-mode operating system. And this has been going on, under our
noses, since the debut of Windows 90.
Meanwhile, we still need to see whether or not Windows 95 rests upon real-mode
DOS. Let's close the DOS box and return to the WINPSP program shown earlier.
This is a Win16 program. If "Windows 95 bypasses DOS" and "there is no DOS
hiding under Windows anymore," then this ought to be reflected in WINPSP's
output. It's not. I didn't say anything about it at the time, but when we used
WINPSP to show that Windows 95 can run without COMMAND.COM, it also happened
to show that all Windows programs running under Windows 95 have real-mode DOS
PSPs.
It is particularly revealing that both a Win32 program like the Windows 95
Cabinet/Explorer (CAB32), and the Win32 kernel (KERNEL32) should have
real-mode DOS PSPs. Microsoft says, "if you run only Windows applications,
you'll never execute any MS-DOS code"--well, here we are, running only Windows
applications, but what do you think is creating those PSPs? DOS is creating
them, and the Win16 KERNEL is asking DOS to create them by calling an
undocumented DOS function, INT 21h function 55h (Create PSP).
This finding, though surprising at first, actually follows logically from
other well-known aspects of the Windows 95 architecture:
Microsoft has said quite explicitly that the windowing and messaging system in
Windows 95 uses the Win16 USER module, even for Win32 apps: "Most of the code
in the 32-bit User DLL is little more than a layer that accepts 32-bit API
calls and hands them to its 16-bit counterpart for processing" (Inside Windows
95, p. 148). All WM_XXX messages, even those intended for Win32 applications,
are first processed in the Win16 USER module.
The Win16 USER messaging system in turn depends on a Win16 KERNEL data
structures called the Task Database (TDB). For example, the TDB contains a
pointer to the application message queue (see Undocumented Windows,
Addison-Wesley, 1994). Even the most cursory examination of Windows 95 shows
that every process--including every Win32 process--has an associated Win16
TDB. 
The TDB, in its turn, depends on the DOS PSP. In Windows Internals
(Addison-Wesley, 1994), Matt Pietrek shows that the CreateTask function in the
Win16 KERNEL calls an internal BuildPDB function, which in turn calls INT 21h
function 55h (PDB, meaning Process Data Block, is another name for PSP). 

Thus, it makes perfect sense that Windows 95 depends on the DOS INT 21h
interface for PSP management if for nothing else, and that this has nothing to
do, one way or the other, with whether or not you run any DOS programs.
One point does not necessarily follow, however: Making an INT 21h call does
not necessarily mean calling the real-mode DOS code. You must get used to the
fact that in Windows, an INT 21h call--even one coming from a real-mode DOS
program, device driver, or TSR--is not always handled by the real-mode DOS
code. There are two reasons why INT 21h ain't necessarily real-mode DOS:
First, Windows runs the real-mode DOS code in Virtual-8086 (V86) mode. This is
not a pedantic point. V86 mode is hardly at all like real mode, and more
closely resembles a one-megabyte protected mode. The best way to wrap your
mind around Windows' relationship to DOS is to keep telling yourself that
Windows runs the real-mode DOS code in protected mode. 
Second, Windows avoids sending most INT 21h calls down to the real-mode DOS
code (which, as just mentioned, Windows is effectively running in protected
mode), because these calls--whether coming from a DOS program running in a DOS
box, or from some piece of DOS software loaded before Windows, or from a
Windows application--are handled by VxDs. As noted earlier, the
Hook_V86_Int_Chain service provided by VMM and used by VxDs such as IFSMgr is
the basis for 32-bit protected-mode emulation of the DOS interface.
Still, the DOS PSP management services, including functions 50h (Set PSP), 51h
and 62h (Get PSP), and 55h (Create PSP), are not among those INT 21h services
that the Windows 95 VxDs currently emulate in protected mode. Every time you
start up an application in Windows, be it a DOS program, Win16 program, or
even the newest Win32 application, Windows will ask DOS to create a PSP.
What we're left with, then, is that Windows 95 can dispense with COMMAND.COM,
purveyor of the nasty C:\> prompt. This is terrific. A Windows 95 machine can
boot right into Windows without a CONFIG.SYS or AUTOEXEC.BAT. Fantastic! But
is this what everyone means when they say that Windows 95 "doesn't require
DOS"? It would appear so.
WINBOOT.SYS contains the old real-mode DOS code, and Windows 95 calls down to
this code quite frequently (albeit in V86 mode which, you need to keep telling
yourself, is really protected mode). Now, the VxD layer in Windows 95 does
handle most INT 21h calls entirely in 32-bit protected mode, without calling
DOS. This, too, deserves two and possibly even three cheers. But 32BFA in WfW
3.11 did the exact same thing, and met with considerably less fanfare than
Windows 95, and in fact the trade press generally complained that WfW 3.11's
"bypassing DOS" was some sort of bug that caused all sorts of
DOS-compatibility problems. 
So what could the "Windows 95-bypasses-DOS" claim possibly mean?
Perhaps it simply refers to the packaging of Windows 95. For example, the
Microsoft Windows "Chicago" Reviewer's Guide says "Chicago will be a complete,
integrated protect-mode operating system that does not require or use a
separate version of MS-DOS." Note Microsoft doesn't say that Windows 95 will
not require or use DOS. Microsoft says Windows 95 won't require or use a
separate version of MS-DOS. Perhaps Microsoft is simply telling us that all
the functionality formerly associated with MS-DOS will now be brought out
under the Windows 95 brand name. If everything formerly thought to be part of
MS-DOS is by executive fiat now part of Windows, then the "you'll never
execute any MS-DOS code" claim makes sense, I suppose. 
If I had to explain how Windows 95 relates to DOS in 25 words or less, then,
I'd say this: Windows 95 relates to DOS the same way that WfW 3.11 does.
Windows 95 provides 32BFA. For nonfile calls, it calls (in V86 mode) the
real-mode DOS code in WINBOOT.SYS. Windows 95 is a genuine operating system;
so were WfW 3.11, Windows 3.1 Enhanced mode, and Windows 3.0 Enhanced mode.
Windows 95 is nothing more, but also nothing less, than Windows 90+5.
Figure 1: The MS-DOS boot record looks for IO.SYS and MSDOS.SYS.
C:\WINDOWS>debug
-L 100 2 0 1 ;;; load drive C: sector 1 into address 100h
-d 100 300 ;;; now dump out address 100h
7431:0100 EB 3C 90 4D 53 44 4F 53-35 2E 30 00 02 10 01 00 .<.MSDOS5.0.....
7431:02E0 61 64 79 0D 0A 00 49 4F-20 20 20 20 20 20 53 59 ady...IO SY
7431:02F0 53 4D 53 44 4F 53 20 20-20 53 59 53 00 00 55 AA SMSDOS SYS..U.
C:\WINDOWS>dir \*.sys /a:h
IO SYS 40,566 09-30-93 6:20a
MSDOS SYS 38,138 09-30-93 6:20a
Figure 2: The Windows 95 boot record looks for WINBOOT.SYS.
C:\Windows>debug
-l 100 2 0 1
-d 100 300
77AB:0100 EB 3C 90 4D 53 57 49 4E-34 2E 30 00 02 08 01 00 .<.MSWIN4.0.....
77AB:02F0 00 57 49 4E 42 4F 4F 54-20 53 59 53 00 00 55 AA .WINBOOT SYS..U.
C:\Windows>dir \*.sys /a:h
WINBOOT SYS 288,030 06-10-94 4:22a
Figure 3 Output from WINPSP in a Windows 95 system booted without an
AUTOEXEC.BAT file; notice that COMMAND.COM is not present in memory. But also
notice that even Win32 tasks such as CAB32 (the Windows 95 shell), KERNEL32
(the Win32 kernel), and WINBEZMT (a multithreaded Bzier demo) have DOS PSPs.


































PROGRAMMER'S BOOKSHELF


Making Programs Go Faster




Peter Gulutzan


Peter is president of Ocelot Computer Services and co-author of Optimizing SQL
(R&D Publications, 1994). He can be contacted Suite 1104, Royal Trust Tower,
Edmonton, AB or on CompuServe at 71022,733.


In an etymology-based world, "optimization" books would show how to make the
"best" code for a given algorithm, and "best" would cover a range of concepts:
clarity, portability, connectivity, and the like. In such a world, the title
of Michael Abrash's book, Zen of Code Optimization would be Zen of Code
Acceleration because it's about making programs go faster--period. That's a
narrow way to define optimizing, but there are enough generalist books out
already. Speed specialists will know that this book is for them as soon as
they read the first sentence in the Preface: "This book is the diary of a
personal passion, my quest for ways to write the fastest possible software for
IBM-compatible computers in C, C++, and assembly language."
In fact, the approximate ratios are 5 percent C++, 25 percent C, and 70
percent assembly language. No surprise there: To really control the cycles,
programmers have to use assembly language. But Zen of Code Optimization
introduces this proposition in a sober fashion: Chapter 1 is an example of a C
program that really can't be improved by just dropping in some inline assembly
code. The first step--and Zen never stints on the warnings that this must be
the first step--is improve the algorithm by timing it, timing the
alternatives, stopping and thinking, deciding if any performance improvement
would really be worth the effort, and then--only then--rewriting the critical
routine in assembly language.
But how can you make your routine faster if you can't figure out how fast the
routine is running, before and after their changes? Zen solves this problem in
Chapter 3 by introducing a timing program called "the Zen Timer," which comes
on the 3.5-inch diskette packaged with the book. Certainly, a profiler will do
a better job of describing what routines are in use and where apparent
bottlenecks lie. However, a profiler is not a precise instrument. Typically,
profilers depend on the computer's system clock to interrupt them, which
happens 18.2 times a second--a relatively infrequent occurrence on a computer
that's doing several million instructions per second. Not only that, the
profiler and the clock-interrupt-routine themselves consume cycles, so the act
of measurement affects the thing being measured. To truly time something, you
have to run a routine several million times with varying clock speeds in a
loop.
The Zen Timer presents no such difficulties. It works by reprogramming the
8253 (or equivalent) timer chip that comes standard with every "IBM
compatible" computer. The 8253 is incrementing an internal counter
approximately 1,000,000 times per second. The Zen Timer retrieves the 8253's
counter value, then it masks all interrupts and executes whatever routine has
to be timed. As soon as that's over, it gets the 8523's current count (the
8523 keeps incrementing independently of what's happening on the CPU). It
subtracts the count value that it got before entering the loop, and lo! The
result is a timing of the routine's speed to the nearest microsecond, give or
take a bit (some caveats apply).
As far as I can tell though, Zen of Code Optimization does not mention a
potentially irritating detail: The Zen Timer does not work from Windows or
from the Windows DOS Box. So the question arises, if you have to go outside
Windows when you use the Zen Timer to test a routine, are the results valid
when we put the same routine in a Windows application? In any ordinary
situation, yes. There are a few instructions which run differently in Windows'
Enhanced 386 mode: POP ES; MOV DS,AX; or anything else that causes a segment
identifier to change. Still, such instructions are too rare to make the Zen
Timer's results meaningless. This utility is the best thing about the book.


Zen: The 486 Avatar


How fast does an instruction go on an Intel 486? My assembler's reference
guide is supposed to answer that but--fascinatingly--it's often wrong.
Sometimes my guide is simply misprinted, like the TASM 3.x manual, which says
that JCXZ takes one or three cycles (it should say five or eight cycles, a
whopping difference). Sometimes my guide is simply incomplete: It was years
before I found out that ADD mem,1 can be slower than INC mem, and I search in
vain for that kind of information in the "manuals" that came with my assembler
package. I do better if I go to the real information source (Intel's
documents). Yet even there, some details are missing or hidden in a terse
appendix. In short, nobody can answer the question, "How fast does an
instruction go on an Intel 486?". I have to time it myself, and until
recently, my code-timing method involved wasting a lot of my own time.
This is where Zen comes in. I've solved the timing problem by plugging in the
Zen Timer, which shows with reasonable clarity where my cycle times are going.
What about the problem of finding out how fast a CPU really goes? Since (to my
knowledge) Zen is the only trade book that even tries to address this
question, I'm sure that many people will rejoice in Zen's revelations.
Still, Zen's Chapters 12 and 13--the two chapters that address the Intel
486--need a close look.
Chapter 12 contains a section titled, "Calculate Memory Pointers Ahead of
Time," which warns against loading a value into a register and then using the
same register as a pointer. For example, if your first instruction is MOV
BX,5, then your second instruction better not be MOV AX,[BX], because the
486's pipeline stalls when it can't figure out an address in advance. Intel's
documentation says there is a penalty for doing this. Zen's contribution is to
point out that the penalty is really two cycles. But Zen misses the
exceptions. For example, if a two-cycle penalty always applied, then Example
1(a) would take two cycles longer than Example 1(b) (assuming BX starts equal
to offset Mem)--but it does not. I timed them together, and the penalty in
this case is one cycle.
A bit later you come to the section entitled, "Problems With Byte Registers,"
which begins with the statement, "There are two ways to lose cycles by using
byte registers, and neither of them is documented by Intel, so far as I know."
We then see that the first rule is a matter of using a 16-bit register as an
instruction's source operand right after "loading" one half of the register.
In Example 2, the two instructions together will run in three cycles--one more
than you'd expect if you read that MOV is a one-cycle instruction.
Actually, Intel does warn that there will be a penalty for loading half of a
register and then using the whole register as a source in the next
instruction. When they say "half of a register," they mean the 16-bit half of
a 32-bit register, so Zen is apparently right in saying that Intel doesn't
document the effect. However, Zen's rule merely extrapolates Intel's warning
to 8- and 16-bit registers.
In reality, Zen's rule is only one manifestation of a rule that's much more
widely applicable, but also much more complex. To merely describe it would
take a page, so I'll limit myself to two examples. In Example 3(a), CX is the
destination, not the source, but there is a penalty anyway. In Example 3(b),
however, there is no penalty because a penalty is already being applied.
Neither of these byte-register effects fits either Zen rule. In short, you
should be aware that there's more to the story than Zen implies.
Chapter 13 has four pages of personal anecdotes and a restatement of the 486's
most important pipeline problem (changing a register just before using it in
an address), then a couple of pages on the obscure BSWAP instruction ("BSWAP:
More Useful than You Might Think"). To justify this attention, Zen says, 
Unfortunately the x86 instruction set doesn't provide any way to work directly
with only the upper half of a 32-bit register. The next best solution is to
rotate the register to give you access in the lower 16 bits to the half you
need at any particular time....
Zen dismisses using ROR for this, because "shifts and rotates are among the
worst performing instructions of the 486, taking two to three cycles to
execute." To the rescue comes BSWAP, which "executes in just 1 cycle." The
discussion culminates in Example 4, which shows that you can use the top half
of the ECX register as a loop counter and the bottom half as a "skip value,"
swapping the top half with the bottom half when you need to directly refer to
the loop-counter half.
This would be very useful, except that BSWAP is not just a one-cycle
instruction--it requires from one to three cycles. In most situations, BSWAP
ECX takes three cycles; ROR ECX,16 and BSWAP ECX take the same time.
Furthermore, you certainly can work directly with the upper half of a 32-bit
register. If you do, your code will be faster than if you used BSWAP and (I
think) quite a bit clearer; see Example 5.
Incidentally, my claim that BSWAP ECX is a three-cycle instruction may
surprise some people. Doesn't Intel say that BSWAP takes one cycle? Yes, but
Intel also says to add one cycle for prefixes (there are some exceptions, but
that's the general rule, and it's certainly applicable in these examples).
Aha, everyone thinks, we're operating on a 32-bit operand; therefore, there's
a "32-bit operand" prefix (DB 66h), so BSWAP takes 1+1=2 cycles. That's right,
but not everyone realizes that BSWAP, like many other instructions, always has
another prefix (DB 0Fh), so BSWAP takes 1+1+1=3 cycles. Intel's documentation
has a little note that 0Fh is a prefix, but nowhere have I seen them spell out
the horrible implication: Most of the time on 386/486s, all instructions whose
first machine opcode byte is 0FH take one cycle longer than the manual says
they will. This includes many of the instructions that appeared with the
introduction of the 386, including BSR, BT, BTC, BTR, BTS, CMPXCHG, IMUL
<register>,<register>; Jxx <rel16>; LFS, LGS, MOVSX, MOVZX, POP FS, POP GS,
PUSH FS, PUSH GS, SHLD, SHRD, SETxx, XADD, and various protected-mode
instructions--and BSWAP. I'll suggest that BSWAP is not "more useful than you
think."
By way of emphatic disclaimer, I am not giving Zen's advice a critical review.
I cowrote an "optimizing tips" book (on a different topic) and fully expect
people to find exceptions to the so-called rules in it--that's how we learn. 
This is not a single book with a single unified plan--it is two books threaded
together. Book 1 is for "speed freaks," who want to learn quickie fixes, speed
up their tightest routines, and count cycles. Book 2 is for "Zen disciples,"
who want to exercise their minds, experience the thought processes of the
masters, and be better programmers. If you're a speed freak but have no
patience for all that Zen stuff, buy the Intel manuals. If you're a speed
freak but you want to be a Zen disciple too, buy this book.
Zen of Code Optimization
Michael Abrash 
Coriolis Group Books, 1994 449 pp., $39.95 
ISBN 1-883577-03-9
Example 1: If a two-cycle penalty always applied, then (a) would take two
cycles longer than (b), but it doesn't.
(a)
MOV BX,offset Mem
ADD word ptr [bx+4],55
(b)
MOV CX,offset Mem
ADD word ptr [BX+4],55
Example 2: These two instructions run in three cycles.
MOV AL,5 ;"Loading" AL, which is one half of AX
MOV DX,AX ;Using AX, the whole register, as the source
Example 3: (a) CX is the destination; (b) there is no execution penalty
because a penalty is already being applied.
(a)
DEC CL is slower than DEC CL
SUB CX,5 SUB AX,5

(b)
MOV BL,BL is NOT slower than MOV BL,BL
MOV [BX],BX MOV [BX],CX
Example 4: You can use the top half of the ECX register as a loop counter.
mov cx,[InitialValue]
bswap ecx ;Put skip value in upper half of ECX
mov cx,64h ;Put loop count in CX
looptop:...
bswap ecx ;Make skip value word accessible in CX
add bx,cx ;Skip BX ahead
inc cx ;Set next skip value
bswap ecx ;Put loop count in CX
dec cx ;Count down loop
jnz looptop ;The loop will repeat 64h times.
Example 5: This code is faster than using BSWAP.
mov cx,[InitialValue]
and ecx,000ffffh ;Clear the upper part of ECX to 0
or ecx,00630000h ;Put 63h directly in the upper part of ECX
looptop:...
add bx,cx ;Skip BX ahead
inc cx ;Set next skip value
sub ecx,00010000h ;Count down loop
jnc looptop ;The loop will repeat 64h times.








































SWAINE'S FLAMES


An Invitation


In the December column, I announced an Emoticontest, in which I invented two
emoticons, or smileys, and asked whom they might represent.Answers started
coming in before the end of October, the first being from Scot Wingo, on
October 26 at 5:09 p.m., PST. Scot's answer was "Siskel and Ebert," which was
absolutely correct. To see the famed movie critics, rotate the page 90 degrees
clockwise and think of thumbs. Scot's prize is the fame that comes from having
your name printed in Dr. Dobb's Journal and an official DDJ T-shirt. When I
say that Scot's answer was correct, I mean that it agreed with mine. Other
good answers, like Laurel and Hardy, didn't, so they weren't. 
&8-) 7
(:-\ L
Symbols mean what you make them mean. It's due to a quirk of print publishing
that readers can respond to the December issue of a magazine in October of the
same year. That I can respond to these responses in the very next issue is
actually more impressive, given certain other quirks of print publishing. My
ability to respond so quickly in the magazine is due to two things: the
Internet and a recklessly tolerant deadline.
If I am any example, the Internet will be saturated with users when everyone
on the planet has seven e-mail addresses. I realize that seven is not a lot;
you probably have more, I have had more, or at least different, addresses. I
guess it was about ten years ago that I got into this online stuff, when I
announced plans for a Dr. Dobb's bulletin board in my first editorial for this
magazine. (They made me write editorials back then. Same deadline.) Not long
after that, I began writing this column, with a title that suggests that it
has some connection with this online stuff. Until now, that has not been the
case. Now I've started listing an e-mail address with the column, and I invite
you to send me stuff so I don't have to write so much. Make it really funny. 
While doing the research for this month's column--yes, I actually did research
for this month's column; I had to find that reference to the Dr. Dobb's
bulletin board--I reread issues of this publication from the early 1980s, and
this sent me off to some history books, which, coincidentally, I was reading
for another writing project. While doing all this research, I came across that
famous 1976 Bill Gates letter to computer hobbyists. "Most of you steal your
software," Bill tactlessly but accurately chastised the programming community.
But being accurate doesn't make you right. If you weren't planning to read my
"Programming Paradigms" column in this issue, just take a look at the end of
it and read what Fortran inventor John Backus had to say about stealing
software in the 1950s. Never mind; I'll quote it here: "An idea was the
property of anyone who could use it." An interesting angle on intellectual
property, no? In the 1950s, intellectual property didn't seem to apply to
programming. There was no real software market; the technology had not
stabilized to the point where a market made sense. This same situation held
true in the hobby computer world of the late 1970s, the world that Bill Gates
wrongly thought should function like a market. But being wrong didn't make his
strategy incorrect. 
So my question for you is, has the online world evolved to the point where a
market makes sense? I invite your thoughts on the commercialization of the
Internet.
Michael Swaine
mikeswaine@eworld.comeditor-at-large














































OF INTEREST
ISDN*tek has released the CyberSpace Internet Card, an ISDN-compatible,
half-size plug-in card that enables high-speed data transfers for ISDN users
between for Windows- or OS/2-based PCs and the Internet or other UNIX hosts.
Running six times faster than modems, the CyberSpace Internet Card allows
connection to the Internet using compatible TCP/IP software to support Mosaic,
Gopher, and other applications. 
The card can also be used for high-speed data transfers directly between ISDN
users or sites which have HDLC and synchronous PPP hardware and software. The
CyberSpace Card supports all Basic Rate ISDN in the U.S. and automatically
configures itself for AT&T 5ESS, Northern Telecom DMS, and National ISDN-1
standards. It offers a 56/64 data call on either B-channel and is supplied
with interface drivers for any WinISDN compatible TCP/IP software. The card
comes with a Windows DLL driver that implements the WinISDN API, a public
interface by ISDN*tek and NetManage, and Performance System International that
uses 18 calls to provide ISDN-specific data services between the hardware and
software.
The CyberSpace Internet Card sells for $395.00. Reader service no. 20.
ISDN*tek 
P.O. Box 3000
San Gregorio, CA 94074
415-712-3000
Microware has announced that its OS-9 real-time operating system is available
for the PowerPC platform. Additionally, the company has ported FasTrak, an
integrated C cross-development environment for UNIX and Windows (based on the
Microware Ultra C compiler), and DAVID (digital audio/video interactive
decoder) to the PowerPC.
To ensure deterministic, real-time response, OS-9 for PowerPC takes advantage
of the PowerPC's visual caching mechanism, which enables developers to lock
time-critical sections of code in the cache. Likewise, the FasTrak debugger
has also been optimized for the PowerPC by making use of the processor's
watchpoint registers.
The developer's package consists of various file managers ranging from ISP and
NFS to ISDN and MPEG, as well as the development environment and drop-in
board; it sells for $7500.00. Reader service no. 21.
Microware Systems
1900 NW 114th Street
Des Moines, IA 50325-7077
515-224-1929
A series of books for Mosaic users has been released by O'Reilly & Associates:
The Mosaic Handbook for Microsoft Windows, The Mosaic Handbook for the
Macintosh, and The Mosaic Handbook for the X Window System, by Dale Dougherty
and Richard Koman (Paula Ferguson was also a co-author on the X Window book).
The books introduce readers to Mosaic and its use in navigating and finding
information on the World Wide Web. The Microsoft and Macintosh versions come
with a copy of Mosaic on disk; the X Window version comes with a CD-ROM. The
books sell for $29.95 each. Reader service no. 22.
O'Reilly & Associates
103A Morris St.
Sebastopol, CA 95472
707-829-0515 
Software Blacksmiths has begun shipping C-Doc 6.0, an automatic C/C++ program-
documentation tool for Windows 3.x, NT, OS/2, and DOS. C-Doc consists of six
modules: C-Call, for tree diagrams, table of contents, and function
cross-references; C-Cmt, for function-block comments; C-List, for action
diagrams and standardized formatting; C-Ref, for cross-reference of locals,
globals, defines, and parameters; C-Metric, for complexity metrics; and
C-Browse, for graphical viewing of C-Call function trees and C-Ref class
trees.
Among the features Version 6.0 supports are OS/2 and Win32 long filenames, RTF
output, and one-million-line capacity for Win32, OS/2, and extended DOS. 
The complete C-Doc 6.0 Professional package sells for $299.00, although
individual modules can be purchased separately. Reader service no. 23.
Software Blacksmiths
6064 St. Ives Way
Mississauga, ON
Canada L5N 4M1
905-858-4466
d-Time 1.1, a software accelerator that makes CD-ROMs run as fast as hard-disk
drives, has been released by Ballard Synergy. The company claims that access
improves by 20 times when the accelerator is installed. In addition to DOS and
Windows support, d-Time works with network CD-ROMs. The program sells for
$69.95. Reader service no. 24.
Ballard Synergy
10715 Silverdale Way, Suite 208
Silverdale, WA 98383
206-656-8070
Datman, a software package that transforms a 4-mm DAT tape drive into a
high-performance, 8-gigabyte, MS-DOS file system, has been released by
Pixelab. Supporting up to 256,000 files and 16,000 subdirectories, Datman data
is directly accessible by DOS or Windows applications. Once the Datman device
driver is installed, the tape drive behaves like a floppy disk with
large-volume data-transfer operations exceeding 15 Mbyte/minute. 
Datman 1.0, which is ASPI-compatible, sells for $150.00. A full DAT tape
subsystem, which includes the Datman software, a tape drive, SCSI host
adapter, cable, and the like, sells for $1199.00. A Datman developer's kit
that includes file-access functions for C programmers is also available.
Reader service no. 25.
Pixelab
1212 S. Naper Blvd., Suite 119
Naperville, IL 60540
708-369-7112
Texas Instruments has released the TMS320C44, a floating-point DSP chip
targeted for multiprocessor telecommunication systems. Although based on the
TI C40 processor, the C44 differs in that it has a reduced pin count,
lower-cost packaging, reduced power requirements, and power management.
Because it is software compatible with the C40, all existing C40-based
software tools can be used for system development. The C44 is available in
40/50/60-MHz versions, starting at $130.00 each in quantities of 1000. Reader
service no. 26.
Texas Instruments
Semiconductor Group, SC-94112
P.O. Box 172228
Denver, CO 80217
800-477-8924 x4500
Pentium Processor Optimization Tools, by occasional DDJ author Mike Schmit has
been released by AP Professional. Among other topics, the book provides
coverage of superscalar programming, pipeline operation, Pentium optimization,
FPU math, and the like. The book includes a source code disk. ISBN
0-12-637230-1. The 400-page book sells for $39.95. Reader service no. 27.
AP Professional
525 B Street, Suite 1900
San Diego, CA 92101-4495
619-699-6735
Object Bridge, middleware technology that provides interoperability for
object-based systems, has been announced by Visual Edge. The Object Bridge SDK
is a C++ class library that initially supports Microsoft's COM and OLE
Automation, Iona's Orbix, and IBM's SOM/DSOM. Central to the Object Bridge
technology is a class registry to which the features of each object system are
described. Added to the class registry are Object System Adapters (OSAs) that
describe each support object system. These OSAs can be purchased and
dynamically added to Object Bridge as required. 
The Object Bridge SDK lets you write C++ for a target-object model, then
access objects from other systems using familiar native mechanisms (such as
COM's IClassFactory). Reader service no. 28.
Visual Edge Software
3950 Cte Vertu, Suite 100
St-Laurent, PQ 
Canada H4R 1V4
514-332-6430 

Rimstart Technologies has released Version 2.1 of its Rimstar Programmer's
Editor. In addition to supporting OS/2, this version of the tool includes hex
mode and syntax coloring. The editor provides a graphical user interface for
OS/2, Windows, and NT, and is fully configurable. Its multithreading
capabilities make it possible for simultaneous multiple compilations. It also
provides keystroke macro recording, smart indexing, bookmarks, a C source
browser, and C macro language. The OS/2 version sells for $299.00, while the
Windows and NT versions sell for $199.00. Reader service no. 29.
Rimstar Technologies
91 Halls Mill Road
Newfields, NH 03856
603-778-2500
The Fuzzy Logic CD-ROM Library, recently released by AP Professional, includes
The Fuzzy Systems Handbook, by Cox, Fuzzy Systems Theory and Its Applications,
by Terano, Fuzzy Sets and Systems, by DuBois and Prade, Introduction to the
Theory of Fuzzy Subsets, by Kaufmann, and Fuzzy Sets and their Applications to
Decision and Cognitive Processes, by fuzzy-logic inventor Lotfi Zadeh. The
CD-ROM, which is available for Windows, Macintosh, and UNIX systems, includes
a search engine and additional software and sells for $59.95. ISBN
0-12-059755-1. Reader service no. 30.
AP Professional
525 B Street, Suite 1900
San Diego, CA 92101-4495
619-699-6735
Wireless Connect has announced its CDPD Starter Kit and CDPD SDK. The CDPD
Starter Kit includes source code for Windows, UNIX, Macintosh, and DOS; two
wireless modems and airtime. The CDPD SDK is a full set of tools and
libraries--including built-in encryption, compression, and modem
independence--that lets developers build Cellular Digital Packet Data (CDPD)
applications. CDPD is similar to packet radio networks in that data is moved
in small packets that can be checked for errors and retransmitted. However,
CDPD does this in the current cellular voice network using a technique known
as "channel hopping" to locate idle voice channels and weave data packets into
them. Reader service no. 31.
Wireless Connect
2177 Augusta Place
Santa Clara, CA 95051
408-448-3844
CSIM 17, from Mesquite Software is a C/C++ toolkit for implementing
process-oriented, discrete-event simulation models. The library supports UNIX
workstations, PCs, and Macintosh platforms. Among the features this version
provides are synchronous facilities and storages, timed operations,
random-number streams, and moving window averages. The package sells for under
$500.00. Reader service no. 32.
Mesquite Software
8920 Business Park Drive
Austin, Texas 78759
512-338-9153
Voysys, a supplier of voice-processing systems, has announced VoysAccess, a
software tool that lets developers create applications for receiving,
inputting, and updating data in a database using a TouchTone telephone, a
process referred to as "interactive voice response" (IVR). VoysAccess will
initially support Microsoft's FoxPro language; subsequent releases will
support languages such as Borland dBase and Microsoft Visual Basic. In
addition, VoysAccess supports Microsoft's Telephony Advanced Programming
Interface (TAPI) standard for Windows, a set of program specifications that
allows developers to incorporate telephony features in their applications.
The VoysAccess Software Development Kit (SDK) includes a set of
database-development language extensions to enable telephone access and the
VoysAccess Server, which is capable of handling two telephone lines in a
single PC; an Expansion Kit will support up to 24 lines. The SDK also includes
a one-port PC board to enable rapid development and Windows sound capture and
editing that enables IVR applications to "speak" database information. In
addition, Voysys will provide three operational mini-applications to help
developers get started quickly. The VoysAccess SDK sells for $895.00. Reader
service no. 33.
Voysys Corp.
48634 Milmont Dr.
Fremont, CA 94538
510-252-1100
The C++ Compilation System 2.0 for UnixWare 1.1 from Novell is an add-on to
the UnixWare SDK 1.1 that includes a C++ compiler, with template support and
optimization for the Pentium processor, C++-standard components libraries (a
set of foundation classes and tools), graphical debugger with interface
customization and animation features, C++ and system header files, and support
for shared libraries. The C++ Compilation System 2.0 for UnixWare 1.1 is
available for $99.00 on CD-ROM and QIC-24 tape formats. Reader service no. 34.
Novell Inc.
P.O. Box 4100
Crawfordsvall, IN 47933
800-457-1767
The SoftPub Yellow Pages, a directory of suppliers in the software industry,
has been released by the SoftPub Group. The directory provides contact
information for over 100 categories in the software industry, including
disk/CD duplicators, user-interface consultants, localization specialists, and
the like. The directory sells for $49.00. Reader service no. 35.
The SoftPub Group
24705 214th Ave. SE
Maple Valley, WA 98038
206-852-7440
Nu-Mega has released Version 2.2 of its Bounds-Checker for Windows that
supports automated debugging for Visual Basic Custom Controls (VBXs). In
addition to detecting memory leaks, Version 2.2 also detects resource leakage
and heap and data corruption in VBXs from C, C++, and Visual Basic. Likewise,
T-View, Bounds-Checker for Windows' integrated event-logging utility, also
supports VBXs so that you can capture and play back all VBX-specific events
and messages.
Bounds-Checker for Windows 2.2 sells for $249.00. Reader service no. 36.
Nu-Mega Technologies
P.O. Box 7780
Nashua, NH 03060-7780
603-889-2386
Crescent Software has announced the release of its QuickPak Professional
add-on library for PowerBasic. QuickPak Professional is a library of nearly
600 subroutines and functions for everything from handling data entry input
fields to accessing DOS and BIOS interrupts. The QuickPak Professional library
sells for $199.00. Reader service no. 37.
Crescent Software
11 Bailey Avenue
Ridgefield, CT 06877-4505
203-438-5300
















EDITORIAL


Net High Jinks, or Life on the High Wire


Just when the "J" word seemed comfortably under the carpet, Microsoft just had
to make another pre-announcement, this time for its upcoming Microsoft Network
online communications service. What with their crying for "justice" (as in
Justice Department), you'd think online kingpins like CompuServe, Prodigy, and
America Online had been pinched. Maybe they yet will be. In any event, the
specter of recent battles with the Federal Trade Commission over fair trade
and antitrust has again cropped up. 
What Microsoft's potential competitors are yelping about is a communications
service that promises to be easy, inexpensive, and directly accessible from
the Windows 95 operating environment. The problem, online vendors say, is that
the only way to get on the Microsoft Network will be through Windows
95--presumably, third-party communications tools won't allow access. Assuming
Microsoft ships 35 million copies of Windows 95 this year and 10 percent of
Windows 95 users sign up for the network (Redmond's projections, not mine),
Microsoft's 3.5 million subscribers will outdistance CompuServe's 2.5 million
and America Online's 800,000 users in a single swoop. 
At issue is whether or not Microsoft will unfairly use its clear advantage.
One measure of fairness will be how public Microsoft makes the Windows
95/Microsoft Network interface. If it is open so that third-parties can create
access tools using Windows 95 facilities that enable Microsoft Network access,
then there's little room for complaint. If not, then competitors will rightly
keep on hollering, and authors of books on undocumented interfaces can start
counting their royalty chickens before Windows 95 even hatches.
(C)
To champions of communication networks, virtual environments such as distance
learning and distributed businesses have proven to be winners. The courts,
however, don't agree--at least when it comes to virtual desegregation. In part
to achieve court-established desegregation goals while cutting down on the
costs associated with busing public-school students back and forth throughout
the city, the Kansas City, Missouri school district last year set up ShareNet,
a resource-sharing network designed to electronically link inner-city students
with their suburban counterparts. But the three-judge panel making up the 8th
U.S. Circuit Court of Appeals pulled the plug on the project, saying it
appeared to show little promise of meeting desegregation goals. 
Maybe not, but none of the court's ideas have worked either, at least by their
own yardstick of measuring achievement-test scores. I guess it's too much to
think that people in decision-making positions would be forward-thinking
enough to chance new solutions to thorny problems. 
(C)
I can see it now--high-tech bounty hunters scouring the net, clutching digital
wanted posters with thousands of dollars in reward money at stake. Microsoft
for one is ready to pony up $10,000 for information leading to the arrest,
conviction, and presumably execution of the fun-loving culprit who posted a
beta version of Windows 95 on an Internet site at Florida State University.
Likewise, DeScribe is pitching in $20,000 for information resulting in
conviction of the person who posted its word-processing software.
As with last fall's posting of the RC4 algorithm and other such anonymous
actions, irresponsible use of the networks can harm software vendors and
endanger Internet hosts. We're already seeing a host of host-liability
lawsuits, leading the National Law Journal to coin the term "cybertort."
Prodigy, for instance, is being sued by a Long Island investment bank, which
alleged libel and negligence due to postings that accused the bank of fraud
and criminal activity. As part of the settlement, the online provider has
agreed to help track down the person who posted the message. Then there's the
case of the former Proctor & Gamble employee who was awarded $15 million in a
defamation suit because of a derogatory statement posted on the company's
network.
Whether or not you agree that some of this anonymous network activity has been
irresponsible, you can't deny that anything putting smiles on lawyer's faces
(not to mention money in their pockets) has to have a downside. Think about
that next time you publicly flame on the network.
(C)
Internet domain site-registration requests coming in at the rate of 300 per
day are swamping Network Solutions, the organization that manages Internet
addresses. The usual ten-day registration wait now takes weeks. Muddying the
waters are spats over who owns the rights to what names. The tussles between
former MTV video jockey Adam Curry and the MTV cable network over who owns
"mtv.com" are well documented. Likewise, you can imagine the surprise of
McDonald's when the fast-food giant found that someone already owned
"ronald@macdonalds.com." Even telecommunication megacorps like MCI have been
scooped, as the company discovered when it attempted to register "mci.net" and
found that some prankster at rival Sprint already had.
This suggests that latter-day '49ers far-sighted enough to stake out dubious
domain-name claims before selling them back to the highest bidder have mined
the real gold on the Internet. 
Jonathan Ericksoneditor-in-chief









































LETTERS


Detecting the Pentium Math Bug


Dear DDJ,
Have you heard about the fdiv bug in the Intel Pentium chip that Thomas R.
Nicely, a mathematics professor at Lynchburg College, ran across? The problem
is that the Pentium produces faulty numbers in the ninth place to the right of
the decimal point and beyond. Nicely found it when checking double, triple,
and quadruple prime numbers--specifically in analyzing properties of twin
prime numbers. Although Intel says the average user should only run into this
problem once every 27,000 years, they're still offering to replace buggy chips
if you call 800-628-8686 and demand new ones. Example 1(a) is a Basic and
Example 1(b) a C program that test whether or not a computer is using a chip
with this bug.
Harry J. Smith
Saratoga, California
DDJ responds: Thanks Harry. Yes, we heard about the fdiv problem. We also
heard why Intel didn't name the "Pentium" the "586." (When they added 486+100
on the first Pentium, they got 585.999983605.)


Greek to Me


Dear DDJ,
Although well versed in a number of computer languages, I'm not up on what
appears to be the Latin in Jonathan Erickson's January 1995 editorial--"Cursor
sine termino." What the heck is that all about?
Zac Davis
Austin, Texas
DDJ responds: Yep, it's Latin, and it literally means "a runner with no
limits," the formal way of saying "running light without overbyte." Thanks to
our staff Latin expert (the publisher's mother).


BBS Update


Dear DDJ,
In the article, "Building an E-Mail Manager" (DDJ, December 1994), Michael
Floyd praised Qmodem from Mustang Software, as do I and many people I know.
However, readers should note that Mustang Software Inc. is known for its
Wildcat! BBS package, not its Mustang BBS, as the article states.
As a Wildcat! BBS System Operator, I wanted to point this out. Please keep the
quality articles coming. Maybe even do a story about the new programming
language that is included with the latest versions of Wildcat! (4.0 and 4.01).
Dave Noice
Columbus, Ohio 
DDJ responds: Thanks Dave. We thought fellow readers might like to know more
about your BBS. Super-Port is a BBS Service Bureau, providing BBS services to
individuals, associations, and businesses. The only BBS Super-Port runs for
itself is for customer demonstrations (Wildcat!). To arrange for
demonstrations, or to request information packets, call Super-Port at
614-385-2003. You can also e-mail requests to: dave.noice@commport.org;
include your name, address, phone numbers (voice and FAX) and e-mail address.
Dave uses Wildcat! BBS software for most of the system, but has the ability
and expertise to use any commercial BBS package currently available. He
provides complete BBS packages (turnkey systems), including hardware,
software, custom display screens, menu design, file and message areas, and
conferences. 


More Mind and Life 


Dear DDJ,
With regards to Homer Tilton's letter, "Mind and Life" (DDJ, December 1994),
I'd like to take issue with his ad hominem attack on those who feel that
mechanistic models of human thought are inadequate. One could argue just as
convincingly that proponents of such models find them comforting because they
can understand them without having to think too hard.
In any case, both he and Michael Swaine seem to have missed part of the point
of Rodger Penrose's The Emperor's New Mind (and the recently published Shadows
of the Mind). Penrose is not arguing from quantum uncertainty (being one of
Britain's leading mathematical physicists, I suspect that he is as familiar
with the literature as Mr. Tilton), but rather from the noncomputability of
the deterministic quantum laws. He is not a mystic (as he takes some pains to
explain in Shadows), but simply believes that there are aspects of human
awareness and understanding (such as the proofs of Gdel's Incompleteness
Theorems) that transcend what a Turing Machine can accomplish and that such
transcendence is mediated by a physical phenomenon whose workings further
research can bring to light.
There is great comfort in thinking that we know everything about a particular
area of knowledge, but we should recall the feeling prevalent in the late 19th
century that "everything has been discovered and it only remains to work the
laws out to a few more decimal places." History teaches us that shortly
afterwards, Planck, Einstein, and Dirac revolutionized physics, while Gdel,
Turing, and Church demolished David Hilbert's ambitious program to mechanize
all of mathematical reasoning. Penrose may not be right (as he freely admits)
but he is at least open to the possibility that we do not yet have all the
answers--or even all the questions. I personally find this attitude to be more
intellectually stimulating than wandering around with a mechanistic hammer
treating the world as the unfortunate metaphorical nail.
Richard Wesley
Seattle, Washington


More Emoticons 


Dear DDJ,
In his December 1994 "Swaine's Flames," Michael Swaine asked us to guess what
two people were represented by: 
&8-) 7 and (:-\ L
I may be wrong, but this looks suspiciously like the occupants of 1600
Pennsylvania. The &8-) 7 is Hillary with a bow in her hair. The (:-\ L is Bill
frowning about Newt as the new Speaker of House. 
I hereby request my 15 minutes of fame.
Vance Rigg 
Newport Beach, California
How about Penn & Teller?

P.S. What does the "CyberSpeak Pronunciation Guide" say about the
pronunciation of "emoticon"? Or is it one of those words to be seen and not
heard? No, no. I've got it! Siskel and Ebert! I could tell from the bald pate
and the glasses_.
Kerry Burton
kerrykjb1@aol.com
My answers for who those emoticons represent are: Somebody bald & somebody
with hair.
Kevin Haskel Rubin 
gnome@teleport.com
Is it Siskel and Ebert?
Jerry Chadwick 
jerry_chadwick@Novell.COM
Yo, Mike:
My guess for the smiley contest: Penn and Teller. (The top smiley is Penn.) By
an amazing coincidence, I'm now working on a book of Teller's collected
quotations. (I've already finished a similar work on Harpo Marx.)
Mark Gingrich 
st190022@s1.csuhayward.edu
*** This letter was assembled from 100% recycled bits. ***
&8-)7 (:-\L
Looks like Laurel and Hardy to me.
Scott Redding
sreddin@apg-9.apg.army.mil
It looks like Siskel and Ebert, thumbs up and thumbs down!
Todd Hale
thale@novell.com
Unofficially speaking __.--. 3-D waterskiing!
 () () () ()
 /[]\ /[]\ /[]\ /[]\
 .-##..___.##--..__##.---..##_..--
 .-//___..//.___.//-.___//--.
 .---.___.---.___.---.___.---.__
 .---.__.---.__.---.__.---.__.---
Siskel & Ebert:
&8-) 7 is Siskel (or, the fat one_I forget which one is which)
(:-\ L is Ebert (or, the skinny one_I forget which one is which)
Gregg Cooke
gcooke@rt66.com
(:-\L = Stan Laurel
&8-)7 = Oliver Hardy
Denis Blodgett
blodgett@monet.vill.edu
Siskel and Ebert. But they only get quoted when they agree.
John Maxfield
73523.736@compuserve.com 
The emoticons represent Laurel & Hardy. Thank you. 
Jay Joiner
70263.1054@compuserve.com
Laurel and Hardy perhaps? :-}
N N E E E E -------------------------------------
N N N E N E E D H A M ' S E L E C T R O N I C S
N N N E E E Device Programmers
N N N E -------------------------------------
N N E E E E
Eric Cox 
needhams@crl.com
&8-)7 + (:-\L == Oliver Hardy + Stan Laurel
tamortir@cris.com
Unknown @ Cogitate Inc.
My guess is &8-) 7 is Bill Gates, and (:-\ L is/are his lawyer(s). Do I win
the prize?
Ed Remmell
eremmell@Internet.cnmw.com 
Hi Mike,
I have always enjoy your flamorous editorial/commentary article, even though I
sometime don't understand all of the fancy terms. On your emoticons. I could
guess only one person:

&8-) 7 is Gates,
Who is (:-\ L ? P. Kahn of Borland?
Thanks & how about
 O O
 .^. __,
 \\ //
 \//
na...
Lan Tran
LTRAN@CSTP.UMKC.EDU
Example 1: Detecting the Pentium math bug.
(a)
10 DEFDBL A-Z: COLOR 14, 1: CLS : PRINT 'Yellow on Blue
20 PRINT "FDIV - This program checks for an error > 10^-15 when using"
30 PRINT "fdiv. When detected it warns you that you probably have the"
40 PRINT "Pentium fdiv bug."
50 PRINT
60 PRINT "GWBASIC Version 1.0, last revised: 1994-11-26"
70 PRINT "Copyright (c) 1994 by author: Harry J. Smith,"
80 PRINT "19628 Via Monte Dr., Saratoga, CA 95070."
90 PRINT
100 FOR i% = -3 TO 3
110 in = 824633702449# + i%
120 ou = (1# / in) * in
130 IF ABS(ou - 1#) <= .000000000000001# THEN GOTO 150
140 PRINT "You have the Pentium bug"
150 PRINT in; "produced an error of"; ABS(ou - 1#)
160 IF ou <> 1# THEN PRINT ou; "<> 1#"
170 NEXT i%: PRINT
180 PRINT "This program is a modification of a C program I found on the"
190 PRINT "Internet in the newsgroup comp.sys.intel, in a message posted by"
200 PRINT "Bill Broadley Broadley@math.ucdavis.edu UCD Math Sys-Admin"
210 'If you get the message "1 <> 1#" without, the message "You have the
220 'Pentium bug" the program may be using floating point emulation and not
230 'checking the CPU fdiv instruction. This is the case for the GWBASIC
240 'interpreter. It is better if you compile this program to FDIV.EXE
250 'before running.

(b)

#include <stdio.h>
#include <math.h>

#define C 824633702449.0


double
test (double x)
{
 return ((1.0 / x) * x);
}
void main ()
{
 double delta = 1e-15;
 volatile double in1, in2, in3, out1, out2, out3;
 in1 = C - 1; out1 = test (in1);
 in2 = C; out2 = test (in2);
 in3 = C + 1; out3 = test (in3);
 printf ("Program checks for an error > %e when using fdiv\n", delta);

 printf ("When detected it warns you that you probably have the \n");
 printf ("Pentium fdiv bug.\n\n");
 if (fabs (out1 - 1.0) > delta)
 printf ("You have the pentium bug\n");
 printf ("%lf produced an error of %e\n\n", in1, fabs (out1 - 1.0));
 if (fabs (out2 - 1.0) > delta)
 printf ("You have the pentium bug\n");
 printf ("%lf produced an error of %e\n\n", in2, fabs (out2 - 1.0));
 if (fabs (out3 - 1.0) > delta)
 printf ("You have the pentium bug\n");
 printf ("%lf produced an error of %e\n", in3, fabs (out3 - 1.0));
}



















































Distributed Computing and the OSF/DCE


Building client/server apps for distributed systems




John Bloomer


John is a staff scientist at GE's corporate R&D center, working on distributed
multimedia and video applications. His publications and patents cover the
areas of distributed computing, medical imaging, information theory, and
neural nets. He is author of Power Programming with RPC, published by O'Reilly
& Associates.


Distributed computing typically involves two or more computers on a network,
computing in cooperation and sharing resources ranging from CPU cycles to
databases. While distributed systems provide end users advantages such as
greater reliability and flexibility (as compared to centralized environments),
developers are faced with a number of hurdles when creating distributed
applications. These challenges include:
Resource location. Interested applications must be able to locate and
discriminate between network resources.
Managing data consistency. Multiple copies of the same data must often be
shared for reading and writing across a network.
Synchronizing systems. The clocks across a distributed environment must be
kept in sync to ensure each machine has the same view of the network. Without
this, tasks such as file sharing, backup, or policing security become
impossible.
Managing security. In a decentralized system, user and resource identities are
more complicated to manage. Separate tools for authentication, data integrity,
and access control, once embedded in centralized systems, are now required for
application development.
The Open Software Foundation's (OSF) Distributed Computing Environment (DCE)
is an integrated suite of tools and services that support the development of
distributed applications. DCE provides interoperability and portability across
heterogeneous platforms across LANs and WANs. One global namespace makes the
resources across interconnected LANs and WANs look like a hierarchical file
system (X.500 or DNS) through the directory-services API. A user on the
distributed system can share resources (data, services, or whatever) by
finding them or placing a mention of them in the namespace. 
In addition, DCE Release 1.1, available for UNIX, MVS, Windows, HP-UX, Alpha
NT, VMS, and OS/2, provides a consolidated interface for system administration
throughout DCE, plus remote startup and shutdown of remote services. It also
provides a generic security-service API (GSSAPI) which allows non-RPC-based
systems to take advantage of DCE security, extended registry attributes
allowing various proprietary systems to be registered in the DCE security
registry, and security-delegation and auditing capabilities. Release 1.1 also
supports internationalization, including standardized POSIX and X/Open
interfaces which provide character-code interoperability. 
DCE makes possible client/server architectures and data sharing using a remote
procedure call (RPC) paradigm. In a client/server model, anyone (a client)
interested in a particular resource on a network (a server) uses a formally
defined application protocol to find and request a service. The server sends a
reply, completing the "request-reply cycle."
Typically, servers are continuously running daemons that specialize in
performing a few functions or services. The nature of the client/server
communication is often synchronous, with the client waiting for the server to
send a response. This is not necessary, though. Application protocols may also
be designed to support asynchrony if necessary. 


OSF/DCE Elements


As Figure 1 illustrates, DCE resides between distributed applications and the
operating system, network transports, and protocols. It isolates the
programmer from low-level, platform-specific nuances when communicating across
a network between heterogeneous machines.
Threads are the most fundamental component specified by DCE. DCE needs to be
able to run multiple threads of execution simultaneously on the involved
machines to facilitate things like asynchronous I/O and concurrent servicing.
Since many operating systems do not inherently support multithreaded
execution, a user-level (nonkernel) threading package is included with DCE and
is compliant with POSIX 1003.4a, Draft 4. This package, known as "DCE
Threads," gives users the ability to create, schedule, synchronize, and
otherwise manage multiple threads in a single process. 
Often, clients and servers are first implemented as single-threaded processes.
As the application evolves, it may require that multiple RPCs be placed from
one client at the same time. Servers may have to field many requests at once,
or applications may need to maintain a live user interface while placing RPCs.
To achieve this, you'll need to split processing at the local machine into
threads--to allow one thread, for example, to remain blocked, waiting on I/O,
while another thread executes.
Use of communication and synchronization agents between threads is crucial to
multithreading. DCE Threads provides mutual exclusion objects (mutexes) to
limit access to associated resources to one thread at a time. Resources can be
locked and unlocked to be shared among threads. Condition variables provide a
mechanism for threads to be notified when another thread has completed some
task or access. A thread can effectively wait for another thread to signal it
and for a specified condition to be met before continuing. A joining mechanism
allows one thread to wait for another to complete. Combined with the DCE
thread-scheduling tools and exception-handling API, an elaborate multitasking
system can be quickly assembled. 
The network data representation (NDR) in DCE specifies a standard,
architecture-independent format for data, to facilitate data transfers between
heterogeneous architectures. This serial encode/decode scheme insulates
application code from the differences in data types, enabling application
portability and interoperability. NDR uses the receiver-makes-right paradigm,
making it the receiver's job to convert the data into the form required of
that architecture. Compared to the more common single-canonic approach used in
Open Network Computing (ONC) RPC, the receiver-makes-right scheme distributes
the translation computation load while scaling overall load with
heterogeneity.
DCE RPC is layered on top of DCE Threads. All DCE services are based on RPC.
RPC calls are function calls outside of your address space that look and feel
like local procedure calls. DCE RPC specifies its own network-access
mechanisms, different from the ONC RPC on which systems like NFS are based.
For example, a local database query might look like Example 1(a). A remote
database query to a potentially remote database would be similar to Example
1(b). The parameter someRemoteBank may be optional, depending on whether you
want a specific bank (service) or one that is automatically searched-out and
used at run time according to some externally specified criteria.
A protocol or interface compiler is used to generate stubs (containing
low-level network-interface code) from a textual definition that describes the
application protocol between clients and servers; see Figure 2. The stubs
generated by the compiler allow a network client to execute a procedure on a
remote host as if it were a local procedure call in its own address space. The
stubs include numerous calls to the services in Figure 1, and they handle
packaging and encoding/decoding of procedure parameters (both outward and
inward bound). This packaging of parameters is called "marshaling." 


DCE Services and Utilities 


As we move away from centralized systems, we need tools to locate resources
(data, services, and the like) on the network, keep systems synchronized, and
provide network security. Consequently, DCE provides a directory service,
distributed time service, and security service, which are layered on top of
the basic mechanisms.
The directory service is used on a network to store and retrieve information
about distributed resources--users, machines, and services--with specific
attributes like location of a home directory or the host a service is running
on. Integral to the directory service is the concept of a "cell." A DCE cell
is a group of machines running a minimum set of DCE services and sharing a
common network. For DCE applications to function, either a directory,
security, and distributed time service must be running on a machine in the
cell.
Additionally, machines in a cell can be running any combination of other DCE
services, including GDS, DFS, and diskless support, and may require clerk
daemon processes to cache and otherwise forward requests for standard DCE
services. Basically, a cell is the smallest organization of machines on which
you can perform DCE computing. Vended DCE software and development
environments (layered products) typically ship in per-machine or per-cell
measures. Linkages across cells are provided via the directory services'
global naming tools. Figure 3 illustrates the connectivity of DCE service
components. The cell directory service (CDS) manages naming and directory
services for a cell and is first to be consulted for resource location. Should
the resource not be local to the cell, CDS must consult other connected cells.
If the DCE global directory service (GDS, an X.500 implementation) or
domain-naming service (DNS) is installed, the global directory agent (GDA) is
directed to consult with them to determine in which known foreign cell the
resource is listed. A DCE application can use the X/Open directory service
(XDS) API standard to access the DCE directory-service library. The XDS
library can determine from the format of the name to be looked up whether to
direct the look up to CDS or GDS, making your source code independent of
service location.
A CDS namespace server for a cell stores names and other information in
databases called "clearinghouses." These databases have a hierarchical
structure similar to a file system, with nodes that can be object entries
(network resources), soft links to other entries in the namespace, or child
pointers to subordinate directories. 
Database nodes can be master or read-only replicas. Read-only replicas are
updated from master replicas either through immediate update requests or a
"skulking" process. The latter facilitates the update of replicated entries of
databases whose server may have been unavailable when the master was first
altered and changes propagated. CDS servers skulk themselves, thereby updating
themselves from associated master nodes--either on demand, when other
management activity requires, or automatically in the background. Master
entries can be read/write accessed if the user meets prescribed security and
access-control constraints. 
Client applications in search of resources actually send their lookup requests
through local "clerk" processes. As Figure 4 shows, a CDS clerk first checks
its cache of transactions that have not grown stale to see if naming
information can be returned immediately. If no cache entry matches, the
request is forwarded to the cell CDS server. In the third and fourth steps,
the CDS server looks through any clearinghouse databases to look for the
requested information. For step five, the closest possible match to the
requested information is returned. This could range from nothing to the actual
object entry with all the necessary information for the client to locate and
bind to a server. It might be a pointer to another CDS server to query, for
example, telling the clerk that all print objects or services are located in
another cell, thereby linking cell namespaces. The clerk continues the
strategy prescribed to it to find the requested resource. The clerk finally
returns the requested data to the client and caches any useful information.
All this is transparent to the client.
You can find the CDS daemon (service) cdsd, the GDA daemon gdad, and the GDS
daemon gdsd running on a cell CDS master host. Each machine's cdsadv process
proactively looks for networked CDS servers and receives unsolicited broadcast
advertisements from them to build cell-to-cell linkages. cdsadv also spawns
any necessary CDS clerk services (cdsclerk) for caching, and so on. The cdscp
is a control program acting as a client interface to cdsd. It is used by the
cell administrator to manage (add, delete, and so on) namespace entries.
(cdsbrowser is a handy Motif application that is also a cdsd client.)
The security and time services are vital to the existence of a cell. The DCE
security-service daemon secd provides a way for clients and servers to prove
their identities. Identities (users, servers, and computers) known to the
security service are called "principals." Entries for entities are stored in a
database called the "registry." It contains group, organization, account, and
administrative-policy information in addition to principal definitions. A
separate registry service facilitates slave replicas of the registry to
increase availability and aid in the management of user and group information.
The rgy_edit command provides the cell or security-administrator principal
with a way to manage the information in the registry. This information is
usually derived from /etc files using other DCE setup utilities. The security
service includes an authentication and privilege service in the form of
libraries that call security services.
The authentication service provides a trustworthy principal-identification
scheme. DCE system users log into principal accounts with the dce_login
password-checking utility. A secret key shared with the authentication system
is generated. Credentials that have finite lifetimes and identities must be
periodically reverified via the authentication service. To reduce the chance
of tampering, DCE uses an extended version of the Kerberos shared
secret-key-authentication encryption scheme. The privilege service uses the
verified principal's identity to see to which groups the user belongs.
Privilege-attributes certificates (PACs) establish the rights of DCE
principals to access networked services or resources DCE calls its
"networked-resources objects," with a unique identification provided for each.
Objects have methods equivalent to procedures, comprising services. The
security service also includes an extensive access control list (ACL)
environment. Access rights to network resources are determined by using the
principal's identity and group membership to consult a list associated with
that resource. ACLs can be managed with the acl_edit utility. Most standard
DCE services use ACLs. cdsd, for example, uses ACLs to enforce read/write
access on the clearinghouse. ACL interaction at a server is managed through a
boilerplate code module known in DCE lingo as the "ACL Manager." 
The DCE distributed time service (DTS) is essential to maintaining time stamps
of data throughout the DCE system, such as service-database updates,
credential checks, and file-system management. The dtsd daemon runs on each
DCE machine. Most are configured as clerks, responsible for retrieving current
time and adjusting the local clock. Those configured as servers are
responsible for synchronizing time amongst themselves as well as performing
clerk tasks. A master on the cell may derive the de facto notion of time for
that cell as derived from its own clock or external sources such as an NNTP
daemon or external clock source. The dtscp control program is a client
interface to dtsd, allowing the administrator to configure and manage DTS or
change the nature of the background updates taking place across a cell.


Threads Package


An RPC inherits its synchronous behavior from the local procedure calls within
the stubs. Unless asynchronous programming is used to facilitate multiple
concurrent threads of execution or nonblocking I/O, a server is capable of
servicing only one request at a time and a client is blocked while waiting for
the server to return a reply. Figure 5 illustrates the steps behind a
synchronous RPC.
Asynchronous programming tools native to most versions of UNIX can provide
nonblocking (remote) procedure calls: forking, multithreading, or lightweight
processes; asynchronous, nonblocking reads/writes through I/O control calls;
or event-driven programming such as X11, signals, timers, and the like. 

All of the DCE RPC libraries and services are thread-safe or reentrant, making
it possible for multiple threads of execution to be accessing a piece of code
at a given time. For application programmers, this means any resources shared
between threads of execution must be independently managed or owned by a
thread. Synchronization and locking primitives exist to make sequencing of
tasks and sharing of resources possible. The DCE Threads (pthreads) package
implements Draft 4 of the POSIX 1003.4a standard, plus some additional _np
suffix (not portable) routines. Note that linked libraries must also be
reentrant or "thread safe."
The four states a thread can have are: waiting, ready, running, or actually
executing and terminated. In Figure 6, for example, threads A--Z may be vying
for CPU time, as moderated by the priority-driven, preemptive scheduling
algorithm you've selected. When thread A executes, it becomes I/O blocked and
yields control to thread Z, and gets marked as waiting until the I/O condition
clears. When thread Z executes, it gets preempted by thread B, possibly
because a time-sharing scheme was specified. Several types of scheduling are
available within pthreads: first-in, first-out, round-robin, and three types
of time-slicing across all priorities.


The Mechanism Behind an RPC


Figure 7 outlines the steps behind performing RPCs. Notice that the client and
server processes perform all their network communication--RPCs to directory or
endpoint servers--through another entity called the "RPC run time." Before an
RPC can be performed, a client must get a service address and other
information necessary to "bind" the client and server together. You can do
this explicitly in your client program, implicitly, or automatically delegate
this responsibility to the run-time libraries. Binding information at a client
resides behind a "binding handle" and includes: a protocol sequence specifying
the network; transport and RPC protocol to use; network-address information
including the host name and endpoint on that host at which a service is
listening; transfer syntax (really a nonissue here); and version number of the
client/server RPC interface. 
Step 1: The server registers itself with the system. This includes exporting
some of the information necessary for building the client/server communication
channel or binding into the local namespace. Since it is often done as a
supervisory function without a specific instance of a service up and running
on a host, only part of the information necessary for binding is exported.
Essentially, a mapping between an interface specification and a server host
name with acceptable protocol sequences is passed to the cell-namespace
service. This service listing is available to all hosts in that cell and other
cells that can access it through global directory services. Protocol sequences
will be discussed in detail later. To complete the client/server binding,
specific endpoint information is necessary. On startup, a server must register
the endpoints it will use with the local endpoint-mapping service, rpcd. Today
these take the form of UDP or TCP port numbers. In the case where servers wish
to use well-known endpoints, these are established typically in the interface
definition, and rpcd never gets involved. The CDS can be found without
consulting any other directory service as it runs on a predefined host. The
endpoint-mapping services for each machine run at well-known ports.
Step 2: The client consults the directory service. The cell-directory service
attempts to match service interfaces registered with it (or peers it can
contact) with those asked for by the client. Interfaces are registered by
universal unique identifiers (UUIDs). The directory service returns, one at a
time as demanded, matching interfaces that meet version-compatibility
criteria. When successful, the client imports the part of the binding
information necessary including at least the server's host name.
Step 3: Client requests a specific service procedure. With only partial
binding information, the first RPC is directed to the server host's endpoint
mapping service. From there the binding information is completed with port
information added, and the call is passed on to the target service. The server
replies if possible. 
Step 4: Subsequent RPCs from the client are placed directly with the server as
the binding is complete. 
The RPC run-time API and stubs isolate the client and server development
process from the nitty-gritty details of the DCE service and utility APIs.
This helps reduce volumes of function calls to a manageable number of rpc_
function calls with a somewhat reduced flexibility. As Figure 8 shows, your
application code will depend heavily on the code generated into the stub, as
well as the run-time API. Most applications will require only a few RPC
run-time calls, and often no calls directly to the DCE services and utilities.


Developing a Distributed Application


To illustrate how you develop a typical distributed application, I've taken an
image-database management application from a single process to a network form.
Listing One is im.c, the flat-file, single-process implementation of the
database, while Listings Two and Three are rim_client.c and rim_server.c,
respectively, the distributed client/server implementation of the database.
The entire source code (including support files) for both the local and
client/server distributed versions is available electronically; see
"Availability," page 3. Noting the differences between the single-process and
distributed versions of this database make it clear that developing a
distributed application can be a complicated process, and walking step-by-step
through the development process is beyond the scope of this article.
Consequently, I'll provide an overview of the steps required to move from a
local to a distributed application. For more information, I recommend the
works listed at the end of this article.
Figure 9 illustrates how to develop DCE applications. Figure 10 and Table 1
list the files you'll author or generate while developing the client and
server parts of the application. The files you are responsible for developing
are highlighted ovals. 
You start by developing the protocol-specification file appn.idl (where appn
is the application name). uuidgen -i > appn.idl is run once to start things
off, generating a UUID by which the interface will be known. (The
attribute-control file or appn.acf is an optional way to alter the behavior of
the stubs produced.) After running the protocol compiler with a command such
as idl appn.idl to produce the header file and stubs, you proceed to develop
the client and server functions. On the client side this may be solely a main
procedure with a user interface to the remote procedures. At the server, you
must not only codify the procedures to be executed (the service "manager"
code), but also develop a main that initializes the server when first invoked.
You then compile and link your client and server code with the associated
stubs to create client and server executables. 
Code-generation technology like this is sensitive to source-code automation
and management. Use of make and a source-code management system such as SCCS
is advised for even modest-sized projects. Take note that the default
client/server header filename is appn.h. You will have to use another header
filename to isolate your own generic definitions and prototypes. I'll use
appn_util.h here. 


Debugging a Distributed Application


Distributed-application debugging can be very challenging. What you have in
your favor is the similarity between remote and local procedure-call models.
It's extremely productive to first link the service procedures directly with
the client side of your application, as shown in Figure 11. By sidestepping
the network and RPC calls, you can expose and debug parameter passing and
overall functionality before distributing the application. It may be necessary
to use preprocessor directives in your client and server code to make this
linking possible.
Once local debugging is complete and the functionality of the client and
server has been fleshed out, you can run the client and server applications in
separate debuggers, each in its own process. Be sure to use a thread-aware
debugger or inhibit threading at the client and server. It is then that
additional violations to the protocol prescribed by your .idl file are found.
Common protocol-programming mistakes include:
Strings without NULL terminators.
Linked lists or trees without NULL next pointers.
NULL reference pointers.
Vectors or arrays that are longer than the specified maximum.
Volatile variables in service procedures or poor memory management.


The Image Database Application


The local database application (Listing One) makes no major assumptions about
operating-system, C, or system-support libraries. It basically offers a way to
add, extract, delete, and list entries in a database designed for imagery,
organized as a flat file with organization embedded as headers for each entry.
It provides local users with a repository with which to share images, thereby
conserving disk space and allowing version management. Since it retrieves and
records user identity (thereby requiring a notion of system user ID), it
cannot enforce any access control or security measures. It can only access
database files in a file system accessible to all interested users for reading
and writing.
You might argue that if the necessary machines were networked and a package
like PC-NFS or OSF-DFS installed, common mount points would make image
databases accessible across the network. But what if all machines aren't
sharing the same file system, or they (as is common) have different notions of
mount points, or user identity is not maintained or consistent across the
network? What if you want to give remote dial-in users access without bringing
up a shared file system? Most importantly, what if one machine is to be
dedicated to serving image archive requests, perhaps because it has optical
drives or because it has the horsepower needed to compress and decompress
images? For these reasons and more, it is important to think about how you
would develop a truly distributed version of this application. 
For obvious reasons, you'll want to establish an interface to this database
that's accessible across the network as a service. This will allow you to
craft different clients to achieve different purposes, all sharing the same
data through the same consistent interface. A two-tiered information system
results. Tools such as Visual Basic, Visual C++, and PowerBuilder are good for
crafting GUIs that access commercial databases to form two-tiered systems.
Nontrivial database applications warrant adding an additional layer of
services or proxies between the clients and the data or other resources being
shared. A middle layer isolates the reusable low-level routines used by
different types of clients as separate services. This three-tiered strategy
allows clients to keep network interactions at an abstract, potentially
database-independent level, thereby concentrating on the user interface and on
unique client-processing needs. 
Several questions regarding client/server partitioning must be addressed
before writing the local-procedure calling code:
Is there a functional client/server partition? It's probably not as easy as
putting main() at the client and all function calls at the server. It is true
that modular code is the easiest to distribute. If it's not modular, bite the
bullet and reorganize it. In a similar fashion, object-oriented applications
can extend interfaces across the network, moving all or some of the objects
methods into remote services.
Is there a data-driven client/server partitioning scheme? Just because clean
function-module or object-interface boundaries exist does not mean that it is
a good client/server partitioning. If large amounts of data (as compared to
computation) are passed on the stack or shared globally, the partitioning will
cause burdensome network I/O. You may need to rework module or object
partitions to reduce request-and-reply passing overhead.
Is there extensive use of global variables or objects? With RPCs, all
variables get to the server via request messages. Two strategies exist for
programming around global variables: Encapsulate them as another outgoing
request argument for each procedure, or identify an additional remote
procedure whose purpose is to share global parameters between client and
server as they change. You may need RPCs in both directions (client to server
and server to client). Keep in mind that memory-address spaces are different
between client and server.
Does the current application use communication schemes other than procedure
calling? If so, you may have to craft RPC replacements to the semaphores,
signaling, or shared-memory mechanisms used, as these support interprocess
communication on the local host. Semaphores, signaling and shared memory can
still be used if contained within the client or the server.
All GUI calls may need to be isolated into a single thread if a thread-safe
GUI is not used.
In addition, questions related to the RPC system that you need to ask include:
What security measures are needed? Standard OS-user, password, and group
credentials may not facilitate or guarantee client and server authenticity
across a network. You may require authentication credentials that are more
difficult to falsify. 
What access-control policy is required? Security schemes only validate or
invalidate the identity of the client and/or server, or secure the channel.
With the service made available to everyone on the network, you must decide
what type of access protocol you will have. Before you start coding, decide
what you will do if insufficient authentication occurs. Often some functions,
like listing an archive, can be made available to all. Some functions might be
restricted to certain users, whose identity must be validated. 
Do you have nonidempotent procedures (procedures that may not be called more
than once, without changing the state of the server or response to the
client)? Such procedures require either a reliable transport or the use of
request/reply queuing on an unreliable transport. Unadulterated UDP
transmissions are unreliable and may never get to the server--or they may get
there multiple times. TCP requests are reliable but incur more overhead. When
porting an application to RPC, it's possible to add code that makes behavior
more robust in the face of network problems. If you're developing an RPC
application from scratch, design robustness in from the start. Another option,
requiring less user code, is to use a transaction-processing monitor like
Transarc's Encina.
What about error reporting and recovery? As in the local procedure-call
application, you need a well-defined error-reporting scheme. The DCE RPC
library provides error reporting at the RPC and communication-protocol levels.
But you must now address new areas: retaining state and recovering from
crashes or dropped connections.
Should your client or server retain state? This influences the nature of the
connection that clients should make with servers and how losses of context
should be recovered from. 
Should your client or server be able to recover from a crash? Often you need
more than a reliable transport to recover from a crash. Without some
nonvolatile record of where the application was when the machine died (as
available from a disk file or another client/server), the application might
come up in a bad state. 
What should you do about dropped connections? If the client or server
connection is severed during an RPC, should you attempt to automatically
reconnect, just stop, or look for another connection?


Conclusion



The DCE system and API is broad, deep, and potentially intimidating.
Nonetheless, distributed computing will likely shape the future of computing
in the coming years, and programmers will need to come to grips with this
complexity.


References


Bloomer, John. Power Programming With RPC. Sebastopol, CA: O'Reilly &
Associates, 1992.
Borghoff, L.M. Distributed File/Operating Systems. Berlin: Springer-Verlag,
1992. 
Corbin, John. The Art of Distributed Applications. Berlin: Springer-Verlag,
1991.
Lyons, Tom. Network Computing System Tutorial. Englewood Cliffs, NJ:
Prentice-Hall, 1990.
Rosenberry, Ward. Understanding DCE. Sebastopol, CA: O'Reilly & Associates,
1992.
Shirley, John. Developing Distributed Applications with DCE. Sebastopol, CA:
O'Reilly & Associates, 1992.
Stevens, W. Richard. Advanced UNIX Network Programming. Englewood Cliffs, NJ:
Prentice-Hall, 1992.
Stevens, W. Richard. UNIX Network Programming. Englewood Cliffs, NJ:
Prentice-Hall, 1990.
Example 1: (a) Local database query; (b) remote database query.
(a)
 bucks = getAcctBalance(acctName);
(b)
 bucks = getAcctBalance(acctName, someRemoteBank);
Figure 1 The elements of DCE.
Figure 2 RPCs look and feel like local calls.
Figure 3 Connectivity of DCE directory-services components.
Figure 4 Under the hood of a CDS lookup.
Figure 5 Flow of control during a synchronous RPC.
Figure 6 States of a thread.
Figure 7 The steps in binding a sequence of RPCs.
Figure 8 The relationship between DCE application, stub, run-time, and
service/utility operations.
Figure 9 RPC distributed-application development steps.
Figure 10 DCE RPC client and server development steps.
Figure 11 Linking around RPC calls lets you debug in a single process first,
thereby speeding development.
Table 1: Files for DCE RPC client and server development.
File You Develop Purpose 
appn.idl Interface-description file
appn.acf Attribute-control file (optional)
appn_client.c Client functions, including main()
appn_server.c Server functions (manager) and initialization
Produced by Protocol Compiler
appn.h Client/server header file
appn_cstub.c Client stub
appn_sstub.c Server stub
appn_caux.c Client auxiliary functions (optional)
appn_saux.c Server auxiliary functions (optional)
Target Executables
appn_server Server
appn_client Client

Listing One 

#include <stdio.h>
#include <string.h>
#include <pwd.h>
#include "im.h"
#define USAGE() { fprintf(stderr, "Usage: %s ", argv[0]); \
 fprintf(stderr, "\t-a imageName \"comments\" width height depth 
 compressType"); \
 fprintf(stderr, "\n\t\t\t\t\tadd an image from file 'imageName'\n"); \
 fprintf(stderr, "\t\t-d imageName\t\tdelete an image\n"); \

 fprintf(stderr, "\t\t-x imageName\t\textract an image to file 
 'imageName'\n"); \
 fprintf(stderr, "\t\t-l\t\t\tlist contents of archive\n"); \
 exit(1); }
#define PRINTHEAD(pI) { \
 printf("name:\t%s\n\towner: %s\n\tcomments: %s\n\tdate: %s\n", \
 pI->sN, pI->sO, pI->sC, pI->sD); \
 printf("\tbytes: %d\twidth: %d\theight: %d\tdepth: %d\tcompress: %d\n", \
 pI->b, pI->x, pI->y, pI->d, pI->c); }
image *readImage();
FILE *fp;
main(argc, argv)
 int argc;
 char *argv[];
{
 pStr expectEmpty; /* a NULL if success, else an error string */
 imageList *pIL;
 image *pI;
 pStr sImageName;
 int arg;
 /* Parse the command line, doing local procedure calls as requested. */
 if (argc < 2) {
 USAGE();
 exit();
 }
 for (arg = 1; arg < argc; arg++) {
 if (argv[arg][0] != '-')
 USAGE();
 switch (argv[arg][1]) {
 case 't':
 arg++;
 break;
 case 'a':
 if ((argc - (++arg) < 6) !(pI = readImage(argv, &arg)))
 USAGE();
 expectEmpty = add(pI);
 if (expectEmpty[0] != '\0')
 fprintf(stderr, "local call failed: %s", expectEmpty);
 break;
 case 'd':
 if (argc - (++arg) < 1)
 USAGE();
 sImageName = (pStr) strdup(argv[arg]);
 expectEmpty = delete(sImageName);
 if (expectEmpty[0] != '\0')
 fprintf(stderr, "local call failed: %s", expectEmpty);
 break;
 case 'x':
 if (argc - (++arg) < 1)
 USAGE();
 sImageName = (pStr) strdup(argv[arg]);
 expectEmpty = extract(sImageName, &pI);
 if (expectEmpty[0] != '\0')
 fprintf(stderr, "local call failed: %s", expectEmpty);
 else
 (void) writeImage(pI, sImageName);
 break;
 case 'l':{
 if (!(pIL = list()))

 fprintf(stderr, "local call failed:");
 else
 for (pI = pIL->pImage; pIL->pNext; pIL = pIL->pNext, pI = pIL->pImage)
 PRINTHEAD(pI);
 break;
 }
 default:
 USAGE();
 }
 }
}
image *
readImage(argv, pArg)
 char **argv;
 int *pArg;
{
 static image im;
 char buffer[MAXBUF];
 char null = '\0';
 u_int reallyRead;
 u_int imageSize = 0;

 /* Build the header information then look at stdin for data. */
 im.sN = (pStr) strdup(argv[*pArg]);
 im.sO = UIDTONAME(getuid());
 im.sC = (pStr) strdup(argv[++*pArg]);
 im.x = atoi(argv[++*pArg]);
 im.y = atoi(argv[++*pArg]);
 im.d = atoi(argv[++*pArg]);
 im.c = atoi(argv[++*pArg]);
 im.sD = &null; /* don't forget to terminate those empty strings! */
 im.data = (char *) malloc(0);
 if (!(fp = fopen(im.sN, "r"))) {
 fprintf(stderr, "error opening imageName \"%s\" for reading\n", im.sN);
 return (0);
 }
 while (reallyRead = fread(buffer, 1, MAXBUF, fp)) {
 im.data = (char *) realloc(im.data, imageSize + reallyRead);
 (void) bcopy(buffer, im.data + imageSize, reallyRead);
 imageSize += reallyRead;
 }
 im.b = imageSize;
 fclose(fp);
 return (&im);
}
writeImage(pImage, sImageName)
 image *pImage;
 pStr sImageName;
{
 if (!(fp = fopen(sImageName, "w"))) {
 fprintf(stderr, "error opening imageName \"%s\" for writing\n", sImageName);
 return (1);
 }
 PRINTHEAD(pImage);
 if (fwrite(pImage->data, 1, pImage->b, fp) != pImage->b) {
 fprintf(stderr, "error writing imageName \"%s\" data\n", sImageName);
 fclose(fp);
 return (1);
 }

 fclose(fp);
 return (0);
}



Listing Two

/* rim_client.c - client application for remote image database service */
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include <pwd.h>
#include <dce/rpc.h>
#include <pthread.h>
#include "rim.h"
#include "rim_util.h"
#define USAGE() { fprintf(stderr, "commands:\n"); \
 fprintf(stderr, "\ta imageName \"comments\" width height depth 
 compressType"); \
 fprintf(stderr, "\n\t\t\t\t\tadd an image from file 'imageName'\n"); \
 fprintf(stderr, "\td imageName\t\tdelete an image\n"); \
 fprintf(stderr, "\tx imageName\t\textract an image to file 
 'imageName'\n"); \
 fprintf(stderr, "\tl\t\t\tlist contents of archive\n"); \
 fprintf(stderr, "\tq\t\t\tquits\n"); }
#define PRINTHEAD(pI) { \
 printf("name:\t%s\n\towner: %s\n\tcomments: %s\n\tdate: %s\n", \
 pI->sN, pI->sO, pI->sC, pI->sD); \
 printf("\tbytes: %d\twidth: %d\theight: %d\tdepth: %d\tcompress: %d\n", \
 pI->b, pI->x, pI->y, pI->d, pI->c); }
typedef struct work_arg {
 pthread_t *thread_id;
 int server_num;
 char *server_name;
 rpc_binding_handle_t bind_handle;
 image *pImage;
} work_arg_t;
image *readImage();
FILE *fp;
#define MAX_SERVERS 100
pthread_mutex_t WorkMutex;
pthread_cond_t WorkCond;

/* The single-arg wrapper routine around the list() RPC accessed by each
 * thread we ask to list - must be reentrant */
void list_wrapper(work_arg_t * work_arg_p)
{
 imageList *pIL;
 image *pI;
 if (!(pIL = list(work_arg_p->bind_handle))) {
 fprintf(stderr, "remote call failed:");
 pthread_exit((pthread_addr_t *)1);
 } else {
 for (pI = pIL->pImage; pIL->pNext; pIL = pIL->pNext, pI = pIL->pImage)
 PRINTHEAD(pI);
 iLFreeOne(pIL);
 }
 pthread_exit((pthread_addr_t *)0);

}
/* the wrapper around the add() RPC */
void add_wrapper(work_arg_t * work_arg_p)
{
 pStr expectEmpty; /* a NULL if success, else an error string */
 expectEmpty = add(work_arg_p->bind_handle, work_arg_p->pImage);
 if (expectEmpty[0] != '\0') {
 fprintf(stderr, "remote call failed: %s", expectEmpty);
 pthread_exit((pthread_addr_t *)1);
 }
 pthread_exit((pthread_addr_t *)0);
}
/* the wrapper around the delete() RPC */
void delete_wrapper(work_arg_t * work_arg_p)
{
 pStr expectEmpty; /* a NULL if success, else an error string */
 expectEmpty = delete(work_arg_p->bind_handle, work_arg_p->pImage->sN);
 if (expectEmpty[0] != '\0') {
 fprintf(stderr, "remote call failed: %s", expectEmpty);
 pthread_exit((pthread_addr_t *)1);
 }
 pthread_exit((pthread_addr_t *)0);
}
/* the wrapper around the extract() RPC */
void extract_wrapper(work_arg_t * work_arg_p)
{
 image *pI;
 pStr expectEmpty; /* a NULL if success, else an error string */
 expectEmpty = extract(work_arg_p->bind_handle, work_arg_p->pImage->sN, &pI);
 if (expectEmpty[0] != '\0') {
 fprintf(stderr, "remote call failed: %s", expectEmpty);
 pthread_exit((pthread_addr_t *)1); 
 } else {
 (void) writeImage(pI, pI->sN);
 iFreeOne(pI);
 }
 pthread_exit((pthread_addr_t *)0);
}
main(argc, argv)
 int argc;
 char *argv[];
{
 int server_num, nservers;
 work_arg_t work_arg[MAX_SERVERS];
 char *server_name[MAX_SERVERS];
 rpc_binding_handle_t *binding;
 /* Check usage and initialize. */
 if (argc < 2 (nservers = argc - 1) > MAX_SERVERS) {
 fprintf(stderr, "Usage: %s server_name ...(up to %d server_name's)...\n",
 argv[0], MAX_SERVERS);
 exit(1);
 }
 for (server_num = 0; server_num < nservers; server_num += 1) {
 server_name[server_num] = (char *) argv[1 + server_num];
 /* Import binding info from namespace and annotate handles for security. */
 binding = importAuthBinding(rim_v1_0_c_ifspec,
 SERVER_PRINC_NAME, server_name[server_num],
 '\0', 1, rpc_c_protect_level_pkt_integ,
 rpc_c_authn_dce_secret, '\0', rpc_c_authz_name);

 }
 /* Initialize mutex and condition variable. */
 printf("Client calling pthread_mutex_init...\n");
 if (pthread_mutex_init(&WorkMutex, pthread_mutexattr_default) == -1) {
 dce_err(__FILE__, "pthread_mutex_init", (unsigned long) -1);
 exit(1);
 }
 printf("Client calling pthread_cond_init...\n");
 if (pthread_cond_init(&WorkCond, pthread_condattr_default) == -1) {
 dce_err(__FILE__, "pthread_cond_init", (unsigned long) -1);
 exit(1);
 }
 /* Initialize work args that are constant throughout main loop. */
 for (server_num = 0; server_num < nservers; server_num += 1) {
 work_arg[server_num].server_num = server_num;
 work_arg[server_num].server_name = server_name[server_num];
 work_arg[server_num].bind_handle = binding[server_num];
 work_arg[server_num].pImage = (image *) malloc(sizeof(image));
 work_arg[server_num].thread_id = (pthread_t *) '\0';
 }
 /* Transaction loop -- exits with a 'q' and reaps threads. */
 while (1) {
 /* Per-loop initialization. We're single-threaded here, so locks and
 * reentrant code is unnecessary. For each server... */
 char line[256];
 char args[7][256];
 int argc, argcc;
 void *local;
 /* scrape up to 7 args from the command line */
 gets(line);
 argc = sscanf(line, "%s%s%s%s%s%s%s", args[0], args[1], args[2], args[3],
 args[4], args[5], args[6]);
 server_num = (server_num + 1) % nservers; /* NEXT! */

 local = (void *)'\0';
 switch (tolower(args[0][0])) {
 case 'a':
 argcc = 1;
 if ((argc != 7) (!(work_arg[server_num].pImage = 
 readImage(args, &argcc))))
 USAGE()
 else
 local = &add_wrapper;
 break;
 case 'd':
 if (argc != 2) USAGE()
 else {
 work_arg[server_num].pImage->sN = (pStr) strdup(args[1]);
 local = &delete_wrapper;
 }
 break;
 case 'x':
 if (argc != 2) USAGE()
 else {
 work_arg[server_num].pImage->sN = (pStr) strdup(args[1]);
 local = &extract_wrapper;
 }
 break;
 case 'l':

 local = &list_wrapper;
 break;
 case 'q':
 /* If we ever started a thread for a server, wait for it to die if not 
 already dead, print exit status. Note they have not been
 detached yet so we have status available */
 for(server_num=0; server_num<nservers; server_num++) {
 pthread_addr_t status;
 if (work_arg[server_num].thread_id) {
 pthread_join(*(work_arg[server_num].thread_id), &status);
 printf("thread %d exit status %d\n", server_num, status);
 }
 }
 exit(0);
 default:
 USAGE();
 break;
 }
 if (local) {
 fprintf(stderr, "threading for the call to server %s...\n", 
 server_name[server_num]);
 work_arg[server_num].thread_id = (pthread_t*)malloc(sizeof(pthread_t));
 pthread_create(work_arg[server_num].thread_id, pthread_attr_default, 
 (void *)local, (void *)&work_arg[server_num]);
 }
 }
}
image *
readImage(argv, pArg)
 char argv[7][256];
 int *pArg;
{
 static image im;
 char buffer[MAXBUF];
 idl_char null = '\0'; /* note the idl_*/
 u_int reallyRead;
 u_int imageSize = 0;

 /* Build the header information then look at command line for data. */
 im.sN = (pStr) strdup(argv[*pArg]);
 im.sO = (idl_char *) UIDTONAME(getuid());
 im.sC = (pStr) strdup(argv[++*pArg]);
 im.x = atoi(argv[++*pArg]);
 im.y = atoi(argv[++*pArg]);
 im.d = atoi(argv[++*pArg]);
 im.c = atoi(argv[++*pArg]);
 im.sD = &null; /* don't forget to terminate those empty strings! */
 im.data = (idl_char *) malloc(0); /* note the idl_*/

 if (!(fp = fopen(im.sN, "r"))) {
 fprintf(stderr, "error opening imageName \"%s\" for reading\n", im.sN);
 return (0);
 }
 while (reallyRead = fread(buffer, 1, MAXBUF, fp)) {
 im.data = (idl_char *) realloc(im.data, imageSize + reallyRead);
 (void) bcopy(buffer, im.data + imageSize, reallyRead);
 imageSize += reallyRead;
 }
 im.b = imageSize;

 fclose(fp);
 return (&im);
}
writeImage(pImage, sImageName)
 image *pImage;
 pStr sImageName;
{
 /* same as in Listing One*/
 }
 fclose(fp);
 return (0);
}
/* The next four routines are just image linked-list maint. stuff. */
image *
iAllocOne()
{ /* allocate one image structure */
 image *pI = (image *) calloc(sizeof(image), 1);
 pI->sN = (pStr) calloc(MAXSTR, 1);
 pI->sO = (pStr) calloc(MAXSTR, 1);
 pI->sC = (pStr) calloc(MAXSTR, 1);
 pI->sD = (pStr) calloc(MAXSTR, 1);
 return (pI);
}
imageList *
iLAllocOne()
{ /* allocate one imageList structure */
 imageList *pIL = (imageList *) malloc(sizeof(imageList));
 pIL->pImage = iAllocOne();
 pIL->pNext = '\0';
 return (pIL);
}
iFreeOne(pI)
 image *pI;
{
 cfree(pI->sN);
 cfree(pI->sO);
 cfree(pI->sC);
 cfree(pI->sD);
 cfree(pI);
}
iLFreeOne(pIL)
 imageList *pIL;
{
 imageList *pil;
 imageList *pil_prev = '\0';
 while (pIL) {
 for (pil = pIL; (pil->pNext) != '\0'; pil_prev = pil, pil = pil->pNext);
 iFreeOne(pil->pImage);
 cfree(pil);
 if (pil_prev) {
 pil_prev->pNext = '\0';
 }
 if (pil == pIL)
 break;
 }
}




Listing Three

/* rim_server.c - server intitialization and procedures for remote
 * image database service */
#include <stdio.h>
#include <sys/types.h>
#include <sys/time.h>
#include "rim.h"
#include "rim_util.h"

#define FGETS(ptr, max, fp) { fgets(ptr, max, fp); ptr[strlen(ptr)-1] = '\0';
}
#define READHEADER(n, o, c, d) \
 { FGETS(n,MAXSTR,fp); FGETS(o,MAXSTR,fp); \
 FGETS(c,MAXSTR,fp); FGETS(d,MAXSTR,fp); }

FILE *fp;
imageList *iLAllocOne();
image *iAllocOne();

/* ref_mon()- reference monitor for rim. It checks generalities, then calls
 * is_authorized() to check specifics. */ 
int
ref_mon(bind_handle)
 rpc_binding_handle_t bind_handle;
{
 int ret;
 rpc_authz_handle_t privs;
 unsigned_char_t *client_princ_name, *server_princ_name;
 unsigned32 protect_level, authn_svc, authz_svc, status;
 /* Get client auth info. */
 rpc_binding_inq_auth_client(bind_handle, &privs, &server_princ_name,
 &protect_level, &authn_svc, &authz_svc, &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_binding_inq_auth_client", status);
 return (0);
 }
 /* Check if selected authn service is acceptable to us. */
 if (authn_svc != rpc_c_authn_dce_secret) {
 dce_err(__FILE__, "authn_svc check", (unsigned long) -1);
 return (0);
 }
 /* Check if selected protection level is acceptable to us. */
 if (protect_level != rpc_c_protect_level_pkt_integ
 && protect_level != rpc_c_protect_level_pkt_privacy) {
 dce_err(__FILE__, "protect_level check", (unsigned long) -1);
 return (0);
 }
 /* Check if selected authz service is acceptable to us. */
 if (authz_svc != rpc_c_authz_name) {
 dce_err(__FILE__, "authz_svc check", (unsigned long) -1);
 return (0);
 }
 /* If rpc_c_authz_dce were being used instead of rpc_c_authz_name, privs
 * would be a PAC (sec_id_pac_t *), not a name as it is here. */
 client_princ_name = (unsigned_char_t *) privs;
 /* Check if selected server principal name is supported. */
 if (strcmp(strrchr(server_princ_name, '/'), strrchr(SERVER_PRINC_NAME, 
 '/')) != 0) {
 dce_err(__FILE__, "server_princ_name check", (unsigned long) -1);

 return (0);
 }
 /* Now that things seem generally OK, check the specifics. */
 if (!is_authorized(client_princ_name)) {
 dce_err(__FILE__, "is_authorized", (unsigned long) -1);
 return (0);
 }
 /* Cleared all the authorization hurdles -- grant access. */
 return (1);
}
/* is_authorized() - check authorization of client for this service. We could
 * check on a per-procedure basis, rather than once for the interface, to give
 * more control over access. Typically, an application (i.e., one using PACs &
 * ACLs) would be using sec_acl_mgr_is_authorized(). */
int
is_authorized(client_princ_name)
 unsigned_char_t *client_princ_name;
{
 /* Check if we want to let this client do this operation. A list or
 ACL would be better */
 if (strcmp(strrchr(client_princ_name, '/'), strrchr(CLIENT_PRINC_NAME, 
 '/')) == 0) {
 /* OK, we'll let this access happen. */
 return (1);
 }
 return (0);
}
void
die(rpc_binding_handle_t bind_handle)
{
 printf("server answering the call...\n");
 /* should de-register enpoints and directory info */
 exit(0);
}
void
restart(rpc_binding_handle_t bind_handle)
{
 /* should de-register enpoints and directory info */
 (void) execl(SERVERPATH, (char *) 0);
}
pStr
add(rpc_binding_handle_t bind_handle, image *argp)
{
 static pStr result;
 static idl_char msg[MAXSTR];
 static char N[MAXSTR], O[MAXSTR], C[MAXSTR], D[MAXSTR];
 char head[MAXSTR];
 int fstat, b, x, y, d, c;
 time_t tloc;
 result = msg;
 msg[0] = '\0';
 printf("server answering the call...\n");
 if (!(fp = fopen(SERVERDB, "r"))) {
 sprintf(msg, "cannot open server database %s for reading\n", SERVERDB);
 return (result);
 }
 /* First make sure such an image isn't already archived. */
 while ((fstat = fscanf(fp, "%d%d%d%d%d\n", &b, &x, &y, &d, &c)) == 5) {
 READHEADER(N, O, C, D);

 if (!strcmp(N, argp->sN))
 break;
 fseek(fp, (long) b, 1);
 }
 switch (fstat) {
 case EOF: /* not found - that's good */
 fclose(fp);
 if (!(fp = fopen(SERVERDB, "a"))) {
 sprintf(msg, "cannot open server database %s to append\n", SERVERDB);
 fclose(fp);
 return (result);
 }
 break;
 case 5: /* there already is one! */
 sprintf(msg, "%s archive already has a \"%s\"\n", SERVERDB, argp->sN);
 fclose(fp);
 return (result);
 default: /* not a clean tail... tell user and try */
 repairDB(msg); /* to recover */
 fclose(fp);
 return (result);
 }
 CompressImage(1, argp); /* compress as specified */
 /* Get the date, add the image header and data, then return. */
 time(&tloc);
 sprintf(head, "%d %d %d %d %d\n%s\n%s\n%s\n%s",
 argp->b, argp->x, argp->y, argp->d, argp->c,
 argp->sN, argp->sO, argp->sC, (char *) ctime(&tloc));
 if ((fwrite(head, 1, strlen(head), fp) != strlen(head)) 
 (fwrite(argp->data, 1, argp->b, fp) != argp->b))
 sprintf(msg, "failed write to server database %s\n", SERVERDB);
 fclose(fp);
 return (result);
}
/* This is included for the sake of completeness but is brute-force. */
pStr
delete(rpc_binding_handle_t bind_handle, pStr argp)
{
 FILE *fpp;
 int fstat;
 static pStr result;
 static idl_char msg[MAXSTR];
 char N[MAXSTR], O[MAXSTR], C[MAXSTR], D[MAXSTR];
 char *buffer;
 int bufSize, bytesRead, b, x, y, d, c;
 int seekPt = 0;

 printf("server answering the call...\n");

 if (!ref_mon(bind_handle)) { /* a simple monitor */
 dce_err(__FILE__, "ref_mon - not allowed to delete", (unsigned long) -1);
 return;
 }
 msg[0] = '\0';
 result = msg;
 if (!(fp = fopen(SERVERDB, "r"))) {
 sprintf(msg, "cannot open server database %s for reading\n", SERVERDB);
 return (result);
 }

 /* Look thru the DB for the named image. */
 while ((fstat = fscanf(fp, "%d%d%d%d%d\n", &b, &x, &y, &d, &c)) == 5) {
 READHEADER(N, O, C, D);
 fseek(fp, (long) b, 1); /* fp stops at next entry */
 if (!strcmp(N, argp))
 break;
 seekPt = ftell(fp);
 }
 switch (fstat) {
 case EOF: /* not found */
 sprintf(msg, "%s not found in archive\n", argp);
 break;
 case 5: /* This is the one! Remove it by copying the bottom up. */
 bufSize = MIN(MAX(1, b), MAXBUF);
 buffer = (char *) malloc(bufSize);
 fpp = fopen(SERVERDB, "r+");
 fseek(fpp, seekPt, 0); /* fpp is at selected image */
 while (!feof(fp)) {
 bytesRead = fread(buffer, 1, bufSize, fp);
 fwrite(buffer, 1, bytesRead, fpp);
 }
 seekPt = ftell(fpp);
 fclose(fpp);
 truncate(SERVERDB, (off_t) seekPt);
 break;
 default: /* not a clean tail... */
 repairDB(msg);
 }
 fclose(fp);
 return (result);
}
static image *pIm = '\0'; /* keep this around as we are interative now */
pStr
extract(rpc_binding_handle_t bind_handle, pStr argp, image **ppIm)
{
 int fstat;
 static pStr result;
 static idl_char msg[MAXSTR];

 printf("server answering the call...\n");
 result = msg;
 msg[0] = '\0';

 if (!(fp = fopen(SERVERDB, "r"))) {
 sprintf(msg, "cannot open server database %s for reading\n", SERVERDB);
 return (result);
 }
 /* Free previously allocated memory. Look thru the DB for the named image. */
 if (pIm != '\0')
 free(pIm);
 pIm = *ppIm = iAllocOne();
 while ((fstat = fscanf(fp, "%d%d%d%d%d\n", &(pIm->b), &(pIm->x), &(pIm->y),
 &(pIm->d), &(pIm->c))) == 5) {
 READHEADER(pIm->sN, pIm->sO, pIm->sC, pIm->sD);

 if (!strcmp(pIm->sN, argp))
 break;
 fseek(fp, (long) pIm->b, 1);
 }

 switch (fstat) {
 case EOF: /* not found */
 sprintf(msg, "%s not found in archive\n", argp);
 break;
 case 5: /* this is the one! */
 pIm->data = (idl_char *) malloc(pIm->b);
 if (fread(pIm->data, 1, pIm->b, fp) != pIm->b) {
 sprintf(msg, "couldn't read all of %s\n", argp);
 repairDB(msg);
 }
 break;
 default: /* not a clean tail... */
 repairDB(msg);
 }
 fclose(fp);
 return (result);
}
static imageList *pIList = '\0';/* keep this around as we are interative now
*/
imageList *
list(rpc_binding_handle_t bind_handle)
{ /* inconsistent - should return a string, but there's a reason... */
 imageList *pIL;
 int fstat;
 printf("server answering the call...\n");
 /* Free previously allocated memory. Build a list. */
 if (pIList)
 iLFreeOne(pIList);
 pIL = pIList = iLAllocOne();
 if (!(fp = fopen(SERVERDB, "r"))) {
 sprintf(pIL->pImage->sN, "cannot open server database %s for reading\n", 
 SERVERDB);
 pIL->pNext = iLAllocOne(); /* needs a dangler...:-( */
 return (pIList);
 }
 while ((fstat = fscanf(fp, "%d%d%d%d%d\n", &(pIL->pImage->b),
 &(pIL->pImage->x), &(pIL->pImage->y),
 &(pIL->pImage->d), &(pIL->pImage->c))) == 5) {
 READHEADER(pIL->pImage->sN, pIL->pImage->sO,
 pIL->pImage->sC, pIL->pImage->sD);
 fseek(fp, (long) pIL->pImage->b, 1);
 pIL->pNext = iLAllocOne(); /* hang an empty one on the end */
 pIL = pIL->pNext;
 }
 if (fstat != EOF) { /* not a clean tail... */
 repairDB(pIL->pImage->sN);
 }
 fclose(fp);
 return (pIList);
}
/* The next four routines are just image linked-list maint. stuff. */
imageList *
iLAllocOne()
{ /* allocate one imageList structure */
 imageList *pIL = (imageList *) malloc(sizeof(imageList));
 pIL->pImage = iAllocOne();
 pIL->pNext = '\0';
 return (pIL);
}
image *

iAllocOne()
{ /* allocate one image structure */
 image *pI = (image *) calloc(sizeof(image), 1);
 pI->sN = (pStr) calloc(MAXSTR, 1);
 pI->sO = (pStr) calloc(MAXSTR, 1);
 pI->sC = (pStr) calloc(MAXSTR, 1);
 pI->sD = (pStr) calloc(MAXSTR, 1);
 return (pI);
}
iLFreeOne(pIL)
 imageList *pIL;
{
 imageList *pil;
 imageList *pil_prev = '\0';
 while (pIL) {
 for (pil = pIL; (pil->pNext) != '\0'; pil_prev = pil, pil = pil->pNext);
 iFreeOne(pil->pImage);
 cfree(pil);
 if (pil_prev) { pil_prev->pNext = '\0'; }
 if (pil == pIL) break;
 }
}
iFreeOne(pI)
 image *pI;
{
 cfree(pI->sN);
 cfree(pI->sO);
 cfree(pI->sC);
 cfree(pI->sD);
 cfree(pI);
}
repairDB(s) /* doesn't do much, yet... */
 pStr s;
{
 sprintf(s, "server database %s data hosed, repaired\n", SERVERDB);
}
CompressImage(d, pIm) /* compression and decompression */
 int d;
 image *pIm;
{
 /* omitted */
}
/******** server initialization starts here *********/
#ifndef LOCAL /* go LOCAL if you want to link with rim_client.c */
#include <dce/rpc.h>

#define MAX_CONC_CALLS_PROTSEQ 5 /* max conc calls per protseq */
#define MAX_CONC_CALLS_TOTAL 10 /* max conc calls total */
/* definition, generated by IDL, are all that is necessariliy unique below */
#define SERVER_IF rim_v1_0_s_ifspec

char *server_name;
/* main() Get started; set up server how we want it, and call listen loop. */
int
main(argc, argv)
 int argc;
 char *argv[];
{
 rpc_binding_vector_t *bind_vector_p;

 unsigned32 status;
 int i;
 /* Check usage and initialize. */
 if (argc != 2) {
 fprintf(stderr, "Usage: %s namespace_server_name\n", argv[0]);
 exit(1);
 }
 server_name = argv[1];
 /* Register interface with rpc runtime - no type_uuid/epv associations */
 rpc_server_register_if(SERVER_IF, '\0', '\0', &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_server_register_if", status);
 exit(1);
 }
 /* Tell rpc runtime we want to use all supported protocol sequences. */
 rpc_server_use_all_protseqs(MAX_CONC_CALLS_PROTSEQ, &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_server_use_all_protseqs", status);
 exit(1);
 }
 /* Ask the runtime which binding handle(s) it's going to let us use. */
 rpc_server_inq_bindings(&bind_vector_p, &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_server_inq_bindings", status);
 exit(1);
 }
 /* Register authentication info with rpc runtime. */
 rpc_server_register_auth_info(SERVER_PRINC_NAME, 
 rpc_c_authn_dce_secret, '\0', KEYTABFILE, &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_server_register_auth_info", status);
 exit(1);
 }
 /* Register binding info with endpoint mapper. No object UUID vector */
 rpc_ep_register(SERVER_IF, bind_vector_p, '\0',
 (unsigned_char_t *) "rim explicit secure server, version 1.0", &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_ep_register", status);
 exit(1);
 }
 /* Export binding info to the namespace. */
 rpc_ns_binding_export(rpc_c_ns_syntax_dce, server_name,
 SERVER_IF, bind_vector_p, '\0', &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_ns_binding_export", status);
 exit(1);
 }
 /* Listen for service requests. */
 fprintf(stdout, "server %s ready.\n", server_name);
 rpc_server_listen(MAX_CONC_CALLS_TOTAL, &status);
 if (status != rpc_s_ok) {
 dce_err(__FILE__, "rpc_server_listen", status);
 exit(1);
 }
 /* Not reached. */
}
#endif


































































Distributed Real-Time Operating Systems


The next generation




E. Douglas Jensen


Doug is the technical director for real-time computer systems at Digital
Equipment Corp. His responsibilities include establishing a technology vision
and strategy for Digital's efforts in real-time systems. Doug has 27 years of
experience in real-time computing, including eight years on the
computer-science faculty at Carnegie Mellon University.


To date, virtually all real-time computing has been at the lowest level of the
application-control hierarchy--embedded computers, controllers, appliances,
and other intelligent devices. However, more and more real-time systems are
being specified at the higher levels, including decentralized production
operations and business-management systems.
Unfortunately, the requirements of this expanding real-time domain violate the
assumptions implicitly underlying conventional real-time computing. Moreover,
advanced, complex, and distributed real-time applications typically need more
operating system (OS) technology than their smaller, simpler, executive-based
brethren. Thus, the requirements are becoming too broad for any single,
general-purpose, real-time OS to satisfy. The obvious alternative--multiple,
additional real-time OSs--is logistically and economically infeasible for
vendors and users alike. 
Consequently, a new generation of real-time control, computing, and operating
systems needs to be born. These OSs should be modular, adaptable, and scalable
in functionality. They should support global, distributed, and cooperative
computing across and between levels--and nodes--in the application-control
hierarchy. The new OSs should perform dynamic resource management in the face
of application and system uncertainties, enforce end-to-end timeliness of the
total control system, and support variable degrees of "hardness" and
"softness" of the real-time application.
One scalable, real-time OS architecture across many levels of hardware would
benefit both vendors and users. It would improve software reusability while
accommodating evolving needs and technologies. It would reduce engineering
costs and time-to-market. Plus, it would use a single, common,
software-development environment across the entire application regime.


The Changing Real-Time Application Environment


Traditional, relatively small, simple, centralized, real-time applications
exist everywhere. For example, it's now familiar to see 8-bit embedded-chip
valve controllers running on top of a home-grown or vendor-proprietary OS,
32-bit industrial microcomputers providing supervisory cell control while
running VxWorks or a proprietary OS, and UNIX workstations providing the
operator interface to a remote console. In commercial, real-time
applications--online transaction-processing applications such as lottery
systems, bond trading, currency markets, and the like--predictable and timely
response is mandatory, but not on microsecond time scales. In the consumer
world, the antilock-brake system (ABS) ensures real-time safety for a car's
occupants.
As Figure 1 illustrates, real-time applications usually exist in a stylized,
restricted hierarchy consisting of three levels: control, supervisory, and
management. Generally, the control and computing in this hierarchy have been
local, centralized, and autonomous; they use elementary client/server
relationships between levels, and their real-time nature is different at each
level.
Control, the lowest level of the hierarchy, is typically reactive--small,
stand-alone, and generally oblivious to other systems. Control applications
are relatively simple real-time subsystems for low-level, sampled-data
monitoring and control (such as regulatory loops in process-control
applications). They use static, priority-based resource management and have
highly predictable behavior. Almost all real-time computing has been at this
lowest level of the application-control hierarchy.
The second level is usually a supervisory control and computing system with
loosely defined real-time operations. Typical applications in industrial
plants include production scheduling and control, quality management, and
process optimization. In commercial aircraft, this level would be the
mission-management computer system, which supervises the flight control,
communications, and propulsion subsystems, among others.
At the highest level, the management level, computing is non-real-time. These
systems generally handle business operations, such as manufacturing-resource
planning (MRP II), maintenance management, and order processing.
The three-level hierarchy in Figure 1 implies the need for only two kinds of
real-time computer systems. The control level typically has small,
proprietary, "hard" real-time executives that provide limited functionality;
Wind River's VxWorks is typical of these executives. The supervisory- and
management-level systems, on the other hand, usually have full-function,
"soft" real-time OSs such as DEC OSF/1 and VMS.


Real-Time Rules have Changed


The evolution of industrial automation systems is pulling real-time computing
up from its familiar niche at the lowest level of the system hierarchy to the
higher (supervisory and management) levels. This movement is expected to
improve product quality and yield predictability, asset utilization, and plant
flexibility, among other benefits.
To achieve these benefits, the concepts and techniques of real-time computing
must be clarified, improved, and generalized. In technology, high-level
real-time control and computing will have to be adaptive, self-directing,
global, and distributed (translevel and transnode). In particular:
Application entities will be more abstract.
Data will have more complex syntax and semantics.
Input events will have greater magnitudes and variances.
State spaces will be larger.
Tasks will be more aperiodic and asynchronous.
Uncertainty, and thus nondeterminism, will increase.
While (fortunately) the operational time frames are normally slower at the
higher levels, ranging from seconds on up, the computing must still be hard
real time in the sense that task timeliness has to be predictable.


One Size Can't Fit All


Many real-time systems use special-purpose OSs for a particular application or
to best balance performance; functionality; hardware and software costs; and
hardware size, weight, and power. Increasingly, the economies of developing
and supporting these OSs, together with the nonportability of their
applications, are forcing users to move to commercial, general-purpose,
real-time OS products. For a user with only one application in the real-time
domain, these OS products are often adequate. Users having a variety of
different real-time applications--especially higher-level, distributed
applications--are finding that these existing OS products do not encompass all
their requirements. Theory and practice show that general-purpose OSs cannot
accommodate the broad set of control and computing needs across the entire
real-time control hierarchy. Moreover, OS vendors wishing to serve real-time
user needs up and down the application hierarchy cannot afford to offer a
multiplicity of different real-time OSs, nor can users afford to use a
multiplicity of OSs--the support costs are too high in staff retraining,
time-to-market, budgets, and software quality.
Consequently, the application hierarchy needs a scalable OS because "one size
doesn't fit all." Scalable, in this case, means an operating system is highly
modular, accommodating application specificity; and highly adaptable,
accommodating execution-time situational specificity.


Current OS Limitations: Timeliness


Timeliness is one limitation to the current generation of operating systems.
Timeliness is effected by factors such as the hard and soft dichotomy,
performance metrics, priorities, exception cases, the division of
responsibilities between OS and applications programmers, and determinism.
Hard versus soft real time. Hard and soft real time are much more complex
technical issues than popular usage of the terms would suggest. Hard real time
conventionally means that all important tasks have deadlines that must always
be met, otherwise the system has failed. Conversely, soft real time is not
hard in that tasks might not have deadlines. Even if they do, missing a
deadline is not necessarily a system failure. Most real-time systems fall into
this soft category; their tasks may have deadlines or must be completed as
soon as possible. However, the tasks may still be more or less acceptable, if
they are completed within a specific, suboptimal time.

For example, missing a sensor sample will create a discontinuity in the sensor
reading or a click in the audio signal; neither are catastrophic events, and
the next sample will override the missed data point. Alternately, an
application may require at least 85 percent of the tasks to be no more than 20
percent late, as long as no two tasks in a row miss their deadline. 
So the bad news is that traditional soft real time is undefined, and the worse
news is that most real-time systems are soft.
The information in both hard and soft real-time systems is highly perishable,
and the system (OS and application) has to act on that information while it is
current. At the core of real-time computing are issues of predictability, as
opposed to how long the system takes to complete a set of tasks.
Performance metrics. A computing system or operating system is real time to
the degree that it explicitly manages resources so tasks are completed at
acceptable times. This may happen implicitly (by luck) or by hardware brute
force. Such systems may successfully operate in real time; they may even be
rational, cost-effective solutions for certain applications. However, they are
not genuine real-time systems because they do not use real-time resource
management.
Historically, the real-time OS development and user communities have
subliminally conspired in the belief that some form of interrupt response time
is the real-time performance metric. However, starting the most
eligible--traditionally, the highest-priority--task as fast as possible is
necessary, but not sufficient. What really matters is that tasks complete at
acceptable times.
The response-time artifact arises from the implication that if you start a
task fast enough, it will complete on time. This implication often holds in
small, simple systems, but not in larger, more complex systems, such as at
higher levels of the application hierarchy. In these systems, where there are
dynamic resource conflicts, interrupt response time is insufficient to
characterize the system's real-time performance.
A richer, more powerful approach to expressing application-specific timeliness
is necessary. This is especially true for the more sophisticated real-time
computing necessary at higher levels of the control hierarchy.
Priorities. Programmers normally characterize individual task-timeliness
requirements with priorities. Priorities arose in the context of simple
systems, where they are adequate, but they are not adequate for large,
complex, real-time systems because they:
Do not distinguish between urgency (time criticality) and importance (relative
functional criticality).
Do not adequately represent a task's actual completion-time constraint and
acceptable timeliness.
Often allow only fixed assignments, which limits functionality, performance,
and adaptability, resulting in high-cost design, integration, testing, and
modification.
Cannot support SMP machines, which require intentional processor-idle times.
Have different mechanisms and semantics in the interrupt hardware versus the
OS.
Exception cases. System performance should be optimized for the most important
cases. At the higher levels of the control hierarchy, these cases are often
the high-stress exceptions rather than the most frequent (normal, uneventful)
cases. These exceptions are inherent in some applications or in emergencies,
such as plant upsets. It is in these exception cases that system and OS
performance are most critical.
The traditional real-time approach to dealing with exception cases is through
determinism. Only the most frequent cases are accommodated, and it is presumed
that no exception cases will suffer. Or, the most demanding case, however
infrequent, is identified and satisfied in advance--regardless of the
consequences to overall system performance and cost. 
Division of programmer responsibilities. The responsibility for completing
real-time applications on time is generally divided between the application
software and the OS. 
Historically, most of this responsibility has fallen on the application
programmers. They normally construct a static mapping of task-completion times
to priorities. They do this in a way that is usually ad hoc and experimental.
The OS contributes only fast, tightly bound interrupt latency, plus
fixed-priority scheduling. This imbalance has higher costs because:
Different user resource-management efforts will be inconsistent.
Applications have access to fewer hardware and software resources than the OS.
Application programmers are kept from writing applications. 
Determinism. Real-time people are obsessed with the idea that a system has to
be deterministic to behave predictably. This mistake confuses ends and means.
To put that into action, real-time people must attempt to anticipate
contingencies, and therefore reserve the appropriate compute and data network
resources. So they build rigid systems overendowed with resources that might
never be used. Unfortunately, they might still be unprepared for an entirely
different set of contingencies. For complex and distributed systems, this
approach can be fatal.
Communications people tend not to make this mistake. For example, when you
pick up the phone and call your mom or log onto CompuServe and send packets,
it's highly probable that mom or another CompuServe node will be at the other
end. However, unbeknownst to you, all kinds of uncertainties are present:
links break, buffers get congested, and so on. While all of that goes on, the
telephone or data network dynamically reconfigures itself transparently. This
reconfiguring is considered a feature; the dynamic routing in the network
provides robustness. However, if you are a classic real-time bigot, the
reconfiguring is considered a bug because you don't know exactly how that
routing was accomplished.


Current OS Limitations: Distribution


Another factor limiting the real-time effectiveness of current operating
systems is distribution. System designers typically do not build distributed
systems so much as they build networks--collections of processors connected
together. The result is a non-real-time network of real-time systems, such as
the collection of machining centers and materials-handling equipment on the
factory floor. 
Many systems designers are happy with networking-centralized real-time
subsystems; they've been doing that for years. But many more designers today
want to build entirely real-time process-control applications physically
dispersed across multiple computers. Such applications must contend with the
end-to-end timeliness across the entire network.
In addition, application programmers must know the identities and physical
locations of the computers and the software functions on them. Application
programmers are also responsible for coordinating concurrent execution and
data accesses on each of these computers. Conventional real-time OSs provide
no support for any of these activities.
The solution to many of these issues is to provide some decentralization and
better resource management through middleware at intermediate levels,
including such conventional and object-oriented distributed execution
environments as the Open Software Foundation's Distributed Computing
Environment (DCE) and the Object Management Group's Object Management
Architecture (OMA). However, no off-the-shelf real-time distribution
middleware products exist at present, and middleware typically does not have
direct access to kernel- and OS-level resources, thus limiting the real-time
capabilities of the system.


Technologies for Next-Generation Real-Time OSs


Overcoming the limitations in OS timeliness and distribution requires not only
a shift in mindset, but also new technologies. Today, a variety of computer
and software vendors are performing advanced development in real-time OS
architectures, particularly in the areas of timeliness as expressed in the
benefit-accrual model, distributed threads, and passive objects. What is
likely to happen is that portions of these developments--whether as ideas or
implementations--will be added to existing and new OSs. 
For the next-generation real-time computer systems, the operating-system
architecture must have a framework for expressing highly scalable timeliness
specifications. That is, it must encompass a wide continuum of real-time
hardness and softness in a unified manner. 
A time constraint, such as the archetypal deadline, is conventionally thought
of as a point on a timeline; see Figure 2. Classic scheduling theory often
measures a task's timeliness in terms of lateness, where
lateness=deadline--completion time. For a soft deadline, timeliness is equal
to the lateness value. For a hard deadline, timeliness is equal to the sign of
that value; that is, if it's negative, the task is late.
A better approach would be to think of a task's timeliness in two dimensions:
benefit or contribution to the system over the time required to complete the
task. Graphically, a hard deadline is a binary, downward step function with a
lower range of either zero (missing the deadline is nonproductive) or a
negative number (missing the deadline is counterproductive); see Figure 3. 
In the real world of computing, applications are rarely black and white--real
time is a continuum. With some systems, all tasks and processes have a
deadline that must be met or the system has failed. At the other end of the
continuum, some systems do not have any time constraints. In the middle, some
tasks and process can be late or always late, which can be relatively
difficult to express.
Many of these applications, especially higher-level ones, require individual
task-completion times that are softer in the sense of not being deadlines.
Nonetheless, these completion times must be specified and enforced. Figure 4
illustrates the two-dimensional view of real-time application, where:
Some diminished timeliness is attained when completing the task within an
allowable tardiness period.
Timeliness is not constant before the "deadline."
Timeliness is not constant after the "deadline."
The measure and range of timeliness is application specific.
For example, consider a satellite communications system which has an optimal
window of opportunity for sending and receiving data between the satellite and
the ground station. On each side of that window is a period of time during
which communications can take place, but at a lower rate because of poorer
signal-to-noise ratios. Abstracting this natural analog continuum of
timeliness into an artificial, binary deadline can be highly disadvantageous.
Expressing time constraints in two dimensions lets you represent a wide range
of hardness and softness coherently and methodically, thus letting the OS
satisfy those specifications. Moreover, application programmers can derive
actual timeliness specifications directly from the requirements and behavior
of the system. 
One framework for expressing timeliness is called the "benefit-accrual model"
and is based on three orthogonal functions for specifying timeliness:
Time constraint for each real-time task.
Collective timeliness of a set of real-time tasks.
Collective timeliness acceptability of a set of real-time tasks.
The benefit-accrual model expresses an individual task's time constraint in
terms of a timeliness metric called "benefit."
Graphically, the origin of the benefit function axes is the current time tc
(value of the system clock). The earliest time for a benefit function is
called initial time, ti and the latest time, terminal time, tT; see Figure 5.
The benefit function is evaluated only between the current and the terminal
time. Using these terms, the hard benefit function in Figure 6 has:
A zero or constant negative value before the later time, tL.
An infinite discontinuity in its first derivative at tL if tL>tI.
A due time, tD (which is also equal to sooner time, tS, and expiration time,
tE).
A constant value between tL and tD.
A constant value between tD and tT.

Conversely, a soft benefit function can have arbitrary values before and after
the optimal values at tS, but it need not have constant values on each side of
tL and tD, nor expiration times; see Figure 7.
Individual tasks, in general, also have two other attributes: dynamic
dependencies (for example, precedence and resource conflicts) and relative
importance (functional criticality) that an advanced scheduling policy must
consider. This importance is orthogonal to timeliness, and may be a function
of time and other parameters that reflect the application and computing system
state.
Usually real-time applications include multiple tasks that may each have time
constraints. Collective timeliness, another function in the benefit accrual
model, indicates how timeliness, in terms of system benefits, is "accrued" by
the collection of tasks.
One of the challenges of an advanced scheduler is to optimize collective
timeliness specified by these task time constraints. The scheduler considers
all time constraints it knows about, and creates one or more schedules by
assigning estimated (or expected) execution-completion times. This results in
estimated initiation times and an order for executing the tasks. 
Yet another function in the benefit-accrual model, collective-timeliness
acceptability, specifies the acceptability of the completion times for a set
of tasks. Acceptability of certain tasks or combinations of tasks may be
conditional on the present state of the system, such as other tasks'
timeliness, resource availability, and application mode. Realize that the
semantics and metrics of timeliness acceptability are application specific.
For example, "unacceptable" may mean either nonproductive or counterproductive
in some way. 
Larger, more complex, more distributed, mission-critical real-time systems
usually call for softer collective-timeliness acceptability criteria. These
systems must dynamically adapt to situational uncertainties to remain robust.
For example, a particular group of tasks may be acceptable if they complete at
times yielding at least 75 percent of their maximum possible collective
benefit--if, and only if, no more than two of them, which complete within 100
msecs of each other, yield a timeliness benefit of less than 90 percent of
their maximums.
Real-time applications at higher levels include tasks that span levels of the
control hierarchy and nodes. These tasks are typically "faked" by breaking
them into centralized tasks on each node, which communicate by messages.
Alternatively, next-generation real-time operating systems can provide actual
distributed (transnode) tasks. The technology for accomplishing this is called
"distributed threads." These threads can transparently and reliably extend
themselves across address spaces, and thus among computing nodes. This
transparency minimizes the effect of physical dispersal on software costs and
lets programmers use familiar centralized programming techniques and tools.
These threads also maintain their identities and attributes. In particular, a
distributed thread includes all the real-time scheduling information needed to
enforce its end-to-end timeliness.
Distributed threads provide the opportunity for managing resources coherently,
according to a common performance-optimization criterion, such as meeting
timeliness constraints. For example, the scheduling policy employed for
processor cycles can also be used for managing synchronizers, such as locks,
semaphores, and transactions. Coherent resource management is the only
complete solution to the common problem called "priority inversion."
Separating the task entity--the distributed thread--from the code it executes
and the data it accesses requires a programming model that includes entities
consisting of only code and data. In object-oriented programming, these
entities are abstract data types called "passive objects" (see Figure 8). The
number of distributed threads that can be concurrently active in a passive
object and their synchronization constraints are determined by the object
programmer.
In contrast, active objects often have only one captive thread, and typically
communicate among themselves with asynchronous messages. Because active
objects are a special case of passive objects, they are easily provided by
OSs, if desired. (Active objects are common because of their automatic
compatibility with existing OS process models.) The OS should accept
responsibility of the basic integrity of distributed threads by, for example,
providing orphan detection and elimination, or allowing situation-specific
invocation of failure semantics and recovery policies.
Another important technology that differentiates true distributed computing
from networking is distributed concurrency control. Multiple, mutually
asynchronous, distributed threads must coordinate their concurrent execution
and data accesses. This way, the system remains correct and consistent. The OS
must accomplish this by providing the equivalent of semaphores and locks, but
without shared primary memory among nodes. Technologies for accomplishing this
include distributed agreement, atomic broadcasts, and transaction-like
constructs.


The Challenge of the Next-Generation OS


After remaining relatively unchanged for over 30 years, the real-time
computing domain is expanding into higher-level applications. Traditional
real-time computing concepts and techniques need to scale up to satisfy these
new requirements. This calls for a new real-time paradigm--one that more
carefully defines and generalizes the traditional real-time approach.
This should improve the economies and productivity of the entire
application-control hierarchy and enable new services that are not available
today in real-time applications. At the same time, the continual improvement
in performance and cost-effectiveness of microprocessors increases the need
for these new real-time technologies. 
The new paradigm will include a hierarchy of real-time distributed objects and
threads. Recognizing these as strategic assets represents a business challenge
to traditionally hardware-oriented computer vendors, system suppliers, and
users.
Figure 1 Three-level control hierarchy.
Figure 2 Time constraint as a point on a timeline.
Figure 3 Traditional real-time interpretation of a hard deadline. 
Figure 4 Examples of soft individual time-constraint functions.
Figure 5 Benefit function defined over a range of time.
Figure 6 A "hard" benefit function.
Figure 7 A "soft" benefit function.
Figure 8 Passive objects are abstract data types.


































The Condor Distributed Processing System


Checkpoint and migration of UNIX processes 




Todd Tannenbaum and Michael Litzkow


Todd is the director of the Model Advanced Facility, which pilots leading-edge
computing technology into the College of Engineering at the University of
Wisconsin-Madison. Michael is a researcher in the computer science department
at the University of Wisconsin-Madison and is the primary author of Condor.
You can contact them at condor@cs.wisc.edu.


Over the years, the University of Wisconsin-Madison has developed a powerful,
distributed batch-processing system for UNIX called "Condor." Condor allows us
to utilize otherwise idle CPU cycles in a "pool"--that is, a cluster of
workstations connected via a network that are watched over by the Condor
"central manager" (see Figure 1). Users can submit jobs to Condor from any
workstation in the pool. Condor will then find an idle workstation in the pool
and run the job there until someone starts using that workstation again.
Through remote system calls, Condor ensures that the file system and other
machine characteristics on a remote-execution machine appear identical to
those of the job's submitting machine. These calls allow Condor to provide
necessary file access for the job even in environments where files are not
generally available via more-common file-sharing mechanisms such as NFS or
AFS. When it detects activity on a workstation upon which it is running a job,
Condor creates a "checkpoint" of the job before killing it. This checkpoint is
written to disk and contains all of the process's state information necessary
for Condor to restart the job exactly where it left off. Condor keeps the
checkpoint queued on disk until another workstation in the pool becomes idle
and, thus, available. Condor then transfers the checkpoint to this new
workstation and restarts the job right where it left off, effectively
migrating the process from one available workstation to another.
A complete discussion of Condor is clearly beyond the scope of this article.
Instead, we will discuss exactly how Condor transparently implements
checkpoint/restart of a UNIX process and how remote procedure calls (RPCs)
transparently provide a consistent file-system environment suitable for
process migration.


The Basic Condor Framework


One of the major components of Condor is its facility for transparently
checkpointing and subsequently restarting a process, possibly on a different
machine. By "transparent" we mean that the user code is not specially written
to accommodate the checkpoint/restart or migration and generally has no
knowledge that such an event has taken place. This mechanism is implemented
entirely at user level, with absolutely no modifications to the UNIX kernel. 
Condor supports checkpoint and migration of most user programs without forcing
users to change a single line of source code. However, checkpoint/migration
support does require users to link their binary with the Condor Checkpointing
Library (the Condor system includes a utility that will do this with one
command). The Checkpointing Library first installs a signal handler which
contains the code to asynchronously checkpoint the process. Then it augments a
wide array of UNIX system calls to support checkpoint/restart as well as
migration.
Condor controlling daemons (background UNIX programs) continuously monitor
system activity. When Condor notices external system activity (such as someone
typing on the keyboard) on a machine running a Condor user job, the daemon
sends the job a "checkpoint" signal; see Figure 2. This signal invokes code in
the Checkpointing Library that writes out a checkpoint file to disk and then
terminates the process. The checkpoint file is transferred back to the machine
from which the Condor job was originally submitted. When the Condor central
manager locates a newly available idle machine in the pool, the job's original
binary executable is transferred to the machine, along with the most recent
checkpoint file for that job. Now when the user's job is executed, the Condor
Checkpointing Library (which is linked in with the job's binary) will know
that this is not the first time this process has run. It will therefore
restart by reading in the accompanying checkpoint file and manipulating its
state so as to emulate as accurately as possible the state of the old process
at checkpoint time. The checkpointing process was invoked by a signal, and now
at restart time, things are manipulated so that it appears to the user code
that the process has just returned from that signal handler. 


Inside the Checkpointing Library


To checkpoint and restart a process, you must consider all the components that
constitute the state of that process. UNIX processes consist of an address
space generally divided into text, data, and stack areas, along with other
miscellaneous state information maintained by the kernel; see Figure 3. The
state of the process's registers, any special handling requested for various
signals, and the status of open files and file descriptors fall into this
category.


Text and Data Areas


Statically linked UNIX processes are born with their entire text loaded into
virtual memory by the kernel, generally beginning at address 0. Since exactly
the same executable file serves for both the original invocation and the
restarted process, we don't have to do anything special to save and restore
the text area. (Note that modern programming practice requires that text be
loaded read-only, so there is no chance that the text will be modified at run
time.)
The data space of a UNIX process generally consists of three areas:
initialized data, uninitialized data, and the heap. Initialized and/or
uninitialized data contains global and statically declared variables.
Initialized data is given values by the programmer at compile time.
Uninitialized data is space allocated at compile time, but not given values by
the programmer (the kernel will zero fill this area at load time). The heap is
data allocated at run time by the brk() or sbrk(), UNIX system calls typically
used by the C function malloc(). A process's data generally begins at some
pagesize boundary above the text and is a contiguous area of memory: The
initialized data begins at the first pagesize boundary above the text, the
uninitialized data comes next, and this is followed by the heap, which grows
toward higher addresses at run time. Note that once the process begins
execution, the initialized data may be overwritten, and thus at restart time,
we cannot depend on the information in the executable file for this area.
Instead, the entire data segment is written to the checkpoint file at
checkpoint time and read back into the same address space at restart time. All
you need to know are the starting and ending addresses of the data segment.
The starting address is platform specific, but is usually a static value which
can be found as a linker directive or, on some versions of UNIX, in the man
pages. The ending address of the data segment is effectively the top of the
heap and can be obtained within UNIX via the sbrk() system call. Condor
restores the data space early in the restart process.


Stack Area


The stack area is that part of the address space allocated at run time to
accommodate the information needed by the procedure-call mechanism,
procedure-call arguments, and automatic variables and arrays. The size of the
stack varies at run time whenever a procedure is entered or exited. On some
systems, the stack begins at a fixed address near the top of virtual memory
and grows toward lower addresses; on others, it begins at an address in the
middle of virtual memory (to allow space for heap allocation) and grows toward
higher-numbered addresses. When porting Condor to a new UNIX platform, Condor
must be told in which direction the stack grows.
Preserving the state of the stack requires saving and restoring two distinct
pieces of information. First, the stack context (or "stack environment") must
be saved. The stack context is a small collection of stack-related information
that most notably contains the stack pointer (which keeps track of a process's
current position within the stack data area). Second, the data which makes up
the stack itself (the "stack space") must be saved.
To save and restore the stack context, Condor uses the standard C functions
setjmp() and longjmp(). You call setjmp() with a pointer to a system-defined
type called a JMP_BUF. setjmp() saves the current stack context into the
JMP_BUF and returns 0. If longjmp() is then called with a pointer to the
JMP_BUF and some value other than 0, the stack context saved in JMP_BUF is
restored and we return to the point in the code where the original setjmp()
call was made. This time, the return value from setjmp() is the one specified
in the longjmp() call--that is, something other than 0. 
However, a limitation of setjmp()/longjmp() is that the JMP_BUF does not
contain the actual data contained in the stack space itself, only pointers
into the stack space. The developers of setjmp()/longjmp() designed it to work
within the lifespan of a single process. However, when Condor restarts a
checkpointed process, it first creates a new process, then manipulates its
state so as to emulate the state of the old process. This new process will
have its own new stack space. Therefore, Condor also needs to save the actual
contents of the stack space at checkpoint time. At restart time, Condor must
replace the stack space of the new process with the checkpointed stack space
before utilizing longjmp() to reset the stack context back to its state at
checkpoint time.
Saving the contents of the stack space into the checkpoint file is trivial.
Again, all we need to know is the stack's start and end points. The start of
the stack is a well-known static location defined as a constant on some
platforms and obtained from the man pages on others. The end of the stack, by
definition, is pointed to by the stack pointer. Thus, to determine the end of
the stack, Condor does a setjmp() and pulls the stack-pointer value out of the
JMP_BUF.
Restoring the stack is trickier because we would like to be able to use the
stack space (for local variables) while we are replacing it. Directly
replacing the stack space of a process with the space saved in the checkpoint
file is a sure way to send the process off to never-never land! To avoid this,
Condor moves the stack pointer to a safe buffer reserved in the process's data
area. The stack pointer is moved with yet another call to setjmp(), manually
manipulating the stack pointer in the JMP_BUF to point to our buffer in the
data area, followed by a longjmp(). Then, with the Condor stack-restore
procedure using a secure stack space in the data area, we can safely restore a
new process's original stack space with the one previously saved in the
checkpoint file.


Open Files


Files held open by a process at checkpoint time should be reopened with the
same "attributes" at restart. The attributes of an open file include its
file-descriptor number, the mode in which it is opened (read, write, or
read/write), the offset to which it is positioned, and whether or not it is a
duplicate of another file descriptor. Since much of this information is not
made available to user code by the kernel, we record several attributes at the
time the file descriptor is created via an open() or dup() system call.
Information recorded includes the pathname of the file, the file-descriptor
number, the mode, and (if it is a duplicate) the base-file descriptor number.
The offset at which each file descriptor is positioned is captured at
checkpoint time by performing an lseek() system call upon each descriptor. All
of this information is kept in a table in the process data space (recall that
the data space is restored early in the restart process). Later in the restart
process, we walk through this table and reopen and reposition all of the files
as they were at checkpoint time. Of course, an important part of a file's
state is its content; we assume that this is stored safely in the file system
and that nobody tampered with it between checkpoint and restart times.
An interesting method is used to record information from a system call such as
open() without the need for any modification of the user code. We do this by
providing our own versions of open() and close(), which record the information
in the table, then call the original, system-provided open() or close()
routines. A straightforward implementation of this would result in a naming
conflict; for example, our augmented open() routine would cover up the system
open() routine. To see how we avoid the naming conflict, see the accompanying
text box entitled, "Augmenting UNIX System Calls."



Signals


An easily overlooked part of the state of a process is its collection of
signal-handling attributes. In UNIX processes, signals may be blocked or
ignored; they may take default action or invoke a programmer-defined signal
handler. At checkpoint time, a table is built, again in the process's data
segment, which records the handling status for each possible signal. The set
of blocked signals is obtained from the sigprocmask() system call, and the
handling of each individual signal is obtained from the sigaction() system
call. During restart, Condor restores signal state by stepping through the
table. If a signal has been sent to a process while that process has the
signal blocked, the signal is said to be "pending." If a signal is pending at
checkpoint time, the same situation must be recreated at restart time. To
handle pending signals, we determine the set of pending signals at checkpoint
time with the sigispending() system call. During restart, the Condor library
code will first block each pending signal, then send itself an instance of
each pending signal. This ensures that if the user code later unblocks the
signal, it will be delivered.


CPU State


Saving and restoring the state of a process's CPU is potentially the most
machine-dependent part of the checkpointing code. Various processors have
different numbers of integer and floating-point registers, special-purpose
registers, floating-point hardware, instruction queues, and so on. You might
think that it would be necessary to build an assembler-code module for each
CPU to accomplish this task, but we've discovered that the signal mechanism
already available within the UNIX system call set can (in most cases) be
leveraged to do this work without assembler code. This is why a checkpoint is
always invoked by sending the process a signal. A characteristic of the UNIX
signal mechanism is that the signal handler could do anything at all with the
CPU, but when it returns, the interrupted user code should continue on without
error. This means that the signal-handling mechanism provided by the system
saves and restores all the CPU state we need.


Pulling It Together: A Complete Checkpoint and Restart Cycle 


Now that we've discussed some of the details, we can return to the big
picture: how a Condor checkpoint/restart is actually accomplished. Listing One
contains selected functions from Condor that illustrate this. When our
original process is born, Condor code installs a signal handler for the
checkpointing signal, initializes its data structures, and calls the user's
main(). At some arbitrary point during execution of the user code, the process
will receive a checkpoint signal which invokes checkpoint(). This routine
records information about the stack context, signal state, and open files into
data structures in the process's data area. Then, it writes the data and stack
spaces into a checkpoint file, and the process exits. At restart time, Condor
executes the same program with a special set of arguments that cause restore()
to be called instead of the user's main(). The restore() routine overwrites
its own data segment with the segment stored in the checkpoint file. Now it
has the list of open files, signal handlers, and so on, in its own data space,
and restores those parts of the state. Next, it switches its stack to a
temporary location in the data space and overwrites the stack of its own
process with the stack saved in the checkpoint file. The restore() routine
then returns to the stack location that was current at the time of the
checkpoint(); that is, restart() returns to checkpoint(). Now checkpoint()
returns, but recall that this routine is a signal handler: It restores all CPU
registers and returns to the user code that was interrupted by the checkpoint
signal. The user code resumes where it left off and is none the wiser. 


Location-Independent File-System Access via Remote System Calls


Process migration requires that a process can access the same set of files in
a consistent fashion from different machines. While this functionality is
provided in many environments via a networked file system (NFS, for example),
it is often desirable to share computing resources between machines which do
not have a common file system. For example, we have migrated processes between
our site at the University of Wisconsin-Madison, and several sites in Europe
and Russia. Since these sites certainly do not share a common file system,
Condor provides its own means of location-independent file access. This is
done by maintaining a process (called a "shadow") on the machine where the job
was submitted, which acts as an agent for file access by the migrated process
(see Figure 2). All calls to system routines that use file descriptors by the
user's code are augmented by the Condor Checkpoint Library so that they are
rerouted via RPCs to the shadow. (Again, see "Augmenting UNIX System Calls.")
The shadow process then executes the system call and passes the result back to
the Condor Checkpointing Library, which passes back the result to the user
code. Whether the user code uses write() directly, calls printf(), or calls
some other routine we have never heard of that ultimately exercises the
write() system call somewhere along the line, this redirection for functions
at the system-call level will ensure correct action.


Limitations 


While the designers of truly distributed operating systems such as vkernel and
sprite have carefully defined and implemented their process models to
accommodate migration, UNIX users are not so fortunate. In Condor, we have
taken the viewpoint that we can save and restore enough of the state of a
process to accommodate the needs of a wide variety of real-world user code.
There is, however, no way we can save all the state necessary for every kind
of process. The most glaring deficiency is our inability to migrate one or
more members of a set of communicating processes. In fact, no attempt is made
to deal with processes that execute fork() or exec(), or that communicate with
other processes via signals, sockets, pipes, files, or any other means. Some
inventive users have found ways to use Condor for communicating processes, but
they were forced to change their code to accommodate our limitations. Another
major limitation is that the Condor Checkpointing Library must be linked in
with the user's code. This is fine for folks who build and run their own
software, but it does not work for users of third-party software, who do not
have access to the source. We have considered schemes to provide a
checkpointing C library for dynamically linked third-party programs, but so
far we have not implemented anything. A major obstacle to such work is the
fact that shared-library implementations vary widely across platforms, and
such a facility would not be very portable.


Availability


Work on Condor is on-going, and Condor is available for free. Both binary-only
distributions as well as distributions with complete source code are available
for many different UNIX platforms via anonymous FTP over the Internet to
ftp.cs.wisc.edu in the /condor directory.
Augmenting UNIX System Calls
The UNIX man pages distinguish between "system calls" and "C-library routines"
(system calls are described in section 2, and library routines are described
in section 3). However, from the programmer's point of view, these two items
appear to be very similar. There may seem to be no fundamental difference
between a call to write() and a call to printf(); each is simply a procedure
call requesting some service provided by "the system." To see the difference,
consider the plight of a programmer who wants to alter the functionality of
each of these calls, but doesn't want to change their names. Preserving the
names is crucial if you want to link the altered write() and printf() routines
with existing code, which should not be aware of the change. The programmer
wanting to change printf() has at his disposal all the tools and routines
available to the original designer of printf(), but the programmer wanting to
change write() has a problem. How can you get the kernel to transfer data to
the disk without calling write()? We cannot call write() from within a routine
called write(); that would be recursion, and definitely not what we want here.
The solution is a little-known routine called syscall().
Every UNIX system call is associated with a number (defined by a macro in
<syscall.h>). You can replace an invocation of a system call with a call to
the syscall routine. In this case, the first argument is the system-call
number, and the remaining arguments are just the normal arguments to the
system call. For instance, the write() in Example 1 counts the number of times
write() was called in the program; otherwise, it acts exactly like a normal
write().
Interestingly, this trick works even if the user code never calls write()
directly, but only indirectly via standard C-library calls--printf(), for
example.
The Condor checkpointing code uses this mechanism to augment the functionality
of a number of system calls. For example, we augment the open() system call so
that it records both the name of the file being opened and the file-descriptor
number returned. This information is later used to reopen the file at restart
time.
--M.L.
Figure 1 When any machine in the Condor pool submits a job, the Central
Manager will select an otherwise-idle machine to perform the remote execution.
Figure 2 Framework for remote job execution between a submitting machine and a
remote-execution machine.
Figure 3 Address-space layout of UNIX processes.
Example 1: Augmenting the write() system call.
int number_of_writes = 0;
write( int fd, void *buf, size_t
 len ) {
 number_of_writes++;
 return syscall( SYS_write, fd,
 buf, len );
 }

Listing One 

/** This is the signal handler which actually effects a checkpoint. This 
function must be previously installed as a signal handler, since we assume the
signal handling code provided by the system will save and restore important 
elements of our context (register values, etc). A process wishing to
checkpoint
itself should generate the correct signal, not directly call this function. */

void
Checkpoint( int sig, int code, void *scp )
{
 if( SETJMP(Env) == 0 ) { // Save place here
 // dprintf() will log messages into condor system admin log files
 dprintf( D_ALWAYS, "About to save MyImage\n" ); 
 // This routine will step through our File state table and fills in 
 // information, for instance lseek all file descriptors to fill in 
 // where the file pointer for each open file is
 SaveFileState();
 // Here we fill in Signal state table (which signals are pending, etc)
 SaveSignalState();
 // These will now write out our data & stack into the checkpoint file
 MyImage.Save();
 MyImage.Write();
 dprintf( D_ALWAYS, "Ckpt exit\n");
 // now that we have saved all of our state, we terminate this process
 // we terminate with a signal so that the condor controlling daemons
 // (of whom one is our parent) will know that we exited after a 
 // checkpoint, as opposed to the user code exiting on its own
 terminate_with_sig( SIGUSR2 ); // this exits 
} else { 
 // We get here from the longjmp in RestoreStack() during restart!
 // Patch registers handles any messy CPU register business which is not
 // handled by UNIX itself as part of a signal handler. Note that
 // on all platforms except HPUX, this is a null procedure
 patch_registers( scp );
 // Close the checkpoint file, etc, before we re-open all files
 MyImage.Close();
 // Re-open and lseek back all previously opened files, then
 // re-install any user signal handlers and block/resend any pending 
 // signals.
 RestoreFileState();
 RestoreSignalState();
 return; // here we go back to user code (end of signal handler)
 }
}

/** Given an "image" object containing checkpoint information which we have
 just read in from disk, this method actually effects the restart. **/
void
Image::Restore()
{
 int save_fd = fd;
 int user_data;
 // Overwrite our data segment with the one saved at checkpoint time.
 RestoreSeg( "DATA" );
 // We have just overwritten our data segment, so the image
 // we are working with has been overwritten too. Fortunately,
 // the only thing that has changed is the file descriptor, which we 
 // also saved on the stack above at the start of this function.
 fd = save_fd;
 // Now we're going to restore the stack, so we move our execution
 // stack to a temporary area (in the data segment), then call
 // the RestoreStack() routine.
 ExecuteOnTmpStk( RestoreStack );
 // RestoreStack() also does the jump back to user code
 fprintf( stderr, "Error, should never get here\n" );
 exit( 1 );

}
void
RestoreStack()
{
 // This function rewrites the stack data area. Thus, we are called from
 // ExecuteOnTmpStk() which has repositioned the stack pointer into a safe
 // temporary chunk of memory in the data area
 // First, call our routine to restore stack data from the checkpoint file
 MyImage.RestoreSeg( "STACK" );
 // Now, restore the stack context, i.e. put the stack pointer back where
 // it was at checkpoint time. Do this via a LONGJMP using a JMP_BUF
 // we created at checkpoint time in the Checkpoint() routine.
 // Will move execution back to the else clause in Checkpoint() routine!!
 LONGJMP( Env, 1 ); 
}
static void (*SaveFunc)();
static jmp_buf Env;

const int TmpStackSize = 4096;
static char TmpStack[ TmpStackSize ]; // buffer will end up in data area
/* Execute the given function on a temporary stack in the data area. */
void
ExecuteOnTmpStk( void (*func)() )
{
 jmp_buf env;
 SaveFunc = func; // save in global; we're going to lose stack frame
 if( SETJMP(env) == 0 ) {
 // First time through - move SP
 if( StackGrowsDown() ) {
 JMP_BUF_SP(env) = (long)TmpStack + TmpStackSize;
 } else {
 JMP_BUF_SP(env) = (long)TmpStack;
 }
 LONGJMP( env, 1 );
 } else {
 // Second time through - call the function
 SaveFunc();
 }
}
























Extending C++ for Distributed Applications


One approach to implementing groupware




Patrick Suel 


Patrick holds advanced degrees in theoretical physics and computer science. He
works at ILOG Inc. and can be reached at suel@ilog.com.


In recent years, networking has completely revolutionized organizations by
allowing workers in different locations to share and access information. The
main problem now is to make sure that this valuable information stays
consistent and can be easily manipulated by multiple applications. A groupware
situation exists whenever a piece of information (that is, an object) is
manipulated by two actors (applications, processes, or other objects) at the
same time. This kind of application integration is only now gaining support in
the form of development tools for groupware. In this article, we'll explore
issues relating to groupware development and deployment, and describe ILOG
Server, a tool that enables the development of dynamic servers of C++ objects.
Given an appropriate object request broker, these servers can be distributed
across a network in a transparent manner.
ILOG Server implements a system that, among other things, automatically
manages object integrity, allows the programmer to define constrained values
for structures, provides a facility for computing cross-references on
structures, and provides a notification mechanism for structures that ensures
the consistency of views. The major constraint in using ILOG Server is its
restriction to C++ as the sole implementation language. This is because ILOG
Server is implemented as an extension to the C++ language via a preprocessor
that generates portable, standard C++ code. Presently, ILOG Server runs on a
range of UNIX platforms, as well as Windows, Windows NT, and OS/2.


The Example Application


To illustrate the concept of distributed groupware, we're presenting an
application that simulates the visualization of air traffic in the United
States. This application can display various airlines, routes (which we call
"lines"), flights, and airports. The application is partitioned into four
types of processes: 
The airline-data server, which manages the shared data structures representing
companies, lines, airports, and flights.
The agency client, which displays a map of the United States with all the
airline routes, similar to that of the Federal Aviation Agency (FAA).
The airline-company client, which an airline would use to choose what routes
it operates between cities and to specify how many flights there are on each
airline. A company can open/close its lines and modify the schedule of its own
flights.
The airport client, which would be used by airport staff to see all the
arriving and departing flights from all airlines. Each client shows two flight
boards: one for arrivals and one for departures. 
In Figure 1, for example, a line (a route) can be represented simultaneously
in three different ways: as an arrow between airports on graphical maps, as a
textual entry in the table of lines of the airline, and as a whole table
containing the line's flights.
Eliminating an arrow in the airline map will therefore trigger multiple
actions: removal of the corresponding arrow from the agency map, removal of an
entry from the table of airlines, closure of the list of flights for this
line, and removal of these flights from airport departure and arrival boards.
This example illustrates a principal characteristic of groupware applications:
Many clients are able to simultaneously see and manipulate the same
information displayed locally under different representations. As soon as a
modification of the structure occurs, all the clients viewing that information
are notified within their own context.


The MVC Paradigm


Today, the language of choice for implementing real-world, object-oriented
applications is C++. However, the earliest general-purpose approach to
managing the consistency of applications can be found in Smalltalk, in the
form of the well-known Model-View-Controller (MVC) paradigm for application
architectures.
The MVC paradigm was developed at Xerox PARC in the late 1970s, and is used in
the classic Smalltalk-80 system for presenting different graphical views of
the same object. For example, an integer value such as Temperature can be
displayed as a numerical digit in a text box, as the position of a needle in a
gauge, or as a point in a graph. The MVC approach, in its initial form,
applied only to objects within the same Smalltalk program. However, this
paradigm can be extended to sharing objects across different applications, and
to more-generic, nongraphical models.
In a distributed scheme, MVC can separate application objects (found in a
server) from representation objects or views (generally found in clients).
This is more than just good programming practice: It allows a single object to
have multiple dynamic representations attached to it. In such a case, the
creation and destruction of views is independent from the creation and
destruction of application objects in the server. 
An MVC architecture is best implemented in a language that provides dynamic
binding, allowing for virtuality on method arguments. Unlike Smalltalk, this
functionality is not available in the current form of the C++ language.
Moreover, the MVC approach has some limitations. Its use in the context of
structured objects is difficult. The notification mechanism that propagates
update information cannot be made incremental (as would be done in a diffusion
model). Going beyond fundamental datatypes with MVC can become complicated if
you stick to standard C++.
One way to overcome the lack of dynamic binding in C++ is via code generation.
A tractable way of generating code is to extend C++ with keywords that can be
used to annotate code in header files. A preprocessor parses these annotated
headers and automatically generates the appropriate C++ code. Example 1
illustrates the annotation technique: the annotated header is shown in Example
1(a) (specifically, the ILB_ENTRY keyword), while the corresponding
preprocessor output is Example 1(b). The preprocessor generates C++ code,
which declares and implements accessors and mutators on annotated data. In
this example, the class Flight is updated via the generated mutator function:
void Passengers(int). 


Composition Versus Inheritance


Object-oriented languages, unlike object-oriented methodologies, place a
strong emphasis on inheritance and have little direct support for expressing
relationships. However, most applications seem to use the composition relation
to a greater extent than inheritance. ILOG Server extends C++ to model this
kind of relationship, which is crucial to implementing consistent object
models.
In an object-oriented design, attaching a view corresponds to attaching a C++
class. Let's imagine that we need two views on the model:
List all lines for an airline company, showing the arrival and departure
airports (this is the case of a view attached to an Airline). 
List all departures and arrivals for a given airport (a view attached to an
Airport).
To meet the first requirement, the program needs to access the arrival and
departure airport for each line. In C++, this can be done by keeping two
pointers to airports in each Line object. The second requirement is difficult
since one needs to provide a return pointer from the airports to the lines.
This can lead to situations where an airport may not be connected to the
correct line. 


Bidirectional Smart Pointers


A way to solve this problem is to extend the C++ notion of pointer so that it
is a reversible relation. ILOG Server provides annotations for specifying
relations between classes via two keywords: ILB_USES and ILB_HAS; see Example
2. This relationship can also bear cardinalities that will automatically
manage the maximum and minimum number of target objects for the relation. The
ILB_HAS keyword expresses a notion of exclusive ownership: A given object
(say, a Flight) is owned once by another object (an Airline). The ILB_USES
keyword introduces the concept of utilization: The object Airline uses the
object Departure. ILOG Server relies on "smart pointers," so that the
developer does not have to explicitly destroy an object by calling the
operator delete, which can be fatal when dealing with a large network of
interrelated objects.
When the preprocessor encounters ILB_USES or ILB_HAS in a declaration, it
generates functions and data members for class Airline. The member functions
are generated with the degree of access current in the declaration (in this
case, public). These member functions make it possible to access objects that
are the target of the relations Departure, Arrival, and Flight, and thus
manipulate them and the data members generated in the private part (and stored
in the relation).
The member functions generated for Departure are shown in Example 3 and
perform the following tasks:

Departure() is an accessor that returns a smart pointer to the target object
of the relation. The type ILB_SMART(Airport) participates in the automatic
management of object destruction. It is generated automatically by ILOG Server
and can be used as a pointer to Airport.
Departure(ILB_SMART(Airport) target) is a mutator that replaces the target
object of Departure by the structure given in argument.
Without creating an explicit symmetric relation from an Airport to an Airline,
it is possible for the developer to implement an Airport function returning
the arriving and departing Lines. This means that ILOG Server automatically
generates bidirectional pointers that ensure coherence of the model. 
Moreover, the inverse relations are used for referential integrity. For
instance, if an Airport is removed, the related Lines no longer exist and
disappear from their parent companies. Not only are the data structures
automatically updated, but object destruction is also carried out
automatically.
ILOG Server provides generic mechanisms to ensure referential integrity
through annotation and also offers the possibility to locally adapt the
relation behavior to fit specific needs. The referential integrity of a model
is not predetermined but depends on the form of model itself. Unlike a garbage
collector, which only reacts to local pointers, ILOG Server performs nonlocal
operations on structures.


Information-Sharing Models


Once an object model has been designed and implemented, its objects can
provide multiple views for different clients. In that case, the object model
becomes an object server that can notify the various clients connected to it
when the model changes. The notification mechanism that animates views is the
heart of this groupware application.
Currently, there are three principal models for sharing object information in
a groupware application: the facet, coupled, and diffusion models. In the
facet model, each actor (a process or program) is aware of all the other
actors with which it exchanges information. Adding a new view generally
affects the implementation of all the other actors. This model does not scale
well, resulting in a combinatorial explosion due to the lack of abstraction. 
In the coupled model, application objects are clearly separated from
representation objects. Each action performed on the application object
incorporates the feedback to each view. Adding a new view requires
modification of the actions and therefore of the model. This model also
suffers from a combinatorial explosion.
The diffusion model is derived from the coupled model and decouples the
feedback from the actions by propagating the very same application object to
all views. Each view is then responsible for decoding the notification it
receives. Performance is generally poor because the notification cannot be
made incremental. When the application becomes distributed, network bandwidth
becomes a scarce resource. Moreover, as in the previous cases, the server does
not respect the client API. MVC-influenced implementations generally use a
diffusion model.
To deal with these shortcomings, we introduce the object-server model. This
model may or may not be distributed. In fact, the developer should not worry
about this issue and can decide to distribute the system later without changes
to the source. In our approach, an object model is created with a set of views
attached to clients within the server. The notification is selective and
adapted to each client's API. Traffic from the server to the client is
incremental and reduced to selected service calls to the client's API (this is
particularly interesting in the case of distributed objects). Each client is
independent from the other, and adding a view does not require modifying the
model. This architecture is shown in Figure 2.
A view is a class, separated from the object model, which contains a number of
notification functions. In a view, the programmer specifies the classes of the
model that will be notified. You can then define three different types of
notification functions on the object of the model: creation, destruction, and
modification.
The most important feature of the object-server architecture is that the
server adapts itself to both the API and the logic of clients. If a client is
a spreadsheet view, then its cells, upon modification of the objects in the
server they represent, will receive a specific, spreadsheet-cell notification,
not an abstract message they will need to decode.
An object server can work in both linked and distributed mode. For instance,
one can implement a linked object server which enables different workstations
to participate in a groupware application. In fact, the object-server
architecture is invariant whether one uses a network or not. 


Derived Attributes


In object-oriented programs, there is often the notion of an attribute (a data
member). The annotation technique in ILOG Server can enable the notification
of attributes. We distinguish two types of attributes: entry (using the
keyword ILB_ENTRY) and derived (keyword ILB_DERIVED). To illustrate these
attribute types, consider the case of a spreadsheet cell whose formula can be
statically defined through C++ functions. Entry attributes are those not
constrained by others but that still need to be notified. Derived attributes
have values that are functions of other attributes.
Going back to the airline example, let's add a data member to the class
Airline. This new data member will count the number of passengers on the line
at a given time. The number of passengers is the sum of all the passengers
traveling on all the flights of that airline. We will assume that the number
of passengers on a flight is a data member of that flight as well. As shown in
Example 4(a), the data member Passengers, annotated by the keyword
ILB_DERIVED, constitutes the declaration of the active value. The member
function countPassengers() runs through the list of flights and sums up the
passengers from each flight. To define the rule for computing the passengers,
we simply have to define the function ILB_EVALUATE(Airline, Passengers), as
shown in Example 4(b).
Once a derived attribute has been declared and defined, the data member it
controls will be recomputed automatically based on various updates in the
model, such as adding/removing passengers from a flight on the line, or
adding/removing entire flights on the line. Derived data members are
simultaneously sensitive to modifications of other data members, even those
remote in the structure, and are also sensitive to establishing or breaking
off relations among objects.


Handling Complex Updates 


After designing an object server, the development of the views can be done in
parallel with the development of the clients, since only their APIs have to be
known. Attaching the notification mechanisms is a simple step that can be
performed at the end without any surprises. 
Consider a relatively complex application scenario. In the airline
application, routes (lines) belong to airline companies and are displayed in
multiple views (tables, graphs). Similarly, one can view arriving and
departing flights for a given airport. This view is a cross-section of the
model, compared to the views by company or line. One application function that
may be needed is the transferring of an entire line from one company to
another. This operation impacts all the views opened on the model. First, the
line needs to disappear from the original company; then, it needs to appear on
the map of the target company with the correct color; lastly, the different
company line and airport tables must be updated. Using ILOG Server, adding
this functionality is a matter of the five lines of code shown in Example 5.
The cut() function is automatically generated by ILOG Server and performs a
cut operation on the object--that is, it is removed from its owner (Airline).
The cut object must then be attached to a new owner by adding it to an
internal list. ILOG Server adapts itself to the locality of the view that the
client has on the information. In the case of Line::Transfer(), one
destruction and one creation operation are triggered in completely different
representations, while the object just moved from one structure to another.
Since the model has been modified, all relevant structures will be
automatically notified and updated. This kind of operation is difficult to do
using the facet or coupled models.


From Groupware to Systems Integration


Deploying groupware technology in an existing enterprise is only successful if
existing heterogeneous systems can be integrated. With the object-server
architecture, it is possible to create a server of C++ objects to which
different applications, even legacy systems, can connect and access common
services. Instead of drastically modifying existing applications, you can
extend them, as long as they offer a C++-compatible API. Each application
becomes a client of the newly created object server when connecting to a view.
Such integration can extend to databases. A database can be considered a
client of an object server through its API. Doing so enables the server to
selectively notify the database, in real time, of object updates. This can
transform any standard relational database into a persistent repository for
C++ objects.
Going back to the airline-management example, this application has to manage a
common repository stored in a database, offer multiple views of the same
information under different representations (tables, graphs, maps, lists), and
ensure consistency between views. If any of the clients already existed as a
separate application, one would only have to create a small API around it and
add the corresponding view in the object server. 
Example 1: (a) A simple class, as annotated for the preprocessor; (b) the
corresponding preprocessor output.
(a)
class Flight
{
 public:
 ILB_ENTRY int Passengers; // data member subject to notification
};
(b)
class Flight
{
 public: // these are generated functions
 int Passengers(); // an accessor function (to get value)
 void Passengers(int); // a mutator function (to set value)
 private:
 int _Passengers; // the real data member is private
};

Figure 1 Multiple views in the Airline application.
Figure 2 The architecture of the object server model and its clients.
Example 2: Specifying relationships between classes via annotations.
class Airline
{
 public:
 ILB_USES Airport *Departure; // Departing airport
 ILB_USES Airport *Arrival; // Arriving airport
 ILB_HAS Flight *Flight {0, ...}; // Flight with cardinality unlimited
};
Example 3: Member functions generated for Departure relation.
ILB_SMART(Airport) Departure ();
ILB_SMART(Airport) Departure ( ILB_SMART(Airport) target);
Example 4: (a) A derived attribute for class Airline; (b) function that
calculates the derived attribute.
(a)
class Airline
{
 public:
 // ...other data members...
 ILB_DERIVED int Passengers; // passenger count
 int countPassengers(); // evaluation function
};
(b)
int ILB_EVALUATE(Airline, Passengers) ()
{
 return owner().countPassengers();
}
Example 5: Transferring a route (line) from one company to another.
void Line::Transfer(char* new_co_name)
{
 Airline *new_co = Airline::get(new_co_name); // Get the company
 if(new_co)
 {
 cut(); // cut the Airline.
 new_co->Lines().cons(this); // paste Line into the new Airline.
 }
}


























Reading GIF Files


Manipulating graphics files




Wilson MacGyver Liaw


Wilson, who holds a computer-science degree from Ohio State University, can be
reached at macgyver@cis.ohio-state.edu.


The Graphics Interchange Format (GIF) has become one of the more popular
formats for storing images. First developed by CompuServe in 1987 as a way of
exchanging images across different platforms, it has since become the de facto
graphics interchange standard for the Internet as well.
The original GIF87a format supported 256 colors and compressed images with a
variant of the LZW algorithm. Although limited by today's measures, GIF was
still an instant success. This is somewhat surprising since GIF, unlike other
graphics file formats, is protected by CompuServe copyrights and built upon
the patented LZW compression scheme. However, the only restriction on using
GIF is that you acknowledge the CompuServe copyright.
The GIF standard was revised in 1989, resulting in the newer standard known as
"GIF89a." GIF87a/GIF89a-compliant encoders and readers are available for most
platforms. In this article, I'll focus on the reading process. For information
on writing and otherwise manipulating GIF files, I recommend Bitmapped
Graphics Programming in C++, by Marv Luse (Addison Wesley, 1993), or
Programming for Graphics Files in C and C++, by John Levine (John Wiley &
Sons, 1994). 


The GIF Format


Every GIF file starts with a header block identifying the file as a GIF file.
The header block is always six bytes long, and the value is either GIF87a or
GIF89a. 
Following the header block is the logical screen-descriptor block (see Figure
1) containing:
 Logical screen width, a 2-byte value.
 Logical screen height, a 2-byte value.
 A 1-byte packed field, containing the global color-table flag, color
resolution, sort flag, and size of global color table.
 Background-color index, storing the index value to the global color table for
drawing any area not covered by an image.
 Pixel aspect ratio, a 1-byte value used to approximate the original image's
aspect ratio (aspect ratio=(pixel aspect ratio+15)/64). This allows for a
range of pixel widths, from 4:1 for the widest pixel, to 1:4 for the tallest
pixel.
If the 1-bit global color-table flag is set to 1, the global color table will
follow the logical screen-descriptor block. The 3-bit color-resolution value
indicates the number of bits per primary color available, minus one. 
The sort flag indicates whether the global color table has been sorted. If the
value is 1, then the table is sorted in order of decreasing importance. The
size of the global color table (three bits) is 2(value+1), which yields the
maximum of 256 colors. 
A global color-table entry is a triple in the form of red, green, and blue.
Each color value occupies one byte. Thus, each entry is three bytes.
After the program has processed these structures, the logical drawing space is
ready to read in the images. Every image starts with the image descriptor
block, followed by an optional local color table and the LZW-compressed image
data.
The image descriptor block (see Figure 2), which is similar to the logical
screen descriptor block, consists of:
 Image separator (one byte), which contains the value 0x2C. This indicates
that a new image descriptor is starting.
 Image left position (two bytes), which contains the initial X position of the
image. The left-most X is 0.
 Image top position (two bytes), which contains the initial Y position of the
image. The top-most Y is 0.
 Image width (two bytes), which contains the width of the image.
 Image height (two bytes), which contains the height of the image.
 A 1-byte packed field, with the following values: local color-table flag,
interlace flag, sort flag, and size of local color table. 
The local color-table flag (one bit) indicates the presence of the local
color- table block. If present, the local color table is read and used instead
of the global color table for this image only. The local color table follows
the same triple format as the global color table. 
The interlace flag (one bit) indicates whether the image is stored in the
four-pass interlace pattern. If the value is 1, the image data is stored in
the interlace form. The data is then arranged in the following manner: The
first group of data gives pixels of every eighth row, starting with row 0. The
second group gives pixels of every eighth row, starting with row number four.
The third group gives pixels of every fourth row, starting with row number
two. The last group gives every second row, starting with row number one.
Table 1 illustrates this grouping.
The sort flag indicates if the table is sorted in order of decreasing
importance, as it does in the global color table. The size of the local color
table is determined in the same way as the global color table, by computing
2(value+1).
The first byte of the compressed image data is the LZW minimum code size,
followed by a series of bytes. The compressed image data is terminated with an
end-of-information code. 
GIF compresses the data using the LZW algorithm, with two major differences:
First, it uses a variable-length code size. The value from the LZW minimum
code size is the initial number of bits used for the compression codes. When
the number of patterns detected by the encoder exceeds the maximum number of
patterns allowed by the current bit size, the number of bits per code is
increased by one. GIF allows up to 12 bits per code, thus the maximum is 4096
codes. 
Second, GIF adds two special codes. The first is a clear code, defined to be
2(code size). For example, if the code size is 2, then the clear code would be
4. The clear code resets and initializes the table back to the startup state.
The second code is an end-of-information code. The value is defined as (Clear
Code+1). This marks the end of the LZW-compressed image data. All other
aspects of the GIF-variant LZW algorithm are the same as the standard LZW
algorithm. For more information on the LZW algorithm, refer to the article,
"LZW Data Compression," by Mark R. Nelson (DDJ, October 1989). Listing One
contains C code, written by Steven A. Bennett, that deals with the GIF-variant
LZW algorithm.


Extension and Trailer Blocks


GIF89a introduces extension blocks that add extra functions in GIF without
requiring massive changes in the format itself. There are many types of
extension blocks. They all start with a byte called the "extension introducer"
which always contains the value 0x21. Every extension block is also terminated
with a block terminator (a byte containing the value 0x00). This allows the
reader to skip the extension block.
For example, the comment extension block starts with the extension introducer,
followed by the comment label, comment data, and the block terminator. The
1-byte comment label has the value of 0xFE and identifies the extension block
as a comment extension block. The comment data are 7-bit ASCII characters. The
comment extension block is used to comment the image file. For information on
other extension blocks, see Graphics Interchange Format Version 89a, available
on the CompuServe Graphics Support Forum, Library 17.
All GIF files conclude with the trailer block, a single byte containing the
value 0x3B, which indicates the end of the GIF file.
Table 1: Interlace groupings.
 Row Number 1st Group 2nd Group 3rd Group 4th Group 
 0 x

 1 x
 2 x
 3 x
 4 x
 5 x
 6 x
 7 x
 8 x
 .
 .
 .
Figure 1 Logical-screen-descriptor layout.
Figure 2 Image-descriptor-block layout.

Listing One 

/* decode.c is Steven Bennett's code with some minor changes */
/* the original copyright from him follows */
/* Wilson MacGyver Liaw */

/* DECODE.C - An LZW decoder for GIF
 * Copyright (C) 1987, by Steven A. Bennett
 * Permission is given by the author to freely redistribute and include
 * this code in any program as long as this credit is given where due.
 * In accordance with the above, I want to credit Steve Wilhite who wrote
 * the code which this is heavily inspired by...
 * GIF and 'Graphics Interchange Format' are trademarks (tm) of
 * Compuserve, Incorporated, an H&R Block Company.
 * Release Notes: This file contains a decoder routine for GIF images
 * which is similar, structurally, to the original routine by Steve Wilhite.
 * It is, however, somewhat noticably faster in most cases.
 */

/* the defined ERRS */
#define OUT_OF_MEMORY -10
#define BAD_CODE_SIZE -20
#define READ_ERROR -1
#define WRITE_ERROR -2
#define OPEN_ERROR -3
#define CREATE_ERROR -4

extern char *malloc(); /* Standard C library allocation */

/* extern int get_byte() - This external (machine specific) function is 
 * expected to return either the next byte from the GIF file, or a negative 
 * number, as defined. */
extern int get_byte();

/* extern int out_line(pixels, linelen)
 * unsigned char pixels[];
 * int linelen;
 * This function takes a full line of pixels (one byte per pixel) and
 * displays them (or does whatever your program wants with them...). It
 * should return zero, or negative if an error or some other event occurs
 * which would require aborting the decode process... Note that the length
 * passed will almost always be equal to the line length passed to the
 * decoder function, with the sole exception occurring when an ending code
 * occurs in an odd place in the GIF file... In any case, linelen will be
 * equal to the number of pixels passed... */

extern int out_line();

/* extern int bad_code_count; - This value is the only other global required 
 * by the using program, and is incremented each time an out of range code is 
 * read by the decoder. When this value is non-zero after a decode, your GIF 
 * file is probably corrupt in some way... */
extern int bad_code_count;

#define NULL 0L
#define MAX_CODES 4095

/* Static variables */
static short curr_size; /* The current code size */
static short clear; /* Value for a clear code */
static short ending; /* Value for a ending code */
static short newcodes; /* First available code */
static short top_slot; /* Highest code for current size */
static short slot; /* Last read code */

/* The following static variables are used for seperating out codes */
static short navail_bytes = 0; /* # bytes left in block */
static short nbits_left = 0; /* # bits left in current byte */
static unsigned char b1; /* Current byte */
static unsigned char byte_buff[257]; /* Current block */
static unsigned char *pbytes; /* Pointer to next byte in block */

static long code_mask[13] = {
 0,
 0x0001, 0x0003,
 0x0007, 0x000F,
 0x001F, 0x003F,
 0x007F, 0x00FF,
 0x01FF, 0x03FF,
 0x07FF, 0x0FFF
 };
/* This function initializes the decoder for reading a new image. */
static short init_exp(size)
 short size;
 {
 curr_size = size + 1;
 top_slot = 1 << curr_size;
 clear = 1 << size;
 ending = clear + 1;
 slot = newcodes = ending + 1;
 navail_bytes = nbits_left = 0;
 return(0);
 }
/* get_next_code() - gets the next code from the GIF file. Returns the code,
or
 * else a negative number in case of file errors... */
static short get_next_code()
 {
 short i, x;
 unsigned long ret;
 if (nbits_left == 0)
 {
 if (navail_bytes <= 0)
 {
 /* Out of bytes in current block, so read next block */
 pbytes = byte_buff;

 if ((navail_bytes = get_byte()) < 0)
 return(navail_bytes);
 else if (navail_bytes)
 {
 for (i = 0; i < navail_bytes; ++i)
 {
 if ((x = get_byte()) < 0)
 return(x);
 byte_buff[i] = x;
 }
 }
 }
 b1 = *pbytes++;
 nbits_left = 8;
 --navail_bytes;
 }
 ret = b1 >> (8 - nbits_left);
 while (curr_size > nbits_left)
 {
 if (navail_bytes <= 0)
 {
 /* Out of bytes in current block, so read next block */
 pbytes = byte_buff;
 if ((navail_bytes = get_byte()) < 0)
 return(navail_bytes);
 else if (navail_bytes)
 {
 for (i = 0; i < navail_bytes; ++i)
 {
 if ((x = get_byte()) < 0)
 return(x);
 byte_buff[i] = x;
 }
 }
 }
 b1 = *pbytes++;
 ret = b1 << nbits_left;
 nbits_left += 8;
 --navail_bytes;
 }
 nbits_left -= curr_size;
 ret &= code_mask[curr_size];
 return((short)(ret));
 }
/* The reason we have these seperated like this instead of using a structure
 * like the original Wilhite code did, is because this stuff generally
produces
 * significantly faster code when compiled. This code is full of similar
 * speedups. (For a good book on writing C for speed or for space
optimization,
 * see Efficient C by Tom Plum, published by Plum-Hall Associates.) */
static unsigned char stack[MAX_CODES + 1]; /* Stack for storing pixels */
static unsigned char suffix[MAX_CODES + 1]; /* Suffix table */
static unsigned short prefix[MAX_CODES + 1]; /* Prefix linked list */

/* short decoder(linewidth)
 * short linewidth; * Pixels per line of image *
 * - This function decodes an LZW image, according to the method used
 * in the GIF spec. Every *linewidth* "characters" (ie. pixels) decoded
 * will generate a call to out_line(), which is a user specific function
 * to display a line of pixels. The function gets it's codes from

 * get_next_code() which is responsible for reading blocks of data and
 * seperating them into the proper size codes. Finally, get_byte() is
 * the global routine to read the next byte from the GIF file.
 * It is generally a good idea to have linewidth correspond to the actual
 * width of a line (as specified in the Image header) to make your own
 * code a bit simpler, but it isn't absolutely necessary.
 * Returns: 0 if successful, else negative. (See ERRS defined) */
short decoder(linewidth)
 short linewidth;
 {
 register unsigned char *sp, *bufptr;
 unsigned char *buf;
 register short code, fc, oc, bufcnt;
 short c, size, ret;
 /* Initialize for decoding a new image... */
 if ((size = get_byte()) < 0)
 return(size);
 if (size < 2 9 < size)
 return(BAD_CODE_SIZE);
 init_exp(size);
 /* Initialize in case they forgot to put in a clear code.
 * (This shouldn't happen, but we'll try and decode it anyway...) */
 oc = fc = 0;
 /* Allocate space for the decode buffer */
 if ((buf = (unsigned char *)malloc(linewidth + 1)) == NULL)
 return(OUT_OF_MEMORY);
 /* Set up the stack pointer and decode buffer pointer */
 sp = stack;
 bufptr = buf;
 bufcnt = linewidth;
 /* This is the main loop. For each code we get we pass through the
 * linked list of prefix codes, pushing the corresponding "character" for
 * each code onto the stack. When the list reaches a single "character"
 * we push that on the stack too, and then start unstacking each character
 * for output in the correct order. Special handling is included for the
 * clear code, and the whole thing ends when we get an ending code. */
 while ((c = get_next_code()) != ending)
 {
 /* If we had a file error, return without completing the decode */
 if (c < 0)
 {
 free(buf);
 return(0);
 }
 /* If the code is a clear code, reinitialize all necessary items. */
 if (c == clear)
 {
 curr_size = size + 1;
 slot = newcodes;
 top_slot = 1 << curr_size;
 /* Continue reading codes until we get a non-clear code
 * (Another unlikely, but possible case...) */
 while ((c = get_next_code()) == clear)
 ;
 /* If we get an ending code immediately after a clear code
 * (Yet another unlikely case), then break out of the loop. */
 if (c == ending)
 break;
 /* Finally, if the code is beyond the range of already set codes, 

 * (this had better NOT happen. I have no idea what will result from
 * this, but I doubt it will look good) then set it to color zero. */
 if (c >= slot)
 c = 0;
 oc = fc = c;
 /* And let us not forget to put the char into the buffer. And if, on
 * the off chance, we were exactly one pixel from the end of the line,
 * we have to send the buffer to the out_line() routine... */
 *bufptr++ = c;
 if (--bufcnt == 0)
 {
 if ((ret = out_line(buf, linewidth)) < 0)
 {
 free(buf);
 return(ret);
 }
 bufptr = buf;
 bufcnt = linewidth;
 }
 }
 else
 {
 /* In this case, it's not a clear code or an ending code, so
 * it must be a code code... So we can now decode the code into
 * a stack of character codes. (Clear as mud, right?) */
 code = c;
 /* Here we go again with one of those off chances... If, on the
 * off chance, the code we got is beyond the range of those already
 * set up (Another thing which had better NOT happen...) we trick
 * the decoder into thinking it actually got the last code read.
 * (Hmmn... I'm not sure why this works... But it does...) */
 if (code >= slot)
 {
 if (code > slot)
 ++bad_code_count;
 code = oc;
 *sp++ = fc;
 }
 /* Here we scan back along the linked list of prefixes, pushing
 * helpless characters (ie. suffixes) onto the stack as we do so. */
 while (code >= newcodes)
 {
 *sp++ = suffix[code];
 code = prefix[code];
 }
 /* Push the last character on the stack, and set up the new
 * prefix and suffix, and if the required slot number is greater
 * than that allowed by the current bit size, increase the bit
 * size. (NOTE - If we are all full, we *don't* save the new
 * suffix and prefix... I'm not certain if this is correct...
 * it might be more proper to overwrite the last code... */
 *sp++ = code;
 if (slot < top_slot)
 {
 suffix[slot] = fc = code;
 prefix[slot++] = oc;
 oc = c;
 }
 if (slot >= top_slot)

 if (curr_size < 12)
 {
 top_slot <<= 1;
 ++curr_size;
 } 
 /* Now that we've pushed the decoded string (in reverse order) onto
 * the stack, lets pop it off and put it into our decode buffer.
 * And when the decode buffer is full, write another line... */
 while (sp > stack)
 {
 *bufptr++ = *(--sp);
 if (--bufcnt == 0)
 {
 if ((ret = out_line(buf, linewidth)) < 0)
 {
 free(buf);
 return(ret);
 }
 bufptr = buf;
 bufcnt = linewidth;
 }
 }
 }
 }
 ret = 0;
 if (bufcnt != linewidth)
 ret = out_line(buf, (linewidth - bufcnt));
 free(buf);
 return(ret);
 }

































RTFHelp for Windows Help Files


An intuitive, easy way to generate help files




Joseph Hlavaty


Joe is a systems programmer at a major hardware vendor. He is a graduate of
Georgetown University and currently lives and works in the Washington, DC
area. He can be contacted at jhlavaty@aol.com.


At some time or another, every Windows-application developer has to generate
help files. However, the standard method of generating Windows help files is
tedious and complicated, requiring nonintuitive commands that give little
indication of the actual function they represent. For example, $\footnote
creates a topic title. 
Furthermore, the Windows 3.1 SDK help compiler (HC) requires a Rich Text
Format (RTF) file format, locking you into using a word processor or text
editor that supports the RTF specification. The 3.1 SDK does introduce an
"extended subset" RTF spec, which means that you can write help topic files
without an RTF-generating word processor, but this doesn't help you associate
commands with function (for example, \uldb marks a "hot spot" in help text).
You could use the RTF specification to discover these commands, if you did not
have access to the Windows 3.1 SDK. However, you would only know through trial
and error which tags are understood by the help compiler. (Interestingly, the
Windows 95 "WordPad" applet generates RTF files, unlike the Windows 3.1 Write
program.)
This process is further complicated by the fact that the resulting RTF file
can only be understood by an expert in Windows help-file generation. Many of
the Windows help specification's commands are buried in footnotes that are
difficult to associate to the text they are generated from. 
The RTFHelp filter I present here is an intuitive, readable process for easy
Windows help-file generation. At the very least, it is more intuitive and
readable than the usual process for Windows help-file generation. The RTFHelp
filter takes a simple tagged file--created with any text editor--and generates
an RTF file that can be passed through the Windows help compiler. The tagged
file that RTFHelp processes can use any of the tags from the list in Table 1.
Note, however, that certain help functions require specific tags in a
particular order. This is a requirement of the Windows help specification, not
RTFHelp. The RTFHelp executable and source code is available electronically;
see "Availability," page 3. 


RTFHelp Background 


RTFHelp is a bound executable. The version available electronically was
developed with MSC 6.x using standard C libraries for OS/2. The resulting
executable (a protected-mode OS/2 1.x character-based application) is run
through the bind utility, which creates a DOS character-based version of the
application in the same executable. The resulting executable is larger than
normal, but runs as both an OS/2 application and a DOS application. Of course,
if you do not have access to OS/2 libraries, you can build just the DOS
version of RTFHelp by linking with a DOS C library. RTFHelp source, however,
is completely operating-system independent and makes only C standard-library
calls. 
The RTFHelp process and the normal Windows process are shown in Table 2.
RTFHelp does add a small (automated) step to Windows help-file generation,
namely the compilation of a help description code (HDC) file into an RTF file
for the help compiler. With the RTFHelp process, instead of requiring an
expensive, hard-to-use Windows word processor, you can use any editor that can
create ASCII text files. All you do is write your help text, tag it, run it
through RTFHELP to generate a RTF file, and then pass the resulting file to
the Windows help compiler to generate an HLP file for your Windows
application. 


A Sample HDC File for RTFHelp


The file SAMPLE.HDC in Listing One is the tagged file that is the source for
the Windows Help file that accompanies this article (SAMPLE.HLP). All RTFHelp
keywords must be typed in upper case, minimizing the potential for
user-untagged help text to conflict unintentionally with RTFHelp keywords.
Upper case also helps the keywords stand out in the source file. 
I put initial values for margins and the like at the start of the HDC file.
This help file begins with the DEFFONT 0 line, which selects the first font in
the font table (which is zero-based) to be used as the default font for this
help file. The font table is RTFHelp dependent. The FONTSIZE 24 means that
your help file wants a 12-point (1/6") font to be our default font size. The
LEFTINDENT 720 and RIGHTINDENT 720 lines give us 1/2" left and right margins
in our help window. This means that no matter what the size of our help
window, Windows will try to maintain one-half inch left and right empty
margins surrounding our text within the help window. Note that the FONTSIZE
command and the LEFTINDENT command use two different units: The FONTSIZE
parameter is given in 1/2 points, and LEFTINDENT parameters are given in
1/20th of a point values. These are restrictions of the Rich Text Format
specification itself. RTFHelp could be modified to give some standard
measurement for many such commands, but this would require analysis and
conversion of these units internally. RTFHelp does not currently do this; all
values are simply passed straight through the filter to the generated RTF
file. If you desire such a function, it's best to take TWIPs (the smallest
unit of measure) as your standard unit for ease and reliability of
conversions; see Table 3 for conversions. 
Help files are broken down into topics, with each topic a collection of all
information found on a single page. The first topic consists of all comments
and text up to (and not including) the next PAGE tag that will begin a new
page. The first topic is usually the table of contents of the help file. You
can make another topic the contents by adding a CONTENTS=<context-string>
statement to the [OPTIONS] section of your help project file. The
<context-string> must be a literal value defined by a DEFINEPOPUP or
DEFINELINK statement in your help file. If no CONTENTS string is given, then
the first topic is assumed to be the table of contents.
The table of contents is often the first topic that the user sees, and it's
used frequently. I like tables of contents to consist mostly of links and
contain little raw text. The table of contents is quickly accessible by
clicking on the Contents button while the help file is loaded, so this topic
can be used to quickly locate a particular section of a help file, much as a
book's table of contents might be used to isolate a search for information to
a particular chapter. If finer granularity is needed, then the Search dialog
can be used. The sample help file's table of contents is almost entirely hot
spots: either descriptions of the source files making up RTFHelp or links to
other topics within the help file.
Almost all topics contain a TITLE tag. The text in this title statement is
visible to the user in the Search and History dialog boxes, so it should
clearly distinguish this topic from all others in the help file. The TITLE
text does not appear in the topic itself. For this topic, the multiword
string: "The first topic is usually the TABLE OF CONTENTS for the help file"
is the title. This command resolves to a $\footnote group in the generated RTF
file.
Following the TITLE text, we find a KEYWORD tag. This tag can have multiple
arguments, and the KEYWORD tag itself can occur more than once in the same
topic. Keywords can be separated by semicolons and can include spaces. All
keywords in the help file will be displayed to the user in the Search dialog;
the user can then select one to isolate the search to one (or more) topics
containing that keyword. The same keyword can be found in any number of
topics. The more topics that contain a keyword, however, the less effective
that keyword becomes. This tag resolves to a K\footnote group in the generated
RTF file.
Next we find a BROWSESEQ tag. The parameter to a BROWSESEQ tag can be any
string. Take care to keep all strings in all your BROWSESEQ tags of identical
length and form, as WinHelp maintains the order of your tags by sorting the
strings alphabetically. This tag resolves to a +\footnote group in the
generated text file.
None of these commands have added any visual information to the topic as far
as WinHelp is concerned. For tags that generate text in the topic window,
let's first look at SA 90, which generates 1/16" of whitespace following the
current paragraph. Then the following PARA tag ends the current paragraph.
This is functionally equivalent to SB 90 without the PARA command. Use
whichever you are more comfortable with.
At this point, I define a hot spot--a pop-up window with the text "rtfhmain.c"
(given as the argument to the PUTPOPUP command). This command requires an
XREFID (context string), which identifies the text to be displayed in the
pop-up window. The POPUP command resolves to a \ul group, and the XREFID to a
\v group in the generated RTF file. Note that we added a PARA tag to end the
current paragraph here so that each of our pop-up window links would be on
separate lines. This is simply a matter of style. You could code your own help
files so as to have multiple links on the same line in your topic. Note also
that I ended the last link with a PARA tag--otherwise the following paragraph
of text would begin immediately after our last link, which is what we
intended. I prefer to give a standard prefix (PTR_) to all of my pop-up window
XREFIDs. This makes it easier for me to recognize them when scanning the file.
You may use any prefix that you wish, or none at all.
Any line of text not beginning with a keyword is considered raw text, and is
simply written out to the generated RTF file unprocessed by RTFHelp. This
section of the topic contains a paragraph of instructions to the user, ended
with a PARA command. Remember that this paragraph is using the margins and
font selected earlier. As noted previously, many formatting properties are
"inherited" from one formatting group to another so there is no need to change
margins or font or other defaults. I do change the font size to an 8-point
font with the FONTSIZE 16 statement.
Following the pop-up window hot spots and paragraph of instructions for the
user, I add a number of jump hot spots. Notice that while both pop-up windows
and jumps have a similar structure and form, their function is different. A
popup may be viewed as an extension of the current topic (it adds something to
it, such as defining a term used in the topic). A jump is more of a split from
the current topic to a new topic, even though the two topics may be related in
some way.
A jump is added to the text file with the PUTLINK command. The first jump is
PUTLINK Margins and Spacing, where, as with PUTPOPUP, "Margins and Spacing" is
the text to be displayed in the hot spot for the user. As PUTPOPUP, the
PUTLINK command must immediately be followed by an XREFID tag (context string)
that names the topic to be linked to (in this case, LINKTO_MARGINSANDSPACING).
The PUTLINK command resolves to a \uldb group, and the XREFID, to a \v group
in the generated RTF file. Again, you add a PARA tag to the end of every
PUTLINK/XREFID pair so that each is displayed on an individual line. You could
just as easily have put any number of these on a single line of text or in a
single paragraph. For consistency, I begin all my links with the LINKTO_
prefix. You can use a different prefix (or none at all). 
I explicitly end this topic with the PAGE command. Any text or commands
following this PAGE command are considered to be in the next topic. Note that
the first line of text is considered the start of a new paragraph, and thus
the default SB distance will be skipped before displaying the text of the
paragraph, as with any other paragraph. The PAGE command resolves to the \page
control word in the generated RTF file.
The first new tag in this topic is DEFINELINK. Note that the
LINKTO_MARGINSANDSPACING argument to this tag is the same string that appeared
in an XREFID tag in the first topic. As you might expect, only one topic in
any given help file can have a particular DEFINELINK string, otherwise the
help compiler will not know which topic to resolve the link to when the user
clicks on the hot spot to the jump. There may, however, be any number of
XREFIDs that point to the same DEFINELINK, as this situation is not ambiguous.
The DEFINELINK command resolves to a #\footnote string in the generated RTF
file.
Another tag used in this topic is PARD, which sets the attributes of the
current paragraph to the defaults, unless otherwise overridden within the
paragraph. The paragraph will not inherit formatting from the previous
paragraph. FIRSTINDENT sets a special indent for the first line of a
paragraph. The FIRSTINDENT value is added to the LEFTINDENT value to get the
actual horizontal offset. If FIRSTINDENT is 0, then the first line and all
other lines of a paragraph will be positioned at the same horizontal offset.
If FIRSTINDENT is positive, then the first line of a paragraph will be further
from the left border than other lines. If FIRSTINDENT is negative, then the
first line of a paragraph will be closer to the left border than the other
lines of a paragraph. For example, to get a 1/2" left margin with the first
line of a paragraph at the left border (0), use LEFTINDENT 720 and FIRSTINDENT
--720. FIRSTINDENT, LEFTINDENT, and RIGHTINDENT resolve to the \fi, \li, and
\ri control words respectively in the generated RTF file.
In the next topic you find even more text-formatting tags, such as BOLD,
ITALIC, and SMALLCAPS. These tags are not inheritable. Their formatting
property exists only for the text given as their argument. Note the use of the
SPACECHAR character to force a space on lines after the first. This "hard
space" is necessary because RTFHelp removes spaces found between a tag and its
argument. Raw lines of text never require the use of the SPACECHAR character,
as spaces will always be preserved. These commands resolve to the \b, \i, and
\scaps groups, respectively, in the generated RTF file. The SPACECHAR
character ensures that when the help compiler wraps the text (and it will, or
it will remove the space), spacing will still be appropriate. 
The next topic uses the BMC command to include a bitmap file in the help file
at the current text position. If it is not found in the help directory, the
bitmap should also be listed in the [BITMAPS] section of the help project
file. Copy pyramid.bmp from your Windows 3.1 directory or add any other bitmap
to build a bitmap into your help file. Alternatively, comment out this bmc
command in the hdc file.
The BOX tag is used to draw a single-line border around the current paragraph.
The BOX attribute is inheritable. If you do not want all other paragraphs
following this one to also be boxed, then you should reset the following
paragraph to the defaults with the PARD command. This tag resolves to the \box
control word in the generated RTF file.
RTFHelp also has support for tab characters and tab stops. You can use the
DEFTAB command to set the tab stops every so many units. For example, DEFTAB
2880 gives you tab stops every two inches. To tab, simply insert the TAB tag
in your text. These commands resolve to the \deftab and \tab control words,
respectively, in the generated RTF file. 
Lastly, you encounter the DEFINEPOPUP tag. I put all of my DEFINEPOPUPS last
in my file as the help compiler requires each topic to have only one
DEFINEPOPUP tag. Because of this, RTFHelp must generate a \page tag for each
one. The generated RTF text for the DEFINEPOPUP tag is \page #\footnote
<context string>. The DEFINEPOPUP tag is followed by any amount of text, which
will be the information displayed in the pop-up window when the user selects
the hot spot for this popup. The context string, of course, must have been
referenced by an XREFID elsewhere in the file in order for it to be
potentially visible to the user.


The RTFHelp Help Project (HPJ) File


A project file is conceptually a makefile for help compilation. The help
compiler uses it to find out what files to process and what options to use
during the compilation. A project file is made up of sections, each of which
has a name in brackets. The only required section of the HPJ file is the
[FILES] section, which lists the topic files to be compiled.
The [OPTIONS] section keywords control the generation of a help file. The
[OPTIONS] section should be the first section in your help project file to
ensure that all options will apply to the full help-compiler build. The TITLE
keyword sets the title-bar text when the .HLP file is loaded into WinHelp to
that given by the author as the value of the TITLE keyword. The COPYRIGHT
keyword adds a user copyright notice to the help file displayed when the user
selects Help, About from WinHelp. One option not used by RTFHelp's sample file
is the CONTENTS keyword. This keyword specifies a context string (see
DEFINELINK) that will be the Contents topic (table of contents) for this help
file.

The [FILES] section lists the topic files to be input to the help compiler.
Each file is listed on a separate line.
The [BITMAP] section lists the bitmap files to be used in the help file. These
file names should be fully qualified if they are not in the same directory as
the topic files that reference them. 
The [CONFIG] section can contain a number of help macros. Example 1 uses the
BrowseButtons() macro to enable browse buttons in WinHelp for our help file.
For more information on the RTF commands, refer to the Rich Text Format (RTF)
Specification, Version 1.3, available from Microsoft product-support services
and numerous other online services.
Table 1: The RTFHelp tagging language.
Command Function 
BMC Inserts a bitmap file in your help file.
BOLD Text following the tag is displayed in bold.
BOX Puts a boxed border around the current paragraph.
BROWSESEQ Assigns a browse sequence number to a topic.
CENTER Ends a paragraph and aligns it centered within the margins.
DEFFONT Sets a new default font from the font table.
DEFINELINK Marks a topic that is the target of a link.
DEFINEPOPUP Defines the content of the popup.
DEFTAB Assigns a new default tab width.
FIRSTINDENT Defines the offset of the first line of a paragraph.
FONT Sets current font to a new font.
FONTSIZE Sets size of the current font to a new size.
ITALIC Displays text following the tag in italic.
JUST Ends paragraph and justifies the text in it.
KEEP Keeps the lines of this paragraph together (no wrap).
KEEPNEXT Creates a nonscrollable region at the top of this topic.
KEYWORD Defines a keyword (found during a search in the SEARCH box).
LEFT Ends current paragraph and aligns paragraph to the left.
LEFTINDENT Defines the offset of the left margin of a paragraph.
LINE Begins new line within the current paragraph.
PAGE Starts new page, and, hence, a new topic in the help file.
PARA Ends paragraph.
PARD Assigns default settings to the paragraph and ends same.
PLAIN Resets paragraph formatting to plain settings.
PUTLINK Marks hot spot to link.
PUTPOPUP Marks hot spot to pop-up window.
RIGHT Aligns paragraph to the right.
RIGHTINDENT Defines the offset of the right margin of a paragraph.
SA Space after, places vertical whitespace after a paragraph.
SB Space before, places vertical whitespace before a paragraph.
SL Space between, places vertical whitespace between
 lines of a paragraph.
SMALLCAPS Text following tag is displayed in small caps.
SPACECHAR Assigns new character to be used as a hard space.
TAB Inserts TAB character.
TITLE Sets title for this topic (found during a search in the GOTO box).
TX Sets position of tab stop, defaults to every 1/2" if not set.
TQC Tab center, advances to the next tab stop and centers text.
TQR Tab right, advances to the next tab stop and aligns text right.
XREFID ID of the term (UPPERCASE_LETTER).
* Begin comment (in column 1).
; Equivalent to "*", a comment.
Table 2: (a) Windows help process; (b) RTFHelp process.
(a)
1. Edit file using full-featured word processor and WinHelp specification
 commands. Save file in RTF format.
2. Run RTF output (first pass) through the Windows help compiler (HC) to
 generate a help file.
3. Launch WinHelp and verify the generated .HLP file.
(b)
1. Edit file in any editor or word processor that supports ASCII using
 RTFHelp keywords. Save file as ASCII text.
2. Run output of first pass through RTFHelp to generate an RTF file.

3. Run RTF output (second pass) through the Windows help compiler to
 generate a help file.
4. Launch WinHelp and verify the generated .HLP file.
Table 3: Conversions between points, 1/2 points, TWIPs, and inches (assuming
72 points to the inch, a standard Windows approximation).
 Points 1/2 points TWIPs Inches 
 144 288 2880 2
 72 144 1440 1
 54 108 1080 3/4
 36 72 720 1/2
 24 48 480 1/3
 18 36 360 1/4
 12 24 240 1/6
 9 18 180 1/8
 4.5 9 90 1/16
Example 1: A sample help-project (HPJ) file.
; J. Hlavaty, 1994
[OPTIONS]
TITLE = RTFHelp Sample Help File
COPYRIGHT = Help file content
 copyright (C) J. Hlavaty, 1994

[FILES]
sample.rtf

[BITMAPS]
pyramid.bmp

[CONFIG]
BrowseButtons()

Listing One 

* J. Hlavaty, RTFHelp sample file
* defaults for the file go here
 DEFFONT 0
 FONTSIZE 24
* Justified text, with a 1/2" left and 1/2" right margin (1440 twips/inch)
 LEFTINDENT 720
 RIGHTINDENT 720
* Now begin the first topic
* The TITLE tag introduces a topic (page) of a help file.
 TITLE The first topic is usually the TABLE OF CONTENTS for the help file
* The following statements select 12 point TmsRmn out of RTFHelp's default
* font table. What we'll actually get depends on the available fonts
* in the session of Windows in which the help file is executing.
* As noted in the text, measurements for tags like LEFTINDENT are in TWIPs
* (an abbreviation for twentieths of a point). To convert inches to twips,
* simply multiple inches by 1440 (72 points in an inch times twenty).
* Other tags, like FONTSIZE are in 1/2 points so FONTSIZE 72 is 36 point
* (meaning 1/2") type. Note that LEFTINDENT (twips) tag arguments are
* ten times the equivalent FONTSIZE argument
 KEYWORD Table of Contents

* note that if you wish your pop-up topics or links to be each on
* an individual line, then you must use PARA tags to force the separation.
* Don't forget the last PARA tag, or the text following these definitions
* will begin on the same line as the last one
 BROWSESEQ rtfhelp:005
* Add 1\16th" (90 twips) of whitespace after each of the following

* definitions for readability
 SA 90
 PARA
 PUTPOPUP rtfhbann.c
 XREFID PTR_RTFHBANN_C
 PARA
 PUTPOPUP rtfhcp85.c
 XREFID PTR_RTFHCP85_C
 PARA
 PUTPOPUP rtfhdata.c
 XREFID PTR_RTFHDATA_C
 PARA
 PUTPOPUP rtfhhelp.c
 XREFID PTR_RTFHHELP_C
 PUTPOPUP rtfhmain.c
 XREFID PTR_RTFHMAIN_C
 PARA
 PUTPOPUP rtfhpars.c
 XREFID PTR_RTFHPARS_C
* Without the following PARA, this text would be right next to RTFHHELP.C
* PUTPOPUP
* Switch to a smaller font
FONTSIZE 16
PARA
 Please select one of the terms above by single-clicking on it with the left
 mouse button to view its definition, or go to a topic below by clicking on
it.

PARA
FONTSIZE 24
* The PUTLINK tag tells the user where the link will go
 PUTLINK Margins and spacing
* The XREFID tag must also be used to tell the help compiler what DEFINELINK
* to go to
 XREFID LINKTO_MARGINSANDSPACING
 PARA
 PUTLINK Special Effects
 XREFID LINKTO_SPECIALEFFECTS
 PARA
 PUTLINK Bitmaps and graphics
 XREFID LINKTO_BITMAPSANDGRAPHICS
 PARA
 PUTLINK WinHelp Terms
 XREFID LINKTO_WINHELPTERMS
 PARA
 PUTLINK RTFHelp Terms
 XREFID LINKTO_RTFHELPTERMS

* Now explicitly start a new topic (PAGE) with the PAGE tag, which has no
* arguments. The PAGE tag ends the current page.
* The help compiler will not like an extra \page tag on the first
* page. It will assume that the blank page is the table of contents
* if you do not have a CONTENTS tag in your HPJ file
 PAGE
 TITLE Text formatting in a help file
 KEYWORD text formatting; left indent; right indent; hanging indent; 
 default paragraph settings
 BROWSESEQ rtfhelp:010
 DEFINELINK LINKTO_MARGINSANDSPACING
* now reset the spacing after each PARA to the default

* 90= 1/16th inch glue following each PARA
 SA 90
 JUST
 LEFTINDENT 0
 RIGHTINDENT 2880
This is text with a two inch right margin. This is just done for some contrast

with the paragraph below.

 PARA
* Produces a "hanging" indent for the first line only. Note negative offset
* The actual value used for the first line's offset is LEFTINDENT+FIRSTINDENT
 FIRSTINDENT -720
 LEFTINDENT 720
 RIGHTINDENT 720

this is text with a "hanging" indent, as you'll see the rest (lines after
 the first) of this paragraph is indented normally.
PARA
 SA 720
* Reset the firstindent/leftindent values to the default (720 ? )
 FIRSTINDENT
 LEFTINDENT
This paragraph has 1\2" "glue" (empty space) following. Notice that
 the indent of the first and second lines are identical.
 PARA
 SA 0
This paragraph has no glue following
 PARA
 PARD
 This is a paragraph using default paragraph settings
 PARA

 PAGE
 TITLE Text formatting in a help file, continued
 KEYWORD text formatting; italic; smallcaps; bold
 BROWSESEQ rtfhelp:015
 DEFINELINK LINKTO_SPECIALEFFECTS
* now reset the spacing after each PARA to the default
 SA
 LEFTINDENT 720
 RIGHTINDENT 720
ITALIC This is italic text with half-inch left and right margins. Should 
ITALIC %also appear as Justified Text if JUST tag is supported by the HC.
* JUST
PARA
 LEFTINDENT 2880
 RIGHTINDENT 0
* note that RTFHelp considers the end of a line to end certain tags. This is
* intentional so that certain formatting properties (bold, italic and so
* forth do not span lines)
JUST
LEFTINDENT 1440
SMALLCAPS This is text with a one inch left margin in small caps.
* Note the use of the hard space character (%) to force a space.
SMALLCAPS %flexibility that you can put in your help file. 
 PARA
* Reset leftindent to default
LEFTINDENT
BOLD This text is displayed in bold type. 

BOLD %text (similar to underlining). Underlining is not supported by
BOLD %RTFHelp as the HC gives a specific meaning to the RTF underline
BOLD %command in help files.
PARA
 PAGE
 TITLE Bitmaps and graphics
 KEYWORD bitmap; graphics
 BROWSESEQ rtfhelp:020
 DEFINELINK LINKTO_BITMAPSANDGRAPHICS
*
* You will need to copy pyramid.bmp from your \windows directory to the
* directory containing your HDC file for the bitmap to be found
*
 BMC pyramid.bmp
 This is a sample bitmap caption

SA 180
* begin a new paragraph and draw a box around it
PARA
 BOX
 DEFTAB 2880
 TAB
1
 TAB
This is one
 LINE
 TAB
2
 TAB
This is two
PARA
PARD

 PAGE
 TITLE WinHelp Terminology
 BROWSESEQ rtfhelp:025
* note that a ; separates multiple keywords (as multiword keywords are 
* permitted) you can also have multiple KEYWORDs in the same topic
 KEYWORD WinHelp terminology
 KEYWORD topic file; topic; title; link; pop-up topic
 DEFINELINK LINKTO_WINHELPTERMS
 SA 90
The document containing the Rich Text Format (hereafter abbreviated RTF) text
 input to the help compiler will be called the TOPIC FILE. One or more topic
 files are used to generate a help file (a binary file with the .HLP
 extension). Every topic file is made up of a series of "pages" or "screens"
 of related information known as TOPICS.
PARA
Each topic can contain a variety of constructs, such as titles,
 links and pop-up topics. A TITLE is not visible in the topic itself, but is
 used in the Search and other WinHelp dialog boxes to allow the user to
 distinguish this topic from others in the help file.
PARA
 A LINK hooks one topic
 to another so that when the user clicks on the link's hot spot, the topic
 that the link references will become the current topic displayed to the user.
PARA
Finally, a POP-UP TOPIC when clicked on will create a child window on top of
 the help window which contains information for the user. These pop-up topics

 are often used for definitions of terms.

 PAGE
 TITLE RTF Terminology
 BROWSESEQ rtfhelp:030
 * note that a ; separates multiple keywords (as multiword keywords are 
 * permitted) you can also have multiple KEYWORDs in the same topic
 KEYWORD RTFHelp terminology
 KEYWORD control words; control symbols; groups
 DEFINELINK LINKTO_RTFHELPTERMS
The topic file is a collection of RTF tags and raw text. These tags
 are known as CONTROL WORDS and all contain a backslash as their first
 character and consist of entirely lower case letters (RTF is case-sensitive).
 One such tag is \\par to end a paragraph.
PARA
There is a small set of non-alphabetic control words known as CONTROL
 SYMBOLS, such as \\~ which signifies a hard (non-breaking) space.
PARA
Control words and control symbols together with the text making up the
 topic file are collected in GROUPS. A group begins with an open curly brace
 and continues until an equally-nested closing curly brace. Any formatting
 properties that are found within a group apply to all and only elements of
 that group. Some groups can even inherit specific formatting properties from
 the group preceding them. For details on exactly when and how this occurs,
 refer to the RTF specification listed in the reference section.
PARA
RTFHelp uses the tagged file passed in by the user to generate an
 RTF topic file containing a mixture of control words, control symbols and
 groups.

 PAGE
* Here are the definitions for our (what's the term) definitions

* A few warnings are in order. The DEFINEPOPUP generates a new page, so I
* generally put my DEFINEPOPUPs at the end of my help file.
* Note that the context ID (such as PTR_RTFHMAIN_C below) cannot have any
* text following it. This is because RTF assumes that any text following
* the context ID is actually a part of the context ID and includes it in
* the #{\footnote} tag that it builds. The help compiler will complain
* about this mixture of text and context ID. Because of this, there is a
* a restriction that text that is the definition of a DEFINEPOPUP must begin
* on the line following the DEFINEPOPUP tag and ID
 DEFINEPOPUP PTR_RTFHMAIN_C
 RTFHMain.c is the main module for RTFHelp. It reads the tagged input file, 
 and writes the formatted RTF text to the output file.
 DEFINEPOPUP PTR_RTFHCP85_C
 RTFHCP85.c is the module containing the translation table from CP850 
 (the so-called Latin 1 code page). This code page roughly corresponds to
those
 characters needed to display Western European languages. It converts to the 
 Windows default (ANSI) code page.
 DEFINEPOPUP PTR_RTFHDATA_C
 RTFHData.c is the module containing global data.
 DEFINEPOPUP PTR_RTFHBANN_C
 RTFHBann.c is the module containing the banner (logo) information
 that is displayed when using the RTFHelp compiler.
 DEFINEPOPUP PTR_RTFHHELP_C
 RTFHHelp is the module that processes RTFHelp's command line arguments.
 Named because it also displays help for the user.
 DEFINEPOPUP PTR_RTFHPARS_C

 RTFHHelp is the module that contains the RTFHelp parser.






























































Identifying Serial Port IRQs


Detecting how many ports are installed and which IRQ they're set to




John Ridley


John is a programmer in Ypsilanti, Michigan. He can be contacted at
john.ridley@hal9k.com.


While developing installation programs for a hardware vendor, I collected code
to identify equipment in the PC. I was fascinated by programs, such as
Microsoft Diagnostics, which are able to detect most of the standard hardware
in a machine. Eventually, I had a collection of code for detecting everything
from the type of CPU to the kind of soundboard.
My employer had decided that it was worth his money to have good installation
software for even fairly minor things, particularly serial ports, because
although they are among the simplest elements in a PC, serial ports were
eating up a lot of tech-support time. They are also one of the most common
add-ins. If we could mail a disk that costs 75 cents and save an 800
tech-support call, the company would save money and the tech-support people
would have more hair left on their heads. (Try explaining to a stockbroker
over the phone which of those 12 jumpers to move where--then discovering that
you first have to explain what a jumper is.)
I soon had software that would detect existing serial ports in the machine
and, after prompting the user for which of the remaining ports he wanted, draw
a picture of how to set the jumpers. What I didn't have, however, were code
fragments to detect which interrupt request (IRQ) the ports were using. 
It used to be okay to assume that "standard" interrupts were being used--for
serial ports, ports 1 and 3 on IRQ4, and 2 and 4 on IRQ3. But now that nearly
every new PC is running Windows or some other multitasker, sharing interrupts
is no longer acceptable. If you want to run a mouse and a modem under Windows,
for instance, they had better not be using the same interrupt.
A little while later, while working on setup for a communications package, I
ran across the same problem. The users didn't know which IRQ their ports were
set to, and support was sometimes just having them try all the settings until
one worked. Unfortunately, even that didn't always work, because the computer
may come from the factory with settings that work only in limited situations
(don't use the mouse and modem at the same time) or not at all. So if your
spiffy new modem software is being installed in a Windows environment where
the modem and the mouse share an interrupt, the mouse is going to stop moving
as soon as you grab that interrupt. You want to be able to tell the user why
and give them advice on how to fix it.
This problem seemed to me like a gaping chasm in my knowledge. I was soon on a
nightly quest to untangle the problem of IRQ identification. I knew it was
possible--after all, Microsoft Diagnostics tells you the interrupts for the
COM ports, doesn't it?
Well, I thought so. After some experimentation, however, I discovered that it
actually does no such thing. You can move the actual IRQ around, and Microsoft
Diagnostics will not reflect the change--apparently, it detects that the ports
are there, then assumes "standard" IRQ settings. Still, I remained convinced
that there was a better solution.
The most accurate solution would have been to backtrace the interrupt vectors
to find the attached drivers. However, this not feasible for shared
interrupts, and the serial ports usually don't have any drivers attached to
them unless a mouse is attached.
Next, I tried to determine if there were any way to tell from the serial-port
end which IRQ the port was attached to. A short look through the schematics of
a serial port dismissed this option. Serial ports are primitive devices;
certainly miles away from being able to read back their configurations.
After about a week of trying different methods, I realized there was no
elegant solution. There just isn't any passive way to tell what is attached to
an IRQ, nor is there any way to query the port to find the IRQ. The only
solution, then, is to cause the serial port to generate an interrupt and watch
to see which interrupt happens.
To do this, you have to learn how to deal with the 8257, or PIC (Programmable
Interrupt Controller). Since the PIC has no pretty interface, you have to
twiddle its bits. Fortunately, Jeff Duntemann's columns ("Structured
Programming," DDJ, June--September 1991) on serial-port programming provide a
good reference. For the purposes of this article, only a little PIC
programming is necessary, and it is commented pretty well.
The IsUART and WhichUART routines in SPINFO.C (see Listing Two) are borrowed
almost directly from Jeff's columns. What comes after these routines is the
meat of this article. (Listing One is SERPORT.H, the necessary include file,
and Listing Three is PORTINFO.C, a demo program to utilize SPINFO.C, which
scans normal IBM PC serial-port addresses COM1 though COM4, then determines
whether an 8250 or compatible UART is at the location, what kind of UART it
is, and which IRQ it is set to.)
It is relatively easy to cause a serial port to generate an interrupt. There
are several methods--the trick is to use one that doesn't cause grief to any
attached devices. I enabled the "Transmitter Holding Register Empty"
interrupt. Assuming the chip is not currently trying to transmit anything,
this will cause an interrupt to occur as soon as it is enabled. Also, after
you disable the interrupt again, the chip is exactly as it was when you came
in.
I first intercepted each interrupt in turn and triggered an interrupt to see
if the interrupt-handler function was called. This actually worked; it
successfully found the IRQs. Success at last--or so I thought.
About ten seconds later, I went back into the editor and found my mouse had
stopped working. By the time the program finished checking all the ports, it
had completely confused the mouse, as well as any other similar device
drivers. To understand why, consider this example: COM1 has a mouse attached
and is set to IRQ4. COM3, the port we're trying to find out about, is also on
IRQ4. This is a very common situation. We will quite likely not have IRQ4
intercepted when we cause our first interrupt, so the interrupt will go to the
mouse driver, which is first in line for IRQ4. The mouse driver has received
this interrupt and has no clue what to do with it. Chaos ensues.
I tried a lot of different methods to avoid stomping on serial device drivers,
with varying amounts of success, but none were 100 percent correct. The best
one tracked down the mouse driver, identified which port it was using, and
avoided stomping on this interrupt. Unfortunately, this only works for mice,
and it has trouble with some configurations. 
After a day or two, I finally gave in to the inevitable: a straightforward--if
not very elegant--solution is to intercept all the interrupts on which you
think your port might be, cause the interrupt, and then check to see what
happened. This method can also be used to resolve existing conflicts, since it
will return correct values even if an interrupt is shared. Though the concept
and the code seem pretty simple, the timing was hairy. A few instructions had
to be juggled to allow the PIC and UART time to setup and interact properly.
Since I'm not much of a cycle counter, this was done by juggling for best
results.
At first, I got a lot of false triggering; apparently there were interrupts
pending much of the time, and when I opened the PIC floodgates I got several
false interrupts. This situation was taken care of by inserting the first
enable/disable pair; the serial port in question hasn't had its interrupts
enabled yet, so anything that comes in at this time is garbage and is thrown
out.
After enabling the UART interrupt, all interrupts are enabled for just a
moment, during which all pending interrupts are serviced, hopefully including
the one you want. Then the pending interrupts are disabled again. Keeping the
time short minimizes false triggering (by the user bumping the mouse, or
something). Then the UART status is restored, and the IRQ bitmap returned.
In the real world, the calling program will want to check for multiple
returned bits, which normally indicate an IRQ occurring simultaneously on
another device, or possibly a hardware fault. If this occurs, the detecting
routine can be called several times, to try to get a consistent result.
You can expand this procedure to any device that you can cause to generate an
interrupt without endangering its normal operation. Sometimes this is
impossible; the printer port, for example, is a really dumb device; the only
way to get it to generate an interrupt is to actually send something to the
printer and wait for the ACK signal. Luckily this doesn't really matter too
much for MS-DOS, since the printer IRQ is almost never used. Note, however,
that the code does not always work correctly when running in a DOS window in
Windows Enhanced mode, seemingly because Windows sets up "virtual chips" that
can get into odd states. If this occurs, the code will not detect an
interrupt. This rarely happens, but once it does, only restarting Windows
seems to clear it up. I have not found a solution to this (other than not
running it under Windows).
You can now detect all relevant information for serial ports. You can tell how
many ports are installed, where they are, and which IRQ they are set to. With
this information, you can write software to handhold the end user through the
installation. Users will get their systems going faster and gain some
confidence in the process; you'll have another 100 lines of code in the
toolbox.
Perhaps someday, every last PC will be replaced with smart systems such as
Plug and Play. Meanwhile, methods such as this one will help you keep the
frustration levels down.

Listing One 

#ifndef _SERPORT_CONSTANTS_
#define _SERPORT_CONSTANTS_
#define COM1 0
#define COM2 1
#define COM3 2
#define COM4 3
#define NOPORT 0 /*types of UARTS we recognize */
#define T8250 1
#define T16450 2
#define T16550 3
#define T16550AF 4
#define COM1port 0x3f8
#define COM2port 0x2f8
#define COM3port 0x3e8
#define COM4port 0x2e8
#define HW_IRQ_OFF 8 /* Add to IRQ# to get software IRQ vector */
#define LOOPBIT 0x10 /* MCR reg loopback bit */

#define DLAB 0x80 /* DLAB flag position */

/************************ 8250 family UART registers ************************/
/*************** 0 1 2 3 4 5 6 7 */
#define RBR 0 /* ==========Receiver Buffer Register (read only)========== */
#define THR 0 /* ========Transmitter Holding Register (write only)======= */
#define DLL 0 /* Divisor Latch LSB: load these bytes with 115200 / BAUD */
#define DLM 1 /* Divisor Latch MSB: */
 /* NOTE: Set DLAB in LCR to write DLL,DLM, then clear DLAB */
#define IER 1 /* ===============Interrupt Enable Register================ */
 /* Rcv Xmitr Line Modem 0 0 0 0 */
 /* Data Empty Status Status */
#define IIR 2 /* ======Interrupt Identification Register (read only)===== */
 /* 0=IRQ ---IRQ ID-- 0 0 0 0 0 */
 /* pending bit 0 Bit 1 */
#define LCR 3 /* ================Line Control Register========== Div.lat.*/
 /* -Word Length- Stop Parity Even Stick Set AccessBt*/
 /* bit 0 bit 1 Bits Enable Parity Parity Break (DLAB) */
#define MCR 4 /* ================Modem Control Register================== */
 /* DTR RTS Out1 Out2 Loopback 0 0 0 */
 /* NOTE: Out2 must be set (1) on PC's to enable serial port */
#define LSR 5 /* =================Line Status Register=================== */
 /* Data Overrun Parity Framing Break Xmit Xmit 0 */
 /* Ready Error Error Error Interpt Hold Empty */
 /* Empty */
#define MSR 6 /* ===============Modem Status Register==================== */
 /* Delta Delta TrailEdg Delta CTS DSR RI CD */
 /* CTS DSR RingInd. CD */
#define SCR 7 /* Scratch Register: not present in early 8250's (8250A) */
/****************************************************************************/

/**************************** 8259 control ports ****************************/
/*************** 0 1 2 3 4 5 6 7 */
#define OCW1 0x21 /* Operation Control Word 1 */
 /* Zero in these bits enables the respective IRQ (0-7) */
#define OCW2 0x20 /* Operation Control Word 2 */
 /* 001xxxxx : Non-Specific End Of Interrupt (EOI) */
 /* 011xxxxx : Specific EOI */
 /* 101xxxxx : Rotate on Non-Specific EOI */
 /* 100xxxxx : Rotate in Automatic EOI mode (SET) */
 /* 000xxxxx : Rotate in Automatic EOI mode (CLEAR) */
 /* 11100vvv : Rotate on Specific EOI, vvv indicates IRQ */
 /* 11000vvv : Set Priority Command " " " */
 /* 010xxxxx : No Operation */
/****************************************************************************/
short IsUART(short which);
short WhichUART(short which);
#endif



Listing Two

#include "serport.h"
/********************************** IsUART **********************************/
short IsUART(short PortAddr)
 {
 short HoldMCR, HoldMSR, Port, retval;
 HoldMCR = _inp(PortAddr+MCR); /* Get Modem Control contents */

 _outp(PortAddr+MCR, HoldMCR LOOPBIT); /* Turn on loopback */
 HoldMSR = _inp(PortAddr+MSR); /* Get Modem Status Register */
 /* so we can restore MCR later */
 _outp(PortAddr+MCR, 0x0a LOOPBIT); /* Turn on RTS */
 if ((_inp(PortAddr+MSR) & 0xf0) == 0x90) /* If CTS is on, there's a UART */
 retval = TRUE;
 else
 retval = FALSE;
 _outp(PortAddr+MSR, HoldMSR); /* Restore MCR/MSR */
 _outp(PortAddr+MCR, HoldMCR); /* Turn off loopback */
 return retval; 
 }
/******************************** WhichUART *********************************/
short WhichUART(short PortAddr)
 {
 short Port, Temp, retval=-1;
 Temp = _inp(PortAddr+SCR) ^ 0xff; /* Check scratch register */
 _outp(PortAddr+SCR, Temp); /* Output complement */
 if (_inp(PortAddr+SCR) != Temp) /* check for same return val */
 return(T8250); /* No scratch reg: 8250 */
 _outp(PortAddr+IIR,7); /* Enable FIFO */
 switch(_inp(PortAddr+IIR) & 0xc0) /* Check high bits of IRQ ID reg*/
 {
 case 0: retval = T16450; break;
 case 0x80: retval = T16550; break;
 case 0xc0: retval = T16550AF; break;
 }
 _outp(PortAddr+IIR,0); /*turn of FIFO */
 return retval;
 }
/************************* IRQ Identification code **************************/
short CurPortBase; /* shared info: port base addr */
void (__interrupt __far *old_irq[4])(void); /* save old IRQ vectors */
short IRQ_Happened; /* bitmap: which IRQ happened? */
#define IRQbit 0x3c /* 8259 enable IRQ 2-5 */
void __interrupt __far our_irq2()
 {
 _enable(); /* Enable CPU interrupts */
 IRQ_Happened = 4; /* Flag which IRQ happened */
 _outp(CurPortBase+1, 0x0); /* Mom,make him shut up! */
 _outp(OCW2,0x20); /* EOI to 8259 */
 }
void __interrupt __far our_irq3() /* Same thing for IRQ3-5 */
 {
 _enable();
 IRQ_Happened = 8;
 _outp(CurPortBase+1, 0x0); 
 _outp(OCW2,0x20); 
 }
void __interrupt __far our_irq4()
 {
 _enable(); 
 IRQ_Happened = 16;
 _outp(CurPortBase+1, 0x0); 
 _outp(OCW2,0x20); 
 }
void __interrupt __far our_irq5()
 {
 _enable(); 

 IRQ_Happened = 32;
 _outp(CurPortBase+1, 0x0); 
 _outp(OCW2,0x20); 
 }
void GrabAllIRQ()
 {
 short IRQ;
 for (IRQ = 2; IRQ <= 5; IRQ++)
#pragma warning(disable:4113)
 old_irq[IRQ-2] = _dos_getvect((unsigned)IRQ+HW_IRQ_OFF);
#pragma warning(default:4113)
 _dos_setvect((unsigned)2+HW_IRQ_OFF,our_irq2);
 _dos_setvect((unsigned)3+HW_IRQ_OFF,our_irq3);
 _dos_setvect((unsigned)4+HW_IRQ_OFF,our_irq4);
 _dos_setvect((unsigned)5+HW_IRQ_OFF,our_irq5);
 }
void ReleaseAllIRQ()
 {
 short IRQ;
 for (IRQ = 2; IRQ <= 5; IRQ++)
 _dos_setvect((unsigned)IRQ+HW_IRQ_OFF,old_irq[IRQ-2]);
 }
unsigned short WhichIRQ(short PortAddr)
{
short IRQ, HoldOCW1, HoldIER, HoldMCR;
 if (!IsUART(PortAddr)) /* Don't bother testing */
 return 0;
 CurPortBase = PortAddr;
 HoldOCW1 = _inp(OCW1); /* remember status of 8259 */
 HoldIER = _inp(CurPortBase+IER); /* and of our UART */
 HoldMCR = _inp(CurPortBase+MCR);
 _disable(); /* Be safe... */
 _outp(CurPortBase+MCR, 0x03 0x08); /* enable port */
 GrabAllIRQ(); /* We see all now! */
 _outp(OCW1, HoldOCW1 & (~IRQbit)); /* enable 8259 */
 _enable(); /* Clear pending IRQ's */
 _disable(); /* Ready for the real thing now */
 IRQ_Happened = 0; /* Clear bitmap */
 _outp(CurPortBase+IER, 0x02); /* enable xmt empty int. */
 _enable(); /* BANG! */
 _disable(); /* OK, we're done. */
 _outp(OCW1, (_inp(OCW1) & (~IRQbit)) /* Restore 8259 */
 (HoldOCW1 & IRQbit));
 _outp(CurPortBase+IER, HoldIER); /* Restore our UART */
 _outp(CurPortBase+MCR, HoldMCR);
 ReleaseAllIRQ(); /* Let go of IRQ vectors */
 _enable(); /* and relinquish control */
 return IRQ_Happened; /* Send back bitmap */
}



Listing Three

/******************************** PORTINFO.C ********************************/

#include <dos.h>
#include <stdlib.h>
#include <stdio.h>

#include <memory.h>
#include <conio.h>

#define TRUE 1
#define FALSE 0

#include "spinfo.c"
short PortBases[]={COM1port,COM2port,COM3port,COM4port};
char *PortType[]={"NO PORT","8250","16450","16550","16550AF"};
void main()
 {
 short Portnum, PortAddr;
 unsigned short IRQ,IRQ_bitmap;
 for (Portnum=COM1; Portnum <=COM4; Portnum++)
 {
 PortAddr = PortBases[Portnum];
 if (IsUART(PortAddr))
 {
 printf("COM%d: ",Portnum+1);
 printf("%s ",PortType[WhichUART(PortAddr)]);
 IRQ_bitmap = WhichIRQ(PortAddr);
 if (IRQ_bitmap == 0)
 printf("Unable to detect IRQ");
 else
 for (IRQ = 0; IRQ < 16; IRQ++)
 if (IRQ_bitmap & (1 << IRQ))
 printf("IRQ%d ",IRQ);
 printf("\n");
 }
 }
}
































The Microsoft Flash File System


Flash file system mechanics 




Peter Torelli


Peter is a student at Rensselaer Polytechnic Institute and can be contacted
through the DDJ offices.


A file system stores and manipulates named objects in a hierarchical manner.
Named objects, called "files," can be organized and stored in sets, or
directories. In turn, directories can store other directories as well, leading
to a file-directory hierarchy. The Microsoft MS-DOS and Intel RMX operating
systems use a File Allocation Table (FAT) file system to maintain a
file-directory hierarchy. A flash file system implements this basic
file-directory scheme while exploiting the benefits of flash media. Microsoft
has implemented a flash file system for DOS, known as "FFS."
In his article, "Flash File Systems" (DDJ, May 1993), Drew Gislason described
an implementation for a basic flash translation layer. In this article, I'll
continue his discussion by examining a byte-oriented file system for flash
media, focusing on the mechanics behind a flash file system based on the
Microsoft data structures.


FTL and FFS


As he himself acknowledged, Drew's device driver isn't really a flash file
system, but rather a layer of code that translates the DOS FAT file system
requests from sectors to physical flash addresses. This approach to using
flash media as a DOS disk is commonly known as a "flash translation layer"
(FTL). Under this scheme, DOS still uses its FAT file system to process file
and directory operations, but the FTL block device driver accesses the media.
The device driver maintains the organization of the data within the media by
means of special headers that describe the arrangement of the sectors in each
block. When the DOS FAT file system needs to read a sector, the FTL processes
the request by looking up that sector's physical address in its headers and
returning the requested data to DOS. The same process occurs for writes,
unless no clean flash space exists. In this case, the FTL driver performs a
cleanup to reclaim deallocated space and then performs the write. 
FFS departs from the sector-oriented scheme by organizing and storing
byte-sized data within various structures. Unlike an FTL device driver, the
DOS FAT file system plays no part in FFS; instead, an "installable file
system" enters the picture. DOS provides an interface for non-FAT file systems
commonly known as the "INT2F Network Redirector Interface;" see Figure 1. The
redirector accepts requests from DOS at a level above that of a specific file
system, making it possible for any type of storage device or network computer
to interface with DOS. The FFS interface to DOS works the same way. 
Unlike the device driver, FFS performs no low-level I/O on the flash media.
Instead, a PCMCIA-compliant memory card device driver is needed to perform
certain functions through the DOS generic IOCTL call. BIOS vendors such as
SystemSoft, AMI, Award, Phoenix, and Intel provide these PCMCIA drivers.


Understanding FFS


Three concepts need to be understood before implementing an FFS: FFS data
structures, their arrangement in the media, and the basic manipulation of the
data structures inherent to a file system. (A fourth element, the interface to
the OS, requires a separate discussion. See Andrew Schulman's Undocumented
DOS, Addison-Wesley, 1993, for a description of the DOS INT2F redirector
interface.)
Linked lists form the basis of FFS. Files, directories, and data are all
stored in linked lists. The different types of data structures described here
serve as the links in these lists. Microsoft defines four different structures
for storing and arranging data in the FFS format: file entry, directory entry,
file info, and boot record. The boot record structure contains data describing
the media's geometry (size, number of blocks, erase block size, and so on), as
well as FFS version information. (Since only one copy of the boot record
exists, I'll exclude it when I refer to "data structures.")
The file-directory hierarchy exists in the file-entry directory-entry list
(FEDE). In an FFS-formatted flash card, the information displayed by typing a
DIR command corresponds to the file and directory entries in that directory's
FEDE chain, as shown in Figure 2. All files or subdirectory entries at the
same level are part of one FEDE chain and are referred to as "siblings." If a
subdirectory exists in that FEDE chain, it points to another FEDE chain. If
more subdirectories exist in that FEDE chain, the tree continues.
Actual file-entry file data resembles the FEDE list, except that each entry in
the file's list is a file-info structure. This list of file-info structures
points sequentially to the regions of the card that contain the file's data.
FFS performs a read-file request by locating the proper file entry, traversing
its file-info chain, and returning the requested data.
File-entry and directory-entry structures contain three pointers: sibling,
primary, and secondary (see Listing One). The sibling pointer always points to
the next entry in the same level as that structure. The primary pointer of a
directory entry points to the first entry in that directory's FEDE chain. The
primary pointer of a file entry points to its file-info chain. The secondary
pointers of both structures point to files or directories of the same name
that supersede the existing structures.
File-info structures point to "extents," regions of the card that contain file
data. The maximum size of an extent is 65,535 bytes. Like file and directory
entries, file-info structures contain sibling and secondary pointers, but the
primary pointer has been replaced by an extent pointer. The extent pointer
points to the first extent of that file's data. The sibling pointer addresses
the next file-info structure in the chain, and the secondary pointer indicates
where to find updated or superseded extent data.


Allocation of Flash Media Space for Structures and File Data


The pointers I've just discussed don't explicitly reference the physical
address of a structure or extent within a block; instead, they point to a
block-allocation member, which in turn points to the physical location of that
particular structure. Which brings us to the second detail of FFS:
block-allocation structures and members.
Each erase block contains one block-allocation structure (BAS). The BAS in
Listing Two exists at the topmost address range of every erase block. It
contains the block's logical number and erase count, whether or not it is a
spare, and other block-specific fields. It also marks the starting point to
the chain of block-allocation members (BAMs). FFS uses the BASs to determine
the logical block ordering (that is, by locating the boot record or finding a
spare block).
To maintain organization of the data structures and file extents, FFS uses
BAMs; see Listing Two. These 6-byte fields begin at the top of each erase
block, just below the BAS, and grow downward as more structures and extents
are written to that block. They contain the length of the pointed-to data
region, the beginning offset of that data relative to address zero of that
block, and a status field indicating whether or not the data being pointed to
is valid or deleted. FFS uses BAMs to locate the physical offset of a data
structure or extent by determining if the structure or extent has been deleted
via the status field and assisting FFS in performing reclaim.
As FFS writes files or directories to a block, the data structures and extents
grow from address zero of the block toward the top and allocation members grow
downward from top to bottom; see Figure 3.


Spare Blocks and the BAM: Reclaim Explained


Once a flash cell has been programmed (changed from 1 to 0), it must be erased
(changed from 0 to 1) before it can be reprogrammed. Intel Series 2 and 2+
Flash Cards contain multiple 128-Kbyte blocks that require one second to
change all of the 0s back to 1s. 
When an FFS user deletes a file or directory, FFS zeros the uppermost bit in
all of the BAM's status fields associated with that file or directory. The
data has been marked as deleted, but it still exists and the flash space it
takes up cannot be used again until it's erased. Deleted space is commonly
referred to as "dirty" or "deallocated" space. Since the rest of the block may
contain valid data, FFS uses a process called "reclaim" to erase the deleted
file's space while preserving the valid data.
FFS requires at least one block in an FFS partition to be reserved as a spare.
A spare block is indicated in that block's BAS by both the BlockSequence and
the BlockSequenceChecksum equaling FFFFH. A spare block is always clean (all
1s).
FFS performs a reclaim when the array becomes filled with used and dirty
space. Before it can write another byte, some of the dirty space must be
reclaimed. FFS begins a reclaim by selecting a block with the most dirty
space. It copies that block's BAS into the spare block and then begins reading
the BAMs of that block. If the status bit indicates a valid structure or
extent, it gets copied into the spare block and the new BAM's physical offset
is adjusted to reference its new address. If the structure or extent has been
deleted, that BAM is skipped. The old block gets erased and becomes the new
spare; the old spare now has the logical number and valid data of the old
block, plus clean flash space. FFS can now write to this space. If it runs
out, it performs further reclamation until the partition is full of valid
data; see Figure 4.
Reclamation is the sole reason for FFS. BASs, BAMs, and spare blocks make
reclaim possible. A robust reclaim mechanism is the key to an efficient flash
file system. The more efficiently an FFS implementation handles reclamation,
the better it performs.


FFS Mechanics



Before FFS can use a flash array, it must be formatted by a special formatting
program. The formatter places a BAS at the top of every block, plus a boot
record, root-directory entry, and volume label. It also designates up to eight
blocks as spares. 
Moving through the FFS file-directory hierarchy requires several basic
functions: 
Adding entries to the FEDE chain.
Locating the boot record and root directory. 
Finding a matching filename in a FEDE chain.
Finding a particular structure in a list begins with a pointer. Primary,
secondary, sibling, boot-record, and root-directory pointers all reference a
BAM. The pointer is a double word, but not as a physical offset. The high word
contains the logical block number, the low word references the logical BAM
number. 
For example, if the sibling pointer of the root-directory entry equals
00000001H, that translates into logical block 0, logical BAM 1. Reading the
BAM's offset (24 bits) gives the physical offset to the sibling structure of
the root-directory entry in that block.
To find the physical location of a structure, the necessary data must be
obtained from the BAM. Listing Three takes a pointer from a structure
(sibling, primary, or whatever) and returns the BAM associated with it. The
calling program can then compute where in that block the structure exists.
Listing Three then returns the BAS from the given physical block, letting you
translate a logical-block address to a physical one so that an absolute flash
address may be accessed. With the two procedures in Listing Four , you can
read the BAM associated with a structure pointer.
Listing Five presents two procedures used in conjunction with Listing Four to
perform the translations necessary to locate the physical address of a
structure or extent within a flash array. Listing Six simplifies the process
of translating the BAM's 3-byte offset into an address. 
The next step in traversing a FEDE chain is to obtain the structure referenced
by the previous pointer. Listing Seven reads the FE or DE structure associated
with a BAM and returns the data it found in the FEDE structure.
Now that you can read a BAM and a FEDE entry, you can traverse the list
(Listing Eight) and find the BAM associated with the last structure. Passing
the BAM associated with any FEDE entry in a chain to this function causes it
to return the BAM of the last structure. Adding a file or directory requires
the procedures in Listing Nine . Assuming you have already written your new
structure and its BAM to flash, you append it to the current FEDE chain.
LinkBAMptr refers to the new structure, while CurrentBAMptr refers to the
current place in the chain.
The conventional heuristic for locating the root directory begins by searching
each BAS, first for a boot record, then for the root-directory entry.
Qualifying long pathnames often requires traversing several FEDE chains,
beginning with the root. 
You find the boot record by reading each block's BAS; see Listing Ten . You
can then use Listing Eleven to read the boot record using the BAM found in
Listing Ten. Finally, you use Listing Twelve to combine the functions in
Listings Ten and Eleven to obtain the root directory's BAM.
Another important feature of a File System is the ability to locate a
filename. Listing Thirteen searches a FEDE chain from its topmost entry until
it finds a file or directory entry whose name matches the 11-character DOS
name. It returns a BAM to the matching entry if it is found, or ERROR (FNULL)
if not. 


Conclusion


The first FFS implementation did not function smoothly: FEDE chains grew out
of control, reclaim would occur at the wrong moments, and so on. Today, FFS
implementations by SystemSoft and Microsoft have addressed these issues and
shown that FFS can achieve reasonable performance. 
Additionally, competition between the FTL and FFS approaches has caught the
attention of systems designers. SCM Microsystems and M Systems both supply FTL
solutions for DOS-based PCs. Still, using an FTL requires a file system, and
exchanging data between other FTL platforms does not guarantee compatibility
unless they use the same file system. The procedures I've presented in this
article cover the basic elements of flash file system mechanics so that you
can format a card, follow FEDE chains, add files or directories, and so on.
The header file for the procedures I've presented is in Listing Fourteen . 
If standards begin to solidify and enough resources are devoted to the
development of other operating-system FFS drivers, flash cards could become a
dominant form of data exchange for computer users.
Figure 1 DOS model for using flash cards.
Figure 2 FEDE chain examples.
Figure 3 FFS usage of an erase block.
Figure 4 Cleaning up dirty blocks.

Listing One 

struct FileOrDirectoryEntry { 
 word Status;
 dword SiblingPtr;
 dword PrimaryPtr;
 dword SecondaryPtr;
 byte Attributes;
 word Time;
 word Date;
 word VarStructLen;
 byte NameLen;
 byte Name[8];
 byte Ext[3];
};

struct FileInfoStructure {
 word Status
 dword SiblingPtr;
 dword ExtentPtr;
 dword SecondaryPtr;
 byte Attributes;
 word Time;
 word Date;
 word VarStructLen
 word UncompressedExtentLen
 word CompressedExtentLen
};




Listing Two

struct BlockAllocationStructure {
 dword BootRecordPtr;
 dword EraseCount;
 word BlockSeq;
 word BlockSeqChecksum;
 word Status;
};

struct BlockAllocationMember {
 byte Status;
 byte Offset[3];
 word Len;
};



Listing Three

struct BlockAllocStruct read_BAS( UINT block ) {
 struct BlockAllocStruct BAS;
 dword address=0UL;
 /* Calculate the top of the physical block address. */
 address = (ULONG) ( block * (ULONG) block_size );
 /* Subtract back the size of a BlockAllocStruct. */
 address -= sizeof( struct BlockAllocStruct );
 /* Read it. */
 read_memory( (UCHAR *) &BAS, address, sizeof( struct BlockAllocStruct ) );
 /* Return the BAS. */
 return BAS;
}



Listing Four

UINT log2phy( UINT lblock ) {
 struct BlockAllocStruct CurBAS;
 UINT block=0;
 ULONG address=0UL;

 /* Start at physical block 1. */
 block = 1;
 /* Read each BAS. */
 while( block <= NumBlocks ) {
 CurBAS = read_BAS( block );
 /* Look for the sequence that matches our requested lblock. */
 if( CurBAS.BlockSeq == lblock ) break;
 block++;
 }
 return block;
}
struct BlockAllocMember get_BAM_data( ULONG BAMptr ) {
 struct BlockAllocMember BAM;
 UINT logical_BAM=0;
 UINT logical_block=0;
 UINT physical_block=0;

 ULONG address=0UL;
 /* Determine the logical Block and BAM number. */
 logical_block = (ULONG) BAMptr >> 16;
 logical_BAM = BAMptr & 0xFFFF;
 /* Call the routine that determines the physical block number. This
 might actually parse each BAS, or it may be read from a look-up table
 that FFS creates for every card insertion and maintains thereafter. For
 now, translate by reading the BASs. */
 physical_block = log2phy( logical_block );
 /* Determine the top of the physical blocks address. */
 address = (ULONG) ( physical_block * block_size );
 /* Subtract from it the size of a BAS plus the Number of BAMs
 preceeding the BAM we wish to read. */
 address -= ( sizeof( BlockAllocStruct ) + 
 ( ( logical_BAM + 1 ) * sizeof( struct BlockAllocMember ) ) );
 /* Read the BAM. */
 read_memory( (UCHAR *) &BAM, address, sizeof( struct BlockAllocMember ) );
 /* Return the BAM. */
 return BAM;
}



Listing Five

ULONG get_block_base( ULONG BAMptr ) {
 UINT lblock=0;
 UINT block=0;

 /* Get the logical block # from the BAM. */
 lblock = ( (ULONG) BAMptr ) >> 16;
 /* Translate it to physical. */
 block = log2phy( lblock );
 /* Return its physical base address. */
 return ( ( block - 1 ) * block_size );
}



Listing Six

ULONG translate_BAM_offset( ULONG BAM ) {
 ULONG address=0UL;

 /* Sum the BAM offset components. */
 address = ( ( (ULONG) BAM.Offset[0] ) + 
 ( ( (ULONG) BAM.Offset[1] ) << 8 ) + 
 ( ( (ULONG) BAM.Offset[2] ) << 16 ) );
 return address;
}



Listing Seven

struct FEDE_Entry get_FEDE( ULONG CurrentBAMptr ) {
 struct BlockAllocMember BAM;
 struct FEDE_Entry FEDE;
 ULONG address=0UL;

 UINT length=0;
 
 /* Read the BAM data. */
 BAM = get_BAM_data( CurrentBAMptr );
 /* Determine the base absolute address of the Boot Record's block. */
 address = get_block_base( CurrentBAMptr );
 /* Calculate the offset (in that block) of the Boot Record. */
 address += translate_BAM_offset( BAM );
 /* Reat the structure. */
 read_memory( (UCHAR *) &FEDE, address, sizeof( struct FEDE_Entry ) );
 /* And return it. */
 return FEDE;
}



Listing Eight

ULONG go_to_end_of_FEDE( ULONG CurrentBAMptr ) {
 struct FEDE_Entry FEDE;
 UCHAR buffer[12];

 /* Start traversing at the given point in list specified by CurrentBAMptr. */
 do {
 /* Read the FEDE structure. */
 FEDE = get_FEDE( CurrentBAMptr );
 /* Un-comment the following lines and this code will print out 
 the filenames as it goes. Passing it the PrimaryPtr of the parent
 directory would list all of the files/subdirs in that directory. */
 /* strncpy( buffer, FEDE.Name, 11 ); */
 /* buffer[11]=0; */
 /* printf("%s\n", buffer); */
 /* If the this isn't the last element, then save the next pointer. */
 if( FEDE.SiblingPtr != FNULL ) CurrentBAMptr = FEDE.SiblingPtr;
 /* Otherwise exit the loop. */
 } while( FEDE.SiblingPtr != FNULL );
 /* The CurrentBAMptr now references the last structure of the chain. */
 return CurrentBAMptr;
}



Listing Nine

void add_FEDE_sibling( ULONG CurrentBAMptr, ULONG LinkBAMptr ) {
 struct BlockAllocMember BAM;
 struct FEDE_Entry FEDE;
 ULONG LastBAMptr=0UL;
 UINT length=0;
 ULONG address=0UL;

 /* Go to the end of current FEDE chain. */
 LastBAMptr = go_to_end_of_FEDE( CurrentBAMptr );
 /* Read the FEDE. */
 FEDE = get_FEDE( LastBAMptr );
 /* Connect the link and update the status bits. */
 FEDE.SiblingPtr = LinkBAMptr;
 /* Zero bit 6 of the status word, indicating the SiblingPtr is valid. */
 FEDE.Status &= 0xFFCF;

 /* Since only the Sibling pointer and status get programmed (the other data
 remains unchanged) re-write the new data to the same location. */
 address = get_block_base( CurrentBAMptr );
 address += translate_BAM_offset( BAM );
 write_memory( (UCHAR *) &FEDE, address, sizeof( struct FEDE_Entry ) );
}



Listing Ten

ULONG find_Boot_Record( void ) {
 struct BlockAllocStruct BAS;
 UINT current_block=0;
 UCHAR got_BR_flag=FALSE;

 /* Start at physical block one. */
 current_block = 1;
 /* Loop through physical blocks until we find the Boot Record. */
 do {
 /* Read the BAS of the current BAS. */
 BAS = read_BAS( current_block );
 /* No Boot Record, go to next block. */
 if( BAS.Status & 0x70 != 0x30 ) current_block++;
 /* Otherwise set our flag and break with the correct block. */
 else {
 got_BR_flag = TRUE;
 break;
 }
 /* Make sure we don't run out of blocks (external variable) just
 in case this card was not formatted previously. */
 } while( current_block <= number_of_blocks );
 /* If the flag wasn't set then we didn't find the Boot Record. */
 if( !got_BR_flag ) return ERROR; 
 /* Otherwise, return the Boot Record Pointer. */
 return BAS.BootRecordPtr;
}



Listing Eleven

struct BootRecord read_Boot_Record( ULONG BootRecBAMptr ) {
 struct BootRecord BR;
 struct BlockAllocMember BAM;
 ULONG address=0UL;
 UINT length=0;

 /* Given the root directory BAM pointer, read the BAM to find structure. */
 BAM = get_BAM_data( BootRecBAMptr );
 /* Determine the base absolute address of the Boot Record's block. */
 address = get_block_base( BootRecBAMptr );
 /* Calculate the offset (in that block) of the Boot Record. */
 address += translate_BAM_offset( BAM );
 /* Read it. */
 read_memory( (UCHAR *) &BR, address, sizeof( BootRecord ) );
 /* Return the Boot Record. */
 return BR; 
}




Listing Twelve

ULONG get_ROOT_BAM( void ) {
 struct BootRecord BR;
 ULONG BAMptr=0UL;

 /* Search for the Boot Record BAM. */
 BAMptr = find_Boot_Record();
 /* Read the Boot Record. */
 BR = read_Boot_Record( BAMptr );
 /* Return the ROOT BAM Pointer;
 return BR.RootDirectoryPtr;
}

ULONG find_in_FEDE( char *DOSname, ULONG topBAMptr ) {
 struct FEDE_Entry FEDE;
 UCHAR index=0;
 UCHAR error=FALSE;
 char buffer[12];
 
 /* Start traversing the FEDE chain starting with topBAMptr. */
 do {
 error=FALSE;
 index = 0;
 /* Read each FEDE element. */
 FEDE = get_FEDE( topBAMptr );
 /* Compare the first 8 characters of the 11 char. DOS name. */
 do {
 /* If one character doesnt match, set the error flag and break. */
 if( DOSname[index] != FEDE.Name[index] ) {
 error = TRUE;
 break;
 }
 index++;
 } while( index < 8 );
 /* If the name matched, check the extension. */
 if( !error ) {
 do {
 if( DOSname[index] != FEDE.Ext[index-8] ) {
 error = TRUE;
 break;
 }
 index++;
 } while( index < 11 );
 }
 /* If we've finished the matching loop WITHOUT an error, than we found
 our entry. */
 if( !error ) break;
 topBAMptr = FEDE.SiblingPtr;
 } while( topBAMptr != FNULL );

 /* There's No point in checking for the topBAMptr to see if we reached
 the end, we defined ERROR as -1 which is a 32 bit FNULL. */
 return topBAMptr;
}




Listing Thirteen
Listing Thirteen

Listing Fourteen

#define TRUE 0
#define FALSE 1
#define ERROR -1

/* The following assumptions about machine data size are:
 ULONG = dword = 32 bits
 UINT = word = 16 bits
 UCHAR = byte = 8 bits
*/ 

typedef unsigned long ULONG;
typedef unsigned int UINT;
typedef unsigned char UCHAR;

struct BlockAllocStruct {
 ULONG BootRecordPtr;
 ULONG EraseCount;
 UINT BlockSeq;
 UINT BlockSeqChecksum;
 UINT Status;
};
struct BlockAllocMember {
 UCHAR Status;
 UCHAR Offset[3];
 UINT Len;
};
struct FEDE_Entry {
 UINT Status;
 ULONG SiblingPtr;
 ULONG PrimaryPtr;
 ULONG SecondaryPtr;
 UCHAR Attributes;
 UINT Time;
 UINT Date;
 UINT VarStructLen;
 UCHAR NameLen;
 UCHAR Name[8];
 UCHAR Ext[3];
}
/* Function prototypes for our sample code. */
struct BlockAllocStruct read_BAS( UINT block );
UINT log2phy( UINT lblock );
struct BlockAllocMember get_BAM_data( ULONG BAMptr );
ULONG get_block_base( ULONG BAMptr );
ULONG translate_BAM_offset( ULONG BAM );
struct FEDE_Entry get_FEDE( ULONG CurrentBAMptr );
ULONG go_to_end_of_FEDE( ULONG CurrentBAMptr );
void add_FEDE_sibling( ULONG CurrentBAMptr, ULONG LinkBAMptr );
ULONG find_Boot_Record( void );
struct BootRecord read_Boot_Record( ULONG BootRecBAMptr );
ULONG get_ROOT_BAM( void );
ULONG find_in_FEDE( char *DOSname, ULONG topBAMptr );


/* The following two procedures are hardware dependent and must be 
included by the OEM. */
UINT read_memory( UCHAR *to_buffer, ULONG from_address, UINT size );
UINT write_memory( UCHAR *from_buffer, ULONG to_address, UINT size );

/* The following external data describe the flash array. */
const extern ULONG block_size;
const extern UINT number_of_blocks;






















































Designing Servers with CPI-C


Achieving client/server portability




Peter J. Schwaller and John Q. Walker II


Peter, who currently develops ATM software in IBM's Networking Hardware
Division, can be reached on CompuServe at 73602,3201. John also works in IBM
Networking and can be reached on CompuServe at 72440,1544. They are the
authors of CPI-C Programming in C: An Application Developer's Guide, published
by McGraw-Hill.


Advanced Program-to-Program Communication (APPC), also known as LU 6.2, is
software that enables high-speed communications between programs on different
computers, from portables and workstations to midrange and host computers.
APPC software is available for many different operating systems, either as
part of the operating system or as a separate package. 
APPC provides a rich set of functions for creating "conversations" between
programs. Its original design, however, did not specify a common API for
implementing these functions. Consequently, each operating system that
originally supported APPC developed its own native APPC API. Until now, if you
were designing APPC programs for different operating systems, you had to learn
a distinctive verb syntax for each different platform. 
The Common Programming Interface for Communications (abbreviated CPI-C, and
pronounced "sip-ick") eliminates this problem. The CPI-C standard provides a
consistent set of calls for all systems that support it. Although these calls
correspond to APPC verbs, they are easier to use, since the names of the
calls, constants, and variables are the same across all platforms and
programming languages. Whether you are coding for Windows, OS/2, UNIX, AS/400,
CICS, or MVS, you need to learn only one set of calls to write client/server
applications for different systems. 
Almost every CPI-C application is a client/server app. The client program
starts the conversation by issuing a pair of CPI-C calls named
Initialize_Conversation() and Allocate(); the server program connects by
issuing an Accept_Conversation() or Accept_Incoming(). Often, many client
programs want to connect to the same server program. In this article, we'll
discuss server designs that handle multiple clients, even when the server's
resources are constrained.
When the number of clients is small and the transaction rate low, it's okay to
dedicate a server-program instance to each client. As the number of clients
increases, all platforms will, at some point, run out of resources to support
this operating model. Short conversations and accepting multiple conversations
improve server-program throughput and work within resource constraints. 


Using Short Conversations


Long conversations are maintained even when no work is being done. Although
they can result in a lot of idle time, the startup cost of initializing long
conversations is incurred only once. An application using short conversations,
however, deallocates the conversation during idle times. This frees up the
network and server resources for other clients or applications to use. The
disadvantage of short conversations is the overhead of starting a conversation
every time the client needs work from the server. However, that overhead is
less than that of starting a new process.
On the server side, short conversations have the following advantages:
Support for more clients. The server has a finite number of processes or
threads that can be dedicated to serving clients. Short conversations let you
"timeshare" available tasks to the clients. If all available tasks are busy,
the next client waits for a task to become available. But, since you're using
short conversations, the wait is almost always short.
Increase in server throughput. To obtain the highest server throughput, we
always want the server to have some work to perform. In fact, we would like a
variety of tasks for the server to perform, to take advantage of the server
platform's power (for example, disk I/O that can run in parallel with a
calculation task). Short conversations reduce the amount of idle time in each
server task. The chances of having useful work to perform increase because we
don't have to dedicate a task to waiting on an inactive client.
Decrease in the number of server-platform sessions. When many applications run
between the client and server platforms by reusing sessions, but each
conversation is active for only a short period of time, your application can
use the session while another isn't using it. More applications can be run
over fewer sessions. Sessions can also be brought down when they haven't been
used for a period of time. This is done by configuring your connection as a
limited resource.
Recovery in case of connection failure. Less data must be recovered in the
event of a short-conversation failure. In addition, the code to restart a
conversation is already written as part of the short-conversation design, so
the recovery and mainline logic are very similar. This results in more robust
applications.
You will likely first envision your application as a long conversation. Upon
further consideration, you may decide that you need the advantages of short
conversations. To move from the long- to short-conversation model, you first
identify situations when the conversation is inactive. In most cases, you'll
look for instances when the client is waiting for something to happen or to
complete before issuing another request. Examples are waiting for user input
and extensive processing of previously received data. You'll get the most
advantage from short conversations by eliminating as much idle time as
possible.
When breaking up long conversations, you should also determine the smallest
transaction unit that can exist on its own in a single conversation. This
transaction unit may span more than one request/reply, especially if the
requests are related. The conversation startup should not become a significant
portion of the total conversation time. If the conversations are too short,
clients could spend most of their time starting conversations instead of
getting work done. To illustrate how conversations can be broken up, let's
examine a file-transfer program that sends a set of files from the client to
the server. You could design this application with:
Long conversations, where all files are sent on the same conversation: The
client connects to the server, sends each file in succession, and requests
confirmation of all files that were sent, and the conversation is deallocated.
The server is tied up during the entire file transfer and cannot handle
another client. If user input is required between files, there will be
excessive idle time on the conversation. One problem in this design is error
handling. If only one of the files sent cannot be written to disk, the server
cannot interrupt the client without stopping all files already in transit.
There are two choices for handling errors: Either the server uses Send_Error()
whenever the error occurs and the client has to resend files that were already
in transit; or the server has to receive the file that cannot be processed and
discard the data, wasting network bandwidth if the file is large.
Short conversations, where each file is sent on a separate conversation: The
client connects to the server, sends a file, and requests confirmation that
the file was stored successfully. The conversation is deallocated and the
client goes through the previous steps for each file. 
Shorter conversations, where each data record of each file is sent on a
separate conversation: The client connects to the server, sends a file data
record, and requests confirmation. The conversation is deallocated and the
client goes through the previous steps for each data record in the file, then
repeats the process for each file. Since we're not sending very much data on
each conversation and confirming after each send, the conversation overhead is
likely not worth the cost.
Excessively short conversations, where each file byte is sent on a separate
conversation: The client connects to the server, sends a one-byte file-data
record, and requests confirmation. The conversation is deallocated and the
client goes through the previous steps for each byte in the file, then repeats
the process for each file. The number of conversations started is equal to the
number of bytes in all of the files combined (this is definitely not the way
you should design your applications).
Short conversations require that you be concerned with correlating
transactions across the different conversations. For example, in the "shorter
conversations" file-transfer example, the server would have to know what to do
with each data record when it arrives (store it in the file to which the
record belongs, for example).
To correlate short conversations, use an existing data item as a correlator.
In many instances, the resource that the server interacts with already has an
identifier that could be used as a correlator; for example, a file server
could use an operating-system file handle. If there is no acceptable existing
data item to use, you may have to invent your own correlator. If so, consider
using a combination of the client's LU name (from the
Extract_Partner_LU_Name() call) and a unique integer ID generated by the
server program.
One way to avoid correlating short conversations is to design a "stateless
server," where each client request includes all of the information necessary
to complete processing. Although this may result in more data in each request,
the request can be handled independently of any other requests, past or
future. In addition, the server is freed from having to maintain state
information on each client. Thus, increasing the number of clients does not
increase the server program's memory requirements.


Conversation-Startup Overhead


As we move toward using short conversations in servers, we start conversations
more often. Thus, conversation-startup overhead becomes a bigger part of our
performance concerns. To determine how to reduce it, let's look at the steps
that occur when the client connects to the server and see how long each step
takes. Assume the sequence of calls on the client and server shown in Example
1. At this point, the client has established a conversation and verified that
the server program is running. The elapsed times assume a LAN transport and,
thus, a short propagation delay.
The client's Initialize_Conversation() call pulls the necessary CPI-C
parameters from a side information table. This is usually stored in memory
while CPI-C is running and, therefore, is a very fast operation, usually on
the order of tens of milliseconds, at most.
The client's Allocate() call first ensures that a session is available for use
and allocates a conversation to it for use by the client program. The first
time you Allocate() your conversation, session activation is performed, taking
on the order of hundreds of milliseconds to complete. Subsequent Allocate()
requests can reuse that session (serially, not simultaneously). Then, the only
overhead of the Allocate() call is the matching of a conversation to an active
session, which takes on the order of tens of milliseconds. Many configuration
options exist to ensure that an active session will be available for use by
your program. (Most programs are not concerned with session activation and
have little control over it. Session activation is not normally a source of
performance problems since it is usually done only once.) 
Lastly, Allocate() puts an Attach into APPC's buffers to be sent to the server
platform. The Attach contains all of the program-startup and security
information for the conversation. In the client program in Example 1,
Confirm() flushes the Attach and sends it to the server platform along with
the confirmation request. 
On the server platform, the processing of the Attach header itself is usually
simple, taking only about 20 msecs. If the server program is already running,
Accept_Conversation() gets the conversation ID, and we're off and running. If
the server program is not already running, the server platform will have to
start the program. The overhead to start a program varies among platforms, but
a good rule of thumb is that program startup usually takes between 1 and 10
seconds to complete. In the server program in Example 1, Receive() and
Confirmed() take about another 10 msecs to complete. Table 1 summarizes where
the time is spent. 
Program startup is the last major element of startup overhead, and its time
varies from platform to platform. On a system like CICS, which was designed
for quick program startup and takedown, program startup is likely to be less
than 10 msecs. Although normal program-startup time on OS/2 is around 1 to 2
seconds, a slow PC running OS/2 with little memory could take minutes! 
To limit program-startup time, it's best for the server program to be running
when an Attach arrives from the client. Ideally, you would like to start one
copy (or many copies) of the server program and have it accept one
conversation after another without ending their processes. 
Since we're looking for optimal performance and we're using short
conversations, we cannot afford to start a copy of our server program for each
conversation. (An exception is CICS, which is optimized to make program load
blindingly fast.)
Starting the server program is usually the biggest part of
conversation-startup overhead. To avoid program-startup costs, we would like
to design our server program to accept multiple conversations without exiting.



Accepting Multiple Conversations



Starting with CPI-C version 1.2, programs have been able to accept multiple
conversations within a single program. Your programs can now handle multiple
conversations or multiple clients without the overhead of program startup for
each conversation.
Accepting multiple conversations in CPI-C 1.2 is easy; just issue another
Accept_Conversation() call. The easiest way to convert your programs to accept
multiple conversations is to add a loop around your main processing. In Figure
1, for instance, the program should exit whenever an Accept_Conversation()
call fails. An Accept_Conversation() failure usually indicates one of the
following:
The program is running on an old CPI-C platform that doesn't support accept
multiple, so your program will never be able to accept a new conversation.
The TP definition for the server program isn't set up correctly to accept
multiple conversations. For example, on the OS/2 Communications Manager, a TP
definition can specify that it is nonqueued, meaning that the attach manager
should start a new instance of the program running for each incoming Attach.
No incoming conversation arrived within a time-out period. There wasn't an
incoming Attach in a specified time period. Rather than tying up resources
longer than necessary, you should end your program and free up those
resources. Let the attach manager start a new server program when necessary.
The time-out period is usually a configuration option.
In each of these cases, you don't have to worry about servicing new
conversations since the attach manager will start new server programs as
necessary. Listing One , an adaptation of a simple server program, illustrates
how to code programs to accept multiple conversations. We've modified the main
loop to process the incoming data in a separate procedure. This just makes it
easier to see how the accept-conversation processing works and to convert this
program to use multiple threads. The only thing controlling how long the
program stays active is the return code from the Accept_Conversation() call.
As long as the return code is CM_OK, the program continues to accept
conversations. Listing Two shows a simple client program that connects to the
server program.
Although not specifically a CPI-C function, you can use multiple threads
within your server to handle multiple conversations simultaneously. Using
multiple threads allows your server to handle multiple clients without the
overhead of multiple processes. More clients are serviced with fewer server
resources. You can use multiple threads in your server programs in many
different ways. We'll examine two: 
A main thread accepts conversations, then starts worker threads to process
each conversation. This allows your server to process all the client
conversations that arrive, up to the system thread limit. This technique is
useful only if the number of conversations is not expected to grow beyond the
thread limit. If your program does reach the thread limit, it is difficult to
determine when threads are free to accept new conversations again.
A set of N threads are started. Each accepts and processes conversations in a
loop. This allows your program to explicitly specify how many threads and,
thus, how much resource it will use up. The actual number of threads may be
tuned to provide the best server throughput without overloading or thrashing
the server platform. Listing Three is an example of the server program adapted
to accept many conversations and start a thread to process each. 
You can also choose to write your server using CPI-C 1.2 nonblocking features.
The advantages are that the number of client conversations is limited by the
number of sessions, rather than the number of threads or processes. Also,
nonblocking features free your program from operating-system dependencies, and
they are portable. 
The disadvantage is the extra overhead required for nonblocking processing.
Although the overhead will be less than that for implementing nonblocking
using threads, a nonblocking call is more expensive than a normal procedure
call. Furthermore, your program must supply and maintain parameters for each
nonblocking call it issues. CPI-C keeps the addresses of your parameters until
the nonblocking call completes. If your program issues four nonblocking
Receive() calls, you must have four sets of Receive() parameters, including
four Receive() buffers. If you are using nonblocking calls, we recommend using
C structures to keep the sets of parameters together as one unit. Finally,
your program must maintain complete state information for each conversation.
When the nonblocking call completes, you are only told the conversation ID and
the return code. Your program must remember what CPI-C call actually completed
and what call should be issued next on that conversation. 


Conclusion


CPI-C is a powerful API for creating client/server applications. Early
versions of CPI-C made it easy to build portable clients, but server programs
were limited in their capacity. As CPI-C has become an industry standard, it
has been enhanced to allow building powerful servers, as well. 


References


Walker, John Q. II and Peter J. Schwaller. CPI-C Programming in C: An
Application Developer's Guide to APPC. New York: McGraw-Hill, 1994. ISBN
0-07-911733-3. 
The Best of APPC, APPN, and CPI-C. IBM CD-ROM #SK2T-2013.
Example 1: Usual sequence of startup calls. (a) Client; (b) server.
(a)
Initialize_Conversation()
Allocate()
Confirm()
(b)
Accept_Conversation()
Receive()
Confirmed()
Table 1: Conversation-startup overhead.
Initialize <10 msec
Session activation 100--1500 msec
Conversation allocation <10 msec
Attach About 20 msec
Program startup About 1--10 sec
Figure 1 Accepting multiple conversations.

Listing One 

/*---------------------------------------------------------------
 * CPI-C example program, displaying received records
 * server side (file SERVER1D.C)
 *-------------------------------------------------------------*/
#include <cpic.h> /* conversation API library */
#include <stdio.h> /* file I/O */
#include <stdlib.h> /* standard library */
#include <string.h> /* strings and memory */
#define RECEIVE_SIZE (10) /* receive 10 bytes at a time */

static void process_incoming_data(unsigned char *conversation_ID);

int main(void)
{

 unsigned char conversation_ID[CM_CID_SIZE];
 CM_RETURN_CODE cpic_return_code;
 setbuf(stdout, NULL); /* assure unbuffered output */
 do {
 cmaccp( /* Accept_Conversation */
 conversation_ID, /* returned conversation ID */
 &cpic_return_code); /* return code from this call */
 if (cpic_return_code == CM_OK) {
 printf("Accepted a conversation...\n");
 process_incoming_data(conversation_ID);
 }
 else {
 (void)fprintf(stderr,
 "Return code %lu on CMACCP\n", cpic_return_code);
 }
 } while (cpic_return_code == CM_OK);
 (void)getchar(); /* pause for a keystroke */
 return(EXIT_SUCCESS);
}
static void process_incoming_data(unsigned char *conversation_ID)
{
 unsigned char data_buffer[RECEIVE_SIZE];
 CM_INT32 requested_length = (CM_INT32)sizeof(data_buffer);
 CM_INT32 received_length;
 CM_DATA_RECEIVED_TYPE data_received;
 CM_REQUEST_TO_SEND_RECEIVED rts_received;
 CM_STATUS_RECEIVED status_received;
 unsigned done = 0;
 CM_RETURN_CODE cpic_return_code;

 while (done == 0) {
 cmrcv( /* Receive */
 conversation_ID, /* conversation ID */
 data_buffer, /* where to put received data */
 &requested_length, /* maximum length to receive */
 &data_received, /* returned data_received */
 &received_length, /* length of received data */
 &status_received, /* returned status_received */
 &rts_received, /* ignore this parameter */
 &cpic_return_code); /* return code from this call */
 /* replace the following block with the good algorithm
 * that's shown in the program sketch in the text. */
 if ((cpic_return_code == CM_OK) 
 (cpic_return_code == CM_DEALLOCATED_NORMAL)) {
 /* write the received string to stdout */
 (void)fwrite((void *)data_buffer, (size_t)1,
 (size_t)received_length, stdout);
 if (data_received == CM_COMPLETE_DATA_RECEIVED) {
 (void)fputc((int)'\n', stdout); /* newline */
 }
 }
 if (cpic_return_code != CM_OK) {
 done = 1; /* CM_DEALLOCATED_NORMAL or unexpected */
 }
 }
}




Listing Two 

/*-----------------------------------------------------------
 * CPI-C example program, sending command-line parameters.
 * Client side (file HELLO5.C)
 *-------------------------------------------------------------*/
#include <cpic.h> /* conversation API library */
#include <string.h> /* strings and memory */
#include <stdlib.h> /* standard library */
#include <stdio.h> /* standard I/O */

/* this hardcoded sym_dest_name is 8 chars long & blank padded */
#define SYM_DEST_NAME (unsigned char*)"SERVERS "

int main(int argc, char *argv[])
{
 unsigned char conversation_ID[CM_CID_SIZE];
 CM_RETURN_CODE cpic_return_code;
 cminit( /* Initialize_Conversation */
 conversation_ID, /* returned conversation ID */
 SYM_DEST_NAME, /* symbolic destination name */
 &cpic_return_code); /* return code from this call */
 if (cpic_return_code != CM_OK) {
 printf("Error on CMINIT, RC was %ld\n",
 cpic_return_code);
 }
 cmallc( /* Allocate */
 conversation_ID, /* conversation ID */
 &cpic_return_code); /* return code from this call */
 if (cpic_return_code != CM_OK) {
 printf("Error on CMALLC, RC was %ld\n", cpic_return_code);
 }
 {
 /* send each command-line argument, one per send */
 int index;
 for (index = 0; index < argc; index++) {
 CM_REQUEST_TO_SEND_RECEIVED rts_received;
 CM_INT32 send_length = (CM_INT32)strlen(argv[index]);
 cmsend( /* Send_Data */
 conversation_ID, /* conversation ID */
 (unsigned char *)argv[index], /* send this */
 &send_length, /* length to send, no null */
 &rts_received, /* ignore this parameter */
 &cpic_return_code); /* return code */
 if (cpic_return_code != CM_OK) {
 printf("Error on CMSEND, RC was %ld\n", cpic_return_code);
 }
 }
 }
 cmdeal( /* Deallocate */
 conversation_ID, /* conversation ID */
 &cpic_return_code); /* return code from this call */
 if (cpic_return_code != CM_OK) {
 printf("Error on CMDEAL, RC was %ld\n", cpic_return_code);
 }
 return(EXIT_SUCCESS);
}




Listing Three

/*---------------------------------------------------------------
 * CPI-C example program, displaying received records
 * server side (file SERVER2D.C)
 *-------------------------------------------------------------*/
#include <cpic.h> /* conversation API library */
#include <stdio.h> /* file I/O */
#include <stdlib.h> /* standard library */
#include <string.h> /* strings and memory */
#include <process.h>
#define RECEIVE_SIZE (10) /* receive 10 bytes at a time */

static void process_incoming_data(void *void_conversation_ID);

int main(void)
{
 unsigned char * conversation_ID;
 CM_RETURN_CODE cpic_return_code;
 int thread_id;

 setbuf(stdout, NULL); /* assure unbuffered output */
 do {
 conversation_ID = malloc(CM_CID_SIZE);
 if (conversation_ID != NULL) {
 cmaccp( /* Accept_Conversation */
 conversation_ID, /* returned conv ID */
 &cpic_return_code);
 if (cpic_return_code == CM_OK) {
 printf("Accepted a conversation...\n");
 thread_id = _beginthread(
 process_incoming_data,
 NULL, /* have C allocate the */
 /* stack for the thread */
 8192, /* specify stack size */
 (void*)conversation_ID);
 if (thread_id == -1) {
 perror("Error creating thread.");
 }
 }
 else {
 (void)fprintf(stderr,
 "Return code %lu on CMACCP\n", cpic_return_code);
 }
 }
 else {
 printf("Error getting memory!\n");
 cpic_return_code = -1;
 }
 } while (cpic_return_code == CM_OK);
 (void)getchar(); /* pause for a keystroke */
 return(EXIT_SUCCESS);
}
static void process_incoming_data(void * void_conversation_ID)
{
 unsigned char data_buffer[RECEIVE_SIZE];
 CM_INT32 requested_length = (CM_INT32)sizeof(data_buffer);
 CM_INT32 received_length;

 CM_DATA_RECEIVED_TYPE data_received;
 CM_REQUEST_TO_SEND_RECEIVED rts_received;
 CM_STATUS_RECEIVED status_received;
 unsigned done = 0;
 CM_RETURN_CODE cpic_return_code;
 unsigned char * conversation_ID = (unsigned char *) void_conversation_ID;
 while (done == 0) {
 cmrcv( /* Receive */
 conversation_ID, /* conversation ID */
 data_buffer, /* where to put received data */
 &requested_length, /* maximum length to receive */
 &data_received, /* returned data_received */
 &received_length, /* length of received data */
 &status_received, /* returned status_received */
 &rts_received, /* ignore this parameter */
 &cpic_return_code); /* return code from this call */
 /* replace the following block with the good algorithm
 * that's shown in the program sketch in the text. */
 if ((cpic_return_code == CM_OK) 
 (cpic_return_code == CM_DEALLOCATED_NORMAL)) {
 /* write the received string to stdout */
 (void)fwrite((void *)data_buffer, (size_t)1,
 (size_t)received_length, stdout);
 if (data_received == CM_COMPLETE_DATA_RECEIVED) {
 (void)fputc((int)'\n', stdout); /* newline */
 }
 }
 else {
 printf("unexpected error %lu\n", cpic_return_code);
 }
 if (cpic_return_code != CM_OK) {
 done = 1; /* CM_DEALLOCATED_NORMAL or unexpected */
 }
 }
 free(conversation_ID);
}



























Building an OLE Server Using Visual C++ 2.0


Embedding GIF files into OLE 2.0-compatible documents


John has been programming computers on a variety of platforms for 12 years. He
is currently a senior design engineer at Compton's NewMedia in Carlsbad,
California, where he works on multimedia CD-ROM titles.


The message from Microsoft is clear: If you are a Windows developer, you
should be writing 32-bit applications, developed on 32-bit platforms. The
proof is Microsoft's Visual C++ 2.0, a fully 32-bit hosted environment that
requires either Windows NT 3.5 or Windows 95 (Chicago) to run. Significantly,
Microsoft no longer will release new 16-bit-hosted compiler environments, nor
does this environment target 16-bit code. For 16-bit developers Visual C++ 1.5
is still included in the box. The version I tested, however, supports creating
only 32-bit applications targeted at NT, Windows 95, and Win32s. Microsoft
promises future support for NT on MIPS, Alpha, and PowerPC as well as
Macintosh on 680x0 and PowerPC. 
Although the system requirements for the environment are a bit steep by
today's standards, I found developing code under Windows NT to be a surprising
pleasure. NT's responsiveness and robustness are more important assets than I
had realized. Preemptive multitasking and threads allow you to work in
multiple windows without any of the hiccups and hesitations we have grown used
to under Windows 3.1. NT's crash resistance allows uninterrupted debugging
sessions. Visual C++ 2.0 (VC++ 2.0) employs multithreading internally, which
allows you to compile in the background (with some limitations) and still do
other things such as edit files or resources in the foreground. 
To put VC++ to the test, I created an OLE server called "GIFSERV," which lets
you embed a CompuServe GIF image file into an OLE 2.0 compatible document. 
GIFSERV is a Single Document Interface (SDI) application, meaning that it will
only open one GIF image at a time. It supports a File/Open command which
allows opening and displaying an existing GIF image. For simplicity, GIFSERV
does not support image modifications. 


What's Inside


In setting up VC++, I opted for a full install, except for the "Books Online"
feature, which is of considerable size and is best left on the CD-Rom.
Overall, the installation consumed approximately 84 Mbytes of hard-disk space.
In addition to the compiler environment, several icons representing various
utilities are installed in Program Manager: Windiff, a file differencer
originally shipped with the SDK; Pview, a process viewer allowing viewing
statistics on any running process and its threads; Spy++, an improved version
of Spy for capturing Windows messages; and Tracer, a kind of global filter
that enables tracing of all kinds of system events. 
VC++ 2.0 also includes the Microsoft Foundation Classes 3.0 (MFC 3.0), which
adds several higher-level features such as enhanced toolbars, miniframe
windows, and tabbed dialogs (property sheets). Win32 enhancements include
support for multithreading, Unicode, shared 32-bit DLLs, and new 32-bit APIs.
The library also supports C++ 3.0 language enhancements, including templates
and exceptions. Templates are used in six collection classes. According to
Microsoft, true C++ exception handling is used widely throughout the class
library, although I wasn't able to verify this.


User-Interface Enhancements


The first thing you'll notice about the VC++ 2.0 environment is a plethora of
UI enhancements. Microsoft has made liberal use of the new Windows 95
controls, notably property sheets (tabbed dialogs) and dockable toolbar
windows, implemented with a new, chiseled, three-dimensional look. The
environment supports drag-and-drop for a number of features and has added
right-mouse-button-activated pop-up menus pretty much everywhere.
Property sheets go a long way toward reducing user-interface clutter in a
consistent way. A very effective use of property sheets, for example, is in
the Output window. This is a general-purpose, scrolling-text output window
that displays output from compilations, file searches, debug traces, and
profilers. Each type of output is kept in a separate property sheet within
this one window. You can access any of them by selecting among the tabs along
the bottom.
Dockable windows are dual-personality beasts. They are normal, overlapped
windows: not MDI child windows, but floating, overlapped windows. When dragged
into contact with one of the four sides of the application window, however,
they're "docked"--attached to the margin of the application where they were
dropped. A double-click around the perimeter of a docked toolbar restores it
to its overlapped personality.
My first impression of dockable toolbars was that they are gimmicky. After
some use, however, I came to like their flexibility. For quick reference, they
are ideal for popping up temporarily in the overlapped mode, then closing
right away. This allows you to leave your existing windows in place without
any fussing. For long-term reference, dockable toolbars are best docked
because they're kept visible. Arranging existing windows (by tiling, for
example) will automatically account for the docked toolbars position. I found
docked mode great for the watch window.
Drag-and-drop is an ergonomic, intuitive advantage when used appropriately.
For instance, customizing a toolbar is straightforward. Choosing the
Tools/Customize option from the main menu introduces a pop-up dialog box with
a palette of toolbar buttons to choose from. These are cascaded by category in
property sheets--another plus. Building your custom toolbar involves dragging
buttons between this dialog box and the toolbar you are customizing. Done. Any
of the floating/dockable toolbars can be customized at the same time, and each
responds immediately by accepting and displaying in its new configuration.
Moving buttons on a toolbar is a simple matter of dragging them to their new
location.
ToolTips, another Windows 95 feature, provides small fly-out hints when the
cursor passes freely over toolbar buttons, much like the more familiar status
bar hints that appear at the bottom of most applications. For some reason,
Microsoft implemented ToolTips with a short delay, which requires pausing the
mouse momentarily before they appear. I found this irritating, especially
since the delay seems to be unpredictable.
Editing resources in VC++ is straightforward. Each resource can be launched
into the appropriate editor window by double-clicking. The menu editor is easy
to use, relying on drag-and-drop semantics, allowing true visual editing of
menu layouts.


Putting VC++ to Work


My approach to the OLE server project was to use AppWizard to generate a
skeleton application as a starting point for development. Once generated, the
application cannot be further modified through the AppWizard. Therefore, it is
a good idea to get these selections right the first time. My selections for
GIFSERV included Single Document Interface, no ODBC support, and OLE 2.0
full-server capability with OLE 2.0 Automation. The wizard allows
customization of window styles, dialog as main window, toolbar, status bar,
built-in printing and print-preview support, context-sensitive help, support
for Most-Recently-Used (MRU) file list on the File menu, default filenames and
file types, OLE registry names, and names of all classes and generated files.
Listings One and Two are SRVRDOC.H and SRVRDOC.CPP, respectively. SRVRDOC
implements the CGIFServerDoc class, a CDocument-derived class representing GIF
documents. Except for minor additions, these two files were completely
generated by AppWizard.
AppWizard creates all source files, adding them automatically to the new
project file. It didn't miss a trick, even including a README.TXT file which
explained the function of each of the generated files. The Wizard also
generates a thorough set of resources, including several bitmaps (for the
toolbars, icons, menus, and accelerator tables--normal and OLE in-place), a
version-info resource, a dialog for a custom About Box, and a number of string
tables. This saves repetitive work, especially when it comes to nonessential
details such as the version and string-table resources, which always seem to
slip through the cracks when done by hand.
Project windows are graphically oriented, displaying a tree-like structure for
compilation-dependent nodes. VC++ does a thorough job with resources, tracking
the compilation dependencies created in the resource (.RC) file. This includes
files #included by the .RC file as well--a weakness in other environments.
Typically, bitmap and icon resources are kept in separate files. Other
resources are written directly into the resource file. Double-clicking on any
of the resources from the dependency tree automatically launches you into the
editor window for that resource.
The project created by AppWizard compiled fine, and I was able to run GIFSERV
immediately. Although the code to read a GIF image was yet to be added,
running the application one time caused my new OLE class to register itself in
the registry database as the "Gifser Document" class (Gifserv was truncated
due to a standard character limit on OLE class names--this can be changed).
This allowed the Gifser class to appear automatically in other OLE 2.0
container applications in the Edit/Insert Object dialog box.
Compilation speed is a two-edged sword in VC++. Microsoft has included an
incremental linker with VC++ to speed up linking of small code changes. I
found incremental links to be fast. This addition is worth the overhead.
Apparently, supporting incremental linking, plus VC++'s compiled browser
database, makes the full-build time slow to a crawl. For example, compiling
and incrementally linking a single .CPP file took 17 seconds, while a full
rebuild of the GIFSERV project took 1 minute, 40 seconds.
Thanks to multithreading, it is possible to continue working in the foreground
during builds, with one caveat: The background compilation thread stalls
whenever the foreground thread blocks in a modal dialog box. This appears to
be an architectural issue with Windows NT that impacts productivity. Whenever
I performed searches, search/replace, file open, and so on, the background
compilation would freeze until the operation had finished. Be careful not to
leave an open dialog when you head out for that coffee break--your compilation
will be sitting there staring at you when you come back.


App Development


The next step was to modify the AppWizard-generated classes to open and
display GIF images. I took the simple approach of borrowing an existing class,
XGIFPicture, that implements the GIF decoding. The complete GIFPICT.CPP code
is provided electronically; see "Availability" on page 3. The class
definition, however, is provided in Listing Four, page 113. The class lets you
create an object that represents an in-memory bitmap image. The constructor
takes a CFile object that is assumed to represent a GIF file on disk. After
construction, the object can be asked to draw itself into a DC. The class
supports width(), height(), and draw(CDC &, const RECT &) functions.
In anticipation of future expansion to other image formats, the XGIFPicture
class is derived from an abstract base class, XPictureBase; see Listing Three
for the class definition. The width(), height(), and draw(CDC &, const RECT &)
functions are all declared pure virtual in XPictureBase. For this reason, the
CDocument-derived class that handles the data, CGIFServerDoc, deals with a
pointer to an XPictureBase object, pPicture. All of the imaging code is
isolated in the XPictureBase-derived object. CGIFServerDoc only has to worry
about instantiating the picture, freeing it, and passing along the appropriate
WM_PAINT notifications by calling its draw() routine.
To correctly free the XPictureBase object in a Single Document Interface
environment, the MFC documentation says to override the
CDocument::DeleteContents() in my derived version of CDocument. I used
ClassWizard to do this. ClassWizard automates adding your own versions of
inherited virtual functions. It presents a list of inherited functions to
choose from. Once I chose DeleteContents(), ClassWizard added all its
declarations automatically. 
I also needed to add a new, noninherited function, CGIFServerDoc::draw(CDC &,
const Rect &). ClassWizard is only equipped to add limited kinds of functions
to a class; in particular, inherited virtual functions. Therefore I could not
coerce ClassWizard into adding this new function for me; I had to add it by
hand. Nor would it create a new class not derived from an existing MFC class.
This might have been a minor oversight, since with MFC, everything should
theoretically be derived from CObject. However, ClassWizard does not include
CObject in its selection of classes to derive from. Unfortunately, it only
offers limited automation. 
A capable browser has been packaged with VC++. Microsoft precompiles not only
the class-structure information, but also full cross-reference lists for all
symbols into the browser and function-call graph information. This was one of
my favorite features. First, you can right-mouse click any symbol in the
editor and jump to either its source definition or its references anywhere in
your code. This functionality is reproduced in the browser window as well,
along with additional browsing capability. In the browser window, you can
access these lists of references in full, organized by symbol, class or file,
and jump directly to those source files. The class-inheritance structure is
shown in the browser, with all the class members. As you highlight each
member, all of its references appear in an adjacent split window. There are
your standard options for filtering the class- member lists. Simple
function-call graphs are also available which show who calls a certain
function or who is called from a certain function.
Having this browser capability made deciphering the AppWizard-generated
classes much easier on several occasions. I was immediately able to jump to
Foundation Class header files where symbols were defined to have a closer
look. However, I am puzzled as to why Microsoft didn't make the browser
accessible from the right-mouse button within the editor. 


Debugging



Two new debugging features in VC++ promise to be significant advances over
conventional debug environments: Just-in-Time debugging and OLE RPC (Remote
Procedure Call) debugging. Just-in-Time debugging lets you configure an
application so that if a fatal exception occurs, the debugger will
automatically launch. This occurs whether or not you have the VC++ environment
running or even have debug information compiled into the app. When such an
exception occurs, you are presented with a dialog box explaining the
exception, giving the code address, and asking if you want to debug it. If
selected, up pops an instance of VC++, showing you exactly where in the code
the exception has occurred. You are locked out of normal project-building
operations while in Just-in-Time Debugging; otherwise, this is a very cool
feature and should make obsolete the standard post-mortem-type logging tools
such as Dr. Watson. 
I had more problems with OLE RPC debugging. RPC debugging allows you to debug
two communicating applications simultaneously. This will be particularly
valuable for OLE developers. The idea is to start up a debug session of your
OLE container application from one instance of the VC++ environment. As you
single step into a call that invokes an RPC operation in the other
application, a second instance of VC++ comes up automatically, with you in the
driver's seat, debugging the second (server) application. Control will revert
back to the container when the RPC operation is finished.
For example, I was using DRAWCLI, the sample container application that comes
with VC++, to test GIFSERV. It was having trouble activating GIFSERV.EXE
properly when I double-clicked on a GIFSERV object. By setting a breakpoint in
the OnButtonLDblClk() routine of DRAWCLI, then single stepping, I was able to
traverse the process boundary over to GIFSERV somewhere deep in the bowels of
MFC on a DoVerb() call. Another instance of VC++ popped up. I stepped through
that. It then reverted back to the calling instance of VC++ just fine. I was
impressed. However, it was impossible to repeat that cycle a second time. The
debugger often disabled all the single-step buttons, or I found myself in a
disassembly window stepping through kernel assembly code, never to launch the
second instance of VC++. Also, it appears that you must single-step to get the
transition to occur. When I ran the app, no process switch occurred. At the
time of this writing, I am still immersed in OLE documentation trying to
resolve this problem. This just goes to show you that App-Wizards and MFC
encapsulation can take you only so far down the OLE path. At some point, you
will have to roll up those shirt sleeves and grunt it out if you are going to
do OLE.
The debugger offers other nice features. Drag-and-drop is supported for
viewing variables in watch and memory windows, for viewing code in the
disassembly window, or for viewing memory by dragging a register value to the
memory window. The debugger makes no distinction between application source
code and MFC class-library code. You can trace into the source code of any MFC
function. VC++ handles the details of locating source files smoothly.
Another nice touch is the ability of the QuickWatch window to detect the
actual object type when all you have is a pointer to an object of the
base-class type. For instance, it was a simple matter to view all the member
values for the XGIFPicture object stored in the CGIFServerDoc as the pPicture
pointer, even though pPicture is a pointer to the base class XPictureBase. See
Listing One for the pPicture declaration. Highlighting the pPicture variable
first, then launching the QuickWatch window, produced a display showing all
the XPictureBase class members, plus an extra "ghost" member of type
XGIFPicture*, which you could double-click to expand into all of its member
values for the current object. Surprisingly this feature does not require that
your classes be derived from MFC's CObject class. It seems to use C++ 3.0's
run-time type checking. I would like to see this feature in the regular Watch
window as well.


Conclusion


Microsoft, it would seem, has put all of its wood behind the 32-bit arrow.
VC++ 2.0 is an impressive product. The OLE 2.0 support, RPC debugging, and
full 32-bit implementation all advance the state of the art for Windows
development environments. Microsoft has also concentrated heavily on usability
that clearly puts this environment on par, or ahead of, traditionally
user-friendly environments like Borland C++. However, there are speed issues
to be addressed. Finally, the benefit of instantly generating an
OLE-compatible application is invaluable, and I would not do without it.
However, when writing realistic OLE applications, the learning curve for OLE
is as it has always been--tough. 

Listing One 

/*------------------------- SRVRDOC.H ----------------------------*/
// This is the CDocument-derived class that encapsulates a picture
// object and forwards all drawing messages to it.

class CGIFServerItem;
class XPictureBase;

class CGIFServerDoc : public COleServerDoc
{
protected: // create from serialization only
 CGIFServerDoc();
 DECLARE_DYNCREATE(CGIFServerDoc)
// Attributes
public:
 CGIFServerItem* GetEmbeddedItem()
 { return (CGIFServerItem*)COleServerDoc::GetEmbeddedItem(); }
// Operations
public:
// Overrides
 // ClassWizard generated virtual function overrides
 //{{AFX_VIRTUAL(CGIFServerDoc)
 public:
 virtual BOOL OnNewDocument();
 virtual BOOL OnOpenDocument(LPCTSTR lpszPathName);
 virtual void DeleteContents();
 protected:
 virtual COleServerItem* OnGetEmbeddedItem();
 //}}AFX_VIRTUAL
// Implementation
public:
 virtual ~CGIFServerDoc();
 virtual void Serialize(CArchive& ar);
#ifdef _DEBUG
 virtual void AssertValid() const;
 virtual void Dump(CDumpContext& dc) const;
#endif
protected:
// Generated message map functions
protected:
 //{{AFX_MSG(CGIFServerDoc)
 // NOTE -the ClassWizard will add and remove member functions here.
 // DO NOT EDIT what you see in these blocks of generated code !
 //}}AFX_MSG

 DECLARE_MESSAGE_MAP()
 // Generated OLE dispatch map functions
 //{{AFX_DISPATCH(CGIFServerDoc)
 // NOTE -the ClassWizard will add and remove member functions here.
 // DO NOT EDIT what you see in these blocks of generated code !
 //}}AFX_DISPATCH
 DECLARE_DISPATCH_MAP()
// The following lines of code were added by hand:
public:
 void draw(CDC &, const RECT &);
private:
 XPictureBase * pPicture;
};



Listing Two

/*------------------------- SRVRDOC.CPP --------------------------*/
#include "stdafx.h"
#include "gifserv.h"
#include "srvrdoc.h"
#include "srvritem.h"
#include "gifpict.hpp"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

/////// CGIFServerDoc ////////
IMPLEMENT_DYNCREATE(CGIFServerDoc, COleServerDoc)
BEGIN_MESSAGE_MAP(CGIFServerDoc, COleServerDoc)
 //{{AFX_MSG_MAP(CGIFServerDoc)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

BEGIN_DISPATCH_MAP(CGIFServerDoc, COleServerDoc)
 //{{AFX_DISPATCH_MAP(CGIFServerDoc)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_DISPATCH_MAP
END_DISPATCH_MAP()

////// CGIFServerDoc construction/destruction ////// 
CGIFServerDoc::CGIFServerDoc()
{
 // TODO: add one-time construction code here
 pPicture = NULL;
 EnableAutomation();
 AfxOleLockApp();
}
CGIFServerDoc::~CGIFServerDoc()
{
 AfxOleUnlockApp();
}
BOOL CGIFServerDoc::OnNewDocument()

{
 if (!COleServerDoc::OnNewDocument())
 return FALSE;
 // TODO: add reinitialization code here
 // (SDI documents will reuse this document)
 TRACE("New Document\n");
 ASSERT(pPicture == NULL);
 return TRUE;
}
///// CGIFServerDoc server implementation /////
COleServerItem* CGIFServerDoc::OnGetEmbeddedItem()
{
 // OnGetEmbeddedItem is called by the framework to get the COleServerItem
 // that is associated with the document. It is only called when necessary.
 CGIFServerItem* pItem = new CGIFServerItem(this);
 ASSERT_VALID(pItem);
 return pItem;
}
///// CGIFServerDoc serialization /////
void CGIFServerDoc::Serialize(CArchive& ar)
{
 if (ar.IsStoring())
 {
 // TODO: add storing code here
 TRACE("Storing serial document\n");
 }
 else
 {
 // TODO: add loading code here
 TRACE("Loading serial document\n");
 ASSERT(pPicture == NULL); 
 pPicture = new XGIFPicture(*ar.GetFile());
 }
}
///// CGIFServerDoc diagnostics /////
#ifdef _DEBUG
void CGIFServerDoc::AssertValid() const
{
 COleServerDoc::AssertValid();
}
void CGIFServerDoc::Dump(CDumpContext& dc) const
{
 COleServerDoc::Dump(dc);
}
#endif //_DEBUG

///// CGIFServerDoc commands /////
BOOL CGIFServerDoc::OnOpenDocument(LPCTSTR lpszPathName) 
{
 if (!COleServerDoc::OnOpenDocument(lpszPathName))
 return FALSE;
 // TODO: Add your specialized creation code here
 TRACE("Open Document\n");
 return TRUE;
}
// This function was added automatically by ClassWizard.
void CGIFServerDoc::DeleteContents()
{
 // TODO: Add your specialized code here or call the base class

 TRACE("Delete Document Contents\n");
 // These two lines were added manually to free the picture object
 delete pPicture;
 pPicture = NULL;
 COleServerDoc::DeleteContents();
}
// This function was added by hand. ClassWizard could not do it. It basically
// passes the WM_PAINT message along to the picture object contained herein.
void CGIFServerDoc::draw(CDC & dc, const RECT & r2)
{
 if (pPicture != NULL)
 pPicture->draw(dc, r2);
}



Listing Three

/*------------------------- PICTURE.HPP --------------------------*/
// Abstract base class for all types of picture objects.

#ifndef __PICTURE_HPP
#define __PICTURE_HPP

#include <windows.h>
#include "error.hpp"

class XPictureBase
{
 protected:
 XeStatus mnStatus;
 public:
 XPictureBase();
 virtual ~XPictureBase();
 // Non-virtual member functions
 XeStatus status() const;
 BOOL isOK() const;
 // Pure virtual member functions
 virtual short height() const = 0;
 virtual short width() const = 0;
 virtual XeStatus draw(CDC & destDC, const RECT &destRect) = 0;
};
#endif // __PICTURE_HPP



Listing Four

/*------------------------- GIFPICT.HPP --------------------------*/
// Class XGIFPicture, creates picture objects derived from GIF
// image files on disk.
#ifndef __GIFPICT_HPP
#define __GIFPICT_HPP

#include "picture.hpp"

class XGIFPicture : public XPictureBase
{
 public:

 XGIFPicture(CFile & file);
 virtual ~XGIFPicture();
 // Inherited pure virtuals implemented in this class
 virtual short height() const;
 virtual short width() const;
 virtual XeStatus draw(CDC & destDC, const RECT &destRect);
 private:
 // Private member variables
 long mlImageSize; // Size of the image in bytes
 unsigned short munBitsPerPixel; // Usually 1, 4 or 8
 unsigned short munHeight; // Size of image in pixels
 unsigned short munWidth; // ""
 BITMAPINFO * mpsDibHeader; // DIB Header and palette
 unsigned char * mpPictureData; // Ptr to the DIB pixel data
 // Private member functions
 XeStatus loadImage(unsigned char * rawImage);
 XeStatus ParseGIFHeader( unsigned char * pGIF,
 unsigned char * & newPtr);
};
#endif //__GIFPICT_HPP











































Deploying DCE as an Infrastructure


One organization's experience with implementing DCE




Jack Danahy


Jack is with Hewlett-Packard's Chelmsford System Software Laboratory and can
be contacted at jake@ch.hp.com.


Distributed computing tools--the enablers of a new generation of platforms and
machines--provide transparent remote access to resources and services once
restricted to local connections. System security, account management, clock
synchronization, file storage, and others--once administered and delivered on
a per-machine basis--are now available through platform-independent,
distributed infrastructures such as Distributed Computing Environment (DCE)
technology from the Open Software Foundation.
This relocation of essential resources and services, however, will almost
inevitably create anxiety. Any deployment plan for a distributed
infrastructure needs to consciously address a user's need for stability
throughout the course of this migration. In this article, I'll describe the
process we took in migrating Hewlett-Packard's Chelmsford Systems Software
Laboratory to a DCE-based infrastructure. This deployment provides a road map
of one successful path through the implementation of this new technology. I
hope that new DCE users will benefit from the decisions and discoveries made
through the exercise undertaken in our lab.


The Pre-DCE Environment


The Chelmsford Software Systems Laboratory is a group of more than 100
individuals engaged in the development of distributed-computing software and
strategies, including DCE. The majority of this group is comprised of
engineers, along with managers, marketing personnel, and a large documentation
group. This mix represented a variety of skill sets, need and usage patterns,
levels of technical expertise, and willingness to migrate to new technology.
Before adopting DCE technology as our infrastructure, the lab employed
distributed solutions already present: The Network File System (NFS) and the
Andrew File System (AFS) provided distributed file services, while Kerberos
authentication secured mail and mailing lists. Naming services were provided
through the Network Information Service (NIS). While our patchwork solution
provided the basic services required for a distributed infrastructure, the
associated costs were high. Everyone was forced to authenticate several times,
once into each of the various security domains. Administrative tasks were also
undertaken separately in each of the named environments, increasing overhead.
Although most of us viewed the introduction of a single distributed framework
as solving specific problems in our infrastructure (not as a wholesale change
in operations), there were some staff members who were not yet taking
advantage of the distributed services already in place. Consequently, we had
to emphasize the supposed benefits of DCE: lower administrative costs,
centralized backup and recovery, and machine-independent view of the
namespace, all of which contribute to a more consistent user environment,
regardless of the current host.


The Approach to Deployment


While we all felt that thorough testing of a DCE project manager in a
corporate environment was a significant, worthwhile goal, none of us would
willingly accept reduced productivity, reliability, or stability in our
everyday work environment during the transition. To compound this, the DCE
software to be implemented in our lab was still in its developmental stages
and not yet completely tested. Consequently, we decided to pursue a phased
approach to implementation. Other sites will surely vary in their adoption
model, and while the phases I describe here were appropriate for our
implementation of a DCE infrastructure, another site (or technology) may find
this approach too aggressive or too methodical. These phases are only an
example of one ultimately successful migration.


Phase 1: Plan and Establish the Cell


In DCE terms, a "cell" is a secure administrative domain, containing
interrelated services to provide a distributed environment to users. Services
within a cell include: 
Security, maintained on one or more security servers, which contains account
information and provides authorization and secure communication between
processes.
Naming, maintained on one or more cell-directory servers, which contains the
DCE namespace.
Clock synchronization, maintained on one or more distributed time servers,
which ensures that the daemons running on separate nodes in the cell possess a
consistent view of times for authentication and access.
File service, maintained on one or more distributed file servers, which
services requests for data from the volumes in the namespace.
These services may reside on a single machine, or they may be spread over
multiple machines. The distribution of services and servers form the
configuration of the DCE cell.
We considered stability and performance most important in our cell
configuration. As mentioned earlier, aggressive schedules and a production
environment meant that we needed to achieve a high level of productivity in
this deployment. To eliminate down time, we employed replicas of the security
and naming services. When the master of either service fails, the replicas
provide access to the information, and can be promoted to master in the event
of catastrophic loss. The same replicas provide a performance benefit to the
community. For most operations, clients bind against the first server they
find, be it master or replica, decreasing the load on master servers and
increasing responsiveness.
Adequate planning in the creation of the DCE cell facilitates the addition of
more users as the cell and the community grow. Planning should address
resource allocation and administrative overhead, as a cell is not of fixed
size or capability. Establishing site-specific metrics for acceptable
performance is recommended as a means of triggering increased server capacity
or distribution. Performance evaluation derived from mean time of request for
authentication or access into the namespace provides insight into the type of
service that requires additional resources.
Our deployment was ready to exit this first phase once the cell had been
designed and created, and the responsibilities of daily administration had
been assigned.
We made the following decisions in our deployment:
Scalability/performance. The number of cell users (about 100) is well within
the documented level of usage for a single server cell. Therefore, we didn't
expect the traffic to necessitate the addition of more servers to handle
requests at a level of performance consistent with the distributed services
already in place.
Down time. Any instances of down time would have doomed the deployment, and
possibly the timely release of the product as well. In order to mitigate the
fear and likelihood of down time, replica security and naming servers were
added.
File storage. During the first phase of the deployment, little data was
contained within the file storage in the cell, so only one distributed file
server was configured. A second machine resource, however, was reserved as a
means of quickly expanding as the need arose.


Phase 2: Create Common Resource Areas


In Phase 2 of the deployment of DCE, the objective was to create resource
areas within the cell. These areas contain information that users want or need
to access, and their existence within the cell encourages users to migrate for
their own benefit. During this phase, users install DCE software on their
systems and exercise the basic functionality of the distributed file services
(DFS) and DCE core services.
The level of distributed-computing knowledge among the user community
determines the ease with which a distributed infrastructure will be introduced
in Phase 2. Phase 1 requires very little acceptance on the part of the user
community, but Phase 2 is keyed upon some active user participation. In cases
where DCE is implemented as a solution to the complete lack of a distributed
environment, good choices for migration are tools, games, and commonly read
datafiles.
In our lab, data formerly distributed by other means (such as NFS and AFS) was
relocated to DCE/DFS-based areas. Through the course of Phase 2, the gradual
relocation of commonly used resources to the new areas provided users with a
high comfort level, as they saw the new DCE technology providing support in
areas where they were familiar with sharing resources. Maintaining copies of
this data at the old locations enabled users to move to the new technology
with confidence, as they always had the ability to retreat to the earlier
solution in the event of catastrophic failure.
The resources chosen were those frequently accessed by the community, but
seldom modified. Read-only resources do not require user authentication into
the security domain of the DCE cell.

We were ready to exit Phase 2 once 75 percent of the lab members had installed
the new DCE software and were accessing the common data through the DCE cell.
This percentage represents an approximation of a median-usage case in the
community. Any higher, and the threshold might never have been reached--any
lower, the performance analysis and resource allocation might have proven
inadequate as a model of a full adoption. This percentage provided an adequate
measure of the new technology, but it did not impel the community to move
wholesale.
Measuring this adoption rate, however, was not so clearly defined. A strict
implementation requires that the users remove all but the DCE-based references
to these common data areas. Percentages were arrived at by checking the DCE
namespace for the hosts incorporated as clients in the cell against the total
host base in the lab.
Because we focused on providing preexistent resources in a new location, the
new technology was rapidly accepted. Users took the time to install the DCE
software and enter the cell because these tasks had been tailored into scripts
to make the operation straightforward and consistent. The scripts loaded the
DCE software and configured the host namespace to utilize DCE and DFS paths in
the place of the previous non-DCE/DFS locations. The resources maintained
under the previous technologies were linked into lesser-known paths for quick
relocation in the event of a need to fall back.
Administratively, this phase provided a smooth ramp for the corporate internal
technical-support team, as they viewed the move as a switch in tools, not as a
sweeping change in technology and architecture.


Phase 3: Utilize Security and Read/Write Resources


In Phase 3, the objectives were to populate the DCE registry with user-account
data and to create read/write resource areas within the cell. The emphasis was
on the establishment of security for the cell, thus the creation of more
restrictive permissions for volumes that exist under the DFS. This phase
performs several functions:
Read/write areas exercise and demonstrate the capabilities of the DFS.
Registry information is used to control access to the read/write areas.
Logging into the DCE acquaints users with the security component of the DCE
and with the concepts of "principals" and "access-control" lists.
Having met the exit criteria for Phase 2, in Phase 3 our users had only to
change their login habits. During Phase 3, users logged in daily to access the
data in the cell. In DCE, administrators are allowed to set the time that a
DCE login authentication is valid, and this period is generally less than 24
hours. Setting the expiration at 24 hours acquainted users with a security
scheme that requires daily interaction. This login habit was encouraged by
placing adequate restrictions on read/write information in the cell. We were
forced to authenticate to avoid the annoyance of a failed access. Logging into
an additional secure domain was not resisted, as we were already logging into
UNIX5, AFS, and Kerberos daily.
The new read/write resources included those that had been read/write under a
different distributed file system, and new resources that provided benefit to
the user community. Status directories for project groups, commentary files on
the progress of the deployment, and sign-up lists of various types were all
created as common read-write areas.
To ensure actual interaction with the data under the DCE, each project group
created its own status area within the DCE cell, to which it accorded
permissions as it saw fit. Lab-wide data and administrative tips were
collected and stored under more openly accessible areas. In this manner, there
was a high level of traffic for both the read/write volumes in the common
areas and the security and authentication services, as individual groups
required their own levels of security.
The security replica created during Phase 1 provided uninterrupted security
service for logging in, even when the primary server was being updated with
new software. This made users comfortable with restricting the permissions on
resources: They felt confident that they would not be locked out through an
inability to login to the security server.
Administrative personnel had to populate the registry with the user base and
assign users to the correct project groups. This was simple, but time
consuming. However, once the registry was populated, the control of individual
files and directories was left to the groups that had created them, lessening
the need for administrative overhead.
In Phase 3, the participation of the project-management teams led to a rapid
acceptance of the technology. Specific project information, stored in the
DCE/DFS areas, forced team members to both login and access these resources,
increasing their familiarity and confidence in the product.
We were ready to exit Phase 3 when 75 percent of the lab was logging into the
DCE cell daily. This percentage was measured by checking the requests made for
login to the security server, and balancing this against the total lab
population.


Phase 4: Completing the Transition


Phase 4 was the final transition of user services from existing distributed
services (NIS, AFS, NFS) to DCE-based services for the same functions (CDS,
Security, DFS). Once accomplished, DCE can be realistically considered as a
solution for distributed-computing deployment and as a reliable infrastructure
for application development. 
As Phases 1, 2, and 3 were accomplished, cell administration became a
background task. The natural progression of the phases created sufficient ramp
time for both users and administrators to understand the DCE technology.
Allowing the first three phases to reach their exit criteria simplified the
final phase.
Phase 4 was divided into three separate activities, the first of which was the
integration of login utilities. This integration let us simultaneously login
to AFS, DCE, UNIX, and Kerberos security from the normal system login. The new
utilities were loaded and enabled on the client side, and the necessary
security domains were individually activated. By replacing multiple login
commands with a transparent acquisition of credentials, users automatically
acquired all the credentials they needed at login.
These credentials were obtained through an extension of the existing login
scheme. Code from the existing system commands was altered to change the path
through which credentials were generated.
In the initialization phase, the process reads an authentication configuration
file. An authorization policy is derived, based on the secure domains to be
contacted. Calls through the authorization library create an internal data
structure, containing field limits for user information, time-out limits based
on the type of services, and a message list (typically for errors).
Once initialized, the process continues with the local-machine entitlement
utilities, populating the data structure with the username and password as
entered at entitlement. This information is then passed through the
primary-registry login interface, as described in the authorization policy. In
our case, this was the DCE registry. Once this authentication is obtained, all
other registry authentication is performed sequentially, based on the
authorization policy. Once all registries have been contacted, the following
are returned: a group list, the password data structure, the environment for
setup, and another message list.
The last phase is to set the group access list for the process and assign the
process uid and gid to be the user's uid and gid; see Figure 1. At this point,
the user has obtained the credential information necessary to proceed as an
authenticated user in any of the configured domains.
The second activity in Phase 4 was the movement of home directories into the
DCE cell. This is the directory that the user sees most frequently, and where
most user modification takes place. For this reason, the movement of home
directories to a central location can be subject to more resistance than the
other phases of the deployment, but the benefits are many.
The first benefit was a consistent user environment, regardless of the
physical machine. User permissions, files, and initialization processes
remained consistent because our home directories were accessible from any
node. This eliminated the down time caused by individual machine failure.
Centrally located home directories enabled centralized backup. This decreased
the network traffic during backup periods and the support load for
administrators performing the backups. From an architectural viewpoint,
centralized home directories focused the users on the client/server model and
decreased personal administrative costs.
The move of home directories to DFS was a very basic change for us. With the
loss of a home directory being so catastrophic, fallback solutions were key in
encouraging our lab to migrate to an unreleased DCE product. We created new
paths to the old data in home directories, and this provided us with an option
to return to a known state and to get work done in the event that the DCE cell
or DFS directories experienced prolonged downtime. To provide recovery of any
directories that could be lost, a detailed backup plan was created by the
technical-support group to ensure timely, restorable backup states for all
DFS-based information.
This movement was the greatest leap of faith for some users, but it provided
increased functionality and ease of use. We were now totally immersed in the
DCE functionality, and only accessed multiple distributed environments to
interact with other groups and other labs.


Monitoring DCE Performance in Cell


The performance of the DCE cell was monitored throughout the deployment, as it
was one of our highest priorities. It was used as input to the administrative
team, and enhanced the topology of the cell as it developed. Most of this
monitoring took the form of constantly evaluating the satisfaction of our user
community, as they were in the best position to determine an acceptable level
of performance.
Both the performance and the reliability of the DCE cell proved much higher
than we had originally expected, and users found themselves virtually unaware
that the home directories had moved from AFS to DFS. Users of NFS were
pleasantly surprised with the performance improvements over traditional
NFS-mounted volumes and with the security of the DCE/DFS home directories. The
ongoing performance monitoring did lead the administrative group to relocate
highly used volumes onto separate servers, redistributing the load on the
servers themselves, but these moves were obvious, and the improvements to
performance, immediate.


Conclusions


The deployment of DCE in the Chelmsford Lab provided concrete benefits to the
community in the areas of reduced administrative overhead, single login, and
distributed file services. The model described here proved successful, and it
can be applied to the introduction of any new technology into an existing
environment. Rational steps with immediate and well- known fallbacks, in case
of emergency, comfort an organization in change and lower the resistance to
new technology.
Members of the DCE customer community can apply these results to their own
needs, as appropriate. Solid reliability, minimal down time, reduced overhead,
increased services, and a proven transition plan for the DCE services provided
us with a usable framework for distributed development in a production
environment.
Figure 1 Implementing integrated login.












PROGRAMMING PARADIGMS


The Invention of the Compiler




Michael Swaine


Programming is old enough now to have a history. Several of them, in fact.
There are legends and folktales of programming's past, squabbles over priority
of invention, alternate versions of events now known only through written
records.
And there is a history to programming paradigms. Now and then, it seems
appropriate that this column devote its space to some of those paradigms past,
or paradigms lost, that can only be found in the early days of programming.


Paradigm Shifts


Sometimes a revolutionary new paradigm will be opposed by the graybeards of
the field because it represents a threat to their assumptions, or their
values, or their jobs. But sometimes the graybeards enthusiastically embrace
the new paradigm--and get it all wrong.
One of the most significant paradigm shifts in programming took place as the
result of an invention that we take for granted today: the compiler. Strange,
then, that the compiler's inventor remains virtually unknown.
But maybe not so strange, at that. The compiler, which today we would call an
unqualified success, wasn't always seen as even a particularly good idea. In
fact, at a certain point in the history of programming, it took real courage
to advance the idea of compiling programs.
At just this point in programming history, one rash programmer had the
temerity to write and promote the first real compiler, and to do so pretty
much in the face of one of the legendary figures in computing. This is that
story.


Before Compilers


In his essay "Programming in America in the 1950s: Some Personal Impressions,"
John Backus paints a picture of the world BC--that is, before compilers. His
perspective is probably uniquely relevant: Not only was Backus actively
programming back then, but he led the team that defined the first high-level
language, still in use today, Fortran. He understands clearly what the
transition from BC to the post-compiler era has meant.
"Programming in the early 1950s was a black art," Backus says, "a private
arcane matter involving only a programmer, a problem, a computer, and perhaps
a small library of subroutines and a primitive assembly program."
Most of 1950s programmers' time was spent working around the absurd
difficulties that the machines forced on them. They had to fit their programs
into a tiny data store, overcome bizarre difficulties in getting information
in and out, and work with a severely limited and often downright peculiar set
of instructions. There were no general algorithms for anything, no system
documentation, no system in fact; just the big iron.
That, Backus explains, is what made it so much fun.
Backus and Cuthbert Hurd, who managed computer operations at IBM in the early
1950s, give similar descriptions of the Selective Sequence Electronic
Calculator, or SSEC, in service from 1948 to 1952. The SSEC typified the
hassles that the 1950 programmer faced. It had a 150-word memory, and programs
were read in on any of 66 tape readers, with the tape glued into a closed
loop. Backus recalls a mysterious cyclic error that defied diagnosis until
someone noticed that one of the intermediate tapes had been glued to form a
Mbius strip, so that on every other reading of the tape the entire tape was
read backward.
Programming under these rugged conditions made the early programmers feel like
bold explorers of a new land. "Programming had a vital frontier enthusiasm,"
Backus reports, "virtually untainted by either the scholarship or the
stuffiness of academia. Recognition in the small programming fraternity was
more likely to be accorded for a colorful personality_or the ability to hold a
lot of liquor well than it was for an intellectual insight."
It could hardly escape the notice of these programmers that what they were
doing was beyond the abilities of the average person, and some programmers
began to think of themselves as a breed apart. Programmers of the 1950s,
Backus says, "began to regard themselves as members of a priesthood, guarding
skills and mysteries far too complex for ordinary mortals." Many programmers,
at least as late as 1954, were hostile and derisive toward any plan to make
programming accessible to a larger population.
There were not only priests but also high priests in this new computer
culture. One of those high priests was Howard Aiken, who designed the first
large-scale computer, the Automatic Sequence Control Calculator, or MARK I,
which was the predecessor to the SSEC and was actually built by Frank Hamilton
at IBM and presented to Harvard University in 1944. By 1946, Aiken and Grace
Hopper (known for, among many things, the invention of Cobol) were
collaborating on programs in the MARK I's complicated machine language.
"The digital computer was identified in Cambridge [Massachusetts] with Aiken's
MARK I computer," pioneering programmer Garrett Birkhoff recalls; and by 1950,
Howard Aiken was looked upon with the awe appropriate to a founder. In
addition to designing the MARK I, Aiken "had a very great gift for
anticipating computer applications." Then too, just Aiken's "colorful
personality" inspired a degree of awe; Birkhoff tells stories of table
pounding and says that the Harvard faculty committee supposedly empowered to
decide how the expensive MARK I should be used "never had any control whatever
on anything that [Aiken] did."


Compilers and "Compilers"


There were things called compilers in the early 1950s, but they weren't what
we today would call compilers: programs that accept programs written in a
human-readable high-level language and generate machine code.
Backus points out that it is hard to read "old" papers on programming--papers
written in the early 1950s--because familiar words sometimes had radically
different meanings. Legendary programmer Donald Knuth and Luis Pardo make the
same point, citing the example that what we now call "statements" were
variously referred to in the early 1950s as "formulas," "equations," and
"operations."
Backus makes the point specifically with respect to the word "compiler,"
citing three articles from the early 1950s that used the word compiler in
their titles, each with a different meaning, none of which is the modern
meaning. Those who did describe something like our modern concept of a
compiler used terms like "automatic coding." Backus himself, in his 1954
report that defined the Fortran language, never once used the word compiler.
Maurice Wilkes, who published a paper on one approach to compilation (real-ly
macro expansion) in 1952, recalls "I do not think that the term compiler was
[by 1954] in general use, although it had in fact been introduced by Grace
Hopper." (Actually, Hopper's "compilation" was also what we would today call
"macro expansion.") It wasn't until 1956 that the terms "compiler" and
"statement" became established.
Not only was the word compiler in common (albeit ill-defined) use, but the
concept of a program that could translate human-readable programs into machine
code was also completely familiar in the early 1950s. In fact, it dates back
to the prehistory of computers, to Charles Babbage, designer of the (never
built but valid) Analytical Engine, who wrote about the idea in 1836.
Knuth and Pardo point out that Konrad Zuse, the German computer pioneer whose
work was obscured by World War II and then languished in the poverty of
post-war Germany, designed a remarkably ambitious language, Plankalkl, in
1945, based on the propositional and predicate calculi. Zuse had every
intention of writing what we would today call an interpreter or a compiler for
Plankalkl; but, like Babbage, Zuse lacked the resources to make his plan a
reality.


False Starts


Between the invention of the digital computer in the 1940s and the early
1950s, there were a number of attempts at doing at least part of what a
compiler does.
In the late 1940s at the Moore School of Electrical Engineering at
Pennsylvania, where the ENIAC and EDVAC computers were invented, Herman
Goldstine, drawing on suggestions of John von Neumann, Adele Goldstine, and
Arthur Banks, mapped out a high-level approach to representing programs. It
was highly visual and led not to a high-level language and a compiler, but to
the concept of flowcharts.
Haskell Curry, at the Naval Ordnance Laboratory in Silver Spring, Maryland,
did some theoretical work on algorithms for converting general algebraic
expressions into machine code for a computer at about this same time.
Meanwhile, in Switzerland, Heinz Rutishauser described compilers for
hypothetical computers, and Corrado Bhm even defined a compiler in its own
language. But these were not compilers for implemented languages on real
machines.
The first high-level language, using the term loosely, actually to be
implemented was the Short Code, or short-order code, originally suggested in
the late 1940s by John Mauchly, the co-inventor of the ENIAC computer, and
implemented on various machines in the late '40s and early '50s. But the Short
Code was an interpreted language, and very simple.

Of these attempts, none was both successful and ambitious, and none was
particularly popular with the programming priesthood.
There was a great deal of resistance to the invention of the compiler, and it
wasn't all a matter of the priesthood trying to keep programming obscure,
although that was a factor.
Much of the resistance had to do with perfectly legitimate efficiency
concerns: specifically, the fact that attempts to produce automatic
machine-code generation from high-level languages were so pitifully
inefficient. The generated code ran slower and took more memory, which was the
greatest sin imaginable in those days, when you couldn't just plug in some
cheap RAM. Needing more memory basically meant that you needed a different
computer.
Approaches to the problem of automatic programming, as compiler writing was
sometimes called, were two:
Most took the approach of starting from what was easy, or at least possible,
to implement. The limitations of the hardware and the expense of computer time
were so great that these efforts just didn't go very far.
Zuse, on the other hand, had designed an impressive, powerful, general-purpose
modern language. In doing so he placed himself so far ahead of the hardware
that his plans were impractical. (Some of Zuse's innovations were later
implemented in the specification for the language Algol.)
It took years for the hardware to reach the point where the two approaches
could converge; where a meaningfully powerful high-level language could be
implemented via a true compiler that could generate machine code fast enough,
and code that ran fast enough, to be useful.


The First Real Compiler


The first real compiler, in Don Knuth's judgement, was invented at Fort
Halstead, the Royal Armaments Research Establishment, in 1952. It was a real
compiler in the sense that: 
It took algebraic statements and turned them into machine code for an existing
machine (Aiken's MARK I, whose machine code was particularly hairy).
It was actually implemented.
It was actually used.
The compiler was called AUTOCODE, the first of many programs during the 1950s
to bear that name, but who actually gets the credit for inventing AUTOCODE is
a little murky. Knuth's detective work leads him to the conclusion that the
inventor was one Alick E. Glennie, a programmer at Fort Halstead.
When Glennie wrote AUTOCODE, there was already a tool for generating machine
code for machines like the MARK I, a tool created by none other than the
designer of the MARK I, the legendary and, um, colorful Howard Aiken. Aiken's
approach differed from Glennie's in at least one significant detail: Aiken's
approach was a piece of hardware.
AUTOCODE was being used on the MARK I at Manchester, England in September,
1952. Five months later, Glennie gave a lecture on compilation at Cambridge
University, in which he explained his reasons for writing the compiler.
"The difficulty of programming has become the main difficulty in the use of
machines," Glennie said. "Aiken has expressed the opinion that the solution of
this difficulty may be sought by building a coding machine, and indeed he has
constructed one."
And then Glennie made his heresy apparent: "There is no need to build a
special machine for coding, since the computer itself, being general purpose,
should be used."
Not "could be used," but "should be used." Take that, Howard Aiken.
The story could end here, with the brave programmer challenging the
establishment and bringing in the new age of compiled programs. But history
isn't that simple.
AUTOCODE may have been the first true compiler, but it was still very close to
machine language. This allowed it to generate code that was, by Glennie's
estimate, 90 percent as efficient as human-written machine code; but even if
that was accurate, it wasn't enough for the programmers of the time, who
needed to squeeze every cycle out of the machines. AUTOCODE was used, but not
for the most important work, and it had little direct influence on programming
at Manchester or elsewhere. It would take another four years and the invention
of a more ambitious and efficient compiler to convince the programming
priesthood that compilers were the wave of the future.


Primitives versus Space Cadets


Between January and November, 1954, John Backus and a small group of
programmers worked on and produced a specification for a new language. That
was Fortran. Their effort "was met with the usual indifference and skepticism
of the priesthood," Backus says. This despite the fact that the first Fortran
compiler embodied a level of optimization of object code not to be seen in
compilers again until the late 1960s, including identifying unnecessary code,
moving calculations outside loops, and register-usage optimization based on
expected frequency of execution of various parts of the program.
By this time the legitimate efficiency concerns had become less relevant. It
was clear to Backus that the old paradigm had to go and that compilers were
necessary for the advancement of the state of the art in programming. But it
wasn't yet clear to everyone. "There was little awareness even as late as 1955
and 1956," Backus says, "that programming methods of that era were the most
time-consuming and costly roadblock to the growth of computing."
And at least as late as 1954 the priesthood was still trying to keep
programming obscure, or at least was unwilling to admit that it could be made
less obscure. But things were changing. Programmer John Carr delivered a
lecture that year in which he categorized programmers into the primitives, who
believed that all instructions should be written in octal, and the space
cadets, "who saw themselves as pioneers of a new age," Wilkes recalls. Wilkes
immediately enrolled himself as a space cadet.
Not everyone did. Wilkes tells a poignant story about computing legend Alan
Turing. Watching Turing lecture at a blackboard one day, Wilkes became
terribly confused. All Turing was doing was multiplying two numbers together
to illustrate a point about checking a program, but Wilkes couldn't follow his
math at all. The rest of the audience seemed equally puzzled.
Finally Wilkes realized that Turing had written the decimal numbers backward,
with the least significant digit on the left, and hadn't bothered to mention
the fact to his audience. He wasn't trying to be cute, Wilkes, insists, "It
was simply that he could not appreciate that a trivial matter of that kind
could affect anybody's understanding one way or the other." It was a perfect
example of the primitive mindset.
Turing, Wilkes sadly concludes, would have been on the side of the primitives.
Ironic, considering the paradigm shift that Turing himself was responsible
for. In fact, we should consider that paradigm shift, because it is a
fascinating story.
Sounds like a good topic for a future "Programming Paradigms" column.


Sources


A History of Computing in the Twentieth Century. N. Metropolis, J. Howlett,
and Gian-Carlo Rota, eds. San Diego, CA: Academic Press, 1980.
The First Twenty Years: 1966--1985. Reading, MA: Addison-Wesley, 1987.
Cortada, James M. An Annotated Bibliography on the History of Data Processing.
Westport, CT: Greenwood Press, 1983.
Heims, Steve J. Mathematics to the Technologies of Life and Death. Cambridge,
MA: MIT Press, 1980.
Hodges, Andrew. Alan Turing: The Enigma. New York, NY:Simon and Schuster,
1983.
Toole, Betty. Ada, The Enchantress of Numbers. Mill Valley, CA: Strawberry
Press, 1992.
















C PROGRAMMING


Building the Text Engine Database




Al Stevens


I just returned from London where Dr. Dobb's Journal hosted a one-day seminar
for the British. Several editors and our venerated publisher, Peter
Hutchinson, brought the good news from the colonies. We discussed the
information superhighway, visual programming, interoperable component objects,
and the state of C++, and gave a Newton demo. During Q&A at the end of my
talk, an attendee asked if I thought that Borland would soon abandon OWL and
embrace MFC. Figuring that the British audience would be unfamiliar with our
sitcoms, I decided to steal a joke from "The Nannie." I told him that he had
as much chance of seeing that as Tonya Harding had of being on a Wheaties box.
No one laughed. Blank stares. Fortunately, my bomb was forgotten when, in a
moment of inspiration, I announced that we would break for lunch. Later, I
asked someone if they had ever heard of Tonya Harding. Of course they had, but
they didn't know what a Wheaties box was.


Text-Engine Database


This month we continue the static text-search-engine project by building the
database. The data analysis is complete. We've built the common word list and
determined the hierarchical organization of the database. Next, we must
convert data and build word indexes.
To refresh your memory: The project implements a fast search engine for the
text of the King James version of the Bible. A Visual Basic front end accesses
the text engine through a C-language DLL. The C engine can be compiled as a
stand-alone DOS program with a stubbed user interface to test the engine. The
text-conversion and database-build programs discussed this month are all
written in C.


The Raw-Text Data


The raw-text data for this project comes from a public-domain copy of the King
James version of the Bible distributed on diskette in 1987 by Thomas Cox of
Easley, South Carolina. The text is organized into 98 ASCII text files, with
each book of the Bible represented in one or more files. Cox split large books
into multiple files so that users could read them with the word processors and
text editors of the day. Each file's name identifies the book that it
contains. For example:
...
EZRA.DOC
NEHEMIAH.DOC
ESTHER.DOC
JOB.001
JOB.002
PSALMS.001
PSALMS.002
PSALMS.003
...
Files that contain a complete book have .DOC file extensions. The others use
.001, .002, and so on. Figure 1, from the book of Esther, shows how Cox
organized the text. His choice of formats was fortunate. It lends itself well
to parsing the individual documents (verses) of the database.


Book Names


Due to the highly structured organization of the data, the database does not
need the book/chapter/verse tokens that Mr. Cox uses. The book names can be
maintained in program tables. There are 66 books and two tables. Figure 2
shows a portion of the Visual Basic and C tables that identify the book names.
There are two tables because the engine is implemented two ways--first as a
DOS stand-alone C program that tests the engine; then as a DLL that supports
the Visual Basic front end. I'll discuss this approach in more detail next
month.


Converting the Raw Text


Each book is organized into chapters numbered serially, beginning with chapter
one. Each chapter is organized into verses, also organized serially beginning
with the number one. The program can internally address documents by
book/chapter/verse converted into document numbers. Therefore, the first step
in building the database text consists of converting Cox's text into text
files with verse text only, as Figure 3 shows.
Each verse is separated by a newline character. Each chapter is separated by
an extra newline character. Each book is in a single ASCII text file named as
shown here:
GENESIS.BBK
EXODUS.BBK
LEVITICU.BBK
...
The program that converts the raw text into the .BBK files is called BLD.C. (I
am not publishing all the conversion source code in the magazine. The BLD.C
program, for example, is unique to this particular database and would not
apply to a text project of your own design. You can, however, get the program
from a download or Careware source. See the end of this discussion for
details.) The program reads a file named BOOKS.LST. I manually prepared the
file with a text editor. It lists the files in Mr. Cox's database in the order
that they appear in the Bible. BLD.C uses BOOKS.LST to identify the files in
the raw-text database. A shareware version of the program included only the
New Testament, and BOOKS.LST was modified to list only those books. Besides
writing the converted files, BLD.C writes a test file named BOOKS.DOC, which
lists the converted .BBK files. Later programs in the conversion task use this
file to identify the .BBK files in their proper order. The BOOKS.DOC file and
the .BBK text files are used to build both the index and the data files.



Building the Data Files


The data part of the database is built as two disk files, BIBLE.SQZ and
BIBLE.BCV. The text part of the database is compressed and concatenated into
BIBLE.SQZ, which contains a Huffman decompression tree followed by the
compressed text. Each verse is a newline-terminated string. Genesis 1:1 is the
first string; Revelation 22:21, the last. There are 31,102 strings in the
database. Therefore, there are 31,102 documents.
The BIBLE.BCV file is a table of offsets into the text with one table entry
per document. A document number from 1 to 31,102 provides an offset into the
table. Each table entry consists of a byte and bit offset into the compressed
text. (I adopted the same Huffman compression logic that I used in the D-Flat
help database a while back.) Therefore, given a document number, the text
engine can retrieve the text of a single verse by seeking to the byte offset
in the text data and decompressing characters starting at the bit offset
within that byte. The verse is retrieved when the engine has decompressed the
terminating newline character.
The program called HUFFC.C reads BOOKS.DOC to see what .BBK text files to use
in the database. The program builds two files, BIBLE.NDX, which is a temporary
file with book, chapter, and verse and the offset table data, and BIBLE.SQZ,
which contains the Huffman tree and compressed data. A program called
BLDINDEX.C reads BIBLE.NDX and builds BIBLE.BCV, which is the database's byte/
bit offset table.


Building the BCVtable Source Code


The text engine, which we will discuss next month, addresses documents by
document numbers 1 to 31,102. The user interface, however, uses book, chapter,
and verse to identify documents. The engine must be able to convert from book,
chapter, and verse to document number and back. Why back? When the user
navigates forward or backward through the documents, the engine must report
when the navigation changes to the next or previous book or chapter with
proper wrapping of first or last chapter and verse numbers. To make these
conversions, the engine needs to know how many chapters are in each book and
how many verses are in each chapter. Rather than add a data table to the
database, the engine uses an initialized, unsigned char array, named BCVtable.
The array contains variable-length entries for each book. Each entry begins
with an entry that records the number of chapters in that book followed by an
entry for each chapter with the number of verses in the chapter. The engine
uses this table to compute the document number from the book and chapter, and
to compute the book, chapter, and verse from a document number.
A program named BLDTABLE.C reads the BIBLE.NDX file created by BLDINDEX.C and
writes the BCVtable definition to stdout. One of the engine's source-code
files includes a file named TABLE.C, and the conversion task builds that file
by redirecting BLDTABLE's output to TABLE.C.


Extracting Words


The text engine uses a word index into the database. Given a word, the engine
will deliver all the document numbers that contain that word. (If the word is
from the common word list, determined from last month's analysis, the engine
delivers all 31,102 documents in the database.) The index contributes to the
engine's phrase and Boolean queries by searching each word in the query and
logically combining the results of all those searches into a final document
list.
Listing One, WORDS.C extracts all the words from the .BBK files into a file
named WORDS.LST. Each record in WORDS.LST contains a document number and a
word. WORDS.LST is, therefore, a really big file with one record for every
word in the database, excepting words in the common list. The next step sorts
that file into word, document number, and sequence. The programs SORTW.C and
MERGEW.C do the sort.


Sorting Words


Always eager to recycle software, I used the sort utility program that I
published in this column several years ago. That program was a general-purpose
utility that sorted fixed-length records. Input parameters specified the
record length and sort-field positions and lengths. The records in WORDS.LST
are of variable length. The first field is the integer document number, and
the second is a null-terminated string with the word. I had to modify the sort
program to handle this format.
Sorting large files consists of two major passes. The first pass reads records
into large memory buffers. When a buffer is full, the pass sorts the records,
adds a terminal record, and writes the records to a work file. There are two
work files. Each time the program writes a buffer, the program alternates work
files. Each buffer is called a "sequence." SORTW.C is the first sort pass. It
builds the two files and names them WORDS.WK0 and WORDS.WK1. Listing Two is
SORTW.C. It is a throwback to the old, small-memory models of 16-bit compilers
and has a relatively small sequence buffer. You could speed up the process and
perhaps eliminate the second major pass by using a 32-bit compile and all the
available memory in a contemporary machine.
The second major pass consists of some number of secondary merge passes that
merge pairs of sequences from the two files into single sequences on a second
pair of files. Each merge pass halves the number of sequences. When only one
sequence remains, the file sort is completed and the remaining work file is
the output. Listing Three, MERGEW.C is the merge program. It merges WORDS.WK0
and WORDS.WK1 into WORDS.WK3 and WORDS.WK4. Then it merges WORDS.WK3 and
WORDS.WK4 into WORDS.WK0 and WORDS.WK2, repeating this operation until either
only WORDS.WK0 or WORDS.WK2 has records. Then it renames that file WORDS.SRT,
which is the input to the next step in building the index.


Building the Word Index


A program named BLDLIST.C (Listing Four) reads WORDS.SRT and builds three
files, BIBLE.BNT, BIBLE.IDX, and BIBLE.DLS. These three files constitute the
word index into the database. Given a word, they return a list of document
numbers where that word appears.
BIBLE.BNT is an ordered array of offsets into a table of words. BIBLE.IDX is
that table of words. The offset array is ordered so that the first offset
points to the first word in the order of the index, which happens to be word
order. "Aaron" is first, for example. The search engine uses BIBLE.BNT to
perform a binary search of the words in BIBLE.IDX. The offsets in the array
are of fixed length, facilitating the fast binary search. Each entry in
BIBLE.IDX contains the null-terminated text of the word and an offset to a
document list in BIBLE.DLS. When the search engine finds a matching word in
BIBLE.IDX, it uses the offset to point to the document list in BIBLE.DLS. Each
document list is a variable-length array of integers. The first integer is the
number of documents in the list. The others are document numbers. The engine
can use these document numbers to retrieve the text of the matching documents
in the database.
The index architecture that I just described is fast and space efficient. It
supports fast retrievals but would not readily support efficient word-document
inserts and deletes. It is a good architecture for large, static databases. 


Combining the Files into a Database


At this point, the database-build task has built five database files:
BIBLE.BNT, BIBLE.IDX, BIBLE.DLS, BIBLE.BCV, and BIBLE.SQZ. It has also built
TABLE.C to be used in the source-code build of the text engine. The five
database files must now be combined into one file. The easiest way to do that
is to use the DOS copy command, like this: copy /B
bible.bnt+bible.idx+bible.dls+bible.bcv+bible.sqz /B bible.dat. This command
concatenates the five files into one file named BIBLE.DAT, and that file is
the database. We are not done with the others, however. We need to know their
sizes. The DOS DIR command provides that information:
BIBLE.BNT 50,012
BIBLE.IDX 151,727
BIBLE.DLS 666,030
BIBLE.BCV 155,510
BIBLE.SQZ 2,276,832
The engine needs to know the size of each of the database components so that
it can seek to them when it reads the one file named BIBLE.DAT. Figure 4 shows
how the engine uses the lengths at compile time to compute the seek offsets.
The programs discussed this month build the text-engine database. Your
text-engine projects will involve similar steps, and you might use versions of
these programs and others adapted to the requirements for your database. Next
month, we'll discuss the text engine itself, concentrating on the C code that
implements the DLL and the DOS stand-alone test version.


Getting the Source Code and Database


The database and Visual Basic and C source code for the text engine are free.
You can download them from the DDJ Forum on CompuServe and on the Internet by
anonymous ftp; see "Availability," page 3.
There are several archived files containing the database (almost 1.5 Mbytes'
worth), the Visual Basic front-end source code, the text-engine DLL source
code, the database-build source code, and the Windows Help database. We
discussed the database-build source code in this column.
If you cannot get to one of the online sources, send two high-density,
3.5-inch diskettes and a stamped, addressed mailer to me at Dr. Dobb's
Journal, 411 Borel Avenue, San Mateo, CA 94402, and I'll send you a copy of
the source code and database. It's free, but if you care to support my
Careware charity, include a dollar for the Brevard County Food Bank. They
support some of the needs of our hungry and homeless citizens.
Figure 1: Raw text.
EST 1:1 Now it came to pass in the days of Ahasuerus, (this is Ahasuerus which
reigned, from India even unto Ethiopia, over an hundred and seven and twenty
provinces:)

EST 1:2 That in those days, when the king Ahasuerus sat on the throne of his
kingdom, which was in Shushan the palace,
EST 1:3 In the third year of his reign, he made a feast unto all his princes
and his servants; the power of Persia and Media, the nobles and princes of the
provinces, being before him:
Figure 2: Book-name tables.
Global BookNames(66) As String
BookNames(0) = "Genesis"
BookNames(1) = "Exodus"
BookNames(2) = "Leviticus"
BookNames(3) = "Numbers"
 ...
static char *BookName[] = {
 "Genesis",
 "Exodus",
 "Leviticus",
 "Numbers",
 ...
};
Figure 3: Converted text.
Now it came to pass in the days of Ahasuerus, (this is Ahasuerus which
reigned, from India even unto Ethiopia, over an hundred and seven and twenty
provinces:)\n
That in those days, when the king Ahasuerus sat on the throne of his kingdom,
which was in Shushan the palace,\n
In the third year of his reign, he made a feast unto all his princes and his
servants; the power of Persia and Media, the nobles and princes of the
provinces, being before him:\n
Figure 4: File sizes and offsets.
#define BNTLENGTH 50012L
#define IDXLENGTH 151727L
#define DLSLENGTH 666030L
#define BCVLENGTH 155510L
#define IDXOFFSET BNTLENGTH
#define DLSOFFSET (IDXOFFSET+IDXLENGTH)
#define BCVOFFSET (DLSOFFSET+DLSLENGTH)
#define DATOFFSET (BCVOFFSET+BCVLENGTH)

Listing One 

#include <stdio.h>
#include <dir.h>
#include <dos.h>
#include <process.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define DOS_VERSION

#include "htree.h"

int BinarySearch(char *word, char **cp, int wdct);

int isCommon(char *word);
unsigned int docno;
static FILE *fo;

static char *wds[500];
static int wdctr;

static int inList(char *word)
{
 int i;
 for (i = 0; i < wdctr; i++)
 if (strcmp(word, wds[i]) == 0)
 return 1;

 return 0;
}
static void AddList(char *word)
{
 wds[wdctr] = malloc(strlen(word)+1);
 strcpy(wds[wdctr++], word);
}
static void ClearList(void)
{
 int i;
 for (i = 0; i < wdctr; i++)
 free(wds[i]);
 wdctr = 0;
}
static void dowords(char *fn)
{

 char verse[600];
 char *cp, *wd;
 char word[30];
 FILE *fp = fopen(fn, "rt");
 if (fp != NULL) {
 while (fgets(verse, 600, fp) != NULL) {
 if (*verse == '\n')
 continue;
 docno++;
 cp = verse;
 while (*cp && *cp != '\n') {
 wd = word;
 // extract a word
 while (!isalpha(*cp)) {
 if (!*cp *cp == '\n')
 break;
 cp++;
 }
 while (isalpha(*cp)) {
 *wd++ = tolower(*cp);
 cp++;
 }
 *wd = '\0';
 if (*word && *(word+1)) {
 if (!isCommon(word)) {
 if (!inList(word)) {
 fwrite(&docno, sizeof docno, 1, fo);
 fwrite(word, strlen(word)+1, 1, fo);
 AddList(word);
 }
 }
 }
 }
 ClearList();
 }
 fclose(fp);
 }
}
void main()
{
 char fname[15];
 FILE *fp;

 fo = fopen("words.lst", "wb");
 if (fo != NULL) {
 if ((fp = fopen("books.doc", "rt")) != NULL) {
 while (fgets(fname, 15, fp) != NULL) {
 printf(fname);
 fname[strlen(fname)-1] = '\0';
 dowords(fname);
 }
 fclose(fp);
 }
 fclose(fo);

 }
}



Listing Two 

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <process.h>

#define MAXWORDS 2048

char workfile[] = "words.wk0";
FILE *fo, *wi, *wo;

static int sequences;

struct sortword {
 unsigned int docno;
 char *word;
} wds[MAXWORDS];

static int buffct;
static int wordct;

static int wdcmp(const void *c1, const void *c2)
{
 int rtn = strcmp(((struct sortword *)c1)->word,
 ((struct sortword *)c2)->word);
 if (rtn == 0)
 rtn = ((struct sortword *)c1)->docno -
 ((struct sortword *)c2)->docno;
 return rtn;
}
void bufferout(void)
{
 int i;
 unsigned int zero = 0;
 char eos[] = "{end of sequence}";
 qsort(wds, wordct, sizeof(struct sortword), wdcmp);
 fo = fopen(workfile, "ab");
 for (i = 0; i < wordct; i++) {
 fwrite(&wds[i].docno, sizeof(unsigned int), 1, fo);
 fwrite(wds[i].word, strlen(wds[i].word)+1, 1, fo);
 free(wds[i].word);

 }
 fwrite(&zero, sizeof(unsigned int), 1, fo);
 fwrite(eos, sizeof eos, 1, fo);
 fclose(fo);
 printf("\r%d sequences ", ++sequences);
 workfile[8] ^= 1;
 wordct = 0;

 buffct++;
}
void insertsort(struct sortword sw)
{
 if (wordct == MAXWORDS)
 bufferout();
 wds[wordct++] = sw;
}
void main()
{
 FILE *fp = fopen("words.lst", "rb");
 struct sortword sw;
 char word[50];
 int i, c;

 unlink(workfile);
 workfile[8] ^= 1;
 unlink(workfile);
 workfile[8] ^= 1;
 if (fp != NULL) {
 while (!feof(fp)) {
 if (fread(&sw.docno,sizeof(unsigned int),1,fp) != 1)
 break;
 i = 0;
 while ((word[i++] = fgetc(fp)) != 0)
 if (feof(fp))
 break;
 if ((sw.word = malloc(strlen(word)+1)) == NULL) {
 fprintf(stderr, "OM!\a");
 exit(1);
 }
 strcpy(sw.word, word);
 insertsort(sw);
 }
 fclose(fp);
 if (wordct)
 bufferout();
 spawnl(P_WAIT, "mergew.exe", "mergew.exe", NULL);
 }
}



Listing Three

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *fn[] = {
 "words.wk0",

 "words.wk1",
 "words.wk2",

 "words.wk3"
};
int sequences = 99;

void writeword(FILE *fo, unsigned *dn, char *wd)
{
 fwrite(dn, sizeof(unsigned int), 1, fo);
 fwrite(wd, strlen(wd)+1, 1, fo);
}
void readword(FILE *fp, unsigned *dn, char *wd)
{
 if (fread(dn, sizeof(unsigned int), 1, fp) == 1) {
 int i = 0;
 while ((wd[i++] = fgetc(fp)) != 0)
 if (feof(fp) i == 100)
 break;
 wd[i] = '\0';
 }
}
void merge(FILE *f0, FILE *f1, FILE *f2, FILE *f3)
{
 char wd0[101] = "", wd1[101] = "";
 unsigned dn0 = 0, dn1 = 0;
 int cr;
 FILE *fo = f2;

 sequences = 0;
 while (!feof(f0) !feof(f1)) {
 cr = strcmp(wd0, wd1);
 if (cr == 0)
 cr = dn0 - dn1;
 if (cr == 0) {
 if (*wd0)
 writeword(fo, &dn0, wd0);
 if (*wd0 == '{') {
 sequences++;
 printf("\r%d sequences ", sequences);
 // flip output files
 if (sequences & 1)
 fo = f3;
 else 
 fo = f2;
 }
 else if (*wd1)
 writeword(fo, &dn1, wd1);
 readword(f0, &dn0, wd0);
 readword(f1, &dn1, wd1);
 }
 else if (cr < 0) {
 writeword(fo, &dn0, wd0);
 readword(f0, &dn0, wd0);
 }

 else {
 writeword(fo, &dn1, wd1);
 readword(f1, &dn1, wd1);

 }
 }
}
void main()
{
 int pass = 0;
 FILE *f0, *f1, *f2, *f3;
 int in0 = 0, in1 = 1, out0 = 2, out1 = 3;
 while (sequences > 1) {
 f0 = fopen(fn[in0], "rb");
 f1 = fopen(fn[in1], "rb");
 f2 = fopen(fn[out0], "wb");
 f3 = fopen(fn[out1], "wb");
 printf("\nPass %d: Merging %s and %s to %s and %s\n",
 ++pass, fn[in0], fn[in1], fn[out0], fn[out1]); 
 merge(f0, f1, f2, f3);
 printf("\r%d sequences ", sequences);
 fclose(f0);
 fclose(f1);
 fclose(f2);
 fclose(f3);
 if (sequences > 1) {
 in0 ^= 2;
 in1 ^= 2;
 out0 ^= 2;
 out1 ^= 2;
 }
 }
 remove(fn[in0]);
 remove(fn[in1]);
 remove(fn[out1]);
 rename(fn[out0], "words.srt");
}



Listing Four

// -------- bldlist.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static unsigned doclist[10000];
static unsigned docct;
static long wdct;

FILE *f1;
FILE *f2;
FILE *f3;

void dumpbuffer(char *word)
{
 if (docct) {
 long offset = ftell(f1);
 // --- write binary tree file ( -> word list )
 fwrite(&offset, sizeof(long), 1, f3);
 // --- write word list file ( -> document list )

 offset = ftell(f2);
 fwrite(word, strlen(word)+1, 1, f1);
 fwrite(&offset, sizeof(long), 1, f1);
 // ---- write document list file
 fwrite(&docct, sizeof(unsigned), 1, f2);
 fwrite(doclist, sizeof(unsigned), docct, f2);

 docct = 0;
 }
 wdct++;
 if ((wdct % 100) == 0)
 printf("\r%ld words", wdct);
}
void main()
{
 unsigned docno;
 char word[101];
 char pword[101] = "";
 int len, dn;
 FILE *fp = fopen("words.srt", "rb");
 if (fp != NULL) {
 f1 = fopen("bible.idx", "wb");
 f2 = fopen("bible.dls", "wb");
 f3 = fopen("bible.bnt", "wb");
 while (fread(&docno, sizeof(unsigned), 1, fp) == 1) {
 int i = 0;
 while ((word[i++] = fgetc(fp)) != 0)
 if (i == 100 feof(fp))
 break;
 word[i] = '\0';
 if (strcmp(word, pword)) {
 // ---- finish off the old word
 dumpbuffer(pword);
 strcpy(pword, word);
 }
 doclist[docct++] = docno;
 }
 dumpbuffer(pword);
 fclose(f1);
 fclose(f2);
 fclose(f3);
 fclose(fp);
 printf("\r%ld words", wdct);
 }
}


















ALGORITHM ALLEY


Faster FFTs




J.G.G. Dobbe


Iwan is a hardware and software developer in the department of medical
bio-engineering at the Academic Medical Center in Amsterdam, Holland.


Introduction 
by Bruce Schneier
The Fourier transform is an essential tool of modern applied mathematics.
Named after the French mathematician and physicist Jean-Baptiste Joseph
Fourier (1768--1830), it has found uses in all areas where information or
signals are transmitted or analyzed: concert-hall acoustics, computer vision,
speech analysis, aerodynamics, fluid control, broadcasting, and process
control.
What Fourier devised is a way to express periodic functions as an infinite
series of sine and cosine functions. These sorts of expressions are a way of
putting all periodic functions within the same reference. Even though they are
all different, you can think about them in the same way. The Fourier transform
is the method for deriving the various coefficients for the sine and cosine
functions.
The Fourier transform assumes that the function is periodic even though in the
real world, functions are most often not. However, most applications only care
about the behavior of a function within a certain interval. The trick is to
assume that the function is periodic outside that interval, then just not
worry about it. The Fourier transform of a sampled function that is assumed
periodic outside the interval is called a "Discrete Fourier Transform" (DFT).
Getting a computer to do a DFT requires an incredible number of calculations
and is generally not feasible. Consequently, mathematicians have invented the
"Fast Fourier Transform" (FFT)--a smarter, divide-and-conquer algorithm that
skips most of the DFT's unnecessary calculations and makes all of this stuff
accessible. Think of the Fourier transform as a mathematical tool, and the FFT
as a computational tool for mathematical applications.
In this month's "Algorithm Alley," Iwan describes two implementations of a
radix 2 FFT algorithm, one written in C and the other in assembler. The
assembler version can be linked to C or Pascal programs and can run a
1024-point FFT on a 486/66 in 36 msec, or in just 12 msec on a 60-MHz Pentium.
This performance is comparable to what you can get from a dedicated digital
signal processing (DSP) board--a 1024-point FFT in 1--4 msec--and is much
cheaper. Figure 1 provides various performance results on a variety of
processors.
When computers are involved in signal processing, an analog-to-digital
converter must be used to digitize the incoming analog signal. This results in
an array of samples representing the original analog signal. It is possible to
perform a Discrete Fourier Transform (DFT) on the array of samples resulting
in the discrete-density spectrum in Example 1, where fn is the nth sample in
time domain. Because the number of complex spectral components (N) in the
frequency domain equals the number of signal samples, the same information is
available in both domains, and direct inverse transformation is permissible,
regaining the original samples. This also eases programming because one array
can be used for both time and frequency samples.
When computing an N-point DFT, N2 multiplications are required. Some of those
factors appear several times throughout the calculation. Using a smarter
computation method called the "Fast Fourier Transform" (FFT), it is possible
to skip duplicate calculations, saving a lot of execution time. 
The secret behind the FFT algorithm is splitting an N-point DFT into two
N/2-point DFTs. Computing an N/2-point DFT takes N2/4 multiplications, so two
N/2-point DFTs can be calculated with N2/2 multiplications. Note that this is
half the number of calculations required to perform an N-point DFT. While some
combining of the N/2-point DFTs is required to obtain the original N-point
DFT, computation time is reduced almost by a factor of 2. By continuing this
halving process, finally 2-point DFTs remain to be calculated. When the
combination method is disregarded, FFT execution time is the result of the
equation in Example 2. F(k) can be written as a combination of two N/2 point
DFTs, F1(k) and F2(k), holding the even and odd samples of F(k), respectively;
see Example 3. Note that the sequence of Q-factors is the same for both
equations of F(k). This characteristic is used to reduce execution time. The
equation for finding F(k) for one Q-factor is graphically represented by the
so-called FFT "Butterfly" in Figure 2. 
The computation of an N-point FFT can be graphically represented as a whole
network of butterflies for calculating an 8-point FFT, as in Figure 3. The
network in Figure 3 consists of sections that combine two N/2-point FFTs. If
you read from right to left, the butterflies become smaller, resulting in the
following 2-point DFTs: 
F(0)=f(0)+f(1)
F(1)=f(0)-f(1)


Shuffle Samples


While splitting the N-point DFT into two N/2-point DFTs, one DFT consists of
the even samples and the other of the odd samples. When each N/2-point DFT is
split into two N/2-point DFTs, the first N/4-point DFT consists of the even
samples of the N/2-point DFT, while the odd samples give the second N/4-point
DFT.
By shuffling the FFT input samples in a structured way, the output data is
ordered after performing the FFT. When N is a power of two, this shuffle
algorithm is relatively simple. The index of an array sample in binary format
can be reversed to obtain the index of the array sample for exchange. When the
first half of the array is shuffled (index 0..3), the other half automatically
comes in the right place; see Table 1. To find the array index for exchanging
elements, a compact and fast piece of code like Example 4 can be written in
assembler in which AX holds the array index for exchange with the original
element (OldIndex). (The file PCFFT.ASM includes code for calculating shuffle
indexes and is available electronically; see page 3.)


Speeding Up FFT Performance


The FFT function prototype void Fft(float *Re, float *Im, int Pwr, int Dir);
(where Re and Im are the base addresses of the real and imaginary arrays)
shows the complex array split into its real and imaginary parts. During the
FFT execution, elements in the arrays are reached by adding the right offset
to the base addresses. By splitting the complex array in two, the same offset
applies for indexing an element in both real and imaginary arrays. This
results in saving execution time, both during FFT calculation and when
calculating the power spectrum from the real and imaginary elements.


Size of Complex Numbers


An FFT can be performed on any type and size of variables. Integer numbers are
often small in size so less time is spent on memory access. By using
floating-point numbers, accuracy of integer FFTs can be improved at the cost
of execution time. Coprocessors can work with IEEE floating-point numbers of
different size--short-real (32 bits), long-real (64 bits), and a so-called
"temp-real" format (80 bits). However, the more bits involved, the longer it
will take the coprocessor to access memory. In short, when working with
floating-point numbers, the fastest algorithm can be obtained with the
short-real type. Note that some Pascal compilers convert the REAL type into a
48-bits (6-byte) variant not supported by the coprocessor. In such cases, the
coprocessor is emulated in software; however, execution slows down in the
process.


Improving the Goniometry


When the computer executes an FFT, it must calculate the factor
Qk=cos(ak)+i*sin(ak), with a=--2p/N and k=0,1,_, N--1, over and over again.
Because sin and cos are time-consuming instructions (even for coprocessors),
they slow down FFT performance. It is possible, however, to use a table of
constants holding start values a for each N-point FFT and derive cos(ak) and
sin(ak) from the start value by faster multiplication and subtraction; see
Example 5. Therefore, Qk+1 can be calculated from Qk and the table values for
cos(a) and sin(a). Note that cos(a) and sin(a) for finding the first Qk for
k=0 can also be found in the tables.


Using a Coprocessor 



Math coprocessors are typically stack-based. The operands must be loaded onto
the stack first, before operations can be released to them. You can reduce FFT
execution time by reducing the number of bus accesses. When writing
coprocessor assembler code, you can leave numbers on the coprocessor stack
that can be used later on during the calculation. Not all compilers are this
farsighted. The calculation of sin(a(k+1)) and cos(Aa(k+1)) can be computed
with a very compact piece of code like that in PCFFT.ASM. With languages such
as C or Pascal, the efficiency of the code is at the mercy of the compiler and
often is not easy to see through and improve.


Faster FFT Code


I've implemented the FFT algorithm described here in both C and assembler.
PCFFT.C (Listing One) is the C version, while PCFFT.ASM (available
electronically) is written in assembler. Listing Two is the header file for
PCFFT.C. Both C and assembler versions use the same function prototype: void
Fft(float *Re, float *Im, int Pwr, int Dir);, where Re, Im are pointers to
arrays of (32-bit) floating-point numbers holding the real and imaginary part
of the input, respectively; Pwr holds the size of the arrays as a power of two
(for example, when a 1024-point FFT is to be calculated, Pwr should be equal
to 10); and Dir determines whether an FFT (Dir_1) or an inverse FFT (Dir0)
should be performed.
PCFFTest.C (Listing Three) includes the PCFFT.H header file and shows how the
FFT function must be called. You can determine whether to use the C or
assembler version by linking the right object file.
As Figure 1 illustrates, the assembler version is the fastest and makes direct
use of your system's coprocessor. Set your compiler to generate the fastest
code (80x86); when using the assembler version, define the COPROC287 symbol at
the assembler command line--if your system has a 80x87 coprocessor--for
maximum performance.
The assembler version can also be used to link with Borland Pascal, by
defining the PSCL symbol at the assembler command line. When this is done, the
correct parameter passing sequence is applied. With Borland Turbo Assembler
(TASM 3.0), use the command lines in Example 6 to generate the object file
(PCFFT.OBJ) of your choice.


References


Kaldewaij, A. and J. van Tiel. Voortgezette Wiskunde; Fourier-theorie en
systeemtheorie. Utrecht, The Netherlands: Scheltema & Holkema, 1983.
Nieland, H.M. "Fourier Analyse." Natuur en Techniek (no. 3, 1990).
Lawrence, R.R., and B. Gold. Theory and Application of Digital Signal
Processing. Englewood Cliffs, NJ: Prentice Hall, 1975.
Pettit, F. Fourier Transforms in Action. Chartwell-Bratt Ltd. (Publishing and
Training), Bromley, U.K., 1985.
Press, William et al. Numerical Recipes in C, Second Edition. Cambridge:
Cambridge University Press, 1992.
Example 1 Performing a DFT on an array of samples results in the discrete
density spectrum.
Example 2 When the combination method is disregarded, FFT execution time is
the result of this equation.
Example 3 F(k) can be written as a combination of two N/2-point DFTs, F1(k)
and F2(k), holding the even and odd samples of F(k), respectively.
Figure 1 FFT execution time on PC using Borland C++ 3.1 and assembler. The
assembler code reduces execution time by about another 35 percent.
Figure 2 FFT "Butterfly," where the "hub" represents a summation/ subtraction
point, while the arrow represents a multiplication.
Table 1: Before performing the FFT, array elements are shuffled. The shuffle
index for element X is found by placing its binary format in reversed order.
 Old Binary Reversed New 
 Index Format Binary Index 
 Format 
 0 000 000 0
 1 001 100 4
 2 010 010 2
 3 011 110 6
 4 100 001 1
 5 101 101 5
 6 110 011 3
 7 111 111 7
Figure 3 8-point FFT in "Butterfly" notation. Reading the graph from left to
right shows how N/2-point FFTs are combined. Reading from right to left shows
how an N-point FFT is reduced to the calculation of 2-point DFTs.
Example 4: Assembler code to find shuffle index.
 MOV BX, OldIndex ; BX = Old index
 MOV AX, 0000H ; AX = 0
 MOV CX, WordLength ; CX = Wordlength in bits
 CLC ; Clear carry flag
NextBit:
 RCR BX,1 ; Rotate BX through carry right 1 bit
 RCL AX,1 ; Rotate AX left using carry 1 bit
 LOOP NextBit ; Repeat till whole word reversed
Example 5 Deriving cos (a(k+1)) and sin (a(k+1)) from a table of constants.
Example 6: Command lines for (a) C-linkable code and (b) Pascal-linkable code.
(a)
For XT: TASM pcfft.asm
For AT: TASM /dCOPROC287=1 pcfft.asm
(b)
For XT: TASM /dPSCL=1 pcfft.asm
For AT: TASM /dPSCL=1 /dCOPROC287=1 pcfft.asm

Listing One 

/* PCFFT.C -- by J.G.G. Dobbe -- Performs an FFT on two arrays (Re, Im) of

 type float (can be changed). This unit is written in C and
 doesn't call assembler routines.
*/

/* --------------------- Include directive ------------------------ */
#include "pcfft.h"

/* --------------------- Local variables -------------------------- */
static float CosArray[28] =
{ /* cos{-2pi/N} for N = 2, 4, 8, ... 16384 */
 -1.00000000000000, 0.00000000000000, 0.70710678118655,
 0.92387953251129, 0.98078528040323, 0.99518472667220,
 0.99879545620517, 0.99969881869620, 0.99992470183914,
 0.99998117528260, 0.99999529380958, 0.99999882345170,
 0.99999970586288, 0.99999992646572,
 /* cos{2pi/N} for N = 2, 4, 8, ... 16384 */
 -1.00000000000000, 0.00000000000000, 0.70710678118655,
 0.92387953251129, 0.98078528040323, 0.99518472667220,
 0.99879545620517, 0.99969881869620, 0.99992470183914,
 0.99998117528260, 0.99999529380958, 0.99999882345170,
 0.99999970586288, 0.99999992646572
};
static float SinArray[28] =
{ /* sin{-2pi/N} for N = 2, 4, 8, ... 16384 */
 0.00000000000000, -1.00000000000000, -0.70710678118655,
 -0.38268343236509, -0.19509032201613, -0.09801714032956,
 -0.04906767432742, -0.02454122852291, -0.01227153828572,
 -0.00613588464915, -0.00306795676297, -0.00153398018628,
 -0.00076699031874, -0.00038349518757,
 /* sin{2pi/N} for N = 2, 4, 8, ... 16384 */
 0.00000000000000, 1.00000000000000, 0.70710678118655,
 0.38268343236509, 0.19509032201613, 0.09801714032956,
 0.04906767432742, 0.02454122852291, 0.01227153828572,
 0.00613588464915, 0.00306795676297, 0.00153398018628,
 0.00076699031874, 0.00038349518757
};

/* --------------------- Function implementations ----------------- */
/* --------------------- ShuffleIndex ----------------------------- */
static unsigned int ShuffleIndex(unsigned int i, int WordLength)

/* Function : Finds the shuffle index of array elements. The array length
 must be a power of two; The power is stored in "WordLength".
 Return value : With "i" the source array index, "ShuffleIndex"
 returns the destination index for shuffling.
 Comment : -
*/
{
 unsigned int NewIndex;
 unsigned char BitNr;
 NewIndex = 0;
 for (BitNr = 0; BitNr <= WordLength - 1; BitNr++)
 {
 NewIndex = NewIndex << 1;
 if ((i & 1) != 0) NewIndex = NewIndex + 1;
 i = i >> 1;
 }
 return NewIndex;
}

/* --------------------- Shuffle2Arr ------------------------------ */
static void Shuffle2Arr(float *a, float *b, int bitlength)
/* Function : Shuffles both arrays "a" and "b". This function is called 
 before performing the actual FFT so the array elements
 are in the right order after FFT.
 Return value : -
 Comment : -
*/
{
 unsigned int IndexOld, IndexNew;
 float temp;
 unsigned int N;
 int bitlengthtemp;

 bitlengthtemp = bitlength; /* Save for later use */
 N = 1; /* Find array-length */
 do
 {
 N = N * 2;
 bitlength = bitlength - 1;
 } while (bitlength > 0) ;
 /* Shuffle all elements */
 for (IndexOld = 0; IndexOld <= N - 1; IndexOld++)
 { /* Find index to exchange elements */
 IndexNew = ShuffleIndex(IndexOld, bitlengthtemp);
 if (IndexNew > IndexOld)
 { /* Exchange elements: */
 temp = a[IndexOld]; /* Of array a */
 a[IndexOld] = a[IndexNew];
 a[IndexNew] = temp;
 temp = b[IndexOld]; /* Of array a */
 b[IndexOld] = b[IndexNew];
 b[IndexNew] = temp;
 }
 }
}
/* --------------------- Fft -------------------------------------- */
void Fft(float *Re, float *Im, int Pwr, int Dir)

/* Function : Actual FFT algorithm. "Re" and "Im" point to start of real 
 and imaginary arrays of numbers, "Pwr" holds the array sizes
 as a power of 2 while "Dir" indicates whether an FFT (Dir>=1)
 or an inverse FFT must be performed (Dir<=0).
 Return value : The transformed information is returned by "Re"
 and "Im" (real and imaginary part respectively).
 Comment : -
*/
{
 int pwrhelp;
 int N;
 int Section;
 int AngleCounter;
 int FlyDistance;
 int FlyCount;
 int index1;
 int index2;
 float tempr, tempi;
 float Re1, Re2, Im1, Im2;
 float c, s;

 float scale;
 float sqrtn;
 float temp;
 float Qr, Qi;

 Shuffle2Arr(Re, Im, Pwr); /* Shuffle before (i)FFT */
 pwrhelp = Pwr; /* Determine size of arrs */
 N = 1;
 do
 {
 N = N * 2;
 pwrhelp--;
 } while (pwrhelp > 0) ;

 if (Dir >= 1) AngleCounter = 0; /* FFT */
 else AngleCounter = 14; /* Inverse FFT */
 Section = 1;
 while (Section < N)
 {
 FlyDistance = 2 * Section;
 c = CosArray[AngleCounter];
 s = SinArray[AngleCounter];
 Qr = 1; Qi = 0;
 for (FlyCount = 0; FlyCount <= Section - 1; FlyCount++)
 {
 index1 = FlyCount;
 do
 {
 index2 = index1 + Section;
 /* Perform 2-Point DFT */
 tempr = 1.0 * Qr * Re[index2] - 1.0 * Qi * Im[index2];
 tempi = 1.0 * Qr * Im[index2] + 1.0 * Qi * Re[index2];

 Re[index2] = Re[index1] - tempr; /* For Re-part */
 Re[index1] = Re[index1] + tempr;
 Im[index2] = Im[index1] - tempi; /* For Im-part */
 Im[index1] = Im[index1] + tempi;

 index1 = index1 + FlyDistance;
 } while (index1 <= (N - 1));

 /* k */
 /* Calculate new Q = cos(ak) + j*sin(ak) = Qr + j*Qi */
 /* -2*pi */
 /* with: a = ----- */
 /* N */
 temp = Qr;
 Qr = Qr*c - Qi*s;
 Qi = Qi*c + temp*s;
 }
 Section = Section * 2;
 AngleCounter = AngleCounter + 1;
 }
 if (Dir <= 0) /* Normalize for */
 { /* inverse FFT only */
 scale = 1.0/N;
 for (index1 = 0; index1 <= N - 1; index1++)
 {
 Re[index1] = scale * Re[index1];

 Im[index1] = scale * Im[index1];
 }
 }
}
/* ---------------------------------------------------------------- */



Listing Two

/* PCFFT.H -- by J.G.G. Dobbe -- Headers for PCFFT.C */

#ifndef PCFFT_H /* If not defined yet, use this file */
#define PCFFT_H

/* --------------------- External function ------------------------ */
void Fft(float *Re, float *Im, int Pwr, int Dir);
/* ---------------------------------------------------------------- */
#endif



Listing Three

/* PCFFTest.C -- by J.G.G. Dobbe -- Test program written in Turbo C/Borland C 
 that uses the C or Assembler version of FFT. It depends on 
 the type of FFT object file whether the C- or ASM-version is 
 linked in. In both cases, the same FFT.H file is used.
*/

/* --------------------- Include directives ----------------------- */
#include <stdio.h>
#include <conio.h>
#include "pcfft.h"

/* --------------------- Constant definition ---------------------- */
#define SIZE 16 /* Size of data arrays (re, im) */

/* --------------------- Variable definitions --------------------- */
float re[SIZE]; /* Array holding Real part of data */
float im[SIZE]; /* Array holding Imaginary part of data */

/* --------------------- Function implementations ----------------- */
/* --------------------- DispArr ---------------------------------- */
void DispArr(char *Txt)
{
 int i; /* Loop counter */
 clrscr(); /* Clear screen */
 printf("\n%s:\n", Txt); /* Display header */
 for (i = 0; i < SIZE; i++) /* Display data points */
 printf("i = %4d: Re = %8.2f, Im = %8.2f\n", i, re[i], im[i]);
 printf("Press <ENTER> to continue\n"); /* Display message */
 getch(); /* Wait for keystroke */
}
/* --------------------- main ------------------------------------- */
int main()
{
 int i;
 for (i = 0; i < SIZE; i++) /* Clear data arrays */

 {
 re[i] = 0.0;
 im[i] = 0.0;
 }
 re[0] = 100.0; /* Fill array with pulse signal */
 DispArr("Input Time Data"); /* Display time data (pulse signal) */
 Fft(re, im, 4, 1); /* FFT on data points */
 DispArr("Frequency Response (after FFT)"); /* Display freq data */
 Fft(re, im, 4, -1); /* Inverse FFT on freq data */
 DispArr("Time Response (after inverse FFT)");
 /* Display time data (pulse signal) */
 return 0;
}
/* ---------------------------------------------------------------- */

















































PROGRAMMER'S BOOKSHELF


SOM/DSOM and Object-Oriented Programming




Steve Gallagher


Steve is president of G&A Consultants, an OS/2 consulting firm located in
Research Triangle Park, NC. He can be contacted at sgallagher@delphi.com.


There's little question that the push for distributed component objects will
accelerate over the coming months and years. As Mark Betz pointed out in his
article, "Interoperable Objects" (DDJ, October 1994), the goal of component
objects is to encapsulate all the ugly stuff--platform interfaces,
communication protocols, language specifics, and addressing--to the point that
it's not only transparent to the user, but to the programmer as well. The
individual language-specific object model becomes irrelevant in this paradigm,
thanks to a language-independent Interface Definition Language (IDL). In my
opinion, IBM's System Object Model (SOM/DSOM) packs the gear to deliver what
is needed now, as opposed to approaches such as Microsoft's Common Object
Model and Object Linking and Embedding (COM/OLE), which are hobbled by the
absence of distributed capability and limited to Windows only. Among other
benefits that SOM/DSOM provides are: 
No more recompiling. As things stand now, C++ programmers must recompile every
time the header file for the class library they're using is changed. With SOM,
classes can undergo major structural changes, but as long as no changes to the
source are needed, your application can continue to use the changed class
library without a recompile.
Language neutrality. By providing a language-neutral object model, SOM enables
classes built in Smalltalk to be used in a C++ program, and vice versa. The
potential for code reuse and object sharing across language barriers is
exciting to anyone who has had to code up some classes from ground zero when a
perfectly outstanding set of classes to do the job existed--in another
language.
Address space no longer matters. The objects you use can exist in a totally
different address space, and objects deployed in a single environment are
available to other machine environments.
CORBA++. With the latest version of SOM, IBM dropped OIDL, its object-based
Interface Definition Language, in favor of full compliance with CORBA's IDL.
IBM also added private methods, implementation statements, and instance
variables.
Using other people's DLLs is nothing new; the idea of using someone else's
classes--possibly written in a different language, running on a different
machine, riding on top of a different operating system--takes some getting
used to. But get used to it we must. The exponential increases in
software-development complexity demand SOM, or something like it. 


Object-Oriented Programming Using SOM and DSOM


Object-Oriented Programming Using SOM and DSOM, by Christina Lau is both an
enjoyable and enlightening introduction to SOM. Lau makes sense out of the
often bewildering array of acronyms associated with the subject--DSOM, PSOM,
IR, and the like--in a style that is simple, direct, and forthright. The book
assumes a working knowledge of C and C++ and a grasp of object-oriented
concepts. Given this as a base, you will be taken at a reasonable pace through
SOM's various flavors. 
Lau begins with a solid introduction that gives you a one-page answer to the
question, "What is SOM?". IBM should make photocopies of this page and mail it
to every programmer on the planet. The rest of the introductory chapter
provides details that include concise explanations for the concepts of
language neutrality, the IDL, SOM kernel, various SOM Frameworks, and
collection classes. You also learn the CORBA baseline and the ways in which
SOM extends the base. After developing a "Hello World" program using SOM, Lau
takes you into the world of SOM objects. If you are familiar with OO concepts,
you won't be intimidated when you meet SomClass ("the mother of all classes"),
the detailed explanation of the difference between parent classes and
metaclasses, and SOM's various schemes for dealing with method resolution. 
After explaining the advantages of SOM over stand-alone C++ and providing an
excellent tutorial on DSOM concepts, Lau uses a "Daily Planner" applet to get
to the heart of the matter. The requirements for the Planner are that it allow
addition/deletion of items by multiple users on a network and that changes to
any user's instance of the planner show up in real time on all other users'
Planners. This is an ideal test project for working through SOM's various
Frameworks. Using traditional programming methods to build a stand-alone
Planner is boring; distributing or sharing that same Planner can be Byzantine,
however. The SOM Frameworks make it relatively painless. Once you have built
the base Planner using SOM, you find that by using DSOM you are able to
register multiple Planner servers to enable distributed access across a
network. Compared to traditional methods, DSOM enables this capability with
surprisingly little code.
Still, when the server process terminates, the states of the various Planner
objects are not preserved. The Persistent SOM Framework (PSOM) comes to the
rescue by allowing you to preserve the state of an object, even after the
process that created the object terminates. You are able to store the object
to a file, or some other method of persistent storage. PSOM requires that all
Persistent objects be derived from a Persistent class, and the Persistent data
must be explicitly spelled out. This is in contrast to DSOM, where a given
object does not have to know in advance that it is going to be distributed.
Once you derive your Persistent object from the SOMPersistentObject class, the
rest is surprisingly straightforward.
If you have multiple users sharing a Planner, any change one user makes to a
Planner is not automatically reflected on other users' Planners. The
Replicator SOM (RSOM) Framework is designed to propagate an update to a given
object to all other replicated objects. Updates are propagated directly to the
other objects, without the use of secondary storage. RSOM is the ideal
Framework for groupware applications, such as real-time creation of compound
documents, without the traditional bottleneck of a database server. In common
with the PSOM Framework, RSOM objects are required to be RSOM-aware objects
derived from the SOMRreplb class. After reviewing several important rules that
replicated objects need to adhere to, Lau declares the Daily Planner applet
complete. Different design decisions you could make in real-world situations
are then reviewed: For example, should you use DSOM or RSOM? If you are
programming to a single operating environment, RSOM's complete
object-replication capability is ideal. Unfortunately, RSOM does not talk
across different operating environments; in a situation where this is
required, DSOM is the logical choice.
Lau concludes with an insider's view of the future direction of SOM. Plans are
afoot to port SOM to Windows, System 7, HP UNIX, and OS/400. Additionally, IBM
is furiously developing SOM relationships with vendors as diverse as
Hewlett-Packard, Digitalk, and Watcom. In the words of the Timbuk 3 song, "The
future's so bright, I gotta wear shades."


Objects for OS/2


As a consultant with a need to keep abreast of the latest technology, I found
Object-Oriented Programming with SOM and DSOM to be the most solid
"platform-neutral" introduction I've read. As a working programmer who cranks
out OS/2 Presentation Manager code for a living, Objects for OS/2 was the
perfect follow-up that allowed me to take the generic SOM concepts and ground
them in the OS/2 world I live in every day. Authors Danforth, Koenen, and Tate
take you on a bit-twiddler's dream tour of OS/2's innards, tying it neatly
into the application of SOM to real-world Presentation Manager (PM)
programming scenarios. Objects for OS/2 begins with a competent, basic
tutorial on object-oriented programming, written on a level that will gently
lure in the novice without putting the battered veterans to sleep. In several
chapters crammed with useful source code, you are brought up to speed on the
OS/2 kernel, the PM API, and GPI. This is by no means a definitive reference
on these aspects of OS/2 programming, but it does lay the groundwork and serve
as a solid overview.
From here, you move into the realm of OS/2 C++, and the authors begin
developing their persuasive arguments as to why they consider SOM to be such
an important technology as it relates to C++. This exposition leads naturally
into the three big chapters devoted exclusively to learn-by-example SOM
programming under OS/2. The samples begin at a basic level so as not to
intimidate the novice, then move down into the nuts-and-bolts level of
hard-core OS/2 SOM programming. The subjects of SOM multiple inheritance and
metaclasses are dissected in depth, followed by an outstanding summary of the
entire SOM API. By this point you should be comfortable with SOM concepts and
the API and be fully prepared to grapple with the next large portion of the
book, devoted to working up various coding projects with SOM. Starting with an
amusing variation on the traditional "Hello World" program, you rapidly
progress to the fun stuff--the powerful combination of SOM and PM. The authors
build a solid framework combining the flexibility of PM and the elegance of
SOM, including an outstanding chapter on "wrapping" PM controls within SOM.
This extensive tutorial, along with the code diskette that ships with the
book, are worth enough on their own to justify the price of the book. 
Objects for OS/2 delivers on its promise of getting the OS/2 programmer up and
running with SOM, although I would have preferred a bit more depth on DSOM and
possibly a section describing the innards of the biggest SOM project to
date--the Workplace Shell. Nonetheless, Objects for OS/2 is broad enough and
solid enough that I expect it will develop its own large following of
devotees, much the way Petzold's PM book and Orfali and Harkey's client/server
book (Client/Server Survival Guide, Van Nostrand Reinhold, 1994) already have.


Wrapping It Up


As you may have guessed, these two books complement rather than compete with
each other. You should read Object-Oriented Programming with SOM and DSOM for
the best understanding of the concepts you're likely to find. Then you should
turn to Objects for OS/2 to enable you to take those concepts and apply them
to real-world OS/2 programming problems.
I don't normally read author biographies until after I've finished a book, so
I was impressed to find that both books were written by IBMers, thus proving
that IBM does have talented and articulate people who can put pen to paper and
explain often-difficult concepts in ways we regular folks can both comprehend
and enjoy. It's dang good stuff for the industry that IBM is finally making it
easier for its "wild ducks" to come out and play with the rest of us kids.
Object-Oriented Programming with SOM and DSOM
Christina Lau
VNR, 1994, 272 pp., $36.95
ISBN 0-442-01948-3
Objects for OS/2
Scott Danforth, Paul Koenen, and Bruce Tate 
VNR, 1994, 446 pp., $36.95 
ISBN 0-442-01738-3







SWAINE'S FLAMES


My Religious Quest


I'm thinking about getting religion. I've always found the evidence for the
Big Man in the Sky slightly less convincing than that for the Big Man in the
Sleigh. To tell the truth, I've always had trouble telling those two guys
apart. They seem redundant, if you know what I mean. Possibly you don't.
But what with the United States Congress embracing the Republican agenda for
solving America's problems (pass a balanced-budget amendment and pray) and the
President of the United States allowing as how he might be open to amending
that pesky First Amendment's religious freedom language (more freedom for and
less freedom from), I can see the handwriting on the wall.
The only question is, which religion? There are so many to choose from.
Right off, I have to eliminate the trendy heavy-armament and poisoned-Kool-Aid
denominations. I'm just not cut out for violence. And Mormonism is out,
because those people just can't make good Macintosh software.
I might be able to get behind the dominant American sect, Free-Market
Economics, since I do believe that a free market would be a fine thing and
that it's as likely as the Second Coming of Elvis. And I don't understand the
language in which the services are conducted, a big plus. My only problem with
economic religion is that its prophets all dress so well. As I see it, poor
people live with the economy and rich people only play with it.
I guess I could worship Microsoft. It's big enough and it's scary enough and I
gather that one's religious beliefs don't have to bear any relation to one's
short-term purchase plans. The trouble with Microsoft worship is that I don't
know where Microsoft will be two years from now. Naming Chicago "Windows 95"
set a daunting precedent, and doesn't Microsoft now have to release a Windows
96 in 1996 or abandon it completely? Add to that Microsoft's completely
missing the boat on the commercialization of the Internet, and this idol has
feet of clay.
A lot of my friends used to adhere to the faith that Apple was god and IBM was
the devil. Imagine how confused they are today. I don't think I'll convert to
that sect.
What does that leave? Intel? In its favor, Intel has omnipotence. There is no
problem that could possibly confront Intel that it can't solve by throwing
enough money at it, and Intel has more money than Kuwait. But omnipotence is
kind of boring.
Of course, when it's politically expedient, I profess the dominant programming
religion, C, or its C++ sect. But I am a C-ist in the same politically correct
sense that Thomas Jefferson was a Deist. That fools nobody.
Maybe I'll worship the Internet. As it grows into the utterly ubiquitous
environment in which we all work, play, socialize, and meditate, Internet
worship would be rather like Nature worship, it seems to me, except totally
unnatural.
Okay, flame off. If you're religious and have read this far, I admire your
tolerance. I hope you realize that I'm just having some fun and don't mean to
offend anyone. Despite being a godless heathen, I'm actually a pretty decent
guy. My religious beliefs (or lack of them) have nothing to do with the sort
of person I am.
Any more than yours do. (Oops. Flame on.)
Michael Swaineeditor-at-large
MikeSwaine@eworld.com









































OF INTEREST
MetaWare's recently released High C/C++ Version 3.3 compiler fully implements
C++ exception handling (including nested exceptions), C++ namespaces for
compatibility with third-party class libraries, and new-style casting
notation. The C++ namespace capabilities let you avoid name clashes with
third-party libraries and run-time type information (RTTI). With RTTI and
dynamic_cast (new cast notation), you can write applications that perform
dynamic, object-specific operations based on an object's size, type, or
methods. New style-casting notation provides safer typecasts than the old C
notation, and it's versatile and easy to identify in program code.
Version 3.3 also provides OMF common support for templates, virtual function
tables, RTTI, and inline functions. A 32-bit source-level debugger is
included. 
Version 3.3 requires a 386, 486, or Pentium-processor-based PC with a minimum
of 8 Mbytes RAM. Fifteen Mbytes of disk space are required for compiler
installation. Programs created with High C/C++ can be run in Phar Lap's TNT
DOS-Extender SDK run-time environment, or in Windows 3.x 386 Enhanced mode.
In a related announcement, MetaWare announced the availability of Version 3.3
of the SPARC to DOS Cross Compiler. The cross compiler enables applications
developed on Sun SPARCstations to be distributed on Intel 386, 486, or Pentium
architectures running Windows or DOS.
High C/C++ 3.3 sells for $795.00, while the cross compiler sells for $2895.00.
Reader service no. 20.
MetaWare Inc.
2161 Delaware Ave.
Santa Cruz, CA 95060-5706
408-429-6382
SQA has started shipping SQA TeamTest Version 3.0, an automated GUI
client/server testing tool for the Windows. SQA TeamTest 3.0 is built on a
network repository that integrates test planning, test development, test
execution, results analysis, defect tracking, and summary reporting and
analysis. 
SQA TeamTest 3.0 lets you configure the tool to recognize (via record and
playback) GUI objects from among a variety of options. You can also test the
states of Window objects such as pushbuttons, check boxes, and radio buttons.
Furthermore, the testing tool includes a 3-D graphics engine combined with a
customizable report writer that lets you design your own reports with your
choice of fonts, sizes, style, colors, and the inclusion of bitmap images. All
test-repository information can be extracted and reported. SQA TeamTest 3.0
sells for $2495.90 per installation. Reader service no. 21.
SQA Inc.
10 State Street
Wolburn, MA 01801
617-932-0110
3Dlabs has begun sampling its GLINT 300SX 3-D processor that, the company
claims, enables OpenGL applications running on GLINT-accelerated Pentium PCs
to outperform high-priced workstations. The GLINT processor incorporates the
equivalent of a high-end workstation graphics-board chip set in a single chip.
Target platforms include desktop PCs, workstations, and embedded systems.
GLINT is capable of processing 300,000 shaded, depth-buffered, and
anti-aliased polygons/second. The chip provides complete 32-bit color; 2-D and
3-D acceleration; and an on-chip, PCI-compliant, localbus interface and
integrated LUT-DAC control, making a complete graphics subsystem possible with
minimal chip count. GLINT implements all the 3-D rendering operations of
OpenGL in silicon, including Gouraud shading, depth buffering, anti-aliasing,
and texture mapping. The GLINT 300SX costs $150.00 in volume, with full-volume
shipments expected in the first quarter of 1995. Reader service no. 22.
3Dlabs Inc.
210 N. 1st Street, #403
San Jose, CA 95131
408-436-3456
Microsoft has started shipping its Visual C++ Version 2.0 Cross-Development
Edition for Macintosh, designed for developers who want to port Windows-based
applications to the Macintosh. The development environment is a
Windows-hosted/Macintosh-target system. 
Visual C++ 2.0 Cross-Development Edition for Macintosh is an add-on to Visual
C++ 2.0 running Windows NT on an Intel processor. The toolset includes the
Windows Portability Libraries (WPL), an implementation of the Win32 API and
architecture for System 7.x that allows MFC and Win32-based applications to
run largely unchanged on the Macintosh. These libraries automatically address
user-interface differences so the resulting application incorporates a native
Macintosh look. Developers can also access System 7.x features to implement
Macintosh-system specific features. Visual C++ 2.0 Cross-Development Edition
for Macintosh also includes: a version of Microsoft Foundation Classes 3.0 for
the Macintosh; high- performance optimizing C/C++ 680x0 cross compiler
seamlessly integrated into the Visual C++ development environment; C++
language support for templates and exception handling; C++ debugging support
for remote debugging of applications for the Macintosh; new wizard and
project-management features tailored to the development of applications for
the Macintosh; tools for building Macintosh-specific menu items and dialog
boxes; and a porting tool for identifying nonportable constructs in
Windows-based applications.
Macintosh apps developed with Microsoft Visual C++ 2.0 Cross-Development
Edition for Macintosh can be distributed royalty free. You can license the
toolset for approximately $2000.00. Since the tools were actually developed by
the Microsoft applications group to port Word and Excel to the Mac, developers
of general-purpose word processors and spreadsheets will have to work out
special licensing terms with Microsoft. Reader service no. 23.
Microsoft 
One Microsoft Way
Redmond, WA 98052-6399
206-882-8080
The SOS Application Profiler, from Solid Oak Software, is a development tool
that lets you selectively monitor file-access functions during and after
development. Essentially, the tool is a 13-Kbyte TSR monitor that you install
when file-access problems are encountered. A DLL is supplied that allows
custom configuration of logging functions from the calling application. The
profiler can be used with Visual Basic, Visual C++, Borland Pascal, Clipper,
or any other language supporting DLL access or OBJ linking. The tool sells for
$99.00 and includes an unlimited run-time distribution license. Reader service
no. 24.
Solid Oak Software
P.O. Box 6826
Santa Barbara, CA 93160
805-967-9853
CIEX Version 2.3.5, a very high-level language (VHLL) for 386/486 PCs, can be
downloaded from New Line Software. CIEX is a command-line interpreter and
scripting language tailored for text and data processing. It supports DOS
commands, math functions, text parsing, replacement using wildcards,
multidimensional data arrays and records, and screen-buffered text-graphics
output. 
If you don't have access to America Online, you can request a disk at no cost
from New Line. A 300-page manual costs $40.00. A development environment,
which includes online help, interactive debugger, script libraries, syntax
checkers, and sample scripts, sells for $35.00. Reader service no. 25.
New Line Software
7348 S. Alton Way, #1
Englewood, CO 80112
800-441-2931
Dialogic has announced the release of the SCSA Telephony Application Objects
(TAO) Framework API Version 3.0, the software portion of the Signal Computing
System Architecture. The SCSA APIs provide a hardware- and vendor-independent
interface that simplifies development of multitechnology computer-telephony
applications and makes developed applications more scalable and portable.
Because both locally hosted and remote applications can control the underlying
server using the same APIs, they are ideal for client/server environments in
which multiple computer-telephony applications can share the same server
resources.
SCSA APIs are independent of the underlying hardware platform so that
application-software developers no longer need to directly control hardware-
and software- technology resources by means of their physical location; their
applications can be implemented on top of various hardware platforms. 
The SCSA APIs form the top layer of the SCSA TAO Framework, a
hardware-independent software architecture that simplifies developing
computer-telephony applications. The SCSA TAO Framework consists of: SCSA
APIs, a standard set of function calls that allow applications to easily
control system resources and server-management functions; SCSA System
Services, middleware for controlling various server-management tasks; and SCSA
Service Provider Interfaces, a standardized means for communication between
the various system providers in an SCSA Server. 
A white paper describing the SCSA Telephony Application Objects Framework is
also available at no charge from Dialogic. Reader service no. 26.
Dialogic 
1515 Route 10
Parsippany, NJ 07054
201-993-3030
MobileWare has announced availability of its MobileWare Version 1.1 software
for delivering wireless data communications for large cc:Mail installations
and enabling the development of custom mobile applications. Also included are
systems-management features that streamline support of mobile workers.
With MobileWare 1.1, mobile workers can send and receive cc:Mail messages,
print documents, send faxes, and transfer files to and from their NetWare
network. Mobile workers can use cellular telephones and regular phone lines
for remote data communications. MobileWare 1.1 supports industry-standard
messaging APIs, and apps can be developed using PowerBuilder, Visual Basic,
Microsoft Access, or Visual C++. The software sells for $280.00 per user.
Reader service no. 27.
MobileWare 
2425 N. Central Expressway, Suite 1001
Dallas, TX 75080-2748
800-260-7450
Apple Computer has begun shipping QuickTime 2.0 for Windows, a cross-platform
tool for creating, using, and sharing multimedia information between Macintosh
and Windows. QuickTime for Windows supports full-motion/full-screen video at
30 frames-per-second. The system lets you integrate and synchronize photos,
music, animation, text, audio, and the like. Built-in compression makes it
possible to store a 20-slide multimedia presentation--including images, music,
and text--on a single 1-Mbyte floppy diskette. The QuickTime 2.0 for Windows
SDK will sell for $195.00. License fees for the product start at $300.00 per
year, per title. Reader service no. 28.
Apple Computer
20525 Mariani Ave.
Cupertino, CA 95014

408-996-1010
The Easy-CD Developers Toolkit from incatsystems is a formatting and mastering
toolkit for integrating CD-R functions into your application programs. With
the toolkit, a single DLL called by your program allows you to build an ISO
9660 image from your data. Drivers for all CD-R recorders are included. You
can write discs in any format and mode, including Mode 1, Mode 2, multivolume
and multisession, ISO 9660, and already-prepared CD images on hard disk. The
package includes sample apps written in C and Visual Basic. The toolkit sells
for $995.00. Reader service no. 29.
incatsystems 
1684 Dell Ave.
Campbell, CA 95008
408-379-2400
The MIDI Programmers Toolkit for Windows, available from Music Quest, lets
multimedia and music-program developers create applications ranging from
sequencing, music notation, and music instruction to live performance using
MIDI instruments and sound cards. The toolkit hides much of the Windows API
while providing a complete library of functions. The library allows content
developers to read and write songs in Standard MIDI File form, receive and
transmit MIDI events to and from MIDI instruments and sound cards, filter
events, and synchronize to either a MIDI clock, internal time base, or SMPTE
time code. The library is provided as a DLL and supports Microsoft and Visual
C++, Borland C/C++, and Visual Basic. The MIDI Programmers Toolkit for Windows
sells for $99.95. Reader service no. 30.
Music Quest
1700 Alma Drive, Suite 300
Plano, TX 75075
214-881-7408
Borland has begun shipping Paradox 5.0 for Windows. The new release contains
support for OLE 2.0 and ODBC, interface enhancements for first-time users, and
support for seven new data types: long integer, binary coded decimal, single
byte, logical, time, time stamp, and autoincrement.
Paradox 5.0 for Windows also offers an enhanced debugger for ObjectPAL that is
modeled on Borland's Windows-hosted C++ debugger. In addition, ObjectPAL has
been extended with more than 100 new methods and 200 properties. Finally,
Borland has added comprehensive context-sensitive help for ObjectPAL,
accessible through an integrated-development environment. The package sells
for $495.00, with upgrades at $199.95. Reader service no. 31.
Borland International
100 Borland Way, 
Scotts Valley, CA 95067 
800-233-2444
Curious about the new Visual C++ 2.0 compiler but don't want to upgrade your
hardware to accommodate Windows NT 3.5? Then you may be interested in WinHost
for Visual C++ 2.0 from Phar Lap. WinHost, a tool that allows developers
working in a 16-bit environment to target 32-bit Windows applications,
additionally allows programmers to use the Visual C++ compiler and
command-line tools from either DOS or Windows 3.1. WinHost is an addition to
Phar Lap's desktop, FrontRunner, which integrates the DOS and Windows
environments by providing a DOS box within a Windows shell. FrontRunner
provides cut and paste capability between DOS and Windows, includes a
programmable status bar and customizable Launch bar, and maintains a screen
history of 16,000 lines within the DOS box. FrontRunner with the WinHost for
Visual C++ 2.0 addition retails for $139.00. Reader service no. 32. 
Phar Lap Software
60 Aberdeen Avenue
Cambridge, MA 02138
617-661-1510









































EDITORIAL


Logo No-Go


One way to go about defining the future is to present a vision so crystalline,
so compelling that anyone encountering it is moved to action. Over 20 years
ago, I listened to Buckminster Fuller, author of Operating Manual for
Spaceship Earth and inventor of the geodesic dome, speak of the future as he
envisioned it. Although I didn't move into a dome that night, what he said has
served as a beacon for me ever since.
Other approaches to defining the future are more heavy-handed. A case in point
is Microsoft's move to sculpt the face of computing. From compilers to comm
ports, Microsoft wants to dictate the PC platforms we'll be using in the years
to come, and it's dangling the Windows 95 logo as bait. 
Microsoft opened the logo-licensing door years ago by making available, under
relatively relaxed terms, a variety of logos denoting Windows-compatible
products. Why? Because both Microsoft and licensees benefited from what
amounted to a business partnership. Claiming customer confusion, Microsoft has
instituted a new program that lumps all the old logos under a single Win95
flag, but with some very specific licensing terms.
For starters, an application developer wanting to promote a Win32-targeted
word processor (or any other application capable of opening, saving, or
closing a file) must create the app with a 32-bit compiler that generates a PE
format executable, register 16x16 and 32x32 icons, use system metrics for
sizing, use the right mouse button for context menus, and so forth. Although
extensive, these requirements are reasonable since they do lead to
well-behaved and consistently presented applications, something end users
doubtlessly appreciate. 
But wait, there's more. Win95 applications must also run on Windows NT, use
long filenames, support universal naming-convention pathnames, support OLE 2.0
features such as drag-and-drop and container/objects, and be mail enabled via
the Common Messaging Call API. 
Compilers and other development tools must support OLE drag-and-drop, provide
point-and-click means of creating apps with OLE-container/object support, and
support OLE Automation (although OLE Automation itself isn't required).
Furthermore, tool vendors must provide class libraries that support OLE
functionality, including container/object, drag-and-drop, compound files, and
Automation.
Hardware manufacturers aren't off the hook either. System vendors have to
provide Plug and Play BIOS 1.0a (or better), permanent icon labels on the PC
case, option ROMs with Plug and Play header format, an IEEE-P1284-I parallel
port, a 1-16550A serial port, and so on. For their part, subsystem vendors
hawking everything from network adapters to floppy-disk controllers have their
own set of requirements, down to the I/O addresses and IRQs they can use. PC
Week reports that OEM customers are even being offered discounts on licensing
fees at below Win3.1 levels if they sign logo-licensing contracts prior to
March 1, 1995 and install Win95 on at least 50 percent of systems shipped
within one month after Win95 ships.
But what if your app doesn't really need NT, OLE, or mail support? Write it
anyway, but don't expect Microsoft to let you use its logo to promote your
software. In short, it isn't enough that hardware and software systems be
Win95 compatible. Microsoft will only grant logo licenses for products that
are specifically designed from the ground up to meet Microsoft's requirements
for Win95. In fairness, Microsoft is showing some flexibility. If hardware
architectures dictate non-Win95 compliant features (such as device support),
the company will still grant a license.
The basic assumption in all this is that the Win95 logo--like the now familiar
"Intel Inside" label--has value in the first place. Microsoft obviously thinks
so, stating flat out that "We feel strongly that the Windows Logo [sic] is a
valuable marketing tool_." In truth, the logo does have value, particularly in
a market dominated by less-sophisticated users. If two word processors are
side-by-side on the store shelf and one has a perceived Microsoft
seal-of-approval, which do you think the first-time buyer of a Packard Bell
multimedia PC is going to pick up? 
Adopting a Newt Gingrich-like stance when defending the logo program in an
Infoworld interview, Bill Gates said: "We're not forcing [ISVs] to do
anything_. It's our logo. We own the logo, but they don't need the logo. They
can sell their software without the logo. So what's the big deal? You can be a
player [in the Win95 market]. Here's how you play: You write good software.
You sell it."
Uhh, what's the big deal? Well for one thing, some companies may not have the
resources or desire to add the extra baggage of OLE, mail, or NT support if
they don't need it. Secondly, third-party developers are being used as cannon
fodder in a marketing war that pits Microsoft object models (COM) and
compound-document architectures (OLE) against competitors, such as SOM or
OpenDoc, respectively. Furthermore, the logo program appears as an attempt to
promote Microsoft development tools which "automatically" generate the
necessary code. And, if nothing else, the policy is a means of forcing
developer support for a relatively small NT market. 
Microsoft's scheme calls for developer self-testing of every product before
submitting it to an independent testing house for another round of evaluation.
If new-product confidentiality is of concern, test results will be held in
escrow until a vendor-specified date. In any event, the results will
eventually go to Microsoft's logo-licensing group. Developers will then have
to choose between putting a new product on the shelf without the logo, or
restructuring their marketing plans to accommodate a clumsy licensing process.
Again in fairness, Microsoft apps must go to the independent testing lab along
with third parties.
To enforce proper use of the logo, Microsoft's new logo police (the
"gestaplogo?") will routinely pull software and hardware off the retail shelf
to check compliance and ensure that papers are in order. 
In the past, I've lauded Microsoft for its willingness to hang in there with
projects it believes in, without regard to next quarter's return on
investment. Windows is the best example of this tenaciousness. However,
there's nothing admirable about Microsoft's Win95 logo program. Windows is a
success because of the thousands of third-party applications that run on it,
not because of its technical superiority. Maybe it's time Microsoft remembered
that it's sometimes best, as the old saying goes, to dance with those who
brung ya' and start treating third-party developers as partners, rather than
serfs.
Jonathan Ericksoneditor-in-chief








































LETTERS


DAN Revisited


Dear DDJ,
I just read "DAN," by Reginald B. Charney (DDJ, August 1994), and I enjoyed it
greatly. The code that would come up is wonderfully clear but_
Wouldn't using DAN discourage the reuse of code? In Reg's example, the basics
of the classes Cost and List would be the same. In fact, it is conceivable
that the code modules would be identical except for a Search-and-Replace of
all instances of Cost with List.
The first answer that I see to that is "inheritance." However, although I've
programmed in C++ classes for the last three years, I've never had to be
concerned with speed and only a little concerned with size, so I've never
checked whether or not inheritance (without virtual functions) adds any
negative effects. I know virtual functions add space (the V-Table) and time
(lookup in the table).
Darren 
Germany
Reg responds: Thank you for the complement and questions. Both are very much
appreciated. I will try to answer the two questions you raised.
First, "Does DAN discourage reuse of code?"; and second, "Does inheritance
without virtual functions add any overhead?"
It is my experience and belief that DAN improves the reuse of code. Code is
re-used when its purpose is clear and its interface is clean. Since DAN maps
closely to your own requirements, it is easy to use a DAN class as a base for
further specialization. Also, a DAN class's member functions tend to be simple
and intuitive since most are based on the attributes of the class. For
example, let us assume you sell a line of software and you intend to extend
the line to include a "lite" and "pro" version. Let the class SKU (Stock
Keeping Unit) be the base for all your products, and let us say it is defined
as in Example 1.
As you can see, the base DAN class SKU has been used to build the derived DAN
class CostedSKU. This simple example does not use virtual member functions,
but in real life it probably would.
I was delayed in answering your question by a "problem" that I preceived with
this solution. I needed to duplicate the operators << and >> from the base
class SKU in the derived class CostedSKU. This seemed ugly. However, it is
needed for two reasons. 
The first reason is that while the operators << and >> are defined for base
class B, they must return values of the type D&, where D is the derived-class
type. For example, Example 2 would be invalid for two reasons. First, the
result of d<<x1 would be a B& (not a D&, as desired); and second, the operator
<< in the preceding example is a member function, and the type of the first
operand of the operator << would not be converted from a D& to a B&; thus, the
operator B::operator <<() would not be invoked and we would have an undefined
operation. (Making the operator<< a friend function is possible and at first
glance overcomes the conversion problem; however, it involves so much extra
work for a limited return that it was not viewed as a viable solution.)
The second reason for the "problem" is philosophical. Should the attributes of
a base class be implicitly accessible in a derived class? While mechanically
we may want to say yes, in general we want to say no, since the attributes
making up the base class are usually hidden by the derived-class
encapsulation. In the preceding example, is it normal to want to know the
packing list on a CostedSKU as opposed to a basic SKU? If a car is composed of
an Engine and Body, is it normal to ask the number of Cylinders the car has
without first referring to the Engine of the Car. That is, while the
expression Cylinders(Car) is possible, the expres-sion Cylinders(Engine(Car))
makes more sense. If the former is desired, then an explicit definition can be
provided to convert a Car instance to the number of Cylinders the car engine
has.
You also asked if classes do not contain virtual functions, is there any
overhead? Under ideal conditions, the answer is no. However, this translates
into a quality-of-implementation issue, and different compilers may produce
slightly different results. However, it is my belief that eventually,
optimized code for all major vendors will produce derived classes with no
overhead if no virtual inheritance is used. (Compiling in debug information
into a module may result in more overhead for derived classes, regardless of
the presence or absence of virtual functions.)
Dear DDJ,
I just read the article, "Data Attribute Notation Relationships," by Reginald
Charney (DDJ, January 1995), which is a follow-up on a prior article
describing DAN. Neither article explores the downside to DAN's Generating
Encapsulation Redundancy, or "DANGER." 
As for data, a "root" class (or attribute class) contains a single data
attribute. Non-root classes are used to provide additional data attributes,
but no data attributes are allowed in nonroot classes. Therefore, a root class
is defined (or redefined in this case) as a data attribute. Each nonroot
Object must instantiate all of its encapsulated Objects, and each subsequent
encapsulated Object then instantiates all of its Objects until a root Object
is encountered. This could create quite a call stack for "actual" applications
and would definitely create longer execution times.
Next, the original article stated a DAN benefit of not polluting the namespace
as badly as with access functions. First, access functions are encapsulated
within a class (if properly designed) and are available only to objects as
defined in the interface, but classes are "global." Second, DAN pollutes the
environment with classes which jeopardizes inheritance and reusability, not to
mention conflicting class names for libraries and programs. For example, (DAN)
Class X is the x coordinate of a point defined as a FLOAT, and by itself means
what? Furthermore, defining Class X as an INTEGER doesn't make it more/less
meaningful. (Non-DAN) Class X defined as the x chromosome does have meaning by
itself. Continuing, (DAN) Class XYPoint is a class encapsulating two classes,
Class X and Class Y, rather than containing two attributes, FLOAT x and FLOAT
y. Of course, Class X and Class Y are entirely redundant as defined (to show
the Class X conflict), so we change (DAN) Class X/Class Y to Class Point, and
XYPoint now encapsulates Class Point x and Class Point y.
Some of the disadvantages to using DAN are: 
If the data attribute of Class Point is inappropriate, we cannot overload the
class name to implement a different data type (that is, DOUBLE LONG).
DAN is linear (flat) in nature, so inheritance does not make much sense. After
all, all data attributes are in attribute classes and multiple inheritance may
be obscured.
Operators (>> & <<) rely on the internals of the classes violating
encapsulation (increasing maintenance costs) for simple access to data
attributes,
Overloading an access function (such as calcSalary()) to calculate a salary
for various employees is not possible; there are no "real" access functions.
DAN does support additional classes to simulate access functions, but again,
more classes mean more objects, additional memory consumption, and longer
execution times. Consider this example: A small application contains 42
user-defined classes. An average of 4 data attributes for 30 of the classes
gives us 120 attribute classes plus additional classes for "Boolean functions"
such as isTrainServiced(), plus nonattribute classes which amount to 178
additional classes. Total: 298 classes versus 42. Or, (DAN) Class List is an
attribute class containing a DOUBLE. Wouldn't you consider the class List to
be a container type class and not a variable? What about class Cost? Think of
the problems [that arise] if any of the attribute classes contain a class
variable and pure virtual class cannot exist. Notation should not affect
software design or execution time; however, DAN certainly does. Data
abstraction is but one element of the OO puzzle (inheritance, polymorphism,
data abstraction, encapsulation, reusability_), and no element can be
isolated. Therefore, a complete understanding of each element is required
along with an understanding of how all the elements are put together to create
OO programs.
Timothy D. Nestved
Orlando, Florida
Example 1: Test DAN and inheritance.
// test.cpp - test DAN and inheritance
#include <iostream.h>
#include <string.h>
class PartNumber {
 long n;
public:
 PartNumber(long nn=0) : n(nn) { }
 operator long() const { return n; }
};
class Media {
 char *cp;
public:
 Media(char* ccp=0) : cp(ccp) { }
 operator const char*() const { return cp; }
 // ... should also define copy constructor & assignment op
};
class PackingList {
 char *cp;
public:
 PackingList(char* ccp=0) : cp(ccp) { }
 operator const char*() const { return cp; }
 // ... should also define copy constructor & assignment op
};
class SKU { // simplified Stock Keeping Class
protected:

 PartNumber pn;
 Media m;
 PackingList pl;
public:
 operator const PartNumber&() const { return pn; }
 operator const Media&() const { return m; }
 operator const PackingList&() const { return pl; }
 SKU& operator <<(const PartNumber& ppn){ pn = ppn; return *this; }
 SKU& operator <<(const Media& mm) { m = mm; return *this; }
 SKU& operator <<(const PackingList& ppl){ pl = ppl; return *this; }
 SKU& operator >>(PartNumber& ppn) { ppn = pn; return *this; }
 SKU& operator >>(Media& mm) { mm = m; return *this; }
 SKU& operator >>(PackingList& ppl) { ppl = pl; return *this; }
 friend ostream& operator <<(ostream& os, SKU& ss);
 // ... other public member functions
};
// Now let us define a CostedSKU class for use in sales and costing.
class Cost {
 double c;
public:
 Cost(double cc=0.0) : c(cc) { }
 operator double() const { return c; }
 friend ostream& operator<<(ostream& os, const Cost& cc);
}; // cost in some standard currency (dollars, DeutschMarks, etc.)
ostream& operator <<(ostream& os, const Cost& cc) { return os << cc.c; }
class SRP {
 double s;
public:
 SRP(double ss=0.0) : s(ss) { }
 operator double() const { return s; }
 friend ostream& operator<<(ostream& os, const SRP& s);
}; // Suggested Retail Price in some standard currency
ostream& operator <<(ostream& os, const SRP& ss) { return os << ss.s; }
class CostedSKU : public SKU {
 Cost c;
 SRP s;
public:
 operator const Cost&() const { return c; }
 operator const SRP&() const { return s; }
 operator SKU&() const { return (SKU&)*this; }
 CostedSKU& operator <<(const Cost& cc) { c = cc; return *this; }
 CostedSKU& operator <<(const SRP& ss) { s = ss; return *this; }
 CostedSKU& operator >>(Cost& cc) { cc = c; return *this; }
 CostedSKU& operator >>(SRP& ss) { ss = s; return *this; }
 CostedSKU& operator <<(const PartNumber& ppn){ pn = ppn; return *this; }
 CostedSKU& operator <<(const Media& mm) { m = mm; return *this; }
 CostedSKU& operator <<(const PackingList& ppl){ pl = ppl; return *this; }
 CostedSKU& operator >>(PartNumber& ppn) { ppn = pn; return *this; }
 CostedSKU& operator >>(Media& mm) { mm = m; return *this; }
 CostedSKU& operator >>(PackingList& ppl) { ppl = pl; return *this; }
 // ... other public member functions
};
int main()
{
 CostedSKU cs;
 cs << PartNumber(123);
 cs << Media("3.5");
 cs << PackingList("Users' Guide");
 cs << PartNumber(123) << Media("3.5") << PackingList("Users' Guide");

 cs << Cost(18.34);
 if (strcmp(Media(cs),"CD-ROM") == 0)
 cs << SRP(Cost(cs)+5.00); // SRP based on cost of CD-ROM
 else
 cs << SRP(Cost(cs)+10.00); // diskettes are more expensive than CD-ROM
 cerr << "SRP of SKU is " << SRP(cs) << endl;
 return 0;
}
Example 2: DAN #2.
class X;
class B { X x; public:
 B& operator <<(X); };
class D : public B { /* ... */ };
D d;
X x1, x2;
d << x1 << x2;















































Dr. Dobb's Journal Excellence in Programming Award


Jonathan Erickson


In conjunction with the 20th anniversary of Dr. Dobb's Journal, we're proud to
announce an annual award recognizing achievement and excellence in the field
of computer programming. Selected by a special editorial committee of Dr.
Dobb's Journal, this year's recipients--Alexander Stepanov and Linus
Torvalds--are being honored for the significant contributions they have made
to the advancement of software development. 
In developing the C++ Standard Template Library (STL), Alexander has created a
body of work that in all likelihood will touch most mainstream programmers for
years to come. Likewise, Linus, in creating the Linux operating system, has
shown that powerful, sophisticated, innovative system software can be built
out of sheer will and raw talent, succeeding where many others have failed.
It's also significant that, in keeping with the spirit and philosophy that has
guided Dr. Dobb's Journal since its Tiny Basic days, the development of both
STL and Linux was based on the principles of openness, cooperation, and
technical superiority: STL has been placed into the public domain, and from
its inception, Linux has been freely distributable.
In addition to being acknowledged at the Software Development '95 Conference
in San Francisco, Dr. Dobb's Journal and the Miller Freeman Community
Connection program are granting $1000 scholarships--in Linus and Alexander's
names--to university programs of their choice. At Alexander's behest, the
grant will be given to the Department of Mathematics at his alma mater, Moscow
State University, Russia, to a graduate student doing research in applying
mathematical methods to C++ software. For his part, Linus has designated that
a scholarship be awarded to a deserving computer-science student at the
University of Helsinki in Finland.
Alexander Stepanov currently heads up the Generic Programming Project at
Hewlett-Packard Research Laboratories in Palo Alto, California. A graduate of
Moscow State University, Alexander studied mathematics, obtaining a diploma of
Teacher of Mathematics from Moscow District Educational Institute. However, it
wasn't until 1972, when he was designing computers for controlling
hydroelectric power stations at the Research Institute of Complex Automation
(TzNIIKA), that Alexander became enamored with computers and computer
programming. 
Upon emigrating to the U.S., Alexander joined the computer-science staff at
General Electric's Research Center in Schenectady, New York, where he
established an ongoing research collaboration with Dave Musser and worked with
Vladimir Lumelsky on path-planning algorithms for mobile robots.
After teaching at Polytechnic University in Brooklyn, New York (where he
developed Ada Generic Library), Alexander found himself at AT&T Bell
Laboratories, learning C/C++ from Andrew Koenig and Bjarne Stroustrup. From
there, Alexander moved to HP Labs in Palo Alto, where, along with Meng Lee, he
developed the C++ Standard Template Library. Andrew's current projects include
coauthoring, with Stroustrup, a paper on the language foundations STL, another
paper (with Mary Loomis) on generic programming as a programming methodology,
and a book on STL with P.J. Plauger, Meng Lee, and Dave Musser.
As Alexander explains in an in-depth interview with Al Stevens beginning on
page 115 of this issue, the Standard Template Library is a generic
implementation of a suite of template containers which has been adopted by the
ANSI C++ committee as a model for virtually the entire C++ library. STL
implements a programming model which provides an orthogonal view of data
structures and algorithms, as opposed to object-oriented encapsulation.
Although the ideas behind STL are not new, it took someone with Alexander's
vision, perseverance, and experience--along with the new generation of C++
tools--to turn the promise of generic programming into reality.
Linus Torvalds is the force behind Linux, a UNIX-like, 32-bit, protected-mode,
preemptive multitasking operating system that runs on 386/486 PCs. Although a
full-time student and part-time instructor at the University of Helsinki,
Linus continues to manage the Linux project--which involves hundreds of
programmers worldwide, all of whom are donating their time and efforts in the
further development of Linux. Currently, most of Linus's time is devoted to
handling the kernel development and shepherding the overall project.
Interestingly, Linux didn't start out to be an operating system, as Linus
tells us in a May 1994 DDJ interview. Upon buying his first 386-based PC in
early 1991, Linus didn't want to run MS-DOS and wasn't satisfied with Minix, a
small UNIX-like system developed by Andrew Tanenbaum. Consequently, he began
tinkering with the 386's memory management and process switching, ultimately
realizing that what he was developing was looking more and more like a real
operating system. 
After switching development from assembly language to C, Linus made an
"unofficial" release of Linux 0.01 in September of 1991. This was followed in
early October 1991 with the first "official" release to the Internet of Linux
0.02, which could run the GNU Bourne Shell and GNU C compiler. Shortly
thereafter, Version 0.03 arrived, then Linux 0.10. By this time, programmers
around the world realized that something special was happening, as evidenced
by the exchanges on comp.os.minix, and later, the comp.os.linux hierarchy.
Before long, Version 0.95 was released, then 1.0 in March 1992, and finally,
today's Linux 1.1, patch level 72.
The current implementation of Linux is quite powerful, running everything from
the X Window system and TCP/IP to Emacs, UUCP, mail and news. The full
distribution consists of kernel sources, C, C++, man pages, basic utilities,
networking support, X Window, XView/OpenLook, DOS emulators, and more. 
In addition to being a worldwide development project, Linux has spawned a
veritable cottage industry of programming tools and resources. The operating
system is distributed both at ftp sites (such as MCC in England and SLC in
Canada) and on CD-ROMs published by companies such as Slackware, Yggrdasil,
InfoMagic, and Walnut Creek. A number of books, including Running Linux, by
Andy Oram and Matt Welsh, and magazines such as the Linux Journal have been
published about the system, with many more on the way.
Still, the real significance of Linus's work is that almost single-handedly,
he was able to implement true innovation in kernel design (particularly when
it comes to features such as on-demand loading of system services) while
achieving 100 percent UNIX System V compatibility when no other systems could.
It is significant that Linus's work has been somewhat validated by Novell (who
acquired UNIX from AT&T), which is evaluating Linux as the foundation for its
proposed new-generation "Corsair" project. 
Please join us in congratulating both Alexander and Linus. In developing their
respective projects, they've demonstrated that individual programmers can make
a difference, and innovation can flourish in a cooperative
environment--reminding us once again of why we got into this business in the
first place.
Figure 1 Linus Torvalds (Linus Torvalds' photograph courtesy of the Linux
Journal)
Figure 2 Alexander Stepanov






































Cross-Platform Communication Classes


C++ semaphore classes for OS/2, AIX, and Windows NT




Richard B. Lam


Dick is a member of the research staff at IBM's T.J. Watson Research Center.
He can be contacted at rblam@watson.ibm.com.


Numerous C++ libraries are available that ease the burden of programming
graphical user interfaces (GUIs) in a cross-platform environment. However, few
libraries address the interprocess communications (IPC) facilities built into
today's sophisticated PC and workstation operating systems. Interprocess
communication is a vital part of client/server computing, and processes and
threads running on the same system can communicate in many ways other than the
familiar clipboard, DDE, and OLE standards.
In this article, I'll summarize the common techniques for IPC and present one
way to build cross-platform C++ libraries. In doing so, I'll write an example
library that implements semaphores in a platform-independent manner to allow
signaling or controlling of shared resources between processes and threads.
Implementations are presented for OS/2, AIX, and Windows NT. 


Communication Mechanisms


Clipboard, DDE, and OLE facilities are familiar to many programmers because
they are supported by Windows. However, Windows does not support the IPC
capabilities built into 32-bit operating systems such as AIX, OS/2, and
Windows NT. These powerful APIs enable sophisticated access to and sharing of
information between processes and threads running in the same environment. The
most common of these communication mechanisms are semaphores, shared memory,
queues, and named pipes. 
A semaphore is really just a flag indicating that an event has occurred or
that data is locked and should not be changed by another process. While you
can certainly define your own flag as a global variable in a DLL that other
processes can use for signaling or resource locking, problems can arise. For
instance, in a preemptive-multitasking environment, a task may be preempted at
any time, even while trying to access or change a flag. This can cause
synchronization problems unless the flag is controlled at the kernel level of
the operating system in cooperation with the task scheduler. Thus, operating
systems provide a semaphore API for creating and controlling semaphores.
There are two basic types of semaphores: mutex (mutual exclusion) and event.
Mutex semaphores are used primarily to control access to some kind of shared
resource: shared memory, database records, files, and the like. Event
semaphores signal that an event has occurred, allowing other processes waiting
on that event to continue. This facilitates synchronizing tasks that need to
cooperate or allowing a process to wait until some kind of required
initialization is complete.
Shared-memory address space is a block of memory created by one process or
thread and made available to other running processes. Any process or thread
that has access to the memory can use it just as if the memory were a part of
the application's own address space. Memory is usually accessed through a
unique name known to all processes needing to use it. Obviously, if two or
more processes are writing to the shared-memory space simultaneously, the
resulting data could be garbled, so a mutex semaphore is typically used to
ensure that access is sequential.
Figure 1 schematically shows how two processes share the same address space.
Shared memory typically offers the best performance in sharing data among
applications, but other techniques may be more appropriate for client/server
programming.
Message queues are familiar to GUI programmers who use event-driven message
loops to dispatch messages to window procedures. Windowing systems typically
provide functions for posting messages to windows in the same or other
applications, but these functions do not work for character-mode programs.
However, a message-queue communication mechanism exists that is valid for both
character-mode and window-mode applications: A queue may be created by a
server application, which then reads messages from the queue and processes
them one at a time. Multiple client applications can then access the queue and
write messages to the server, and the operating system synchronizes the queue
so that messages originating from different sources will not intermingle; see
Figure 2.
The server can examine the queue to see if any messages are waiting and can
optionally purge all messages from the queue. Because this queue API is
separate from the underlying window-system API, both window-message loops and
queue-message loops may be running in the same application, possibly in
different threads.
Pipes are buffer areas which allow processes to communicate with each other
just as if they were reading and writing files. Pipes can be either named or
unnamed. Some operating systems (such as OS/2) allow named-pipe communications
not only between applications on the same system, but also between
applications running on different systems connected via a network. A server
process creates a named pipe, and one or more client processes open it and
establish connections; see Figure 3. Once the connection is established, the
pipe is treated as a regular file handle, and the standard file I/O methods
are used to transfer data across the pipe.


Writing Cross-Platform Classes


OS/2, AIX, and Windows NT implement the IPC features described up to this
point. Unfortunately, there is no standard for these features' functional
interface, so each vendor provides a custom API. What programmers need is an
API that is consistent across all platforms to minimize the
operating-system-dependent part of their code.
The preprocessor definitions, macros, and data types for each operating system
are all different. To avoid having users of your communication library deal
with this at the API level, you can separate the library-class APIs from the
implementation of the classes, which must necessarily be different on each
platform. While there are many ways of doing this in C++, I'll undertake the
separation for this library by defining a single interface class
(InterfaceClass) that contains an instance of the operating-system-specific
implementation class (ImplementationClass).
InterfaceClass is packaged as a class declaration (in ifclass.h) as in Example
1(a). The file ifclass.C contains method invocations which call the
corresponding method in the ImplementationClass instance; see Example 1(b).
Now the implementation-class header (defined in imclass.h) can still be
operating-system independent (or #ifdefs can be included to encapsulate the
dependent parts); see Example 1(c). Furthermore, as Example 1(d) shows, you
can define three separate implementations of the constructor and destructor
ImplementationClass to handle the individual operating systems (see os2impl.C,
aiximpl.C, and winimpl.C). 
To package the class library for distribution, compile the ifclass.C module
and the appropriate implementation module for the current operating system
(for OS/2, compile os2impl.C). You then either link the modules as a DLL or
make a static library. The distribution files are simply the ifclass.h file
and the library module for each operating system. Each platform uses the same
header file containing the class declaration, so platform-independent
applications can be written using the InterfaceClass. 


Semaphores


To illustrate, I'll implement named semaphores because AIX requires a name for
the ftok() function I'll use later. Also, you may need to query for the name
and type of semaphore and perhaps access the semaphore id created by the
underlying operating system. You can implement all of these generic methods in
an abstract base class, and derive specialty classes for the mutex and event
semaphores from the base class.


The Abstract Base Class


Listing One shows the class declaration for the abstract base class
ipcSemaphore. All of the interface classes and data types are given an ipc
prefix to avoid possible namespace collisions when linking with other
libraries. Two enumerated types are defined in this file: ipcSemaphoreType
holds the type of semaphore (mutex or event), and ipcSemaphoreOp defines
whether the semaphore is being created by the owner of the semaphore
(semcreate) or already exists and is being accessed (semaccess).
The base-class constructor takes arguments corresponding to a unique name for
the semaphore, type, and operation. Processes or threads that access the
semaphore only need to know the name and type of the semaphore to use it.
Methods are also defined for returning the semaphore name, underlying
operating-system semaphore id (as an unsigned long), semaphore type, and
whether the current process or thread is the owner (creator) of the semaphore.
The last method, Query(), is a pure virtual method that queries the semaphore
and returns the number of requests pending on it.
The last member is a pointer to the osSemaphore class. This friend class has
its declaration in ossem.h (Listing Two). osSemaphore is the
operating-system-independent implementation class corresponding to the
imclass.h mentioned earlier.
Listing Three is the interface class. The ipcSemaphore constructor simply
creates an instance of osSemaphore that is deleted by the destructor. The
other member functions use this instance of the implementation class to call
the corresponding function in that class. For example, the
ipcSemaphore::Name() method calls myImpl-->Name().


Mutex Semaphores



Mutex semaphores control access to some resource. They have two basic
functions which our class must implement--request and release. When a process
(or thread) requests a mutex semaphore, the process either obtains "ownership"
of the semaphore immediately or is blocked until another process releases the
semaphore. 
The semaphore is then "owned" by the requesting process, and requests for the
semaphore by other processes are blocked until the "owner" releases the
semaphore.
Listing Four shows the class declaration for ipcMutexSemaphore, which is
derived from our abstract base class. The Query() method provides an
implementation for the pure virtual method in the base class, and Request()
and Release() methods for the allowed operations on mutex semaphores.
The implementation of this class is given in ipcmutex.C (Listing Five). In the
constructor, the ipcSemaphoreOp operation is checked and used to call either
CreateMutex() or OpenMutex() in the osSemaphore class instance myImpl (defined
as "protected" in the base class). The destructor calls the
myImpl-->CloseMutex() method, and the other three methods call their
corresponding methods in myImpl. Listing Two shows prototypes for these five
member functions included in the public section of osSemaphore.


Event Semaphores


Event semaphores synchronize operations between processes or threads. The
terminology is different from mutex semaphores, and there are three basic
operations instead of two. For example, if application A wants to start
application B and wait for B to perform some operation before continuing, A
first creates and resets an event semaphore. B is then started, and A does a
wait on the event semaphore. When B is done with its operation, it accesses
the event semaphore and posts it, causing A to unblock and continue execution.
Listings Six and Seven show the declaration and implementation for
ipcEventSemaphore. This closely parallels the structure of ipcMutexSemaphore,
except that the Request() and Release() methods are replaced by Reset(),
Wait(), and Post(). The constructor also tests the operation flag argument and
calls either CreateEvent() or OpenEvent(), and the destructor calls
CloseEvent(). Note that all of these functions are also included in the public
section of the declaration for osSemaphore. Now that the interface to the two
semaphore types has been defined, we are ready to look at the actual
implementations on each operating system.


The osSemaphore Implementations


The modules that implement the osSemaphore class for OS/2 (os2sem.C), AIX
(aixsem.C), and Windows NT (winsem.C) are available electronically; see
"Availability," page 3. Each module includes the header files specific to that
operating system, plus ossem.h, the class declaration for osSemaphore. Because
the operating-system dependencies are buried here, users of these classes need
not compile with complicated preprocessor definitions and include files that
specify the target operating environment.
The constructors for all three implementations are almost identical. OS/2
requires system semaphores to have a pathname such as \SEM32\name, and AIX
requires a name for the ftok() function, which returns a key required to
obtain UNIX interprocess-communication identifiers. Therefore, both os2sem.C
and aixsem.C define a semPath string constant at the top of the module, and
semPath and the name argument are used to form the full semaphore name stored
in the myName instance variable. Also, a file with the name of the semaphore
must exist on AIX, and it is created in aixsem.C if the ipcSemaphoreOp type is
semcreate.
Note that the constructor takes a pointer to the ipcSemaphore interface
instance that creates the osSemaphore. Because osSemaphore is also declared as
a friend of ipcSemaphore, the osSemaphore methods can change the myState
variable (declared in ipcsem.h). The user can then call rdstate() (or use the
! operator) to get information on the state of the semaphore.
The destructors delete the memory allocated in the constructors for the full
semaphore pathname, and the AIX destructor also deletes the file and removes
the semaphore using a call to semctl(). The other methods for returning the
name, id, type, and owner are all the same in each file, and could arguably be
put in the interface base-class implementation. However, this would require
exposing the protected variables representing these values to the user of the
semaphore classes. You could have also placed the common code in an abstract
implementation base class and derived specific implementation classes from it,
but the current structure is adequate for our needs.
CreateMutex(), OpenMutex(), and the like are the remaining methods, and they
simply call the API functions appropriate for each operating system. For
example, DosCreateMutexSem() is the OS/2 function that creates a mutex
semaphore, while the ftok() and semget() functions are required under AIX. For
Windows NT, macros are defined in winbase.h that must be undefined to avoid
conflicts with the CreateMutex(), OpenMutex(), CreateEvent(), and OpenEvent()
methods. Then the Win32 functions ::CreateMutexA(), ::OpenMutexA(), and the
like are called directly.


Conclusion


I have defined a cross-platform implementation of mutex and event semaphores
with an interface class that has no operating-system dependencies. You only
need to compile the modules on the appropriate system and distribute the
object module or DLL and the three header files ipcsem.h, ipcmutex.h, and
ipcevent.h.
The test programs semtest1.C and semtest2.C (available electronically) can be
compiled on all three systems using the code presented in this article. These
programs illustrate how the mutex and event semaphores can synchronize two
processes.
Figure 1 Shared memory between two processes (P1 and P2).
Figure 2 A message queue with a server and two client processes.
Figure 3 A named-pipe connection between two processes.
Example 1: (a) InterfaceClass packaged as a class declaration; (b) method
invocations which call the corresponding method in ImplementationClass; (c)
implementation of class header can still be operating-system independent; (d)
defining three separate implementations of the constructor and destructor
ImplementationClass.
(a)
// forward declaration
class ImplementationClass;
// interface class declaration
class InterfaceClass {
friend class ImplementationClass;
public:
 // constructor and destructor
 InterfaceClass();
 virtual ~InterfaceClass();
 void SomeMethod(); // invoked by class user
protected:
 ImplementationClass *myImpl; // pointer to implementation
};

(b)
void InterfaceClass::SomeMethod()
{
 myImpl->SomeMethod();
}

(c)
class ImplementationClass {
public:
 // constructor and destructor

 ImplementationClass();
 virtual ~ImplementationClass();
 void SomeMethod(); // invoked by InterfaceClass
};

(d)
// os2impl.C
#define INCL_DOS
#include <os2.h>
void ImplementationClass::SomeMethod()
{
 // call OS/2 specific functions
}
// aiximpl.C
#define <sys/ipc.h>
void ImplementationClass::SomeMethod()
{
 // call AIX specific functions
}
// winimpl.C
#define <windows.h>
void ImplementationClass::SomeMethod()
{
 // call Windows NT specific functions
}

Listing One 

//
****************************************************************************
// Module: ipcsem.h -- Author: Dick Lam
// Purpose: C++ class header file for ipcSemaphore
// Notes: This is an abstract base class. It is the interface class for
// semaphores used in signalling between processes and threads.
//
****************************************************************************

#ifndef MODULE_ipcSemaphoreh
#define MODULE_ipcSemaphoreh

// semaphore type designation and operation type
enum ipcSemaphoreType { unknown = 0, mutex = 1, event = 2 };
enum ipcSemaphoreOp { semcreate = 0, semaccess = 1 };

// forward declaration
class osSemaphore;

// class declaration
class ipcSemaphore {

friend class osSemaphore;

public:
 // constructor and destructor
 ipcSemaphore(const char *name, // unique name for semaphore
 ipcSemaphoreType type, // mutex or event
 ipcSemaphoreOp operation); // create or access the semaphore
 virtual ~ipcSemaphore();

 // methods for getting semaphore parameters [name, semaphore id, type of 
 // semaphore (mutex or event) and whether this is the owner (creator)

 // of the semaphore]
 char *Name() const;
 unsigned long ID() const;
 ipcSemaphoreType Type() const;
 int Owner() const;

 // pure virtual query method for number of requests made for the semaphore
 // (must be redefined in derived classes)
 virtual unsigned long Query() = 0;

 // class version and object state data types
 enum version { MajorVersion = 1, MinorVersion = 0 };
 enum state { good = 0, bad = 1, badname = 2, notfound = 3 };

 // methods to get the object state
 inline int rdstate() const { return myState; }
 inline int operator!() const { return(myState != good); }
protected:
 osSemaphore *myImpl; // implementation
 state myState; // (object state (good, bad, etc.)
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 ipcSemaphore(const ipcSemaphore&);
 ipcSemaphore& operator=(const ipcSemaphore&);
};
#endif



Listing Two

//
****************************************************************************
// Module: ossem.h -- Author: Dick Lam
// Purpose: C++ class header file for osSemaphore
// Notes: This is a base class. It contains general implementation methods
// for semaphores used in signalling between processes and threads.
//
****************************************************************************

#ifndef MODULE_osSemaphoreh
#define MODULE_osSemaphoreh
#include "ipcsem.h"
// class declaration
class osSemaphore {

public:
 // constructor and destructor
 osSemaphore(ipcSemaphore *interface, const char *name, 
 ipcSemaphoreType type, ipcSemaphoreOp operation);
 virtual ~osSemaphore();
 // methods for getting semaphore parameters [name, semaphore id, type of 
 // semaphore (mutex or event) and whether this is the owner (creator)
 // of the semaphore]
 char *Name() const;
 unsigned long ID() const;
 ipcSemaphoreType Type() const;
 int Owner() const;

 // mutex semaphore methods

 void CreateMutex();
 void OpenMutex();
 void RequestMutex();
 void ReleaseMutex();
 unsigned long QueryMutex();
 void CloseMutex();

 // event semaphore methods
 void CreateEvent();
 void OpenEvent();
 void PostEvent();
 void ResetEvent();
 void WaitEvent();
 unsigned long QueryEvent();
 void CloseEvent();
protected:
 ipcSemaphore *myInterface; // pointer to the interface instance
 char *myName; // semaphore name, id and type
 unsigned long myID;
 ipcSemaphoreType myType;
 int isOwner; // flag indicating whether this is owner
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 osSemaphore(const osSemaphore&);
 osSemaphore& operator=(const osSemaphore&);
};
#endif



Listing Three

//
****************************************************************************
// Module: ipcsem.C -- Author: Dick Lam
// Purpose: C++ class source file for ipcSemaphore
// Notes: This is an abstract base class. It is the interface class for
// semaphores used in signalling between processes and threads.
//
****************************************************************************
#include "ipcsem.h"
#include "ossem.h"
//
****************************************************************************
// ipcSemaphore - constructor
ipcSemaphore::ipcSemaphore(const char *name, ipcSemaphoreType type,
 ipcSemaphoreOp operation)
{
 // init instance variables
 myState = good;
 myImpl = new osSemaphore(this, name, type, operation);
 if (!myImpl)
 myState = bad;
}
//
----------------------------------------------------------------------------
// ~ipcSemaphore - destructor
ipcSemaphore::~ipcSemaphore()
{
 delete myImpl;
}
//
----------------------------------------------------------------------------

// Name - returns the name of the semaphore
char *ipcSemaphore::Name() const
{
 if (!myImpl)
 return 0;
 return myImpl->Name();
}
//
----------------------------------------------------------------------------
// ID - returns the semaphore id
unsigned long ipcSemaphore::ID() const
{
 if (!myImpl)
 return 0L;
 return myImpl->ID();
}
//
----------------------------------------------------------------------------
// Type - returns the type of semaphore
ipcSemaphoreType ipcSemaphore::Type() const
{
 if (!myImpl)
 return unknown;
 return myImpl->Type();
}
//
----------------------------------------------------------------------------
// Owner - returns 1 if this is the owner (creator), and 0 otherwise
int ipcSemaphore::Owner() const
{
 if (!myImpl)
 return 0;
 return myImpl->Owner();
}



Listing Four

//
****************************************************************************
// Module: ipcmutex.h -- Author: Dick Lam
// Purpose: C++ class header file for ipcMutexSemaphore
// Notes: This class is derived from ipcSemaphore. It is an interface class
// for mutex semaphores that can be used to control access to a shared
// resource across processes or threads.
//
****************************************************************************

#ifndef MODULE_ipcMutexSemaphoreh
#define MODULE_ipcMutexSemaphoreh
#include "ipcsem.h"
// class declaration
class ipcMutexSemaphore : public ipcSemaphore {

public:
 // constructor and destructor
 ipcMutexSemaphore(const char *name, ipcSemaphoreOp operation = semcreate);
 virtual ~ipcMutexSemaphore();

 // query method for number of requests made
 virtual unsigned long Query();

 // request and release methods (to lock and unlock resources)

 virtual void Request();
 virtual void Release();
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 ipcMutexSemaphore(const ipcMutexSemaphore&);
 ipcMutexSemaphore& operator=(const ipcMutexSemaphore&);
};
#endif



Listing Five

//
****************************************************************************
// Module: ipcmutex.C -- Author: Dick Lam
// Purpose: C++ class source file for ipcMutexSemaphore
// Notes: This class is derived from ipcSemaphore. It is an interface class
// for mutex semaphores that can be used to control access to a shared
// resource across processes or threads.
//
****************************************************************************
#include "ipcmutex.h"
#include "ossem.h"
//
****************************************************************************
// ipcMutexSemaphore - constructor
ipcMutexSemaphore::ipcMutexSemaphore(const char *name,ipcSemaphoreOp
operation)
 : ipcSemaphore(name, mutex, operation)
{
 // check the state of the object
 if (myState != good)
 return;
 // create or open the semaphore
 if (operation == semcreate)
 myImpl->CreateMutex();
 else if (operation == semaccess)
 myImpl->OpenMutex();
}
//
----------------------------------------------------------------------------
// ~ipcMutexSemaphore - destructor
ipcMutexSemaphore::~ipcMutexSemaphore()
{
 // close the semaphore
 if (myState == good)
 myImpl->CloseMutex();
}
//
----------------------------------------------------------------------------
// Query - returns the number of requests made of the semaphore
unsigned long ipcMutexSemaphore::Query()
{
 if (myState == good)
 return myImpl->QueryMutex();
 return 0L;
}
//
----------------------------------------------------------------------------
// Request - requests the semaphore
void ipcMutexSemaphore::Request()
{
 if (myState == good)
 myImpl->RequestMutex();

}
//
----------------------------------------------------------------------------
// Release - releases the semaphore
void ipcMutexSemaphore::Release()
{
 if (myState == good)
 myImpl->ReleaseMutex();
}



Listing Six

//
****************************************************************************
// Module: ipcevent.h -- Author: Dick Lam
// Purpose: C++ class header file for ipcEventSemaphore
// Notes: This class is derived from ipcSemaphore. It is an interface class
// for event semaphores that can be used to signal events across
// processes or threads.
//
****************************************************************************

#ifndef MODULE_ipcEventSemaphoreh
#define MODULE_ipcEventSemaphoreh

#include "ipcsem.h"

// class declaration
class ipcEventSemaphore : public ipcSemaphore {

public:
 // constructor and destructor
 ipcEventSemaphore(const char *name, ipcSemaphoreOp operation = semcreate);
 virtual ~ipcEventSemaphore();
 // query method for number of requests made
 virtual unsigned long Query();
 // post, reset and wait methods
 virtual void Post();
 virtual void Reset();
 virtual void Wait();
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 ipcEventSemaphore(const ipcEventSemaphore&);
 ipcEventSemaphore& operator=(const ipcEventSemaphore&);
};
#endif



Listing Seven

//
****************************************************************************
// Module: ipcevent.C -- Author: Dick Lam
// Purpose: C++ class source file for ipcEventSemaphore
// Notes: This class is derived from ipcSemaphore. It is an interface class
// for event semaphores that can be used to signal events across
// processes or threads.
//
****************************************************************************
#include "ipcevent.h"

#include "ossem.h"
//
****************************************************************************
// ipcEventSemaphore - constructor
ipcEventSemaphore::ipcEventSemaphore(const char *name,ipcSemaphoreOp
operation)
 : ipcSemaphore(name, event, operation)
{
 // check the state of the object
 if (myState != good)
 return;
 // create or open the semaphore
 if (operation == semcreate)
 myImpl->CreateEvent();
 else if (operation == semaccess)
 myImpl->OpenEvent();
}
//
----------------------------------------------------------------------------
// ~ipcEventSemaphore - destructor
ipcEventSemaphore::~ipcEventSemaphore()
{
 // close the semaphore
 if (myState == good)
 myImpl->CloseEvent();
}
//
----------------------------------------------------------------------------
// Query - returns the number of requests made of the semaphore
unsigned long ipcEventSemaphore::Query()
{
 if (myState == good)
 return myImpl->QueryEvent();
 return 0L;
}
//
----------------------------------------------------------------------------
// Post - posts the semaphore
void ipcEventSemaphore::Post()
{
 if (myState == good)
 myImpl->PostEvent();
}
//
----------------------------------------------------------------------------
// Reset - resets the semaphore
void ipcEventSemaphore::Reset()
{
 if (myState == good)
 myImpl->ResetEvent();
}
//
----------------------------------------------------------------------------
// Wait - waits for a semaphore event to be posted
void ipcEventSemaphore::Wait()
{
 if (myState == good)
 myImpl->WaitEvent();
}











A Portable Font Specification


Here's how one cross-platform tool solves the problem




Ronald G. White and John Biard


Ron is a consultant currently working with XVT Software on its next release.
He can be reached at rwhite@tesuji.com or on CompuServe at 71500,3566. John is
a member of engineering at XVT Software and the primary designer of the font
code. He can be reached at jrb@xvt.com.


When it comes to portability, fonts present a variety of problems. For one
thing, available fonts differ from platform to platform. The Macintosh comes
with fonts such as Chicago, Geneva, Monaco, and New York. Windows, on the
other hand, has Arial, Courier, Serif, and Times New Roman. Moreover,
thousands of fonts are available from third-party vendors. To further
complicate matters, it is even possible to uninstall original fonts. The end
result is that you can make very few assumptions about what fonts will be
available on any system.
Even if you are developing for a single GUI system, you can't assume that a
particular font is available on the system on which your program runs. While
it's safe to assume that at least one font is available (but not what that
font might be), you cannot assume that the fonts available when your program
is installed will be available later--or that new fonts may not be installed
after your program is installed.
Furthermore, similar fonts may differ on different platforms or even on the
same platform. If the font is implemented on two platforms as a bit-mapped
font, the available sizes may not be the same. The font might be bit-mapped on
one platform and scalable on another, making some sizes unavailable on the
first platform. Also, at some sizes, there may be subtle differences in the
appearance of the font--even if available on both platforms. Even on one
platform, the same font family might be available from different companies in
different formats such as bit-mapped, TrueType, or PostScript, with
significant differences in appearance or available attributes. 
Nor are fonts specified in the same way on different platforms. The Macintosh
represents a font by a family name, a font size, and some combination of the
font styles bold, italic, outline, and shadow. X Window-based systems add more
attributes by specifying foundry, weight, width, spacing, and more. Also,
fonts have different physical implementations and internal formats on
different systems and from different vendors. Unfortunately, no standard has
emerged for specifying fonts across different platforms. 


What Developers Want


To develop for multiple platforms, you need a standard way to specify fonts
such that they will have the same characteristics both across platforms and
between the screen and printers on the same platform. This is a hard nut to
crack. Taking the lowest-common-denominator approach, only those font
attributes available on all systems (font family, size, and a limited subset
of styles) would be used. This would make many fonts (or attributes of fonts)
unavailable on most systems, which precludes another feature developers want:
access to all fonts, printer and screen, currently installed on a system.
Although some programs might be satisfied with a small subset of fonts, others
(word-processing programs, for example) need to be able to pick and access any
font the user has installed. Allowing fonts to be chosen via some limited set
of attributes will not work. On the other hand, trying to use the superset of
all possible attributes doesn't work either. Besides creating an unmanageably
large set of attributes to work with, sometimes a font attribute that seems to
be the same on two different platforms will, on closer inspection, differ in
subtle ways. For example, on some systems the weight attribute is limited to
two values (normal and bold) while on other systems new weights can be added
by font designers. In addition, attributes are often not orthogonal and will
interact with each other, or one attribute on one platform will only partly
overlap with one or more attributes on a different platform. For example,
sometimes italic is considered part of the font family name and other times it
is merely an attribute of a font family. 
Developers would also like to have font operations repeatable, both across
platforms and on the same platform across invocations of a program. In a word
processor, for example, if a user picks a font for part of a document and
saves the document, the next time the document is retrieved, the user expects
to get the same font. If the program runs on multiple platforms, the user
expects, if not the same font, at least a similar font when the same document
is viewed or printed on different platforms. The font information saved with
the document must be complete enough to support this.
This leads to the question of font mapping. When a document is created on one
system and moved to another, the fonts used on the first system may not exist
on the second. It then becomes necessary to "map" the unavailable fonts to new
fonts (that hopefully have similar characteristics). Font mapping is
difficult, and, because different applications have different needs, it may
not be achieved by a single method. You may need only a simple, built-in
solution that is reasonably accurate, or you may want the ability to take over
the font mapping with your own code.


One Approach to Font Handling


XVT supplies cross-platform development tools for GUI environments such as
Windows, Macintosh, and Motif. We have grappled with the problem of portable
fonts since our earliest releases. Our current release (4.0) represents a
major rewrite of the font-handling code. 
We approach the problem of portable fonts by using handles to opaque objects
that represent fonts in application software, a technique that simplifies the
handling of fonts inside a program. This approach also allows us to modify or
add information in the font's internal representation without requiring
changes to existing code. Internally, a structure is used to store the font
information. Listing One shows a simplified version of this structure and the
definition of the font object's handle, XVT_FNTID. The family field is a
pointer to a string that can have any value; size is the font size in points;
and style is a bit mask for style flags such as bold, italic, underline,
outline, shadow, inverse, blink, and strikeout. This is not a complete list of
all possible styles, and not all XVT font styles are available on all systems.
This set of styles is a compromise that covers the more-common styles that can
be represented by a single bit. We made this mask large enough for future
expansion. native_desc is a string that gives enough information about the
font, as represented natively on a particular system, to uniquely identify the
font and locate it. (See the text box "Native Font Descriptors.") Is_print is
a Boolean that indicates whether the font is available on a printer instead of
the screen; unfortunately, on some systems the screen fonts and print fonts
are different and have to be handled differently. Is_mapped is a Boolean that
indicates whether this font has been mapped. Finally, plat_font_data points to
platform-specific information. 
The font specification has two major parts: the logical (portable) part and
the native (nonportable) part. The family, size, and style fields are the
logical font. These can be set independent of a particular system and are an
attempt to represent a portable font specification. The native_desc field
contains the nonportable information necessary to identify and locate a
specific font on a specific platform. native_desc is an ASCII string with
platform-specific fields that describes the physical font that exists on the
system. Several font-mapping mechanisms establish the links between the
logical font, the native font description, and the physical font.
A font is said to be "mapped" when the native-descriptor string has been set
and the XVT font code has associated the font object with a particular
physical font. Application code can force this to take place (via a function
call with the font object as a parameter) if it needs to get information about
the font that can only be obtained once the actual physical font is known. The
font is also implicitly mapped by the XVT font code, if it is not already
mapped, before it is used to display (or print) any text.
The mapping process is multistepped, controllable by the application, and
guaranteed to result in a mapping to some (though not necessarily the best)
physical font; see Figure 1. If the application has provided only the
logical-font specification, the mapping process sets the native-descriptor
string. The application can archive a persistent copy of the logical
specification and the native descriptor between document sessions. If the
application has provided a native-descriptor string for the font (as it might
if it restored this font from a document), the mapping process uses the native
font descriptor to do the mapping.
The XVT font code provides a default, "last chance" mapper that tries to do a
reasonable job of mapping a font specification onto a physical font. It will,
if necessary, do an unreasonable job, but it's guaranteed to map the font to
some physical font. "Reasonable" is a relative term and is dependent on the
application. Some applications might not care what font family is picked as
long as the size is correct. Different applications may demand that the font
family be exact or at least related (whatever that means), but not care about
other style attributes. Writing a single mapper that would please everyone
would be a formidable, if not impossible, task; we didn't try. Instead we
provide several methods that allow applications to control the mapping.
One means for controlling the mapping is via a mapping table in the resource
file. This table specifies a correspondence between the logical font (family,
size, and styles) and a native-descriptor string. By allowing certain fields
in both sides of the correspondence to be wildcarded and having the table
entries ordered, complicated mapping schemes can be set up.
If the resource-mapping table is not flexible enough, the application can
install its own mapper. This mapper can use XVT-supplied, portable utility
functions or can be written to use nonportable code and functions. When a font
needs to be mapped, the XVT font code calls the application-supplied mapper
(if available) first. If this mapper fills in the native-descriptor string
(using information in the font object and whatever other information it can
access) and this string can be mapped to a physical font, then the XVT font
code marks the font as "mapped." If the font is not successfully mapped after
this (the application-supplied mapper is not obligated to attempt a mapping),
then the XVT font code checks for a mapping via the resource-mapping table, if
one is available. If this too fails, then the mapping process calls the
default mapper.
A standard font-selection dialog allows the user of an XVT application access
to all fonts on a system; see Figure 2. Though this dialog is platform
specific, the programmatic interface to it is portable so the application need
not be aware of any differences. (The application can also supply its own
dialog, or other means of selecting fonts, if the standard dialog does not
meet its needs.) If the user selects a font via this dialog, a font object is
returned that has both the logical-font fields and the native-descriptor set.
Because font mapping can be expensive, the font code caches the most-recently
mapped fonts. The application can set the cache size according to its own
needs. Apart from word processors, many applications may never need more than
a half dozen fonts. With a properly sized cache, the overhead of font mapping
can be kept to the minimum--only once per font.


More on Font Mapping


The default font mapper supports four "standard" families. This is done for
backward compatibility but also provides some measure of font portability for
applications with simple font requirements. Although these font families
(System, Courier, Times, and Helvetica) may not be available on every system,
the default mapper recognizes these names and makes an effort to match them to
the best physical font on the system. Other font families are more
problematic, and the default mapper may or may not be able to find reasonable
matches.
To see how default mapping works, we'll examine the default mapper for Motif,
which is divided into two major parts. The first part is table driven and
similar to the resource-mapping table discussed earlier. This table is used to
map the four standard font families. One difference from the resource-mapping
table is that each logical font in the table can correspond to more than one
physical font. During a search of this table, if a match is found on the
logical font, a search is performed for each of the corresponding physical
fonts. If one is found, the search ends and the font is successfully mapped.
The reason for having multiple physical fonts is that the Motif toolkit runs
on many different systems from different companies and has no standard set of
fonts. From experience, XVT has accumulated a list of the fonts most likely to
be available. For example, two common Courier fonts are those from ITC and
Adobe, so we include both.
Listing Two is the second part of the Motif default mapper. This function uses
the font family, size, and some style attributes to find the best match to an
existing physical font. First it builds an X Logical Font Descriptor (XLFD)
with all fields wildcarded except the font family, which it copies from the
logical-font family. The Xlib function XListFonts is called to find the
available physical fonts that have this family name. For each font found, a
score is calculated based on the font weight (bold or medium), slant (italic
or not), size (the score is based on how close the size is), and whether the
font is iso8859 (to avoid fonts with non-ANSI encodings).
A "perfect" match (based on the given criteria) exits from the loop;
otherwise, the current font is marked as the best match so far if its score
exceeds the last best score. After the loop, some font has been picked as the
best match (even if no perfect match was found), and its XLFD is converted
into a native descriptor and put into the font object. If no physical font was
found with the same family name, the font is mapped to a standard font
("system") to guarantee some mapping.


Conclusion


The problem of portable fonts is complex. Some applications attempt to solve
it by supplying the same fonts on all systems on which they run. This, of
course, means limiting the number of available fonts. A newer approach is to
put enough information into the document about the font to allow a program to
reconstruct the font if it doesn't exist on the system. Although this may
ultimately be the best solution, it is not currently supported by any
standards; it is therefore proprietary and requires a major commitment of
resources to implement.

In the approach described here, a font contains both a portable and a
nonportable part and multiple levels of mapping are possible. This provides a
reasonable solution for applications that don't need much control over fonts
and a flexible and extensible solution for applications that do.
Native Font Descriptors
The XVT native font-descriptor string borrows conceptually from the X Logical
Font Descriptor (XLFD) string. The XLFD consists of fields separated by dashes
(--); unspecified fields can be given as asterisks (*). XVT's descriptor
consists of fields separated by slashes (/); unspecified fields can be given
as asterisks (*).
The first field of an XVT native descriptor is an encoded value that gives the
window system and a font-descriptor version number. This makes it possible for
the font code to determine if the descriptor is being used in the same
environment in which it was created. Although the other fields are platform
specific, the format is consistent: attribute fields separated by slashes. On
Motif, for example, the fields are the same as those of the XLFD; see Example
1(a). The Macintosh, on the other hand, has a much simpler format and is shown
in Example 1(b).
--J.B. and R.W.
Example 1: (a) Attribute fields on Motif; (b) attribute fields on Macintosh.
(a)
"X1101/<foundry>/<family>/<weight>/<slant>/<sWdth>/
<adStyl>/<pixel_size>/<point_size>/<X_res>/<Y_res>/
<spacing>/<average_height>/<registry>/<encoding>"
(b)
 "MAC01/<family>/<face>/<size>"
Figure 1 XVT font mapping.
Figure 2 Windows font-selection dialog.

Listing One 

typedef struct {
 char* family; /* font family */
 long size; /* font size in points */
 unsigned long style; /* font style bitfield */
 char* native_desc; /* native font descriptor */
 char is_print; /* window is a print window */
 char is_mapped; /* mapped flag */
 void* plat_font_data; /* pointer to platform font info */
 .
 .
} FONT_OBJ;

typedef struct {
 long far * dummy; /* cast internally to (FONT-OBJ*)*/
} *XVT_FNTID;



Listing Two

static void font_fabricate_map(XVT_FNTID font_id)
{
 int i;
 int count = 0;
 const char* family; 
 unsigned long style; 
 long size;
 char **mlist = NULL;
 char* native_desc = NULL;
 int best_fit = -1;
 int best_score, cur_score;
 char *X_attr;
 int attr_length;
 long ptsize;
 char result[MAX_FONTNAME_SIZE];
 char xlfd[MAX_FONTNAME_SIZE];
 BOOLEAN scalable;
 PLAT_FONT_DATA *plat_font_data;

 family = xvtv_font_get_family(font_id); 
 style = xvtv_font_get_style(font_id); 
 size = xvtv_font_get_size(font_id);

 plat_font_data = FONT_GET_PLAT_DATA(font_id);

 /* Start with a score of 0 indicating no match */
 best_score = 0;

 /* build an XLFD that brings up the font families that match */
 strcpy(xlfd, "-*-");
 strcat(xlfd, family);
 strcat(xlfd, "-*-*-*-*-*-*-*-*-*-*-*-*");

 if ((mlist = XListFonts(xvt_display, xlfd, 100, &count)) != NULL) {

 /* loop through the list of XLFD's, weighing the components of 
 * size, boldness, and italics. */
 for (i = 0; i < count; i++) {
 /* Start at 200 points for matching on family */
 cur_score = 200;
 /* Measure XLFD weight against font style. Note that
 * the X weight can be "bold" or "demibold"*/
 X_attr = get_xlfd_token(mlist[i], X_TOKEN_WEIGHT, &attr_length);
 if (style & XVT_FS_BOLD) {
 if (strstr(X_attr, "bold"))
 cur_score += 100;
 } else {
 if (strstr(X_attr, "medium"))
 cur_score += 100;
 }
 /* Measure XLFD slant against font style. Note that
 * the X slant can be "o" or "i" */
 X_attr = get_xlfd_token(mlist[i], X_TOKEN_SLANT, &attr_length);
 if (style & XVT_FS_ITALIC) {
 if (strstr(X_attr, "o") strstr(X_attr, "i"))
 cur_score += 100;
 } else {
 if (strstr(X_attr, "r"))
 cur_score += 100;
 }
 /* Measure XLFD size against font size. If size match 
 * is exact, add 100, else add less than 100. */
 X_attr = get_xlfd_token(mlist[i], X_TOKEN_POINTSIZE, &attr_length);
 ptsize = (strtol(X_attr, NULL, 10)) / 10;
 if (ptsize == 0 ptsize == size)
 cur_score += 100;
 else
 cur_score += (min((int)size, (int)ptsize) * 100 /
 max((int)size, (int)ptsize));
 /* Subtract points if this is not an iso8859 font (helps avoid
 * jis & other fonts). Use 150 to weight matching ISO fonts higher
 * than, say, an exact size match. */
 X_attr = get_xlfd_token(mlist[i], X_TOKEN_REGISTRY, &attr_length);
 if (!strstr(X_attr, "iso8859"))
 cur_score -= 150;
 /* Is this the best font fit we have seen? */
 if (cur_score > best_score) {
 best_score = cur_score;
 best_fit = i;
 if (ptsize == 0)
 scalable = TRUE;
 else 

 scalable = FALSE;
 }
 /* If we have found the perfect match, quit looking */
 if (best_score == 500)
 break;
 }
 /* Slip the correct point size in if font is scalable */
 if (scalable) {
 make_font_name(mlist[best_fit], (int)size, result);
 native_desc = xvtk_font_XLFD_cvt(result);
 } else
 native_desc = xvtk_font_XLFD_cvt(mlist[best_fit]);
 XFreeFontNames(mlist);
 } else {
 /* There was not even a native font match against the font 
 * family. Use the system font */
 if (scratch_font_id == NULL_FNTID) {
 scratch_font_id = xvtv_font_create();
 xvtv_font_set_family(scratch_font_id, "system");
 font_database_map(scratch_font_id);
 }
 native_desc=xvtv_font_get_native_desc(scratch_font_id, FALSE);
 }
 xvtv_font_set_native_desc(font_id, native_desc);
 return;
}





































Cross-Platform Database Programming


Here's how one developer supports more than 100 platforms




William Fairman and Randal Hoff


William is the founder of FairCom Corp. and senior developer of c-tree, c-tree
Plus, r-tree, and the FairCom Server. Randal is FairCom's director of
technical operations. They can be contacted on CompuServe at 71333,72.


How many different combinations of operating systems and hardware platforms
are used today? Fifty, a hundred--does anyone really know? How many are in the
mainstream? DOS, Windows, NT, OS/2, Solaris, UNIX--Intel, Motorola, DEC,
SunSPARC, IBM. With all of these choices, how are you to develop truly
portable applications? 
The good news is that if an application is approached correctly and with
foresight, writing portable code does not have to be a chore. In this article,
we'll discuss coding strategies for developing truly portable database
applications. In doing so, we'll focus on the strategies you can implement to
ease the movement of code and data between computer platforms. The topics
include code portability, function wrappers, size and alignment of data
objects, binary word order, and true multiplatform portability. All of the
techniques we'll cover here are real world--they're what FairCom uses to make
its c-tree Plus File Handler highly portable. c-tree Plus is a C-function
library of database calls designed from the ground up with portability in
mind. The c-tree family has been ported to well over 100 platforms ranging
from Cray supercomputers to embedded systems and virtually all machines in
between. 


Code Portability


The way you organize modules which comprise your application can greatly
affect the time required to port it. We suggest explicitly organizing your
application modules into two sets: one that is system independent and one that
is system dependent. For example, in c-tree Plus, about 98 percent of the code
resides in system-independent modules which are not changed when we port from
platform to platform. Not one line of code in these modules has to be touched.
The remaining code--the system-dependent modules--contains those aspects of
c-tree Plus which depend on system specifics. For c-tree, virtually all the
system-specific code relates to low-level file operations. 
To achieve this degree of separation, certain sections of a system-independent
module may depend on a configuration setting in a system-dependent header
file. However, these dependencies should reflect generic concepts, not
platform-specific issues. In c-tree Plus for instance, there are #ifdefs in
the system-independent modules which depend on the word order of binary
values. Each system-dependent configuration header specifies the type of word
order found on that platform; then the system-independent code need only have
#ifdefs for the word-order choices, not for each platform. 
To minimize unexpected problems when moving your C source code from one
platform to another, it is advisable to utilize a well-defined set of typedefs
for the basic computational objects as well as for application-specific
objects. For example, in c-tree Plus we use three different typedefs for
integers: COUNT, LONG, and NINT. They are, respectively, 2-byte integers,
4-byte integers, and the platform's natural integer. (Of course, we also
support unsigned versions of these integers.) Then on any platform, c-tree can
always rely on a COUNT to be two bytes and a LONG to be four bytes. This is
implemented in a manner typical of our portability strategy: Default typedefs
are supplied in a system-independent header module, and an optional entry in a
system-dependent module can override the default. For example, default
typedefs like Example 1 are found in a system-independent header file. In
those few platforms where a short int is not two bytes or a long int is not
four bytes, these typedefs can be specifically coded in the system-dependent
header file, and the #ifndef will be false in the system-independent module.
To shorten the porting time and avoid problems which are difficult to isolate,
we use the C program in Listing One , which tests each of the system-specific
dependencies found in c-tree Plus. By compiling and executing this module
(which does not require any of the c-tree API) you can determine, among many
other things, if your definition of COUNT really results in 2-byte integers,
or whether the memcmp function performs signed or unsigned byte-wise
comparisons. When we port to a new environment, executing this test program is
one of the first steps we take.
C lends itself to run-time library support. Many developers turn to
third-party libraries to assist with database I/O, report generation, and
other application necessities. When examining a third-party library, it is
important to investigate its portability. If the library is developed and used
properly, it can be a tremendous timesaver to the development and port of the
application. Of course, if the library is not portable or available on
different platforms, it may prove detrimental.


Function Wrappers


Another way to isolate your application from the specifics of the system or of
third-party run-time libraries is to use function wrappers. These act as a
layer between what your application needs to accomplish (say, adding a record
to a database) and the particular function which will perform the desired
action. 
By placing all the wrapper functions in one module, you can change the
underlying operations without affecting the many application modules which use
these functions. However, while C++ makes it easy to modify the parameters
used to invoke an action, C is more rigid. Therefore, to keep your application
well insulated from the underlying functions, you must carefully select the
parameters used in your wrapper functions. While you cannot ignore the
parameter requirements of the underlying functions, you must make sure that
the wrapper function parameters reflect the essential nature of your
application and the function being called. A wrapper should not simply repeat
the exact parameters used in the underlying function.
For example, in c-tree Plus AddRecord uses a small integer value to identify
the data file involved. You may wish to use symbolic names to refer to data
files. In this case, you would pass the symbolic name to the wrapper function,
which would in turn call your own function to translate the name to a c-tree
Plus file number. This same translation function would be used in many of the
wrapper functions which call the c-tree Plus library.
Carefully selecting a naming convention for your wrapper functions simplifies
the task of locating them if they must be modified. We would suggest, for
example, that all your database wrapper functions begin with dbw_, followed by
the desired action; say, dbw_AddRecord for the function to add a record.


Size and Alignment of Data Objects


The three most pressing issues related to moving data across platforms are
structure alignment, size of data objects, and byte order of binary values. 
Different hardware architectures and different C compilers enforce different
alignment restrictions on various data types. An alignment restriction refers
to the legitimate addresses at which a data object can be referenced. For
instance, if a CPU can only address integers on even boundaries, integers are
"word-aligned." Attempting to reference an integer on an odd boundary (that
is, its beginning address is odd) would probably cause a system exception.
Generally, a data object is at most restricted to an address boundary no
larger than the data object itself. For instance, a 4-byte integer will at
most be required to be aligned on a 4-byte (double-word aligned) boundary
while a 2-byte integer on the same machine will, at worst, be restricted to a
2-byte boundary.
For information which only exists temporarily in memory, alignment
restrictions are not a concern. But if your data structures are not carefully
planned, then information stored on disk may not be usable across different
platforms: The position of members within a data structure will change between
platforms, and/or the size of the data structures will be different across
platforms. To avoid these dilemmas, we:
1. Create a set of constant size typedefs for basic data items (as discussed
earlier for COUNT and LONG).
2. Place members in structures to encourage "automatic" alignment, and use
explicit padding between members as necessary.
3. Add padding to the end of a structure, if necessary, to keep the size of
the structure a multiple of its largest-sized data type.
The first step implies that we discourage the use of natural integers as part
of data structures used for permanent storage. If moving your data across
platforms is not important, then this is not an issue. (Some application
developers will be more than satisfied if the application is portable, with no
regard to the portability of the data. They do not expect the data to be moved
from platform to platform.)
The second step implies that the largest data items be placed first in the
structure, or that shorter data objects be grouped together to form clusters,
the size of the most restricted alignment requirement. For example, if the
largest member of a data structure is a 4-byte integer, then the 4-byte
integers should be at the beginning of the structure. If you wish to place
shorter members at the beginning, then group them in clusters which are
multiples of four bytes. Note that character arrays are treated (along with
individual characters) as the smallest data types, and should occur at the end
of the structures.
The third step is necessary to ensure no size difference across platforms
regardless of whether padding was required between structure members.
Two good examples of proper alignment techniques are shown in Example 2. (Note
that UTEXT represents a 1-byte unsigned character and TEXT, a signed
character.)
If you do not follow this strategy, compilers on various platforms may be
forced to insert padding bytes in front of some structure members to force the
required alignment. Further, the size of the structures may vary from platform
to platform. The structure in Example 3 may result in an 8-byte structure on a
byte-aligned platform and a 12-byte structure on a double-word-aligned
platform. On a double-word platform, three bytes of padding would be inserted
before the customer_acc_rcv member and one byte of padding before the
customer_zone member. 
Finally, we strongly suggest omitting pointers to other structures within data
structures used for permanent disk storage. While the use of pointers within
structures is a very powerful and useful technique in C programming, we
discourage it for actual data-storage structures. The size of pointers varies
across platforms from as small as two bytes to as large as eight bytes, and
the values of address pointers lose their meaning once the structure is placed
on disk.


Binary Word Order



CPUs differ in the manner in which integers and floating-point values are
stored in memory. On Little-endian machines, the lowest-order byte is stored
in the first byte of the integer, and the most significant byte is stored
last. Such CPUs include the Intel family of processors and the new DEC Alpha
processors. On Big-endian machines, the highest-order byte is stored in the
first byte, and the least-significant byte is stored last. These CPUs include
the Motorola 68000 family of processors and the IBM RS/6000 family. (In some
unusual circumstances, a binary value may be a mixture of these strategies.)
While most application code is totally independent of the internal word
ordering, this difference does pose a problem when moving application data
across platforms. Such a move results in invalid binary values if the binary
word ordering is different. c-tree Plus uses two different strategies to deal
with this problem. One is to store the binary data on disk in the same order
regardless of the platform's internal order. c-tree Plus uses this approach
for its nonserver implementations, and stores the data in the Little-endian
order (because of the great preponderance of Intel processors). The second
strategy, employed with client/server implementations of c-tree Plus, stores
the data in the server's native ordering. This places the burden for
transforming byte ordering onto the client processors, relieving the server
processor of this overhead.
To permit c-tree Plus to automatically perform the byte-order transformations
on application data, we take advantage of c-tree's ability to store resources
in data files. c-tree Plus allows you to specify the field types of your data
records in a resource stored within the data file. When the data is accessed,
the field type information directs any necessary transformations. Also, if the
data file is moved, it is still possible to interpret the data properly.


Summary


Careful organization and isolation of your application code from user and
file-handling interfaces can significantly reduce the effort required to move
your application code from one platform to another. Creating a test program
sensitive to the platform-dependent elements of your application will further
reduce the time and problems encountered in moving the code. With each port,
you become more attuned to the issues of portability, and can further refine
your strategy.
By defining basic computational data objects which are size invariant across
platforms, and by constructing stable, well organized data structures, your
applications will even be able to share data across different platforms, or
use data stored on different platforms.
Example 1: Default typedefs like this are found in a system-independent header
file.
#ifndef INTEGER_OVERRIDE
 typedef short int COUNT;
 typedef long int LONG;
#endif
Example 2: Techniques for maintaining proper alignment.
(a)
typedef struct invent_record {
 LONG invent_id;
 LONG invent_level;
 LONG invent_reorder;
 COUNT invent_status;
 COUNT invent_bin;
} INVENT_RECORD;
(b)
typedef struct vendor_record {
 COUNT vendor_type; /* The first three members */
 UTEXT vendor_status; /* of this structure use */
 UTEXT vendor_reserved; /* precisely four bytes. */
 LONG vendor_acc_pay;
 TEXT vendor_name[58];
 TEXT vendor_padding[2]; /* Keep struct multiple of 4*/
} VENDOR_RECORD;
Example 3: The structure size in this example depends upon the byte alignment
of the platform.
typedef struct customer_record {
 UTEXT customer_status;
 LONG customer_acc_rcv;
 UTEXT customer_priority;
 COUNT customer_zone;
} CUSTOMER_RECORD;

Listing One 

/* Copyright (c) 1984 - 1994 FairCom Corporation. ALL RIGHTS RESERVED 
 * FairCom Corporation, 4006 West Broadway, Columbia, MO 65203.
 * 314-445-6833
 */

#include "ctstdr.h"
#include "ctoptn.h"

#define CTF " 1 3 "
#define S377 "'\\377'"

typedef struct {
 TEXT mb1;
 } MB;
TEXT getbuf[128];
TEXT *align[5] = {

 "strange: call FairCom (314) 445-6833",
 "byte",
 "word (2 bytes)",
 "strange: call FairCom (314) 445-6833",
 "double word (4 bytes)"
 };
#ifdef PROTOTYPE
main ()
#else
main ()
#endif
{
 TEXT buffer[8];
 TEXT t255,*tp;
 COUNT i,done,d[4],afactor;
 UCOUNT tu;

 TEXT *th,*tl;
 MB mb[2];
 struct {
 ctRECPT pa1;
 TEXT pa2;
 ctRECPT pp;
 ctRECPT pa3;
 ctRECPT pa4;
 } p;
 struct {
 ctRECPT ca1;
 TEXT ca2;
 COUNT cc;
 } c;
 struct {
 ctRECPT aa1;
 TEXT aa2;
 TEXT aa[3];
 } a;
 struct {
 ctRECPT ta1;
 TEXT ta2;
 TEXT tt;
 } t;

 if (SIZEOF(COUNT) != 2 
 SIZEOF(UCOUNT) != 2 
 SIZEOF(LONG) !=4 
 SIZEOF(VRLEN) != 4) {
 printf(
"\n\nBefore continuing with CTTEST be sure that the following types are");
 printf(
 "\ncorrectly sized. Make the necessary changes in CTPORT.H.\n\n");
 printf(" COUNT UCOUNT LONG VRLEN\n");
 printf(" ------- ------- ------- -------\n");
 printf(" Should be: 2 bytes 2 bytes 4 bytes 4 bytes\n");
 printf("Actual size: %d %d %d %d\n\n",
 SIZEOF(COUNT),SIZEOF(UCOUNT),SIZEOF(LONG),SIZEOF(VRLEN));
 exit(0);
 } else {
 printf("\n\nCOUNT, UCOUNT, LONG & VRLEN are properly sized.");
 tu = 40000;

 if (!(tu > 0)) {
 printf(
"\n\nBefore continuing with CTTEST, be sure that UCOUNT is an");
 printf("\nunsigned short integer. See CTPORT.H\n\n");
 exit(0);
 }
 }
 t255 = '\377';
 printf("\n\nC255 Test for CTCMPL.H:");
 printf("\n\tUse the following setup in CTCMPL.H - \t#define C255\t");
 if (t255 == -1)
 printf("%d",t255);
 else
 printf(S377);
 printf("\n\t\t\t Current setting - \t#define C255\t");
 if (C255 == 0x00ff)
 printf(S377);
 if (C255 == -1)
 printf("%d",C255);
 i = 0x0201;
 tp = (TEXT *) &i;
 printf("\n\nLOW HIGH Test for CTOPTN.H:");
 printf("\n\tUse the following setup in CTOPTN.H - \t#define ");
 if ((*tp & 0x00ff) > (*(tp + 1) & 0x00ff))
 printf("HIGH_LOW");
 else
 printf("LOW_HIGH");
 printf("\n\t\t\t Current setting - \t#define ");
#ifdef LOW_HIGH
 printf("LOW_HIGH");
#endif
#ifdef HIGH_LOW
 printf("HIGH_LOW");
#endif
 /* NULL size test */
 printf("\nNULL Size Test: ");
 if (SIZEOF(NULL) == SIZEOF(tp))
 printf("ok (%d bytes)",SIZEOF(NULL));
 else
 printf("inconsistent (NULL is %d bytes & ptr's are %d bytes)",
 SIZEOF(NULL),SIZEOF(tp));
 /* test of compar function for byte-wise comparisons */
 for (i = 0; i < 4; i++) {
 buffer[i] = 'A';
 buffer[i+4] = '\377';
 }
 tp = buffer;
#ifndef FASTCOMP
#ifdef ctDS
 done = ((COUNT) *tp & 0x00ff) - ((COUNT) *(tp + 4) & 0x00ff);
#else
 done = (*tp & 0x00ff) - (*(tp + 4) & 0x00ff);
#endif
 if (done >= 0) {
 printf(
"\n\n\nBefore continuing with CTTEST, call FairCom (314) 445-6833
concerning");
 printf(
"\nthe critical compar function in CTCOMP.C. Please report the following");
 printf(

"\nthree numbers to FairCom: %d %d %d\n",(*tp & 0x00ff),(*(tp + 4) & 0x00ff),
 done);
 exit(0);
 } else
#endif /* ~FASTCOMP */
 printf("\n\ncompar function (CTCOMP.C) test is successful.");
#ifndef ctNOMEMCMP
 done = ctrt_memcmp(tp,tp + 4,1);
 if (done >= 0) {
 printf(
"\n\n\nBefore continuing with CTTEST, add '#define ctNOMEMCMP' to ctcmpl.h.");
 printf(
"\nThis indicates that your memcmp function cannot be used in our high
speed");
 printf(
"\nkey loading routine since its treats bytes as signed quantities.\n");
 exit(0);
 }
#endif
 /* PAUSE IN OUTPUT */
 printf("\n\nHit RETURN (or ENTER) to continue...");
 gets(getbuf);

 printf("\n\nAlignment test for help in computing key segment offsets.");

 th = (TEXT *) &p.pa4;
 tl = (TEXT *) &p.pa3;
 i = th - tl;
 if (i == 1) {
 printf(
"\n\n*** This machine addresses 32 bit words (not bytes). Call ***");
 printf(
 "\n*** FairCom at (314) 445-6833. STATUS & HDRSIZ must be changed. ***");
 afactor = 4;
 } else if (i == 2) {
 printf(
"\n\n*** This machine addresses words (not bytes). Add 2 to STATUS in ***");
 printf(
 "\n*** CTOPTN.H and add 4 to HDRSIZ in CTOPTN.H. Also each member ***");
 printf(
 "\n*** of a structure will be at least word aligned. In particular,");
 afactor = 2;
 } else
 afactor = 1;
 printf(" Members of\nstructures will be aligned as follows:\n\n");
 printf("\tMember Type Alignment\n");
 printf("\t----------- -----------------\n"); 

 th = (TEXT *) &p.pp;
 tl = &p.pa2;
 i = (th - tl) * afactor;
 if (i > 4) i = 0;
 printf("\t4 byte int %s\n",align[i]);

 th = (TEXT *) &c.cc;
 tl = &c.ca2;
 i = (th - tl) * afactor;
 if (i > 4) i = 0;
 printf("\t2 byte int %s\n",align[i]);
 if (i > 2)

 printf(
"\nCall FairCom (314) 445-6833 concerning 2 byte integer alignment.\n");

 th = (TEXT *) a.aa;
 tl = &a.aa2;
 i = (th - tl) * afactor;
 if (i > 4) i = 0;
 printf("\tchar array %s\n",align[i]);

 th = &t.tt;
 tl = &t.ta2;
 i = (th - tl) * afactor;
 if (i > 4) i = 0;
 printf("\tchar %s\n",align[i]);

 printf("\n\nStructure 'SIZEOF' Test: ");
 i = SIZEOF(MB);
 th = &mb[1].mb1;
 tl = &mb[0].mb1;
 if (i == (th - tl))
 printf(" OK");
 else
 printf(
"\nCall FairCom at (314) 445-6833 to report these two numbers: %d %d\n",
 i, (th - tl));
 printf("\n\nShort Integer Input Test for CTOPTN.H:");

 done = NO;
 d[1] = d[3] = 5;
 if (sscanf(CTF,"%h %h",d,d+2) == 2 &&
 d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
 printf(
"\n\tUse the PERC_H option in CTOPTN.H.\n");
 done = YES;
 }
 if (!done) {
 d[1] = d[3] = 5;
 if (sscanf(CTF,"%d %d",d,d+2) == 2 &&
 d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
 printf(
"\n\tUse the PERC_D option in CTOPTN.H.\n");
 done = YES;
 }
 }
 if (!done) {
 d[1] = d[3] = 5;
 if (sscanf(CTF,"%hd %hd",d,d+2) == 2 &&
 d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
 printf(
"\n\tUse the PERC_HD option in CTOPTN.H.\n");
 done = YES;
 }
 }
 if (!done)
 printf(
"\n\n*** COMPILER DOES NOT CONFORM TO KNOWN CONVENTIONS ***\n");

 printf("\tCurrent setting - ");
#ifdef PERC_H

 printf("PERC_H");
#endif
#ifdef PERC_D
 printf("PERC_D");
#endif
#ifdef PERC_HD
 printf("PERC_HD");
#endif
 /* PAUSE IN OUTPUT */
 printf("\n\nHit RETURN (or ENTER) to continue...");
 gets(getbuf);

 printf("\n\nCTOPTN.H SUMMARY -\n");

#ifdef FPUTFGET
 printf("\nFPUTFGET:\tnon-server, multi-user application");
#endif
#ifdef NOTFORCE
 printf("\nNOTFORCE:\tsingle-user or server based application");
#endif

#ifdef RESOURCE
 printf("\nRESOURCE:\tresources are supported");
#else
 printf("\nNO_RESOURCE:\tresources are NOT supported");
#endif

#ifdef CTBATCH
 printf("\nCTBATCH:\tbatch retrieval is supported");
#else
 printf("\nNO_BATCH:\tbatch retrieval is NOT supported");
#endif

#ifdef CTSUPER
 printf("\nCTSUPER:\tsuperfiles are supported");
#else
 printf("\nNO_SUPER:\tsuperfiles are NOT supported");
#endif

#ifdef LOW_HIGH
 printf("\nLOW_HIGH:\tLSB to MSB ordering (ala Intel 8086 family)");
#endif
#ifdef HIGH_LOW
 printf("\nHIGH_LOW:\tMSB to LSB ordering (ala Motorola 68000 family)");
#endif

#ifdef VARLDATA
 printf("\nVARLDATA:\tvariable length data records are supported");
#else
 printf("\nNO_VARLD:\tvariable length data records are NOT supported");
#endif

#ifdef PERC_H
 printf("\nPERC_H:\t\t%%h");
#endif
#ifdef PERC_D
 printf("\nPERC_D:\t\t%%d");
#endif
#ifdef PERC_HD

 printf("\nPERC_HD:\t%%hd");
#endif
 printf(" short integer format specification");

#ifdef VARLKEYS
 printf("\nVARLKEYS:\tkey compression supported");
#else
 printf("\nNO_VARLK:\tkey compression is NOT supported");
#endif

#ifdef PARMFILE
 printf("\nPARMFILE:\tISAM parameter files are supported");
#else
 printf("\nNO_PARMF:\tISAM parameter files are NOT supported");
#endif

#ifdef RTREE
 printf("\nRTREE:\t\tr-tree supported");
#else
 printf("\nNO_RTREE:\tr-tree is NOT supported");
#endif

#ifdef CTS_ISAM
 printf("\nCTS_ISAM:\tISAM functionality supported");
#else
 printf("\nNO_ISAM:\tISAM functionality is NOT supported");
#endif

#ifdef CTBOUND
 printf("\nCTBOUND:\tnon-server mode of operation");
#else
 printf("\nNO_BOUND:\tserver mode of operation");
#endif

#ifdef PROTOTYPE
 printf("\nPROTOTYPE:\tfunction prototypes are supported");
#else
 printf("\nNO_PROTOTYPE:\tfunction prototypes are NOT supported");
#endif
 printf("\n\nEnd Of CTTEST\n");
 exit(0);
}





















The BMP File Format, Part 1


Once only for Windows, it's now a common format for other platforms




David Charlap


David is a software engineer with Visix Software, makers of the Galaxy
Application Framework. He can be contacted via the Internet at
david@visix.com.


The standard format for storing bitmap images in Windows applications is the
bitmap (BMP) file format. Sometimes known as device-independent bitmap (DIB),
this format is also used by OS/2 programs and many DOS applications. Because
of its popularity with Windows applications, however, it is becoming a common
format for software on other operating systems as well. Unfortunately, many
different file formats claim to be BMP files, some of which are incorrectly
structured due to documentation errors in some early editions of the Windows
SDK.
Thus, it is common to find BMP files that an otherwise well-written
application cannot read. In this two-part article, I'll examine all of the
different ways a BMP file can be put together, so that an application can be
built that reads all known BMP file formats, plus future formats. The code
I'll present assumes nothing about the system it runs on so it should work
equally well on DOS, OS/2, UNIX, VMS, or any other operating system. 
Throughout this article, I'll use data types such as INT16 and UINT32 instead
of the usual short, long, and int because the standard C data types do not
have specific sizes associated with them. Table 1 lists these types. In an
attempt to keep everything portable, I am not using structures from either the
Windows SDK or the OS/2 toolkit. Instead, I have a set of structures which are
supersets of both kits. This way, applications using bitmap files (DOS or UNIX
programs, for example) can be compiled and run without either toolkit. Table 2
lists my structures and the corresponding Windows and OS/2 structures. Note
that the BITMAPARRAYHEADER structure does not exist in the Windows SDK. 
In this first installment, I examine the four structures listed in Table 2.
Next month, I'll focus on reading and interpreting bits, and discuss how the
structures fit together.


Basic Concepts of BMP


Unlike other image-file formats like GIF (CompuServe's Graphic Interchange
File format) and JPEG (Joint Photographic Experts Group), the BMP file format
was not designed to be portable. Instead, it was designed to easily work with
the Windows API using the same structures that Windows applications use to
manipulate in-memory bitmaps. As the API changed, so did the BMP file format.
Windows now has three documented variants on the BMP file format: one each for
Windows 2.x, Windows 3.x, and Windows NT. Additionally, OS/2 has variants
designed to work easily with the OS/2 API: one each for OS/2 1.1, OS/2 1.2,
and OS/2 2.x, plus one to store its icon (ICO) and pointer (PTR) files.
Fortunately, only two sets of structures are nearly identical. With a little
work, you can produce a single set of structures to describe all bitmap files.
Figure 1 illustrates the BITMAPFILEHEADER structure; Figure 2, the
BITMAPARRAYHEADER structure; Figure 3, BITMAPHEADER; and Figure 4, the RGB
structure. 
All BMP formats, including the OS/2 icon and pointer variants, are built using
these four structures. All use the BITMAPFILEHEADER structure, various subsets
of the BITMAPHEADER structure (the size field indicating how much is being
used), and the RGB structure. OS/2-format bitmap files containing more than
one image also use the BITMAPARRAYHEADER structure.
There is one caveat: The oldest forms of the bitmap file format use 16-
instead of 32-bit integers for the width and height fields of the BITMAPHEADER
structure. Fortunately, it is easy to tell when such a file is being read,
since the size field is always 12 for those structures and greater than 12 for
the newer structures. Additionally, files based on the "new" format will add
an extra byte to the end of every RGB structure, in order to align it on a
4-byte boundary. 


Byte Ordering


The BMP format was designed to work with systems based on Intel processors
with their Little-endian byte-ordering scheme. This means that whenever a
multibyte field is read (such as a 16- or 32-bit integer), the first byte read
will be the least significant, and the last, the most significant. While this
isn't an issue when reading the files on Intel processors, it is critical on
systems like Sun workstations that use Big-endian byte ordering.
To get around these problems, it is a good idea to use functions that are not
dependent on byte order when reading and writing fields; see endian.c (Listing
One) and endian.h (Listing Two). These functions are used throughout the
sample code. They work properly without any conditional compilation
statements. They read the bytes individually, combining them arithmetically to
produce the required multibyte integer. The resulting integer will be
correctly formatted for the host processor. Similarly, writing is performed by
decom-posing the integer arithmetically and writing the bytes individually,
least-significant byte first.
These functions also help work around problems arising from differences
between 16/32/64-bit processors. On a 64-bit processor (such as an Alpha-based
system), the size of a short integer will be 32 bits instead of the usual 16.
Reading and writing the files on a byte-by-byte basis avoids problems that can
arise from these differences.


Reading the Structures


Each of the structures, BITMAPFILEHEADER, BITMAPARRAYHEADER, BITMAPHEADER, and
RGB, is involved in reading a bitmap file. Each structure may be read by
calling the appropriate Little-endian-value read function on each field. The
fields are ordered as they appear in the structure definition.
That's all that's required to read BITMAPFILEHEADER and BITMAPARRAYHEADER
structures. Both structures are of a fixed length, so reading them simply
involves reading each field.
The BITMAPHEADER and RGB structures, however, are more complicated. Both are
variable-length structures, and therefore require special handling when
reading. The BITMAPHEADER structure contains its length as the first field.
The RGB structure is often stored with an extra byte of padding when used in
color tables.
The readBitmapFileHeader, readBitmapArrayHeader, readBitmapHeader, and readRgb
functions in readbmp.h and readbmp.c show how to properly read these
structures. (These files are available electronically; see "Availability,"
page 3.) 


Reading the BITMAPHEADER Structure


The BITMAPHEADER structure contains all the information required to describe a
bitmap (see Figure 3). It is a 64-byte long structure containing many fields.
But most bitmap files do not use all of the fields, so how do you know how
many are used and how many are not?
The answer is the size field, which always contains the length (in bytes) of
the BITMAPHEADER structure. If only the first 16 bytes are used, then only
they are stored in the file, and the size field will contain the value 16. If
all 64 bytes are used, then all 64 bytes are stored in the file and the size
field will contain the value 64.
The smallest value for the size field is 12 bytes. This is a special
case--only the "old" format bitmap files will have a BITMAPHEADER structure 12
bytes long. When reading these old format bitmaps, the width and height fields
must be read as 16-bit integers instead of the usual 32-bit integers.
When reading the BITMAPHEADER structure, you must keep count of the number of
bytes read, since the structure can be any length (even greater than 64).
After reading the number of bytes indicated by the size field, the structure
has been completely read. Any fields in the structure that have not been read
should be initialized to 0. If the entire structure has been read and the
count is still less than the value contained in the size field, then there are
extra bytes on the disk, since the meaning of these extra bytes is unknown.


Color Tables



Immediately following every BITMAPHEADER structure is a color table. For most
bitmaps, it is an array of RGB structures. When reading the pixel values of
such a bitmap, each pixel's value will be an index into this array indicating
the color that value represents. This is not the case only when BITFIELDS
compression is used; then, the table has special meaning.
For bitmaps with normal color tables (where the compression scheme is not
BITFIELDS encoding), the length of the color table is a function of the number
of colors the image can have. The number of possible colors is 2bit depth,
where bit depth is the number of color planes, multiplied by the number of
bits per plane. If bit depth is 24, however, the length of the color table is
0, since each pixel will already contain the proper RGB value.
Each entry in a normal color table is one RGB structure. The first byte is the
blue value; the second, green; and the third, red. If this is a "new" format
bitmap (indicated by the image's BITMAPHEADER structure having a size greater
than 12 bytes), then an additional byte of padding between each RGB structure
must be skipped over before reading the next structure in the array.
The readColorTable function (see readbmp.h and readbmp.c, available
electronically) shows how to properly read a normal color table.


Understanding the BITMAPFILEHEADER Structure


The BITMAPFILEHEADER structure (Figure 1) always appears on disk immediately
before a BITMAPHEADER. It indicates what type of image the BITMAPHEADER
describes, points to the bitmap's pixel data, and (in the case of OS/2 pointer
files) indicates where the pointer's hot spot is. For a single-image file, the
first structure in the file will be a BITMAPFILEHEADER. For a multiple-image
file, the first structure will be a BITMAPARRAYHEADER, and each image will
have its own BITMAPFILEHEADER.
The first field, type, indicates what type of image the following BITMAPHEADER
describes. Table 3 contains the field's possible values. Note how each numeric
value can also be interpreted as two characters, which are a mnemonic
describing the image type.
The next field of the BITMAPFILEHEADER, size, can be safely ignored. It is a
count of bytes used for the BITMAPFILEHEADER combined with the subsequent
BITMAPHEADER. Using this field for anything is dangerous, though. Windows and
OS/2 have different definitions for the field. Windows defines it as the size
of the entire bitmap file (in DWORDs), while OS/2 defines it as the size of
the BITMAPFILEHEADER, plus the size of the following BITMAPHEADER structure
(in bytes). Many bitmap files simply store 0 in this field. Fortunately, the
BITMAPHEADER contains its own size field, so this one can be ignored.
The next two fields are the x and y coordinates for the hot spot. For TYPE_BMP
images, this value is ignored (and those two fields are documented as
"reserved" in the Windows SDK, where this structure is only used for BMP
files). For the other four image types, these two values contain the position
where the hot spot should be if the image is used as a mouse pointer. The
coordinate is relative to the lower-left corner of the image.
The fifth and last field, offsetToBits, contains the byte offset (from the
start of the file) to the first byte of the image's color data. The
interpretation of this data varies depending on the contents of the
BITMAPHEADER that follows.


Understanding the BITMAPARRAYHEADER Structure


The BITMAPARRAYHEADER structure is used to establish a linked list of images
in a multiple-image bitmap file. Because only OS/2 supports the multiple-image
bitmap file, Windows applications often crash when a bitmap file using this
structure is loaded. A BITMAPFILEHEADER always follows immediately after every
BITMAPARRAYHEADER structure.
The first field of a BITMAPARRAYHEADER structure is the type field and always
contains the hexadecimal value 0x4142 ('BA'). This value has the symbolic name
TYPE_ARRAY.
Although it seems redundant with only one possible value, the type field
serves an important purpose. Since a file can begin with either a
BITMAPARRAYHEADER or a BITMAPFILEHEADER structure, the type field, which
appears first in both structures, can be used to determine which of the two
structures the first structure is.
The second field, size, serves a similar purpose to the size field in the
BITMAPFILEHEADER. In other words, it can be safely ignored.
The third field, next, is a pointer to the next BITMAPARRAYHEADER in the
linked list. Its value is the byte offset from the start of the file where the
next BITMAPARRAYHEADER begins. It will contain 0 if this is the last image in
the list.
The last two fields, screenWidth and screenHeight, define the device on which
this image should be displayed. If they contain 0s, then the image is device
independent; otherwise, it is intended for a screen with a height and width
(in pixels) that matches these fields. OS/2 multiple-image bitmaps usually
contain the same picture, rendered with different dimensions and/or colors.
When using a multiple-image file, proper procedure is to pick the image whose
screenWidth and screenHeight fields match the output device's size. If there
are no matches, the first image in the file is normally used.
Device-independent images (with screenWidth and screenHeight values of 0)
should always be the first images in a file.


Understanding the BITMAPHEADER Structure


The BITMAPHEADER structure is at the core of every image in a bitmap file.
Although it contains 19 fields, many of them can be safely ignored. As a
matter of fact, most images will contain 0s (default values) for all but the
first five fields.
The first field, size, is used to determine the size of the structure on disk.
The next two fields, width and height, indicate the dimensions (in pixels) of
the image. It should be noted that height can be negative, in which case it
uses an inverted coordinate system (an upper-left origin instead of the usual
lower-left). In this case, the actual height of the image is the absolute
value of the height field. 
The next two fields, numBitPlanes and numBitsPerPlane, indicate color depth.
numBitPlanes is nearly always 1--Windows only supports single-plane bitmaps,
and all of OS/2's standard bitmap formats are single plane. numBitsPerPlane is
the bit depth for a single-bit plane. Multiplying these two values together
yields the image's bit depth, usually 1, 4, 8, or 24. Windows 3.x supports bit
depths of only 1, 4, and 8. Windows NT supports 1, 4, 8, 16, 24, and 32.
OS/2's standard bit depths are 1, 4, 8, and 24, although any depth from 1
through 24 is theoretically valid.
The next field, compressionScheme, indicates whether the image's pixel data
has been compressed. Most bitmaps are uncompressed, but some use RLE
compression; Table 4 lists the possible values. Note that the value 3
indicates both modified Huffman compression and BITFIELDS encoding. OS/2
interprets the value as modified Huffman compression, while NT reserves it for
BITFIELDS. This is not a problem, however, since the BITFIELDS layout is only
used for 16- and 32-bit-per-pixel images and modified Huffman encoding is only
used for monochrome (1-bit-per-pixel) bitmaps.
The sizeOfImageData field is the number of bytes an image's pixel data
consumes. The start of the data is pointed to by the offsetToBits field in the
BITMAPFILEHEADER structure. This field is normally 0 if the compressionScheme
is COMPRESSION_NONE, in which case, the size is calculated from the width,
height, and bit depth of the image.
For most files, the rest of the structure will normally contain 0s. The
following fields can be safely ignored without distorting the images: 
xResolution and yResolution, which contain the resolution (in pixels per
meter) of the image. If these values are nonzero, they can be used to generate
a scaling factor to print the image at the proper size. 
numColorsUsed, which indicates the number of colors in the color table
actually used by the image. numImportantColors indicates the number of colors
required for proper rendering of the image. For both fields, 0s mean "all of
them." If numColorsUsed is nonzero, the color table is only that
long--additional entries are undefined and may not exist in the file.
resolutionUnits, which indicates what units are used for xResolution and
yResolution. This field always contains 0, meaning pixels-per-meter and is
probably a place holder for future expansion of the bitmap file format.
padding, which is unused space, and serves only to align the remaining data
onto a 4-byte boundary.
origin, which indicates the direction in which the bits fill in the bitmap. It
always contains 0, meaning the origin is the lower-left corner. Bits fill in
from left-to-right and bottom-to-top. This field, however, is not the only
thing that determines the origin of the image. Windows bitmaps do not use this
field (the Windows structures stop after resolutionUnits). Windows bitmaps may
be stored with an upper-left origin, indicated by storing a negative value for
the image's height. 
halftoning, a flag that indicates one of four halftoning algorithms. The use
of these algorithms is undocumented, so I can not provide any additional
information. halftoningParam1 and halftoningParam2 are parameters used by the
halftoning algorithms; Table 5 lists the possible values.
colorEncoding, which describes the format of each entry of the color table. It
is always 0, indicating RGB encoding.
identifier, the last field in the BITMAPHEADER structure, is not used in
describing bitmaps. It contains a 32-bit value for application use.


Understanding the RGB Structure


The RGB structure is rather simple compared to the other structures. It
contains three byte values--red, green, and blue--used to describe a single
color value. Color tables are an array of these structures.


Until Next Month


That's it for now. In next month's installment, I'll focus on reading and
interpreting bits, and discuss how the structures fit together.
Figure 1: The BITMAPFILEHEADER structure.
typedef struct BITMAPFILEHEADER

{
 UINT16 type;
 UINT32 size;
 INT16 xHotspot;
 INT16 yHotspot;
 UINT32 offsetToBits;
} BITMAPFILEHEADER;
Figure 2: BITMAPARRAYHEADER structure.
typedef struct BITMAPARRAYHEADER
{
 UINT16 type;
 UINT32 size;
 UINT32 next;
 UINT16 screenWidth;
 UINT16 screenHeight;
} BITMAPARRAYHEADER;
Figure 3: BITMAPHEADER structure.
typedef struct BITMAPHEADER
{
 UINT32 size;
 INT32 width;
 INT32 height;
 UINT16 numBitPlanes;
 UINT16 numBitsPerPlane;
 UINT32 compressionScheme;
 UINT32 sizeOfImageData;
 UINT32 xResolution;
 UINT32 yResolution;
 UINT32 numColorsUsed;
 UINT32 numImportantColors;
 UINT16 resolutionUnits;
 UINT16 padding;
 UINT16 origin;
 UINT16 halftoning;
 UINT32 halftoningParam1;
 UINT32 halftoningParam2;
 UINT32 colorEncoding;
 UINT32 identifier;
} BITMAPHEADER;
Figure 4: RGB structure.
typedef struct RGB
{
 UINT8 blue;
 UINT8 green;
 UINT8 red;
} RGB;
Table 1: Basic data types.
Type Description 
INT8 Integer at least 8 bits wide
INT16 Integer at least 16 bits wide
INT32 Integer at least 32 bits wide
UINT8 Unsigned INT8
UINT16 Unsigned INT16
UINT32 Unsigned INT32
Table 2: Structures used in bitmap files. 
My Structure Windows Structures OS/2 Structures 
 
BITMAPFILEHEADER BITMAPFILEHEADER BITMAPFILEHEADER, 
 BITMAPFILEHEADER2 

BITMAPARRAYHEADER (N/A) BITMAPARRAYFILEHEADER, 
 BITMAPARRAYFILEHEADER2 
BITMAPHEADER BITMAPCOREINFOHEADER BITMAPINFOHEADER 
 BITMAPINFOHEADER BITMAPINFOHEADER2 
RGB RGBTRIPLE, RGBQUAD RGB, RGB2 
Table 3: Valid BITMAPHEADER types.
Value Symbolic Name Description 
0x4D42 ('BM') TYPE_BMP Bitmap
0x4349 ('IC') TYPE_ICO OS/2 Icon
0x4943 ('CI') TYPE_ICO_COLOR OS/2 Color Icon
0x5450 ('PT') TYPE_PTR OS/2 Pointer
0x5043 ('CP') TYPE_PTR_COLOR OS/2 Color Pointer
Table 4: Compression schemes.
Value Symbolic name Description 
0 COMPRESSION_NONE No compression
1 COMPRESSION_RLE_8 8-bit-per-pixel RLE compression
2 COMPRESSION_RLE_4 4-bit-per-pixel RLE compression
3 COMPRESSION_HUFFMAN1D 1-bit-per-pixel modified Huffman compression
3 COMPRESSION_BITFIELDS 16- and 32-bit-per-pixel BITFIELDS encoding
4 COMPRESSION_RLE_24 24-bit-per-pixel RLE compression
Table 5: Halftoning algorithms.
Value Symbolic name Description 
0 HALFTONING_NONE No halftoning
1 HALFTONING_ERROR_DIFFUSION Error-diffusion halftoning
2 HALFTONING_PANDA Processing Algorithm for Noncoded
 Document Acquisition (PANDA)
3 HALFTONING_SUPER_CIRCLE Super-circle halftoning

Listing One 

/* These functions read and write our basic integer types from a little-endian
 * file. The endian and word-size of the host machine will not affect this
 * code. The only assumption made is that the C data type (char) is one byte
 * long. This should be a safe assumption.
 */

#include <stdio.h>
#include "bmptypes.h"
#include "endian.h"

/*****************************************************************************
* Read functions. All read functions take an open file pointer as the first
* parameter and a pointer to data as the second parameter. The return value
* will be 0 on success, and EOF on failure. If successful, the second
* parameter will point to the data read.
*/

/* The INT8 and UINT8 types are stored as a single byte on disk. The INT8
 * type is a signed integer with range (-128..127). The UINT8 type is an
 * unsigned integer with range (0..255).
 */
int readINT8little(FILE *f, INT8 *i)
{
 int rc;
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 *i = (rc & 0xff);
 return 0;

}
int readUINT8little(FILE *f, UINT8 *i)
{
 int rc;
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 *i = (rc & 0xff);
 return 0;
}
/* The INT16 and UINT16 types are stored as two bytes on disk. The INT16 type
 * is a signed integer with range (-32768..32767). The UINT16 type is an
 * unisgned integer with range (0..65535).
 */
int readINT16little(FILE *f, INT16 *i)
{
 int rc;
 INT16 temp = 0;
 
 temp = (fgetc(f) & 0xff);
 
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 temp = ((rc & 0xff) << 8);
 *i = temp;
 return 0;
}
int readUINT16little(FILE *f, UINT16 *i)
{
 int rc;
 UINT16 temp = 0;
 
 temp = (fgetc(f) & 0xff);
 
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 temp = ((rc & 0xff) << 8);
 *i = temp;
 return 0;
}
/* The INT32 and UINT32 types are stored as four bytes on disk. The INT32
 * type is a signed integer with range (-2147483648..2147483647). The UINT32
 * type is an unisgned integer with range (0..4294967295).
 */
int readINT32little(FILE *f, INT32 *i)
{
 int rc;
 INT32 temp = 0;
 
 temp = ((long)fgetc(f) & 0xff);
 temp = (((long)fgetc(f) & 0xff) << 8);
 temp = (((long)fgetc(f) & 0xff) << 16);
 
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 temp = (((long)rc & 0xff) << 24);

 *i = temp;
 return 0;
}
int readUINT32little(FILE *f, UINT32 *i)
{
 int rc;
 UINT32 temp = 0;
 
 temp = ((long)fgetc(f) & 0xff);
 temp = (((long)fgetc(f) & 0xff) << 8);
 temp = (((long)fgetc(f) & 0xff) << 16);
 
 rc = fgetc(f);
 if (rc == EOF)
 return rc;
 temp = (((long)rc & 0xff) << 24);
 *i = temp;
 return 0;
}
/*****************************************************************************
* Write functions. All write functions take an open file pointer as the first
* parameter and a data as the second parameter. The return value will be 0 on
* success, and EOF on failure. If successful, the second parameter will have
* been written to the open file.
*/
int writeINT8little(FILE *f, INT8 i)
{
 return fputc(i, f);
}
int writeUINT8little(FILE *f, UINT8 i)
{
 return fputc(i, f);
}
int writeINT16little(FILE *f, INT16 i)
{
 int rc;
 rc = fputc((i & 0xff), f);
 if (rc == EOF)
 return rc;
 return fputc(((i >> 8) & 0xff), f);
}
int writeUINT16little(FILE *f, UINT16 i)
{
 int rc;
 rc = fputc((i & 0xff), f);
 if (rc == EOF)
 return rc;
 return fputc(((i >> 8) & 0xff), f);
}
int writeINT32little(FILE *f, INT32 i)
{
 int rc;
 rc = fputc((i & 0xff), f);
 if (rc == EOF)
 return rc;
 rc = fputc(((i >> 8) & 0xff), f);
 if (rc == EOF)
 return rc;
 rc = fputc(((i >> 16) & 0xff), f);

 if (rc == EOF)
 return rc;
 return fputc(((i >> 24) & 0xff), f);
}
int writeUINT32little(FILE *f, UINT32 i)
{
 int rc;
 rc = fputc((i & 0xff), f);
 if (rc == EOF)
 return rc;
 rc = fputc(((i >> 8) & 0xff), f);
 if (rc == EOF)
 return rc;
 rc = fputc(((i >> 16) & 0xff), f);
 if (rc == EOF)
 return rc;
 return fputc(((i >> 24) & 0xff), f);
}
/* Formatting information for emacs in c-mode
 * Local Variables:
 * c-indent-level:4
 * c-continued-statement-offset:4
 * c-brace-offset:-4
 * c-brace-imaginary-offset:0
 * c-argdecl-indent:4
 * c-label-offset:-4
 * End:
 */



Listing Two

/* This is the header for endian.c - functions to read/write our
 * INT8, INT16 and INT32 types from/to a little-endian file.
 */

#ifndef __ENDIAN_H_INCLUDED__
#define __ENDIAN_H_INCLUDED__
 
/* Read the basic types as little-endian values. The return value will be
 * zero if successful, EOF, otherwise.
 */
int readINT8little(FILE *f, INT8 *i);
int readINT16little(FILE *f, INT16 *i);
int readINT32little(FILE *f, INT32 *i);
int readUINT8little(FILE *f, UINT8 *i);
int readUINT16little(FILE *f, UINT16 *i);
int readUINT32little(FILE *f, UINT32 *i);

/* Write the basic types as little-endian values. The return value will be
 * zero if successful, EOF otherwise.
 */
int writeINT8little(FILE *f, INT8 i);
int writeINT16little(FILE *f, INT16 i);
int writeINT32little(FILE *f, INT32 i);
int writeUINT8little(FILE *f, UINT8 i);
int writeUINT16little(FILE *f, UINT16 i);
int writeUINT32little(FILE *f, UINT32 i);


#endif /* __ENDIAN_H_INCLUDED__ */

/* Formatting information for emacs in c-mode
 * Local Variables:
 * c-indent-level:4
 * c-continued-statement-offset:4
 * c-brace-offset:-4
 * c-brace-imaginary-offset:0
 * c-argdecl-indent:4
 * c-label-offset:-4
 * End:
 */


















































Building a SOM OpenDoc Part


No fancy tools or wizards, just the bare API




Robert Orfali and Dan Harkey


Bob and Dan are the authors of Client/Server Programming with OS/2 (VNR, 1993)
and the Essential Client/Server Survival Guide (VNR, 1994), coauthored with
Jeri Edwards. Bob and Dan have developed client/server systems for the last
eight years and are affiliated with IBM. They can be reached at
harkey@vnet.ibm.com.


Component-software technology holds great promise. One instance of this
technology is OpenDoc, which allows developers to build independently created
"parts" that can collaborate on the desktop, across networks, and across
operating-system platforms. At this writing, OpenDoc technology is in alpha,
and will be released later this year for Windows, Macintosh, OS/2, and,
eventually, UNIX. OpenDoc was originally designed at Apple, but is now being
promulgated by Component Integration Laboratories (CI Labs), a consortium
consisting principally of Apple, IBM, and WordPerfect. For a more complete
background on OpenDoc technology, see the article, "OpenDoc," by Jeff Rush
(Dr. Dobb's Special Report on Interoperable Objects, Winter 1994/95).
OpenDoc allows the user to open up "stationery" for a document container,
populate it with parts from a parts bin, lay out the parts in some visually
pleasing arrangement, create data links, and save the document. The document
can serve as the integration point for data from local or remote sources. The
parts can be linked to external data sources anywhere on the enterprise
through CORBA-compliant Object Request Brokers (ORBs). A scripting facility
allows parts to collaborate in customized arrangements. 
How much effort does it take to create one of these components? It depends on
which component technology you choose. For all practical purposes, the
component technology choices today have narrowed to OLE versus OpenDoc.
However, if OpenDoc lives up to its promise, you should be able to create an
OpenDoc part that's also an OLE container/server. WordPerfect has demonstrated
this technology on Windows since May 1994. In this article, we'll examine what
it takes to create an OpenDoc part, particularly as compared to an OLE Custom
Control (also known as OCX).


The Great Smiley Shoot-out


In his article, "Building Component Software with Visual C++ and the OLE
Custom Control Developer's Kit" (Microsoft Systems Journal, September 1994),
Eric Lang described how to create a Smiley-face OLE Custom Control using
Microsoft Visual C++, Microsoft Foundation Classes (MFC), and a beta version
of the Custom Control Developer's Kit (CDK). In this article, we'll implement
the same Smiley part using an alpha version of OpenDoc for OS/2.
Unfortunately, we can't use wizards and frameworks (these tools will come
later). Consequently, our Smiley is built with raw OpenDoc, as compared to
Eric's OCX, which was implemented with deluxe tools.
What does the Smiley part do? Figure 1(a) is the part running in a test
container. Clicking on Smiley (a right-mouse click) turns it into the sad face
in Figure 1(b). Click again and the smile returns. This is in-place editing;
as you can see, OpenDoc lets you interact with any visible part directly. The
mouse clicks demonstrate how OpenDoc parts handle events. Smiley is a
persistent OpenDoc part: It knows how to save itself when you close the
document. When you open the document again, everything looks the same (the
smiling/sad state and document position are preserved). To sum up, Smiley is a
simple OpenDoc part that knows how to visually embed itself in a container,
draw itself, process events in place, and save and restore its contents from a
Bento document file. 


What SOM Brings to the Party


With its most recent release, the OpenDoc developer's kit supports IBM System
Object Model (SOM), so we implemented Smiley as a SOM object. As a result, its
methods can be invoked using any language that supports SOM bindings (in our
case, IBM's C++). SOM allows objects written in different languages to
communicate in the same address space, across address spaces, and across
dissimilar operating systems over networks, through the services of a
CORBA-compliant ORB. (For more on SOM, see "Interoperable Objects," by Mark
Betz, DDJ, October 1994.) 
SOM lets you package OpenDoc parts in binary class libraries and ship them as
DLLs. In addition, SOM supports implementation inheritance, which means that
you can subclass OpenDoc parts and either reuse or override their method
implementations delivered in the DLL binaries. Also, by providing an external
language for defining interfaces, the OpenDoc part handler binaries can be
distributed independently of the client and, even more importantly, can be
modified or replaced without having to recompile the client code that
interacts with the part.
Using SOM, the typical OpenDoc part editor will inherit most of its behavior
from the ODPart base class--OS/2's OpenDoc provides a class called SimplePart
derived from ODPart to make it easier for you. In any case, you must override
the methods that need to be customized to provide your part's behavior. At a
minimum, your part must be able to allocate storage for its persistent data,
initialize its data from a persistent store, draw its contents inside an area
provided by its container, handle events, and externalize its data to the
persistent store when the document is closed.


To IDL or Not to IDL


In CORBA, an Interface Definition Language (IDL) is the means by which objects
tell their clients what interfaces are available and how to invoke them. Using
IDL, you can define the types of objects, the methods they export, and their
parameters. The IDL also lets you specify the parent classes. Like C++, CORBA
supports multiple inheritance. Note that CORBA IDL only specifies a class's
interface, not its implementation. You can think of IDL as a contract that
binds the providers of a component service to their clients.
Many C++ programmers hate to deal with IDL because this introduces an extra
step in class construction. On the other hand, programmers experienced in
client/server RPCs know that IDL is the cleanest way to specify program
services that live on different machines or in separate address spaces.
Classes in C++ are single-address-space constructs, so a mechanism like IDL is
needed to extend these across process boundaries. The good news for C++
programmers is that you don't have to deal with IDL if you don't want to.
Simply write your classes in C++ and let a direct-to-SOM C++ compiler (like
MetaWare's C++ or IBM's CSet++) generate the IDL for you by capturing the C++
class information from your headers. In our example, we'll use IDL explicitly,
even though our part is written in C++, to give you a feel for programming
with SOM and CORBA. 
The steps in creating the Smiley part are:
1. Define the interface for a SmileyPart class by creating the IDL source file
smiley.idl (see Listing One).
2. Run the SOM precompiler on the IDL file. It produces an implementation
template--smiley.cpp--of the SmileyPart class (shaded lines in Listing Two).
3. Add the body of the code (in C++) to the template of SmileyPart (non-shaded
lines in Listing Two).
4. Compile the class and create the part DLL--smiley.dll.


The Smiley IDL


In smiley.idl, we #include the file SimplPrt.idl, which is the interface
definition for the parent class. Our Smiley part is derived from class
SimplePart. Next, we define private data types for xSmiley by enclosing them
with #ifdef __PRIVATE__ and #endif directives. The interface statement
specifies the name of the part, its parent classes, and the name of the part's
methods and their parameters. Smiley defines three new methods, all of which
are private.
CORBA IDL is a purely declarative language; it is used solely to describe the
interface to an object. SOM extends IDL with constructs that let you specify
helpful implementation information. This information is bracketed within
#ifdef __SOMIDL__ and #endif directives. The first line in the implementation
section is the prefix that SOM will append to the method names (and other
macros) it generates for that class. This is followed by the version number of
the class. The releaseorder allows you to add new functions to a class without
having to recompile the client programs that use it. It's a SOM feature that
helps you maintain backward binary compatibility. All you need to do is list
every method name introduced by the class, and not change the order. If you
need to introduce new methods in the future, simply add them to the end of the
list. However, if you decide to remove a method, you must still leave its name
on the list.
Smiley's functionality results from overriding six methods of its parent class
SimplePart, specified in the override section of the IDL. Notice that you
don't need to specify the parameters for these methods. The SOM precompiler
obtains the interface definitions for these methods from the parent's IDL. The
last section in the IDL contains declarations for private instance data that
is only of interest to the class members.


The SOM Precompiler's Output



The next step in creating the Smiley part is to run the IDL through the SOM
precompiler, which reads the IDL, creating a .cpp skeleton implementation (see
Listing Two). Notice that the precompiler introduces some include files and
creates stubs for all the methods you declared in the IDL. It also generates
stubs (with parameter declarations) of all the parent methods you are going to
override. The precompiler automatically appends the prefix SmileyPart to all
your method names--it's trying to be helpful in maintaining unique names
within your class's implementation. Your clients don't have to know about the
prefix; to the outside world, the method calls are polymorphic.
The terms SOM_Scope and SOMLINK appear in the prototype of all the stub
methods. Ignore them. They're used by SOM to represent internal information.
Notice that the first parameter in each method is always somSelf, which is a
pointer to the target object--in this case, SmileyPart. Again, the precompiler
is being helpful because CORBA requires that the first parameter in each
method invocation be the target object. The precompiler also introduces a
second strange parameter: ev. This is a pointer to the CORBA environment
structure and is used to return error information.
The first statement in the method initializes a local variable, somThis, to
point to a structure representing the instance variables introduced by Smiley.
The second statement, SmileyPartMethodDebug, is a macro used for tracing. You
can turn it off by setting the SOM_TraceLevel flag to a nonzero value. If you
need to invoke another method that's introduced in your class, use the
notation somSelf--><methodName>. To access an instance variable created by
this object, use somThis--><variableName>, or simply precede the variable name
with an underscore.


Programming OpenDoc Style


The OpenDoc run time is a collection of objects which belong to a set of about
50 classes. You interact with these objects by invoking methods on them
whenever your part needs a service. Most of the time OpenDoc will call your
part's class (or object) when it needs something from you. The trick is to
figure out which of these methods will be called during the lifetime of your
part, and then write code for them. You provide the part behavior by
overriding the methods you're interested in.
Smiley will be called when it is first created or initialized in a container,
when it needs to draw its contents, when it receives some event, and finally
when the document is closed and Smiley is asked to save its contents in a
Bento file. We can do all of this by overriding six methods from the
SimplePart base class (see Listing Two). The complete listing is available
electronically; see "Availability," page 3.


Initialize Smiley


OpenDoc asks Smiley to initialize itself by invoking either InitPart, if
Smiley is being embedded for the first time in a container (for instance, a
document), or InitPartFromStorage, if an existing document is opened with
Smiley already embedded in it. Both methods invoke CommonInit--a private
method of our class--to perform some common initialization functions.
CommonInit first records the StorageUnit object that was passed to it. A
StorageUnit is the basic unit of persistent storage for a part. It contains a
list of properties, identified by a unique name within the storage unit. Each
property can have one or more values, which can be raw byte streams or
multiple data types. The StorageUnit class is an abstraction on top of the
Bento persistent-storage system (OpenDoc is layered over and does not expose
the Bento APIs). Figure 2 shows the OpenDoc storage model. A Bento document
can have multiple drafts or versions. Each draft contains multiple storage
units, which in turn contain properties and their values. These entities are
analogous to those in a conventional file system: A storage unit is like a
directory, properties are like filenames, values like byte streams. Values can
have pointers to other storage units. So, in essence, each OpenDoc part gets
to manage its own file system within a file. Storage units can also reside in
memory, on the clipboard, or as links--these are all just different instances
of the same class. 
In the Smiley code, the CommonInit method then records its "session" object,
which is an instance of an OpenDoc class that encapsulates access to OpenDoc
globals and maintains context for a root document. After that, it goes after a
"focus set." A focus set is a mechanism that OpenDoc provides to allow parts
to negotiate for resources atomically--you either get all the resources you're
asking for or nothing. This all-or-nothing proposition helps avoid deadlocks
and makes parts thread safe. 
Smiley is interested in grabbing the focus for mouse selection, menu, and
keyboard events. To accomplish this, CommonInit creates an instance of class
ODFocusSet, an OpenDoc class that will store the desired foci. CommonInit then
requests a session object to return some unique tokens for a particular
OpenDoc type; this is done because the elements of the set are tokenized
strings. Then, CommonInit adds the specified focus to the focus set via the
add method.
After invoking CommonInit, the InitPart method creates a new property,
KODPropSmile, in the StorageUnit received from OpenDoc. Notice the naming
convention OpenDoc uses for properties (KODPropSmile is a string constant
"SmileyPart:Property:Smile"). We assign to this property a value type of
Boolean (KODBoolean). This value will be used to store the persistent state of
our Smiley face. 
The InitPartFromStorage method does not have to create a property because the
property and its value are already there in the existing document. The method
simply needs to find the Smile property and read its value. To do that, we
must first invoke a Focus method on the StorageUnit object to get to the
specified property and value, then invoke GetValue to read the contents of the
value (that is, the stored Smiley state). At a later time, when the document
contents need to be saved, the state of the Smiley face is written to
persistent storage. This happens when OpenDoc invokes the Externalize method
(see Listing Two), which calls Focus on the storageUnit object to target the
KODPropSmile property, and then writes its value to storage using SetValue.


Drawing Smiley


When a part needs to render its contents, OpenDoc invokes the part's Draw
method and passes it a "facet" and a "shape." Facets are OpenDoc objects that
represent the visible area of a part's frame at run time. A visible part can
have one or more facets. Facets are constructed on-the-fly for the visible
frames when a document is opened. The draw method is called for each facet
object in a part (the facet object is passed as an input parameter of the draw
method). "Canvas" objects are platform-dependent presentation spaces or device
contexts. The canvas is where the facets of a part render themselves. The
shape is a description of space on a canvas. Shapes can be scaled, rotated,
and transformed without having to know what's inside of them. A part gets a
default amount of real estate when created but can later negotiate with its
container for more space. The container always wins in space negotiations.
OpenDoc provides the mechanisms for allowing parts to negotiate for space and
to seamlessly coexist within a common visual container. The actual rendering
is done by platform-specific API calls (in our case, OS/2 Presentation Manager
calls). 
The OpenDoc rendering model is shown in Figure 3. Every document must contain
at least one part--the root part--which initially owns all the document's
visual real estate in a single "frame." Frame objects are used for
space-layout negotiations between containing parts and embedded parts. An
OpenDoc part is not required to embed other parts, but most useful OpenDoc
parts will (Smiley does not). In Figure 3, root part A has two embedded parts:
B and C. Each part has its own frame. When a document is saved, the set of
frames it contains are made persistent. These frames contain the geometry
information that will be used to recreate the visual look of the document.
The Draw method implementation:
Obtains a canvas area and platform-specific presentation space. This requires
obtaining the frame associated with this facet, which is returned by the
GetFrame method on the facet object. From the frame, we can get the
rectangular region that Smiley occupies (note that, with a little more code,
this region could be nonrectangular), and from that, the presentation space.
Sets up the clipping region to make Smiley fit into its presentation space.
Positions Smiley's presentation space within the canvas, by using SetOrigin, a
private method of the SmileyPart class.
Draws the Smiley face. This is done by invoking DrawSmileyFace--another
private method of the SmileyPart class. Even though the method is long, it
does not introduce any new OpenDoc constructs. We use ordinary Presentation
Manager GPI calls to draw either a smiling or sad face, depending on whether
the value in the _smile instance variable is true or false.
Cleans up and returns. We must restore the picture and environment space to
avoid memory leaks.


Handling Events


The OpenDoc run time includes an event dispatcher which routes user interface
and semantic events to the correct part handler. The part handler helps out by
negotiating for resources using the FocusSet.
The semantic service also provides powerful APIs for event dispatch
resolution. The OpenDoc run time invokes a Handle-Event method on our part
when it wants us to do something. As you can see from the code, we can treat
these events like normal PM messages. The HandleEvent method processes two
events: W_BUTTON1DOWN and WM_BUTTON2DOWN. The first of these activates the
part. Here, we acquire the FocusSet resources that were specified when the
part was first initialized. The second event is used to toggle the Smiley
face, accomplished by toggling the _smile instance variable and issuing an
invalidate method on the frame object.


Conclusion


Smiley is admittedly a minimalist OpenDoc part, but you can add more
interesting behaviors, such as drag-and-drop, clipboard support, linking,
embedding, irregular-shape support, multiple levels of undo/redo, reference
counting, extensions, menus, property editing, and scripting.
Figure 1: (a) The Smiley OpenDoc part; (b) the Smiley OpenDoc part with a sad
face.
Figure 2 OpenDoc document storage structure.
Figure 3 The OpenDoc rendering model.

Listing One 

/*-------- Smiley.IDL for Smiley OpenDoc part (Excerpted) -------------*/

#ifndef _SMILEYPRT_ 
#define _SMILEYPRT_ 
#ifndef _SIMPLPRT_ 
#include <SimplPrt.idl> 

#endif 
#ifdef __PRIVATE__ // Implementation Types
 typedef long Rect;
#endif
interface SmileyPart : SimplePart
{
#ifdef __PRIVATE__
 void CommonInit(in ODStorageUnit storageUnit);
 void DrawSmileyFace(in HPS hpsDraw, in Rect frameRect);
 void SetOrigin(in ODFacet facet);
#endif
#ifdef __SOMIDL__
 implementation
 {
 functionprefix = SmileyPart;
 majorversion = 1;
 minorversion = 0;
#ifdef __PRIVATE__
 releaseorder:
 CommonInit,DrawSmileyFace,SetOrigin;
#endif
 override:
 InitPart, InitPartFromStorage, Draw,
 Externalize, HandleEvent, RemoveDisplayFrame;
#ifdef __PRIVATE__ //Instance Variables
 ODSession session;
 ODStorageUnit storageUnit;
 ODFocusSet focusSet;
 ODTypeToken selectionFocus;
 ODTypeToken menuFocus;
 ODTypeToken keyFocus;
 ODBoolean smile;
#endif
 };
#endif
};
#endif // _SMILEYPRT_




Listing Two

/*---- Smiley example implemented in OpenDoc for OS/2. (Excerpted code.) ---
*/
#include "os2.h"
#include "smiley.xih"
const ODPropertyName kODPropSmile = "SmileyPart:Property:Smile";
//---- InitPart (OD Method): Called when part first created ----
SOM_Scope void SOMLINK SmileyPartInitPart(SmileyPart *somSelf,
 Environment *ev,ODStorageUnit* storageUnit)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartInitPart");
 if (somSelf->IsInitialized(ev))
 return;
 SmileyPart_parent_SimplePart_InitPart(somSelf, ev, storageUnit);
 somSelf->CommonInit(ev, storageUnit);
 _smile = TRUE;
 storageUnit->AddProperty(ev, kODPropSmile)->AddValue(ev, kODBoolean);
}

//---- InitPartFromStorage (OD Method): Called during part internalization
----
SOM_Scope void SOMLINK SmileyPartInitPartFromStorage(SmileyPart *somSelf,
{ Environment *ev,ODStorageUnit* storageUnit)
 SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartInitPartFromStorage");
 if (somSelf->IsInitialized(ev))
 return;
 somSelf->InitPersistentObjectFromStorage(ev, storageUnit);
 somSelf->CommonInit(ev, storageUnit);
 storageUnit->Focus(ev, kODPropSmile,kODPosUndefined,
 kODBoolean,0,kODPosUndefined);
 storageUnit->GetValue(ev, sizeof(_smile), &_smile);
}
//-CommonInit (Private Method) Called by InitPart/InitPartFromStorage
SOM_Scope void SOMLINK SmileyPartCommonInit(SmileyPart *somSelf,
 Environment *ev,ODStorageUnit* storageUnit)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartCommonInit");
 _session = storageUnit->GetSession(ev); // Record session and
 _storageUnit = storageUnit; // storage unit for later
 // create and initialize the focus set
 _focusSet = new ODFocusSet();
 _focusSet->InitFocusSet(ev);
 _selectionFocus = _session->Tokenize(ev, kODSelectionFocus);
 _menuFocus = _session->Tokenize(ev, kODMenuFocus);
 _keyFocus = _session->Tokenize(ev, kODKeyFocus);
 _focusSet->Add(ev, _selectionFocus);
 _focusSet->Add(ev, _menuFocus);
 _focusSet->Add(ev, _keyFocus);
}
//---- Draw (OD Method): Called to Render Part -----
SOM_Scope void SOMLINK SmileyPartDraw(SmileyPart *somSelf,
 Environment *ev, ODFacet* facet, ODShape* invalidShape)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartDraw");
 // get presentation space
 HPS hpsDraw = facet->GetCanvas(ev)->GetPlatformCanvas(ev);
 GpiSavePS(hpsDraw);
 GpiResetPS(hpsDraw, GRES_ATTRS);
 // get frame and determine part rectangle
 ODFrame* displayFrame = facet->GetFrame(ev);
 HRGN frameRgn = displayFrame->GetFrameShape(ev)->GetRegion(ev);
 Rect frameRect;
 GpiQueryRegionBox(hpsDraw, frameRgn, &frameRect);
 // set up clipping
 HRGN saveClip;
 ODShape* clipShape = new ODShape;
 clipShape->CopyFrom(ev, facet->GetAggregateClipShape(ev));
 clipShape->Transform(ev, facet->GetContentTransform(ev));
 HRGN clip = clipShape->GetRegion(ev);
 GpiSetClipRegion(hpsDraw, clip, &saveClip);
 // set part origin
 somSelf->SetOrigin(ev, facet);
 // Draw the Smiley Face
 somSelf->DrawSmileyFace(ev, hpsDraw, frameRect);
 // Cleanup and return
 GpiRestorePS(hpsDraw, -1);
 GpiSetClipRegion(hpsDraw, 0, &saveClip);
 facet->GetCanvas(ev)->ReleasePlatformCanvas(ev);

 delete clipShape;
}
//--- DrawSmileyFace (Private Method): Called by Draw to render SmileyFace ---
SOM_Scope void SOMLINK SmileyPartDrawSmileyFace(SmileyPart *somSelf,
 Environment *ev,HPS hpsDraw,Rect frameRect)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartDrawSmileyFace");
 // determine part center and smiley radius
 POINTL ptlCenter = {frameRect.xRight/2, frameRect.yTop/2};
 LONG radius = min(frameRect.xRight/2, frameRect.yTop/2)*.97;
 // paint white background for smiley face part
 GpiSetColor ( hpsDraw, CLR_WHITE );
 POINTL ptlBox = {frameRect.xRight, frameRect.yTop};
 GpiBox(hpsDraw, DRO_FILL, &ptlBox, 0, 0);
 // Draw Smiley Face Background
 GpiSetColor ( hpsDraw, CLR_YELLOW );
 GpiSetCurrentPosition(hpsDraw , &ptlCenter);
 GpiFullArc(hpsDraw , DRO_OUTLINEFILL , MAKEFIXED (radius , 0 ) ) ;
 // Initialize Line Characteristics
 GpiSetColor ( hpsDraw, CLR_BLACK);
 GpiSetPattern(hpsDraw, PATSYM_SOLID);
 GpiSetLineWidthGeom(hpsDraw, radius*.07);
 GpiSetLineEnd(hpsDraw, LINEEND_ROUND);
 // Draw Smiley Face Outline
 GpiBeginPath(hpsDraw, 1);
 GpiFullArc(hpsDraw , DRO_OUTLINE, MAKEFIXED (radius , 0 ) ) ;
 if (_smile) // Draw a Smiling Mouth
 { POINTL ptlSmileY" = {ptlCenter.x-(radius*.6), ptlCenter.y-(radius*.3),
 ptlCenter.x, ptlCenter.y-(radius*.7),
 ptlCenter.x+(radius*.6), ptlCenter.y-(radius*.3)};
 GpiSetCurrentPosition(hpsDraw , &ptlSmile[0]);
 GpiPointArc(hpsDraw, &ptlSmile[1]);
 }
 else // Draw a Sad Mouth
 { POINTL ptlSad[] = {ptlCenter.x-(radius*.6), ptlCenter.y-(radius*.6),
 ptlCenter.x, ptlCenter.y-(radius*.5),
 ptlCenter.x+(radius*.6), ptlCenter.y-(radius*.6)};
 GpiSetCurrentPosition(hpsDraw , &ptlSad[0]);
 GpiPointArc(hpsDraw, &ptlSad[1]);
 }
 GpiEndPath(hpsDraw);
 GpiStrokePath(hpsDraw, 1, 0);
 // Draw Eyes/Nose
 POINTL ptlLEye = {ptlCenter.x-radius*.3, ptlCenter.y+radius*.3};
 GpiSetCurrentPosition(hpsDraw , &ptlLEye);
 GpiFullArc(hpsDraw , DRO_OUTLINEFILL , MAKEFIXED (radius*.07, 0 ) ) ;
 POINTL ptlREye = {ptlCenter.x+radius*.3, ptlCenter.y+radius*.3};
 GpiSetCurrentPosition(hpsDraw , &ptlREye);
 GpiFullArc(hpsDraw , DRO_OUTLINEFILL , MAKEFIXED (radius*.07, 0 ) ) ;
 POINTL ptlNose = {ptlCenter.x, ptlCenter.y-radius*.2};
 GpiSetCurrentPosition(hpsDraw , &ptlNose);
 GpiFullArc(hpsDraw , DRO_OUTLINEFILL , MAKEFIXED (radius*.07, 0 ) ) ;
}
//---- SetOrigin (Private Method): Called by Draw to position Smiley Face ----
SOM_Scope void SOMLINK SmileyPartSetOrigin(SmileyPart *somSelf,
 Environment *ev,ODFacet* facet)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartSetOrigin");
 ODTransform* localToGlobal = facet->GetContentTransform(ev);

 HPS hps = facet->GetCanvas(ev)->GetPlatformCanvas(ev);
 MATRIXLF mtx;
 facet->GetContentTransform(ev)->GetMATRIXLF(ev, &mtx);
 GpiSetModelTransformMatrix(hps, 9, &mtx, TRANSFORM_REPLACE);
 facet->GetCanvas(ev)->ReleasePlatformCanvas(ev);
}
//--- HandleEvent (OD Method): Called when the part receives a UI event ----
SOM_Scope ODBoolean SOMLINK SmileyPartHandleEvent(SmileyPart *somSelf,
 Environment *ev,ODEventData* event,ODFrame* frame,ODFacet* facet)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartHandleEvent");
 ODDraft*draft;
 switch (event->msg)
 { case WM_BUTTON1DOWN: // Activate the Part
 if (_session->GetArbitrator(ev)->
 RequestFocusSet(ev, _focusSet,frame))
 somSelf->FocusAcquired(ev, _selectionFocus, frame);
 return kODTrue;
 case WM_BUTTON2DOWN: // Toggle Smile/Sad Face
 _smile = !_smile;
 draft = _storageUnit->GetDraft(ev);
 draft->SetChangedFromPrev(ev);
 frame->Invalidate(ev, kODNULL);
 return kODTrue;
 default: return kODFalse;
 }
}
SOM_Scope void SOMLINK SmileyPartExternalize(SmileyPart *somSelf,
 Environment *ev)
{ SmileyPartData *somThis = SmileyPartGetData(somSelf);
 SmileyPartMethodDebug("SmileyPart","SmileyPartExternalize");
 _storageUnit->Focus(ev, kODPropSmile,kODPosUndefined,kODBoolean,0,
 kODPosUndefined);
 _storageUnit->SetValue(ev, sizeof(_smile), &_smile);
}




























Simulation Compilation and Portability


Translating code from one machine to another for rapid development




Marc Hoffman


Marc is a senior software engineer for Analog Devices. He can be contacted at
marc.hoffman@analog.com.


Simulation compilation is a technique that lets you compile a simulation, then
run an executable that represents the original code instead of simulating the
code directly. By translating the code from the instructions of one machine to
the instructions of another and compiling all the semantics of the machine's
instruction into a set of instructions that execute equivalently on another
processor, you can speed up simulation execution by up to three orders of
magnitude.
The major difference between a simulation compiler and other simulators is
that other simulators use interpretation at run time. A simulation compiler
compiles the set of instructions needed for the simulation into another
machine's machine code. 
In this article, I'll examine how machine code is converted from one target to
another. To illustrate, I'll create a simple machine and build a set of tools
that create running simulations of it. I'll then discuss how you can extend
the environment to provide debugging tools.


Building a Simulation Compiler


In April 1994, I had to create a set of development tools for a
next-generation digital-signal processor (DSP). Although I had already written
the assembler, linker, and disassembler, I needed to define the run-time model
before getting the compiler to work. Tired of waiting for a simulator, I asked
a colleague: "What if we converted the object modules from binary code into
low-level C code?" 
Ultimately, I wrote a minimal simulation compiler similar to Listing One .
Although crude, it did most of the job. It was extremely fast--approximately
300 times faster than our previous simulators. Additionally, I was able to
prototype my run-time environment, and I got a debugger for free.
A simulation compiler takes an executable image for one machine and translates
it such that it can be executed on another machine. In a sense, you are
compiling or translating the instructions of one machine into a set of
instructions for another, while maintaining the exact semantics of the
original machine. You achieve this by reading the contents of the executable
image and mapping each instruction of the original machine (the "source") to a
set of instructions that emulate the same functionality on the target machine.

Instead of writing machine code for a particular target machine, you can write
code for an abstract machine--say, "machine C"--suitable for running on any
workstation or PC. By targeting C, you can run the executable source on any
machine that has a C compiler. Figure 1 defines the abstract C machine.
The machine model creates an infinite loop that iterates over the machine
state of the source machine's program counter (PC). BEFORE_CYCLE is a function
or macro that causes the PC to get the next valid PC and set up and check any
pre-existing conditions of the machine state that need to be serviced.
AFTER_CYCLE finishes any work left after the machine executes an instruction;
in some cases, it could be defined to "do nothing." INVALID_PC_HANDLER is
invoked whenever the PC is out of range: It could issue an error or perhaps
emulate a trap. Each case of the switch statement is used to implement the
TEXT address space of the source machine. (TEXT is the code space in the UNIX
world.) The switch makes it indexable, because the machine I'm interested in
(like most machines) has indirect jumping. This means you must be able to
directly address the entire TEXT address space at run time. 
BEFORE_CYCLE is responsible for generating the next PC. The PC is modified by
a primitive called TICK, which BEFORE_CYCLE invokes to step the PC to the next
address. You can't just add 1 to the PC--you need to be able to handle
branches. To implement this logic, you need something similar to a multiplayer
used in hardware, which multiplexes the next address that the PC gets. This
next address could be a branch or an interrupt-vector address, or the machine
might have variable-size instructions. To multiplex the next address, you'll
use a variable in C, called nextaddr. To implement a jump instruction, set
nextaddr to the target of the jump; to get the destination address, just
extract the correct bits from the source executable.
To build a simulation compiler, you need a loader--software that knows how to
read the contents of an executable. Because C code is generated as the target
machine, you simply disassemble (decode) the executable. In other words, you
use a loader to read the machine code and a disassembler to transform that
code into C. To build the simulation compiler, I integrate the two programs:
The loader reads each instruction and calls on the disassembler to produce the
C code that represents the semantics of the machine instruction. This is a
fairly straightforward algorithm; the difficult part is generating the correct
C code. To illustrate how this is done, I'll build a simple little machine
(SLM), then define a disassembler and a loader for the SLM. Keep in mind that
my motivation for creating the SLM is instructive; it keeps the details of a
big processor out of this article.
Figure 2 shows a simple machine with an 8-bit address space for data and a
separate address space for the program, which is characteristic of a Harvard
architecture (the more common von Neumann architecture of a conventional
processor uses a single address space for both data and program). To keep my
example clean, I've used this dual address space and kept all the instructions
the same size. Example 1(a) is a simple program using SLM.
First the C machine needs to define the registers as global variables so that
you can look at their contents with any ordinary source-level debugger; see
Example 1(b). Since the machine is an 8-bit machine, you can declare a
suitable vector to represent memory; see Example 1(c). (I won't worry about
size problems here, but you wouldn't want to do this for machines with
4-gigabyte address spaces!) Now look at each instruction and write a chunk of
C code that behaves the same way. For the ALU operations, you will need a
scratch register (see Example 2) that has a wider mode than the size of the
word you're operating on; for this machine it is any int. If you don't have a
larger data type, you'll have to do a little more work to get the status bits
set correctly.
The trick to building a disassembler that generates the appropriate C code for
the SLM is to reuse the disassembler. The disassembler works much like printf:
It takes a string and prints each character in turn until it sees a percent
sign (%), where its subsequent action will be determined by the next
character. Figure 3 defines the disassembler for the SLM.
Next, you need to define the function shown in Figure 4, disassm, which takes
as its argument the instruction right-justified as an int. (This wouldn't work
if the machine instruction word were variable or bigger than the largest
integral data type for the machine you are running on. This problem is easy to
solve, but is beyond the scope of this article.)
Now it's time to build a simulation compiler that generates C code for the
disassembler. First, simplify the C code a little by macroizing (creating a
preprocessor macro) or functionalizing the MOVES. This minimizes the number of
tools needed to rebuild when you change the simulator's semantics; that is,
you can just rebuild the simulation and not the compiler. Also, macroization
allows the writes to be hooked (which lets the macro do its work and then
some). Figure 5 shows a simulation, including a header file that describes the
machine; Figure 6 applies the simulation compiler to the program. The
simulation compiler compiles this code into a simulation and examines each
instruction in turn, calling the disassembler on the instruction code
representing each word. Here I've used an array of shorts to represent the
object-code file, making implementation simple. In a more realistic example,
you would need to read the information from the file.


A Commercial Implementation


At Analog Devices, we have developed a simulation-compiler tool that enables
any processor to simulate our DSP instruction set. The scheme provides a
flexible, rapid-prototyping environment intended to decrease a developer's
time to market. Typical hardware simulators are slow, running at about 1000
instructions per second, and have limited debugging capabilities. 
The simulation environment we are exploring is two to three orders of
magnitude faster than conventional simulators. Consider, for example, a
typical speech-compression algorithm requiring computing power of
approximately 12 million instructions/sec. A fully interpretive simulator
running at 1000 cycles/sec takes roughly 3.3 hours to process one second of
speech, while the compiled simulation with the speed of 240,000 cycles/sec
takes only 50 seconds for the same task (numbers based on a 486DX66 CPU).
The design of the DSP simulation compiler allows developers to write not only
DSP code, but also develop simulations of the hardware peripherals that make
up an entire system. Since the simulation is compiled into C code, you can mix
host C code into the simulation, resulting in a truly customizable development
environment. Furthermore, you can find bugs that occur after millions of
instructions have executed.
The debugging environment is constructed around the host C compiler's
debugging environment so the user needn't learn how to use a new debugger. The
simulation compiler generates special bookkeeping information that allows
symbolic debuggers to trace through the preconverted source code. Hence, your
favorite source-level debugger can be used to debug DSP applications. 


Conclusion


The simulation-generation techniques presented here don't address the problem
of self-modifying code. In fact, you can't handle it without the aid of an
interpreter and a scheme that flags modified instructions. However, once the
simulation compiler tables are created, you can generate an interpreter
directly from the tables. This technique of compiling simulations makes the
build or load time of a simulation quite expensive. For debugging
environments, the conversion is done only a few times, and the compiled
simulation is used more than once before it is regenerated. Thus, the speed
and extensibility of the simulation environment outweigh the technique's
drawbacks.
Figure 1: Defining the abstract C machine.
while (1) {
 BEFORE_CYCLE ();
 switch (PC) {
 case ADDR: instruction; break ;
 default:
 INVALID_PC_HANDLER ();
 }

 AFTER_CYCLE ();
}
Figure 2: Definition of SLM.
SLM:
registers: rX, rY, rZ, rI, rJ, rK, PC, ss{z,n,v,c}
 0, 1, 2, 3, 4, 5, 6, 7
addr: is 8bit
000 nop
1sd mov rsrc, rdst
2id mov (ri), rdst
3si mov rsrc, (ri)
4kk mov k, rx
5kk mov k, ry
6xx add src, dst
7xx sub src, dst
8xx mul src, dst
9xx jmp addr
Axx jz addr
Bxx jc addr
Cxx halt
Example 1: (a) Simple SLM program; (b) defining registers as global variables;
(c) declaring a vector of memory.
(a)
 mov 5, rx
 mov 3, ry
 mul rx, ry
 mov 15, rx
 sub rx, ry
 jmp ok
 halt
ok add rx, ry
 halt
(b)
 unsigned char rx,ry,ri,rj,rk,
 pc, ss;
(c)
 unsigned char mem[1<<8];
Example 2: A scratch register.
 unsigned int scratch;
000 -- nop
 /* nothing */
1sd -- mov rsrc, rdst
 rdst = rsrc;
 ss.z = (rdst == 0) ? 1:0;
 ss.n = ((signed)rdst < 0) ? 1:0;
2id -- mov (ri), rdst
 rdst = mem[ri];
 ss.z = (rdst == 0) ? 1:0;
 ss.n = ((signed)rdst < 0) ? 1:0;
3si -- mov rsrc, (ri)
 mem[ri] = rsrc;
4kk -- mov k, rx
 rx = k;
 ss.z = (k == 0) ? 1:0;
5kk -- mov k, ry
 ry = k;
 ss.z = (k == 0) ? 1:0;
6xx -- add src, dst
 rdst = scratch = rdst + rsrc;
 ss.v = (scratch > 0x1ff) ? 1 : 0;

 ss.c = (scratch & 0x1ff) ? 1 : 0;
 ss.n = (scratch & 0x80) ? 1 : 0;
7xx -- sub src, dst
 rdst = scratch = rdst - rsrc;
 ss.v = (scratch > 0x1ff) ? 1 : 0;
 ss.c = (scratch & 0x1ff) ? 1 : 0;
 ss.n = (scratch & 0x80) ? 1 : 0;
8xx -- mul src, dst
 rdst = scratch = rdst * rsrc;
 ss.v = (scratch > 0x1ff) ? 1 : 0;
 ss.c = (scratch & 0x1ff) ? 1 : 0;
 ss.n = (scratch & 0x80) ? 1 : 0;
9xx -- jmp addr
 nextaddr = addr;
Axx -- jz addr
 if (ss.z)
 nextaddr = addr;
Bxx -- jc addr
 if (ss.c)
 nextaddr = addr;
Cxx -- halt
 return;
Figure 3: The SLM disassembler.
char *regs[] = { "rx","ry","ri",
 "rj","rk", "pc", "ss" };
char *distemp[] = {
 "nop",
 "mov %1, %2",
 "mov (%1), %2",
 "mov %1, (%2)",
 "mov %k, rx",
 "mov %k, ry",
 "add %1, %2",
 "sub %1, %2",
 "mul %1, %2",
 "jmp %a",
 "jz %a",
 "jc %a",
 "halt",
 0,
 0,
 0,
};
Figure 4: Defining the disassm function.
disasm (int inst)
{
 char *template = distemp[(inst>>8)&0xF];
 char *p = template;
 if (template) {
 while (*p) {
 if (*p == '%') decode_operand (inst, ++p);
 else putchar (*p);
 p++;
 }
 }
}
decode_operand (int inst, char *p)
{
 switch (*p++) {

 case '1': printf ("%s", regs[(inst>>4)&0xF]); break;
 case '2': printf ("%s", regs[inst&0xF]); break;
 case 'k': printf ("%d", (singed char)(inst&0xFF)); break;
 case 'a': printf ("%u", inst&0xFF); break;
 default:
 abort ();
 }
}
Figure 5: Typical simulation, including a header file that describes the
machine.
#define SETR(x,y)\
 x = y;\
 ss.z = (x == 0) ? 1:0; ss.n = ((signed)x < 0) ? 1:0
#define SETMEM(x,y) x = y;
#define ALUOP(op, rdst,rsrc)\
 rdst = scratch = rdst op rsrc;\
 ss.v = (scratch > 0x1ff) ? 1 : 0;\
 ss.c = (scratch & 0x1ff) ? 1 : 0;\
 ss.n = (scratch & 0x80) ? 1 : 0
char *distemp[] = {
 "/* nothing */",
 "SETR (%2, %1);",
 "SETR (%1, mem [%2]);",
 "SETMEM (mem [%2], %1);",
 "SETR (rx, %k);",
 "SETR (ry, %k);",
 "SETR (%2, ALUOP (+, %2, %1));",
 "SETR (%2, ALUOP (-, %2, %1));",
 "SETR (%2, ALUOP (*, %2, %1));",
 "nextaddr = %a;",
 "if (ss.z) nextaddr = %a;",
 "if (ss.c) nextaddr = %a;",
 "return",
 0,
 0,
 0,
};
Figure 6: Applying the simulation compiler to a program.
0000 405 mov 5, rx
0001 503 mov 3, ry
0002 801 mul rx, ry
0003 40F mov 15, rx
0004 701 sub rx, ry
0005 A07 jz ok
0006 C00 halt
0007 601 ok add rx, ry
0008 C00 halt

Listing One 

::::::::::::::
simcc.c
::::::::::::::
/* Simulation compiler for SLM */

char *regs[] = { "rx","ry","ri","rj","rk", "pc", "ss" };
char *distemp[] = {
 "nop",
 "mov %1, %2",
 "mov (%1), %2",

 "mov %1, (%2)",
 "mov %k, rx",
 "mov %k, ry",
 "add %1, %2",
 "sub %1, %2",
 "mul %1, %2",
 "jmp %a",
 "jz %a",
 "jc %a",
 "halt",
 0,
 0,
 0, 
};
char *simtemp[] = {
 "/* nothing */",
 "SETR (%2, %1);",
 "SETR (%1, mem [%2]);",
 "SETMEM (mem [%2], %1);",
 "SETR (rx, %k);",
 "SETR (ry, %k);",
 "SETR (%2, ALUOP (+, %2, %1));",
 "SETR (%2, ALUOP (-, %2, %1));",

 "SETR (%2, ALUOP (*, %2, %1));",
 "nextaddr = %a;",
 "if (ss.z) nextaddr = %a;",
 "if (ss.c) nextaddr = %a;",
 "return;",
 0,
 0,
 0, 
};
disasm (int inst, char **templates) 
{
 char *template = templates[(inst>>8)&0xF];
 char *p = template;
 if (template) {
 while (*p) {
 if (*p == '%') decode_operand (inst, ++p);
 else putchar (*p);
 p++;
 }
 }
}
decode_operand (int inst, char *p)
{
 switch (*p++) {
 case '1': printf ("%s", regs[(inst>>4)&0xF]); break;
 case '2': printf ("%s", regs[inst&0xF]); break;
 case 'k': printf ("%d", (signed char)(inst&0xFF)); break;
 case 'a': printf ("%u", inst&0xFF); break;
 default:
 abort ();
 }
}
genc (int addr, int inst)
{
 printf (" case 0x%04x: T(\"", addr); disasm (inst, distemp); 

 printf ("\");\n");
 printf (" "); disasm (inst, simtemp); printf ("\n");
 printf (" break;\n");
}
int prog[] = {9, 0x405,0x503,0x801,0x40F,0x701,0xA07,0xC00,0x601,0xC00};
main ()
{

 int i;
 printf ("#include \"slm.h\"\n"
 "main ()\n"
 "{\n"
 " while (1) {\n"
 " BEFORE_CYCLE ();\n"
 " switch (pc) {\n");
 for (i=0; i<prog[0];i++)
 genc (i, prog[i+1]);

 printf (" default:\n"
 " INVALID_PC_HANDLER ();\n"
 " break;\n"
 " }\n"
 " AFTER_CYCLE ();\n"
 " }\n"
 "}\n");
}
::::::::::::::
slm.h
::::::::::::::
/* SLM definitions. simulator header file. */
 unsigned char rx,ry,ri,rj,rk, pc;
 struct { unsigned z:1, n:1, v:1, c:1; } ss;
 unsigned char mem[1<<8];
 unsigned int scratch;
 unsigned char nextaddr;

#define SETR(x,y) x = y; ss.z = (x == 0) ? 1:0; ss.n = ((signed)x < 0) ? 1:0;
#define SETMEM(x,y) x = y;
#define ALUOP(op, rdst,rsrc)\
 rdst = scratch = rdst op rsrc;\
 ss.v = (scratch > 0x1ff) ? 1 : 0;\
 ss.c = (scratch & 0x1ff) ? 1 : 0;\
 ss.n = (scratch & 0x80) ? 1 : 0;
#define BEFORE_CYCLE() pc = nextaddr; nextaddr++
#define AFTER_CYCLE()
#define INVALID_PC_HANDLER() abort ()
#define T(x) printf ("%04x: %s\n", pc, x)
















Congestion Control in Frame-Relay Networks


LAN-to-LAN data-transmission strategies




William Stallings


William is an independent consultant and president of Comp-Comm Consulting of
Brewster, MA. He is the author of over a dozen books on data communications
and computer networking, his most recent being ISDN and Broadband ISDN, with
Frame Relay and ATM, third edition (Prentice-Hall, 1994). William can be
reached at stallings@acm.org.


Frame relay is a standardized, public, packet-switched, data-network service
that functions as a public wide area network (WAN) backbone connecting
individual local area networks (LANs). Data is transmitted in packets from one
LAN to another through a frame-relay network over high-speed leased lines.
Because of its design, the Frame Relay ANSI T1.606 Standard allows for an easy
migration path from present to future network architectures, as existing
systems (including T1, ISDN, and others) can be upgraded, and hence preserved,
via software.
Additionally, frame relay can manage bursty, unpredictable data traffic and
provide single-line access to the network with logical connections to other
destinations. The end result is minimal hardware, a simple network design, and
reduced operating costs. 
With all this in mind, it's no surprise that frame-relay bearer services--that
is, those services designed to serve both LAN interconnections and existing
host-computer environments--are becoming increasingly popular. CompuServe's
Frame-Net frame-relay service, for example, enables dedicated Internet access
at up to T1 speeds (1.536 Mbits/sec), and 14.4 Kbits/sec dial-up access via
point-to-point protocol (PPP). 
Still, the frame-relay standard does not specify a mechanism for flow control
and error control between users and the network. The data-link control
protocol for frame relay (LAPF) uses a frame structure that does not contain a
control field and therefore has no sequence numbers to work with. While this
streamlined protocol provides for efficient data transfer, it lays the network
open to the possibility of congestion. To deal with this problem, standards
organizations have proposed a variety of congestion-control techniques.
Because of the large number of techniques that can be used alternatively or in
conjunction with one another, and because the specifications are scattered
through various documents in no particular order, this is the most confusing
aspect of frame relay.
In this article, I'll provide an overview of frame-relay congestion control,
looking first at the explicit congestion-control techniques proposed for
frame-relay bearer services, and then present a set of algorithms that split
the responsibility for congestion control between the network and the
subscriber. 


Congestion in Frame-Relay Networks


A frame-relay network is a form of packet-switching network in which the
"packets" are layer-two frames. As in any packet-switching network, one of the
key areas in the design of a frame-relay network is congestion control. In
essence, a frame-relay network is a network of queues. At each frame handler,
there is a queue of frames for each outgoing link. If the rate at which frames
arrive and queue up exceeds the rate at which frames can be transmitted, the
queue size grows without bounds and the delay experienced by a frame goes to
infinity. Even if the frame-arrival rate is less than the frame-transmission
rate, queue length will grow dramatically as the arrival rate approaches the
transmission rate. As a rule of thumb, when the line for which frames are
queuing exceeds 80 percent utilization, the queue-length growth rate becomes a
problem.
Figure 1 shows the effect of congestion in general terms. Figure 1(a) plots
the throughput of a network (number of frames delivered to a destination
station per unit time) versus the offered load (number of frames transmitted
by all subscribers), while Figure 1(b) plots the average delay from entry to
exit across the network. At light loads, throughput and network utilization
increase as the offered load increases. As the load continues to increase, a
point is reached (point A in the plot) beyond which the throughput of the
network increases at a rate slower than that of the offered load. This is due
to network entry into a mild-congestion state. At this level, the network
continues to cope with the load, although with increased delays.
As the load on the network increases, a point is eventually reached (point B)
beyond which throughput drops with increased offered load. The reason for this
is that the buffers at each node are of finite size. When the buffers at a
frame handler become full, it must discard frames. Thus, the sources must
retransmit the discarded frames in addition to new frames. This only
exacerbates the situation: As more and more frames are retransmitted, the load
on the system grows, and more buffers become saturated. While the system is
trying desperately to clear the backlog, users are pumping old and new frames
into the system. Even successfully delivered frames may be retransmitted
because it took too long at a higher layer (for example, the transport layer)
to acknowledge them: The sender assumes the frame did not get through. Under
these circumstances, the effective capacity of the system is virtually zero.
Avoiding these catastrophic events is the task of congestion control. The
object of all congestion-control techniques is to limit queue lengths at the
nodes so as to avoid throughput collapse.


Congestion-Control Strategies


Congestion control is the joint responsibility of the network and end users.
The network (that is, the collection of frame handlers) is in the best
position to monitor the degree of congestion, while the end users are in the
best position to control congestion by limiting the flow of traffic.
Table 1 lists the congestion-control techniques defined in the various
standards documents. Discard strategy deals with the most fundamental response
to congestion: When congestion becomes severe enough, the network is forced to
discard frames. You would like to do this in a way that is fair to all users.
Congestion-avoidance procedures are used at the onset of congestion to
minimize the effect on the network; thus, these procedures would be initiated
at or prior to point A in Figure 1, to prevent congestion from progressing to
point B. Near point A, there would be little evidence available to end users
that congestion is increasing, so there must be some explicit signaling
mechanism from the network that will trigger the congestion avoidance.
Congestion-recovery procedures are used to prevent network collapse in the
face of severe congestion. These procedures are typically initiated when the
network has begun to drop frames due to congestion. Such dropped frames are
reported by some higher layer of software and serve as an implicit signaling
mechanism. Congestion-recovery procedures operate around point B and within
the region of severe congestion, as shown in Figure 1.
Congestion avoidance with explicit signaling and congestion recovery with
implicit signaling are complementary forms of congestion control in the
frame-relaying service.


Discard Strategy


As a last resort, a frame-relaying network must discard frames to cope with
congestion. There is no getting around this fact. Since each frame handler in
the network has finite memory available for queuing frames, it is possible for
a queue to overflow, necessitating the discard of either the most recently
arrived frame or some other frame.
The simplest way to cope with congestion is for the frame-relaying network to
discard frames arbitrarily, with no regard to their source. In that case,
since there is no reward for restraint, the best strategy for any individual
end system is to transmit frames as rapidly as possible. This, of course,
exacerbates the congestion problem.


Network Use of CIR


To provide for a fairer allocation of resources, the frame-relay bearer
service includes the concept of a committed information rate (CIR)--a rate, in
bits per second, that the network agrees to support for a particular
frame-mode connection. Any data transmitted in excess of the CIR is vulnerable
to discard in the event of congestion. Despite the use of the term
"committed," there is no guarantee that even the CIR will be met. In cases of
extreme congestion, the network may be forced to provide service at less than
the CIR for a given connection; however, the network will choose to discard
frames on connections that are exceeding their CIR before discarding frames on
those that are not.


Explicit Congestion Avoidance


It is desirable to use as much of the available capacity in a frame-relay
network as possible but still react to congestion in a controlled and fair
manner. This is the purpose of explicit congestion-avoidance techniques
wherein the network alerts end systems to growing congestion within the
network and the end systems take steps to reduce the offered load to the
network.

As the standards for explicit congestion avoidance were being developed, two
general strategies were considered. One group believed that congestion always
occurred slowly and almost always in the network egress nodes. Another group
had seen cases in which congestion grew very quickly in the internal nodes and
required quick, decisive action to prevent network congestion. These two
approaches are reflected in the forward and backward explicit
congestion-avoidance techniques, respectively.
With congestion-avoidance techniques, the network signals congestion to those
end users with affected frame-relay connections. This explicit signaling may
make use of one of two bits in the LAPF address field of each frame, or a
special LAPF control message. Either bit may be set by any frame handler that
detects congestion. If a frame handler receives a frame in which one or both
of these bits are set, it must not clear the bits before forwarding the frame.
Thus, the bits constitute signals from the network to the end user. The two
bits are:
Backward Explicit Congestion Notification (BECN) bit. The user is notified
that congestion-avoidance procedures should be initiated where applicable for
traffic in the opposite direction of the received frame. BECN indicates that
the frames the user transmits on this logical connection may encounter
congested resources.
Forward Explicit Congestion Notification (FECN) bit. The user is notified that
congestion-avoidance procedures should be initiated where applicable for
traffic in the same direction as the received frame. FECN indicates that this
frame, on this logical connection, has encountered congested resources.
In addition, a frame handler may use a Consolidated Link Layer Management
(CLLM) message to notify the user that congestion-avoidance procedures should
be initiated where applicable for traffic in the opposite direction of the
received frame. It indicates that the frames the user transmits on a set of
logical connections may encounter congested resources.
In all of these cases, the network only supplies the notification. The actual
protocol for responding to the notification is supplied by layers above the
frame-relaying bearer service. The standards define some suggested protocols.


Network Notification of Congestion


For the network to be able to detect and signal congestion, each frame handler
must monitor its queuing behavior. If queue lengths begin to grow to a
dangerous level, then either forward- or backward-explicit notification, or a
combination, should be used to try to reduce the flow of frames through that
frame handler. The choice of forward or backward may be determined at
configuration time by whether the end users on a given logical connection are
prepared to respond to one or the other of these notifications. In any case,
the frame handler has some choice as to which logical connections should be
alerted to congestion. If congestion is becoming serious, all logical
connections through a frame handler might be notified. In the early stages of
congestion, the frame handler might notify just those users generating the
most traffic.
The following procedure for monitoring queue lengths is suggested in the
standards: A cycle begins when the outgoing circuit goes from idle (queue
empty) to busy (nonzero queue size, including the current frame). The average
queue size over the previous and current cycles is calculated. If the average
size exceeds a threshold value, then the circuit is in a state of incipient
congestion, and the congestion-avoidance bits should be set on some or all
logical connections using that circuit. By averaging over two cycles instead
of just monitoring current queue length, the system avoids reacting to
temporary surges that would not necessarily produce congestion.
The average queue length may be computed by determining the area (product of
queue size and time interval) over the two cycles and dividing by the time of
the two cycles. This algorithm is illustrated in Figure 2.


Forward Explicit Congestion Notification


The FECN bit is set to notify the receiving-end system that the marked frame
has encountered congestion. In response, the receiving system should try to
reduce the flow of data from the sending system on this frame-relay
connection. The mechanism for doing so must be above the level of the
frame-relay bearer service, which provides no direct flow-control facilities.
The receiving-end system should use this strategy for each connection:
1. Compute the fraction of frames for which the FECN bit is set over some
measurement interval.
2. If more frames have the FECN bit set than have an FECN bit of zero, reduce
the flow of frames from the sending system.
3. If the congestion condition persists, institute additional reductions.
4. When the congestion condition ends, gradually increase the flow of frames.
This strategy reacts slowly to congestion notifications for two reasons:
First, the end system does not react immediately to a particular FECN bit, but
waits until the average behavior of the system over an interval indicates
congestion. Second, the end system does not immediately reduce its outgoing
flow, but rather signals its peers to reduce the incoming flow. All of this is
consistent with a belief that congestion occurs slowly.
The details of the algorithm depend on whether the end system has actual
control of the information rate from the source system or uses some sort of
sliding-window, flow-control scheme. A rate-based system can provide a more
precise control of information flow since it is based on the actual
information rate in bits per second. Since frame relay does not require the
use of fixed-size frames, a window-based system can provide only an
approximate control over information rate. Such control is reasonably precise
only if the statistical variance of the frame size is small.
Rate-based control. For rate-based control, it is assumed that a destination
system has a means of regulating the data rate at the source-end system. Let
us refer to current data rate as R. On each connection, the end system
maintains two counters: FECN0 is the number of LAPF frames with FECN=0, and
FECN1 is the number of LAPF frames with FECN=1. These counts are accumulated
over a measurement interval dij. The standards suggest a value of dij
approximately equal to four times the end-to-end transit delay. 
The algorithm is as follows: Initially, set R=CIR or less in the receive
direction. This "slow start" is intended to avoid an impulse load on the
network when the user begins transmitting. Then, at the beginning of each
measurement interval, set FECN0=FECN1=0. At the end of each measurement
interval, if FECN1_FECN0, set R=0.875 x R; if FECN1<FECN0, set R=1.0625 x R.
If a connection has been idle for a long time, then R should be set to CIR for
that connection.
Note that when congestion is detected, the rate reduction is by a factor of
1/8, whereas the recovery is by a factor of 1/16. This slower recovery
strategy is intended to avoid oscillations between congested and noncongested
states.
Window-based control. Assume that sliding-window flow control is used and that
the destination system can adjust the receive window size W between 1 and some
maximum value Wmax. Again the counters FECN0 and FECN1 are accumulated over a
measurement interval dij. If the current window size is W, then dij is defined
to be twice the interval during which W frames are transmitted and
acknowledged (two window turns).
The algorithm is: Initially, set W=1. Again, this provides a slow start. Then,
at the beginning of each measurement interval, set FECN0=FECN1=0. At the end
of each measurement interval, If FECN1_FECN0, set W=MAX[0.875xW,1]_; if
FECN1<FECN0, set W=MIN[W+1,Wmax].
If a connection has been idle for a long time, W should be set to 1 for that
connection.


Backward Explicit Congestion Notification


Backward explicit congestion notification can be achieved with either the BECN
bit in the LAPF address field or a consolidated link layer management (CLLM)
message carried in a LAPF frame. 
The BECN bit is set to notify the receiving system that the frames it
transmits on this connection may encounter congestion. In response to this,
the receiving system should reduce the flow of data transmitted on that
connection.
The receiving-end system should use the following strategy for each
connection:
1. When the first frame with the BECN bit set is received, reduce the
information rate to CIR.
2. If additional consecutive frames with the BECN bit set are received, then
institute additional reductions.
3. If a consecutive sequence of frames with the BECN bit set to zero are
received, then gradually increase the flow of frames.
This strategy reacts rapidly to congestion notifications because, the end
system immediately reacts to a single BECN bit and reduces its outgoing flow
rather than signaling its peers to reduce the incoming flow. This reflects a
belief that congestion occurs quickly.
As with the response to forward-explicit congestion notification, the details
of the algorithm depend on whether control is rate based or window based.
Rate-based control. The standards define a step count S that is used to
determine when the transmitter may increase or decrease its rate. The
algorithm is as follows: Initially, set IR=CIR or less in the transmit
direction. Then:
1. If a frame with BECN set to 1 is received, and the user's offered rate, R,
is greater than CIR, then reduce the offered rate to CIR.
2. If S consecutive frames are subsequently received with the BECN bit set to
1, the user should reduce its rate to the next lower "step." Further rate
reductions should not occur until an additional S consecutive frames are
received with the BECN bit set. The step rates are R=0.675xCIR, R=0.5xCIR, and
R=0.25 xCIR.
3. After the user has reduced its rate due to receipt of BECN signals, it may
increase its rate by a factor of 0.125 after any S/2 consecutive frames are
received with the BECN bit clear; that is, R=1.125 xR.
If a connection has been idle for a long time, then R should be set to CIR for
that connection.
Window-based control. Here, the step count S is defined to be the interval
during which one frame is transmitted and acknowledged. The algorithm is as
follows: Initially, set the window size W to some small value such as 1 or
0.5xlast window size. Then:
1. If a frame with BECN set to 1 is received, then set W=MAX[0.625xW,1]_. 
2. If S consecutive frames are subsequently received with the BECN bit set to
1, the user should repeat the reduction.
3. After the user has reduced its window size due to receipt of BECN signals,
it may increase its window size by one after any S/2 consecutive frames are
received with the BECN bit clear; that is, W=MIN[W+1,Wmax].
If a connection has been idle for a long time, then W should be set to its
initial value for that connection.
Consolidated Link Layer Management (CLLM). CLLM is a variation of backward
explicit congestion notification that uses a message rather than the BECN bit
to signal congestion. The CLLM technique can be used when congestion occurs at
a network node, but no reverse traffic is available to carry the BECN
indication. CLLM messages carry a list of congested DLCIs to reduce the
traffic load on the network.
The CLLM is a message carried in the information XID LAPF frame. The body of
the XID frame lists the frame-relay connections that are congested and
identifies the cause. The following cause values are recognized:

Network congestion due to excessive traffic.
Facility or equipment failure.
Maintenance action.
Unknown.
If these conditions are short term, they are anticipated to have a duration of
seconds or minutes; otherwise the designation is long term.


Implicit Congestion Control


Implicit signaling occurs when the network discards a frame, and this fact is
detected by the end user at a higher, end-to-end layer. When this occurs, the
end-user software may deduce that congestion exists. 
For example, in a data-link control protocol that uses a sliding window flow
and error-control technique, the protocol detects the loss of an information
frame in one of two ways. When a frame is dropped by the network, the
following frame will generate an REJ frame from the receiving end point. When
a frame is dropped by the network, no acknowledgment is returned from the
other end. Eventually, the source end will time out and transmit a command
with the P bit set to 1. The subsequent response with the F bit set to 1
should indicate that the receive sequence number N(R) from the other side is
less than the current send sequence number.
Once congestion is detected, the protocol uses flow control to recover from
the congestion. Assume that the layer-two window size, W, can vary between the
parameters Wmin and Wmax, and is initially set to Wmax. We would like to
reduce W as congestion increases to gradually throttle the transmission of
frames. Example 1 lists three classes of adaptive window schemes based on
response to one of the two aforementioned conditions.
Successful transmissions (measured by receipt of acknowledgments) may indicate
that the congestion has decreased and window size should be increased. Example
2 shows two possible approaches. Studies suggest that the use of the strategy
in Example 1(c) with =0.5 plus the strategy in Example 2(b) provides good
performance over a wide range of network parameters and traffic patterns. This
is the strategy recommended in the standards.
Figure 1 Effects of congestion: 
(a) throughput; (b) delay.
Table 1: Frame-relay congestion-control techniques.
Type Technique Function Key Elements 
Discard Discard Provides guidance DE bit
strategy control to network
 concerning which
 frames to discard.

Congestion Backward explicit Provides guidance BECN bit or
avoidance congestion to end systems CLLM message
 notification about congestion
 in network.

Congestion Forward explicit Provides guidance FECN bit
avoidance congestion to end systems
 notification about congestion
 in network.

Congestion Implicit End system infers Sequence numbers
recovery congestion congestion from in higher-layer
 notification frame loss. PDU
Figure 2 Queue-length averaging algorithm. t=current time; ti=time of ith
arrival or departure event; qi=number of frames in the system after the event;
T0=time at the beginning of the previous cycle; and T1=time at the beginning
of the current cycle.
Example 1: Three classes of adaptive window schemes.
(a)
 Set W=Max [W--1,Wmin]
(b)
 Set W=Wmin
(c)
 Set W=Max [aW,Wmin], where 0
< a < 1
Example 2: Successful transmissions may indicate that the congestion has
decreased and window size should be increased.
(a)
 Set W=Min[W+1,Wmax] after N consecutive successful transmissions.
(b)
 Set W=Min[W+1,Wmax] after W consecutive successful transmissions.













Examining the PowerBASIC Developer Kit


Vocabulary-frequency analysis moves from DOS to Windows




Raymond J. Schneider


Ray is the director of engineering at ComSonics and a doctoral candidate at
George Mason University. He can be reached at rschneider@global.net.


Language--the most significant of human inventions--is to a great extent the
medium of thought. Through the development of language we enlarge the scope of
that which is thinkable. The act of composing computer programs is not unlike
the use of natural language, though more formal and less flexible. The history
of software engineering is the history of computer languages, beginning with
Fortran, Cobol, and Lisp, and continuing to more recent languages such as
Prolog, Smalltalk, C++, and others.
With this in mind, while working on my doctoral research at George Mason
University in software-engineering methodologies, I explored the vocabulary
used by authors writing professional papers in the software-engineering field.
In the process, I developed some relatively simple tools to perform
vocabulary-frequency analysis. As originally written, these programs run in a
typical command-line environment under DOS. In a recent encounter with the
PowerBASIC Developer Kit (PBDK), I dusted off these routines, extending and
migrating them to Windows using the PBDK. The complete source code to both
versions is available electronically; see "Availability," page 3.


English Riches


English has the largest vocabulary of any human language. Among the reasons
for this are that it springs from a relatively rich core vocabulary drawn from
Anglo-Saxon, Latin, and Greek. Furthermore, English has adopted interesting
words from all other languages, so-called "loan" words. The simplest structure
of an English sentence is a noun phrase (NP), followed by a verb phrase (VP)
(also known as the "subject" followed by a "predicate"). Thus, this simple
structure could be expressed as NP+VP. 
In the sentence, "The programmer wrote the application," The programmer is the
NP (subject) and wrote the application is the VP (predicate). In her book,
Understanding English Grammar (McMillan, 1982), Martha Kolln distinguishes
"_ten sentence patterns which account for the underlying skeletal structure of
almost all the possible grammatical sentences in English. All are elaborations
on this simple underlying pattern."
To understand English with a computer, it is necessary to incorporate both a
large dictionary of words and rules for word transformation, sentence
composition, and some linkage into machine-based semantics to allow some
degree of understanding. According to Bill Bryson in The Mother Tongue
(William Morrow, 1990), Webster's Third New International Dictionary lists
450,000 words, while the revised Oxford English Dictionary has 615,000. If we
include all the scientific and technical terms and all the transformational
words created by various formation rules, we easily get into the range of many
millions of words. This is well beyond any reasonable tabular scheme, so
obviously we must devise some rules.
Many Prolog implementations incorporate definite clause grammar (DCG), a
scheme for parsing sentences. The DCG is basically a set of rewrite rules
which use the infix operator -->. Thus our simple sentence definition, NP+VP
would be written sentence --> noun-phrase, verb-phrase.


Vocabulary and Frequency


Words embody ideas. To study a subject, you need to learn the meaning of the
words which are used and the way they are combined. This is true not only of
natural language but of formal languages such as mathematics and computer
programming. The importance of a concept expressed in a particular fragment of
text may be measured by the frequency with which the words are used. Of
course, some words occur with great frequency because they form part of the
syntactic structure that holds the linguistic forms together. Reporting on a
1923 study by lexicographer G.H. McKnight, Cullen Murphy lists the infamous
nine which account for one-quarter of all spoken words: and, be, have, it, of,
the, to, will, and you ("The Big Nine," Atlantic, March 1988). Once we are
past these and two score or more glue and structure words, we encounter the
vocabulary that holds the ideas and relations which are the core of the
communication.
Figure 1 illustrates the system structure of a small set of programs I wrote
to collect frequency data on text fragments. The structure centers on text and
alpha files. The text file is assumed to be a set of lines encoded in ASCII.
An alpha file is a comma-delimited file whose first field is an ASCII string
(normalized to all capital letters) and whose second field is the integer
number of times that the word appears in the text fragment. The four functions
shown in Figure 2 make up the system. 
The system is implemented using PowerBASIC, the current embodiment of
Borland's Turbo BASIC, written by Bob Zale. Like most Basics, PowerBASIC has
powerful string-handling facilities. It is a fast native-code compiler, and
the resulting programs are also fast. Additionally, PowerBASIC includes a
variety of enhancements not found in many other Basics. I found the REPLACE
statement, VERIFY function, and ARRAY SORT statement particularly helpful. The
REPLACE statement replaces all occurrences of a given string with a new string
which is specified; the VERIFY function determines whether each character of a
string is present in another string; and ARRAY SORT allows you to sort all or
part of an array into ascending or descending order. You can also include a
TAGARRAY, an array associated and sorted with the main array. Thus, to sort
the alpha files into alphabetical order, sort with the words using the
frequencies as a TAGARRAY and vice versa when sorting in frequency order.


DOS and the COUNT Algorithm


The COUNT algorithm (see Listing One) simply processes lines of ASCII text one
line at a time. COUNT isolates the words in the line by replacing all nonalpha
characters (numbers, punctuation, brackets, and so on), except hyphens and
apostrophes, with blanks using the VERIFY command. (Note that in Listing One,
I first used the REPLACE command to replace numbers, brackets, and
punctuation.) The VERIFY is open ended and cleans up anything that is not an
alpha character or a hyphen or an apostrophe. Then the words are isolated as
blank, separated character substrings, and any trailing or isolated hyphens or
apostrophes are eliminated. Thus, a word is defined as a character string with
embedded hyphens and apostrophes. This allows contractions such as "can't" and
hyphenated words such as "high-brow" to get through the system whole. 
As written, lines terminating in a hyphen are handled improperly. Also, some
word structures fairly common in computer-science literature--"dot" notations
referring to structures and the common use of underscore characters in
variable names, for example--will be split. While these limitations are
acceptable for my needs, this might be a good place to start when considering
extensions to the code.
After getting all the words into an array, the array is sorted and scanned.
Words that are spelled the same are adjacent in the sorted array, so the scan
is performed, the frequency is accumulated, and the word array is reduced to
an alpha array of words and their associated frequencies. This array is
written out as a file with the same DOS name as the original, but with the
.CNT extension. The rest of the programs--SORTA, MERGEA, and STATA--are
available electronically (see "Availability," page 3). The programs parse the
command line, read in the input file or files, perform the operation and write
an output file, or in the case of STATA, the word counts are simply printed to
standard output. 
The process is a bit trickier in the case of MERGEA. The command line can
include up to five files to be merged, as well as the name that the output
file is to take. The alpha lines are simply appended, and the file is sorted
with the frequency numbers as a TAGARRAY. Then the file is scanned in a
fashion similar to COUNT except that alpha lines are scanned and the frequency
counts of repeated words between files are accumulated.


Inside the PBDK


PowerBASIC recently added the PowerBASIC Developer Kit (PBDK) to its family of
support tools. PBDK is a shell over the Windows API that simplifies Windows
programming dramatically. Simple but complete access to Windows, dialog boxes,
menus, DDE, clipboard, DLLs, and preemptive multitasking are all provided and
relatively easy to use. Since it is not intrinsically event driven like Visual
Basic, the PBDK provides a more graceful introduction to the Windows
environment without the "sink-or-swim" feeling. The PBDK also provides
PowerBASIC developers with a migration path to Windows that does not require a
major rewrite of DOS application code.
The PBDK run time consists of three files: the DVSERVER.EXE server, DVDLL.DLL,
and Windows virtual driver VDV.386. The DLL and driver are contained within
DVSERVER.EXE, which will extract them transparently at run time and do so only
once. The server shields you from much of the complexity of the Windows
environment. From PowerBASIC, the server simply looks like a standard library.
Although Windows is message based, the PBDK handles all the low-level
messages, leaving you to handle only those messages that are application
dependent. The components of a Windows application as implemented by PBDK
include the session, the working area of a PBDK application where it will open
other windows and where any menus will be installed, and then various
user-defined dialogs containing the controls the application will use. To make
the application operate you must use event loops. The event loop uses the
GetMessage() function to read the message queue maintained by the DV server.
Figure 3 shows the steps required to create an application. 


Migrating to Windows


Implementing the application required a bit of thought. The functions were
generally independent in the DOS implementation. Creating a Windows
implementation, however, implies a higher degree of integration. My first
thought was simply to create a menu structure whose elements (Count, Sort,
Statistics, and Merge) corresponded to one of the DOS commands. The
implementation would just get the command parameters, and the only change
necessary to the original DOS code would be to change from parsing a COMMAND$
line argument to passing the string to a subroutine for parsing. 
However, it quickly became obvious that this simple approach would not be
satisfactory by itself. The problem is that Windows is a very visual
environment, while the DOS command line is a very nonvisual environment.
Moreover, many capabilities available in the DOS environment--file management,
browsers, and editors, for example--are not as immediately available under
Windows. Thus, the application that I collectively call "Fun With Words" would
have to incorporate visual elements that are not part of the DOS
implementation.

Figure 4 shows the Windows version of the application. There is the Session
window containing the menu bar. Note that the additional menu item Get has
been added to allow access to a .CNT file directly. Within the Session window
is a dialog containing a variety of controls used to add visual elements to
Fun With Words. The controls are two edit boxes and two list boxes together
with three buttons labeled OK, GO, and End Merge. During the development I
found this arrangement fairly intuitive. However, I suspect that users will
find it a bit confusing since the dialog is used in more than one way.
Figure 5 shows the application with annotations showing how the menus and
controls operate. Both Count and Get open the OpenFileName common dialog. With
OpenFileName, users can select a filename which is then passed to the edit box
with the label Active Word File, which holds an alpha filename. The list box
is loaded with the alpha file lines, generally in alpha order. Selecting Sort
causes the alpha file to be put in frequency order in the list box. Statistics
opens a message box with the total word count and the distinct word count for
the file in the Active Word File edit box. Selecting Merge makes the dialog
box modal and puts it in merge mode; see Figure 4. In merge mode, the Active
Word File can be used as a source of alpha filenames by putting the
appropriate wild cards in the edit box. The list box is loaded when the OK
button is clicked. Double-clicking on an alpha filename causes it to appear in
the Merge File Selection list box. Adding an output filename in the associated
edit box and selecting Go causes the files to be merged and written to the
output file. Clicking on End Merge clears the edit and list boxes and
reactivates the main menu.


Coding with the PBDK 


The OpenSession() subroutine in Listing Two initializes the session, sets the
opening size of the window and the Txt$="Fun With Words" in the title bar, and
returns the session handle, hSession%. The menu is defined in a resource file
created with the Symantex Resource Toolkit included with the PBDK. You can
also create menus directly in the code, but it's more cumbersome.
The menu is loaded with the LoadResources() subroutine and set with SetMenu().
The main dialog illustrated in Figure 4 is loaded with LoadResources() and
then the various buttons and boxes are loaded with the GetDlgItem()
subroutine. Once all the menus, dialog, and controls are in place, the program
launches the main event loop. This is a straightforward loop which can be
implemented in a number of ways. In Listing Two, a WHILE loop is used.
The key to handling PBDK messages is to set up appropriate conditions in the
event loop. You can go a long way without getting too fancy. The parameters in
the GetMessage() subroutine call provide you with all the information
necessary to interpret the messages. hMsgWnd% is the handle of the object
sending the message. Msg% is the message number denoting the kind of message.
wP% is the main parameter of the message, which depends on the message number.
XCursor%, YCursor% is the position of the cursor. XParam%() is a parameter
array whose meaning is dependent on Msg%. Cmd$ is a string with additional
information.
All controls (buttons, list, edit boxes, and so on) send the value
Msg%=WMCOMMAND% and the ID of the control in the variable wP%. Numerical
values in XParam%(2) give notification codes such as LBNDBLCLK% (List Box
Notification Double Click), which is used to transfer filenames from the left
list box to the right list box. 
In Fun With Words, the global event loop looks for menu selections. With each
selection, it does what is indicated (Count, Get, and so on). With Merge, it
sets the main dialog box modal and sets off a local event loop to handle the
Merge functionality. Clicking on the button End Merge causes the program to
clear the boxes and return to the menu loop.


Pitfalls


Most of the problems in coding Fun With Words resulted from weak PBDK
documentation in certain areas. The examples tend to be somewhat superficial,
something not particularly surprising in a new toolkit. The two-volume manual
covers a lot of territory, but Windows is a big topic. The PBDK is now being
shipped with improved documentation.
I found that the GetOpenFileName() subroutine to use the OpenFileName common
dialog was bombing the application when the Cancel button was clicked--my
window would just collapse. After struggling and failing to find out why, I
called PowerBASIC technical support. It turns out that you must execute a
ReadErrorNumber() after calling GetOpenFileName() and test for error
conditions. In my case, ReadError-Number() returned a value of 3. This is not
exactly intuitive and I'm sure many more of these little tidbits of
information are not clearly documented in the manuals.
Another problem I encountered was with the file buffers in many of the DV
subroutines. Buffers had to be initialized with a command like
FileName$=SPACE$(32), providing a 32-character buffer. When a string was
returned, it was left-justified and space filled to the extent of the buffer.
This caused a few problems when I simplistically concatenated filenames for
the merge command. The parse instructions just weren't expecting really big
strings. To find the problem, I scattered line numbers through the merge
routine and created a MessageBox to tell me where I was blowing up. I left
these in the code as warnings to stray passers-by. I also put a MessageBox out
for every filename to establish that it was parsed correctly.


Conclusion


My initial experience using the PowerBASIC Developer Kit was positive and the
effort, very productive. Although there were some difficulties and false
starts, the product shields the programmer from many of Windows programming
details, while allowing very functional use of the Windows environment. A
strong point was the ability to easily port code created in the DOS
environment to Windows with minimal change. I found the use of event loops
somewhat more forgiving for a programmer unfamiliar with the Windows
environment than a full-blown, event-driven interface would be. I was slightly
disappointed that PBDK does not currently support the use of Visual Basic
VBX-custom controls, nor does it support OLE 2. Bob Zale, PowerBASIC's author,
assured me that VBX support is coming in the next release, with OLE 2 support
to follow.


For More Information


PowerBASIC Developer Kit
PowerBASIC Inc.
316 Mid Valley Center
Carmel, CA 93923
800-780-7707
$299.00
Figure 1 Structure of the command-line word-frequency system.
Figure 2: Functions that make up the command-line word-frequency system.
COUNT count <filename.ext> -> <filename.cnt>, where
 <filename.ext> is an ASCII text file
 in free format and <filename.cnt> is
 an alpha file, comma delimited in the
 form <word>, <frequency>. The count
 function preserves hyphens and
 apostrophes in words and produces
 an alpha file in alphabetical order.
MERGEA mergea <file1> <file2> _ / <fileout>. Input files are alpha
 files; output is a combined alpha
 file that adds together the
 frequencies of the same words and
 produces a single, merged alpha
 file. The alpha input files may
 be ordered in any fashion; the
 output file is in alphabetical order.
SORTA sorta <filename> [/A, /F] //. Accepts an alpha file of unknown
 order and puts it in either
 alphabetical or frequency order.
 If no flag is provided, /F is

 assumed, and it will be put
 into frequency order.
STATA stata <filename>. Prints the total number of words and
 the number of distinct words on the screen. 
Figure 3: Steps to creating a Windows application using the PBDK.
1. Initialize PowerBASIC to link to the DVSERVER.
2. Initialize the DVSERVER.
3. Open the session.
4. Attach menus.
5. Add dialogs and controls.
6. Create global event loop.
7. Incorporate message handling and minor event loops as
 needed to implement application.
8. Disconnect server when finished: END.
Figure 4 Windows version of the vocabulary-frequency-analysis application. 
Figure 5 Annotated version of the vocabulary-frequency-analysis application.

Listing One 

$ERROR ALL
InputFileName$=COMMAND$
CALL Mergea(InputFileName$)
END
'Program to read files and isolate and count occurances of words
' file should have the extension .TXT however the program will
' attempt to read any file so long as the user insists.

'OUTLINE
' 1. READ A LINE -- Drop if it is a comment line (i.e preceeded with ')
' 2. PARSE INTO WORDS
' 3. ADD WORDS TO AN ARRAY
' 4. WHEN LAST LINE, GO BACK AND...
' 5. ANALYZE ARRAY FOR REPEATED OCCURANCES OF WORDS
' 6. INITIALLY BY SORTING THE CONTENTS OF THE ARRAY AND
' 7. THEN GOING THROUGH THE ARRAY AND ACCUMULATING REPEATED WORDS.
' 8. WHEN FINISHED PRINT OUT TOTAL WORDS AND DISTINCT WORDS TO SCREEN.
' 9. WRITE THE ALPHA FILE <WORD>,<# OCCURANCES> IN SORTED ALPHA ORDER
'10. BUT THIS IS NOT AN ASSUMPTION ABOUT ALPHA FILES, THEY CAN BE IN ANY
' ORDER Alpha File line::= <word>,<frequency>
'******************************************
'6/5/94 Added LinkBack: to drop comment lines so that input files could
'contain embedded comments without affecting the word counts
'6/7/94 Dropping word print outs and adding a stats line at end of run
'******************************************
 SUB Mergea (InputFileName$)
 'OPEN FILE
 $DYNAMIC
 MaxWords%=20000
 MaxWordsOnLine%=500
 DIM Word$(MaxWordsOnLine%) 'Assumes no more than MaxWordsOnLine%
 DIM AllWord$(MaxWords%), Vocab$(MaxWords%), VFreq%(MaxWords%)
 'String Constants Required
 UpperCase$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
 LowerCase$="abcdefghijklmnopqrstuvwxyz"
 Punctuation$=",.:;?"
 Brackets$="<>{}[]()"
 Digits$="0123456789"

 'Now create OutputFileName$ by parsing COMMAND$

 FileRootName$=EXTRACT$(InputFileName$,".")
 '********* Note that this Output FileName creation may be Error Prone
 OutputFileName$=FileRootName$+".CNT"
 'Ready to Open input file
 ON ERROR GOTO ErrorHandler
 OPEN InputFileName$ for input as #1
 WordCount=0 'This is the overall wordcount for the Input File
 WHILE NOT EOF(1)
 '*******************Get a Line$ and Parse it into words ************
 'Read Line In
LinkBack:
 LINE INPUT #1, Lin$
 'Parse Line Into Words
'First check line for apostrophe ... and skip lines with apostrophes
IF LEFT$(Lin$,1)="'" THEN Goto LinkBack 'this drops comment lines
'Then Eliminate non-alpha characters replacing them with spaces
 REPLACE ANY ",.:;?<>{}[]()0123456789" WITH " " IN Lin$
Again:
 L=VERIFY(Lin$,UpperCase$+LowerCase$+" "+"'"+"-") 'note apostrophe so as to 
 ' capture contractions
 'Note this process will accept hyphens as a word and we want only
 'internal hyphens -- see below at EXTRACT$
 IF L=0 THEN Goto Success
 'Otherwise L points to a non alpha non space character
 'replace it with a space and continue
 MID$(Lin$,L)=" "
 Goto Again
Success:
 Lin$=LTRIM$(Lin$)
 Lin$=RTRIM$(Lin$)
 'Now spaces are stripped from front and back of string
 i=0 'Word Counter for # for Word(s) in Line$
KeepGoing:
 IF Len(Lin$)=0 THEN Goto Done
 Word$(i)=EXTRACT$(Lin$," ") 'Finds ith word but it may just be hyphens
 Lin$=LTRIM$(Lin$,Word$(i)) 'Strips found word from string
 Lin$=LTRIM$(Lin$) 'Strips leading blacks
 Word$(i)=LTRIM$(Word$(i),ANY "-'")
 Word$(i)=RTRIM$(Word$(i),ANY "-'")
 IF Len(Word$(i))=0 THEN Goto KeepGoing
 i=i+1 'Increment counter
 Goto KeepGoing 'this will loop collecting the words in Lin$
Done:
'****************** End of Line$ Parse *************************
 'i enters Done as the # of words collected indexed from 0 to n-1
 'Note that this is done with a line not the whole file so now we have
 'to add the words to the AllWord$() array
 'Now we must put the indicated # of words in an array which is
 ' large enough to hold all the words in the file.
'************** Assemble the Line$ words into the AllWord$ Array *******
 For j=0+WordCount to i-1+WordCount
 AllWord$(j)= Word$(j-WordCount)
 Next j
 WordCount=WordCount + i 'Increments after assigning all Word$(s) from line
WEND
CLOSE #1
'*********************************Word Collection Complete ************
' **** SORT the AllWord$(0 to WordCount-1) Array **********************
ARRAY SORT AllWord$(0) FOR WordCount,COLLATE UCASE

IF WordCount>0 THEN GOTO Proceed
Print "No Words Found"
END
Proceed:
'Variables are NumWords which counts the Vocabulary Additions AllWordPointer 
' which indexes through the AllWord$() array WordCount which holds the number 
' of words in the AllWord$() array indexed from 0 to WordCount-1
'Vocab$() the Vocabulary Array and VFreq%() the frequency array
NumWords=0
AllWordPointer=0 'Initialize NumWords and AllWordPointer
Vocab$(NumWords)=AllWord$(AllWordPointer) 'Initializes 1st word
VFreq%(NumWords)=1 'One occurrance of the 1st word
VocabAgain:
 AllWordPointer=AllWordPointer+1 'Now has # of words scanned
 IF UCASE$(AllWord$(AllWordPointer))=UCASE$(AllWord$(AllWordPointer-1)) THEN
 'Word is repeated so
 VFreq%(NumWords)=VFreq%(NumWords)+1
 IF AllWordPointer=WordCount-1 THEN Goto VDone
 Goto VocabAgain
 ELSE
 NumWords=NumWords+1
 Vocab$(NumWords)=AllWord$(AllWordPointer)
 VFreq%(NumWords)=1
 IF AllWordPointer=WordCount-1 THEN Goto VDone
 Goto VocabAgain
 END IF
VDone:
OPEN OutputFileName$ for output as #1
For j=0 to NumWords
PRINT #1,UCASE$(Vocab$(j));",";STR$(VFreq%(j))
next j
CLOSE #1
Print "Total Number of Words= ";WordCount
Print "Number of Different Words is= ";NumWords+1
END SUB
'********************************************************
' SUBROUTINES
'********************************************************
 ErrorHandler:
 E=ERRTEST
 IF E=53 then Print "Input File Not Found":END
 Print "Error";E;" Occurred"



Listing Two

$ERROR ALL
'Fun With Words -- an example of the conversion of DOS BASIC programs to 
' Windows using the PowerBASIC Development Kit -- Copyright R.Schneider 1994

$INCLUDE "PB3DV.PB3"
CALL InitExecution( "FWW", RESETAPP% )
CALL ReadErrorNumber( DVError% )
IF DVError% > 0 THEN END

'*****
'Creation of a Session, required to use Menus.
'*****

Txt$ = "Fun With Words"
CALL OpenSession( Txt$, 0, 0, 639, 479, hSession% )

'Open Menu Resource
MenuResName$="WFMENU.RC"
CALL LoadResources(MenuResName$,hMenuRes%)
MenuName$="MainMenu"
CALL OpenMenuRes(hMenuRes%,0,MenuName$,hMenu%)
' Setting the new Menu in the current Session. The Menu is now
' displayed, and it starts emitting messages.
CALL SetMenu(hMenu%)

'Load MainDialog Box
ResName$="wfmndlg.rc"
CALL LoadResources(ResName$,hDlg%)
DlgName$="MAINDLG"
CALL OpenDlgRes(hDlg%,DlgName$,hWnd%)
CALL SetWindowModal(hWnd%,NOTMODAL%)

'Open Resources in Main Dialog Box
hEditBox%=101
CALL GetDlgItem(hWnd%,hEditBox%,hEdit%) 'hEdit% set by Windows
hListBox%=103
CALL GetDlgItem(hWnd%,hListBox%,hList%)
hGOButton%=107
CALL GetDlgItem(hWnd%,hGOButton%,hGO%)
hEndMergeButton%=110
CALL GetDlgItem(hWnd%,hEndMergeButton%,hEndMerge%)
hMergeFileEdit%=108
CALL GetDlgItem(hWnd%,hMergeFileEdit%,hMergeEdit%)
hMergeListBox%=109
CALL GetDlgItem(hWnd%,hMergeListBox%,hMergeList%)
hOKButton%=111
CALL GetDlgItem(hWnd%,hOKButton%,hOK%)
' Event Loop
DIM XParam%(4)
GetMsg% = 1
WHILE GetMsg% <> 0
CALL GetMessage( hMsgWnd%, Msg%, wP%, XCursor%,_YCursor%, XParam%(1), Cmd$)
' When no message is available, GetMessage returns a Msg=0 Message.
 IF Msg% <> 0 THEN
 ' The Msg%=WMCOMMAND% messages come from the Menu.
 ' wP% contains the ID of the clicked Item.

 IF Msg% = WMCOMMAND% THEN
 IF wP%= 100 THEN 'Menu Count was selected
 Flags&=0
 Filter$="Filter 1 $.TXT $.BAK *.TXT *.BAK 
 Filter 2 $.C $.DOC $.BAK*.C *.DOC *.BAK"
 CustomFilter$="CFilter 1 $.TXT $.BAK *.TXT *.BAK"
 FileTitle$=""
 Title$="Get ASCII Test File to Count"
 FileName$=SPACE$(32)
 Directory$="C\"
 DExt$=""
 CALL GetOpenFileName(0,Filter$,CustomFilter$,0,_
 FileName$,64,_
 FileTitle$,Directory$,Title$,Flags&,_
 FileOffset%,FileExt%,DExt$,xError&)

 CALL ReadErrorNumber(ErrNum%)
 IF ErrNum%=0 THEN
 'Load Edit Box and Read .CNT file into ListBox
 DistinctWords%=0
 CALL Count(FileName$,hEdit%,OutFileName$,DistinctWords%)
 'Now load ListBox
 CALL ResetListBox(hList%)
 Open OutFileName$ for input as #1
 lnum%=0
 for num%=1 to DistinctWords%
 Line Input #1, WordF$
 CALL AddStringListBox(hList%,WordF$,lnum%)
 next num%
 close #1
 END IF
 IF ErrNum%=3 THEN
 END IF 'do nothing
 END IF 'menu 100 COUNT
 IF wP%=200 THEN
 ' Sort File Program Calls etc. Need to check for sort 
 ' variable /F or /A and check that edit box has a file in it.
 IF OutFileName$<>"" THEN
 Path$=OutFileName$
 CALL SetSelEdit(hEdit%,0,32)'Don-t like absolute #s
 CALL ReplaceSelEdit(hEdit%,Path$)
 CALL SortIt(OutFileName$,DWords%)
 CALL ResetListBox(hList%)
 Open OutFileName$ for input as #1
 lnum%=0
 FOR num%=1 to DWords%
 Line Input #1, WordF$
 CALL AddStringListBox(hList%,WordF$,lnum%)
 NEXT num%
 CLOSE #1
 END IF 'OutFileName
 END IF 'menu 200 SORT
 IF wP%=300 THEN
 'Statistics Program
 Print OutFileName$
 'Note should check EditBox for contents and use that!
 IF OutFileName$="" THEN
 ELSE
 CALL Statistics(OutFileName$)
 END IF 'OutFileName
 END IF 'menu 300 STATISTICS
 IF wP%=400 THEN
 'Merge Code
 CALL SetWindowModal(hWnd%,TASKMODAL%)
 CALL ResetListBox(hList%)
 CALL ResetListBox(hMergeList%)
 CALL SetSelEdit(hMergeEdit%,0,32)
 'Set Edit Box to Default Value, *.CNT
 Path$="*.CNT"
 CALL SetSelEdit(hEdit%,0,32)
 CALL ReplaceSelEdit(hEdit%,Path$)
 'Get directory based on DefaultValue
 CALL AddDirListBox(hList%,Path$,DDLREADWRITE%)
 'Setup Local Event Loop
 Cmd$=SPACE$(32)

 DIM Xp%(4)
 GetLocalMsg%=1
 WHILE GetLocalMsg%<>0
 Call GetMessage(hLMsgWnd%, LMsg%, LwP%, Xc%, Yc%, Xp%(1), Cmd$)
 IF LMsg%=0 THEN
 CALL ReleaseTimeSlice
 ELSE 'LMsg% is not equal to zero
 IF LMsg%=WMCOMMAND% THEN
 IF LwP%=hEndMergeButton% THEN
 GetLocalMsg%=0
 END IF 'hEndMergeButton
 IF LwP%=hListBox% THEN
 Notification%=Xp%(2)
 IF Notification%=LBNDBLCLK% THEN
 'Get the selected item from ListBox
 'first get # lines in list box
 CALL StatusListBox(hList%,l%,top%,w%,height%,
 _MaxWidth%,Index0%,NbLines%,
 _SelectedLine%,selcnt%)
 'SelectedLine% is the index of the selected 
 ' line or it is negative
 ThisLine$=SPACE$(32)
 CALL GetTextListBox(hList%,SelectedLine%,ThisLine$,32)
 Print ThisLine$
 'then look for selected line. put selected line
 'in merge listbox. get # of items in listbox
 CALL StatusListBox(hMergeList%,l%,top%,w%,
 height%,_MaxWidth%,Index0%,
 NbLines%,_SelectedLine%,selcnt%)
 IF NbLines%<5 THEN
 CALL AddStringListBox(hMergeList%,ThisLine$,Rank%)
 Print "Got to NbLine%<5 IF Statement"
 ELSE
 'List Box has 5 items in it now! No more 
 'will be accepted. Put up a message box.
 Txt$ = "Only 5 files can be merged at a time!"
 Caption$ = "Informative Message"
 CALL MessageBox( 0, Txt$, Caption$,
 MBOK% OR MBTASKMODAL%, Code%)
 END IF 'NbLines<5
 END IF 'Notification
 END IF 'hListBox
 IF LwP%=hGOButton% THEN
 'Enter a Merge File Name
 Size%=MAXSTR%
 MergeFileName$=SPACE$(MAXSTR%)
 CALL GetCTLText(hMergeEdit%,MergeFileName$,Size%)
 MergeFileName$=RTRIM$(MergeFileName$)
 MergeFileName$=LTRIM$(MergeFileName$)
 IF MergeFileName$="" THEN GOTO NoGood
 CALL StatusListBox(hMergeList%,l%,top%,w%,
 height%,_MaxWidth%,Index0%,
 NbLines%,_SelectedLine%,selcnt%)
 CtlString$=""
 for i%=Index0% to Index0%+NbLines%-1
 Size%=MAXSTR%
 Text$=SPACE$(MAXSTR%)
 CALL GetTextListBox(hMergeList%,i%,Text$,Size%)
 Text$=RTRIM$(Text$)

 CtlString$=CtlString$+" "+Text$
 next i%
 CtlString$=CtlString$+" /"+MergeFileName$
 CALL Merge(CtlString$)
 Print CtlString$
 NoGood:
 END IF 'hGOButton%
 IF LwP%=hMergeListBox% THEN
 'if doubleclick on filename. then delete filename
 'if no filename, then do nothing
 Notification%=Xp%(2)
 IF Notification%=LBNDBLCLK% THEN
 'Get the selected item from ListBox
 'first get # lines in list box
 CALL StatusListBox(hMergeList%,l%,top%,w%,
 height%,_MaxWidth%,Index0%,
 NbLines%,_SelectedLine%,selcnt%)
 'SelectedLine% is the index of the selected 
 'line or it is negative
 ThisLine$=SPACE$(32)
 CALL GetTextListBox(hMergeList%,SelectedLine%,ThisLine$,32)
 Print ThisLine$
 'then look for selected line
 'delete the selected line in merge list box
 IF ThisLine$ <> "" THEN
 CALL DeleteStringListBox(hMergeList%,SelectedLine%)
 END IF 'ThisLIne
 END IF 'Notification
 END IF 'hMergeListBox -- to delete entries

 IF LwP%=hOKButton% THEN
 'a file name has been entered
 ' in the edit box with wildcards
 Size%=MAXSTR%
 EditFileName$=SPACE$(MAXSTR%)
 CALL GetCTLText(hEdit%,EditFileName$,Size%)
 PRINT EditFileName$
 'reset the edit box and then
 CALL ResetListBox(hList%)
 'add the directory or file name
 CALL AddDirListBox(hList%,EditFileName$,DDLREADWRITE%)
 END IF 'hOKButtion%
 END IF 'WMCOMMAND
 END IF 'LMsg%=0
 WEND 'Local Event Loop
 'Clear Everything
 CALL SetWindowModal(hWnd%,NOTMODAL%)
 CALL SetFocus(hSession%)
 CALL ResetListBox(hList%)
 CALL ResetListBox(hMergeList%)
 CALL SetCTLText(hEdit%,SPACE$(32))
 CALL SetCTLText(hMergeEdit%,SPACE$(32))
 END IF 'Merge Menu Item #400
 IF wP%=500 THEN Msg%=WMSYSCOMMAND%:wP%=SCCLOSE%
 IF wP%=600 THEN
 'Get an alphafile name and display the file in listbox--
 ' use same strategy as with the Count menu item without 
 ' calling the count function
 Flags&=0

 Filter$="Filter 1 $.CNT $.BNT *.CNT *.BNT"
 CustomFilter$="CFilter 1 $.CNT $.BNT *.CNT *.BNT"
 FileTitle$=""
 Title$="Get Alpha File"
 FileName$=SPACE$(32)
 Directory$="C\"
 DExt$=""
 CALL GetOpenFileName(0,Filter$,CustomFilter$,0,_
 FileName$,64,_
 FileTitle$,Directory$,Title$,Flags&,_
 FileOffset%,FileExt%,DExt$,xError&)
 CALL ReadErrorNumber(ErrNum%)
 IF ErrNum%=0 THEN
 'Load Edit Box and Read .CNT file into ListBox
 Path$=FileName$
 CALL SetSelEdit(hEdit%,0,32)'Don-t like absolute #s
 CALL ReplaceSelEdit(hEdit%,Path$)
 OPEN FileName$ for INPUT as #1
 LinCount%=0
 CALL ResetListBox(hList%)
 WHILE NOT EOF(1)
 LINE INPUT #1, Lin$
 CALL AddStringListBox(hList%,Lin$,lnum%)
 LinCount%=LinCount% + 1
 WEND
 CLOSE #1
 OutFileName$=FileName$
 CALL Statistics(OutFileName$)
 END IF
 IF ErrNum%=3 THEN
 END IF 'do nothing
 END IF 'Menu 600 GET
 END IF 'Msg WMCOMMAND
 ' The Msg=WMSYSCOMMAND messages come from the System Menu of 
 ' the current Session. wP contains the ID of the clicked Item.
 ' Event Loop ends when System Menu CLOSE Item is activated.
 IF Msg% = WMSYSCOMMAND% THEN
 IF wP% = SCCLOSE% THEN
 ' Opening a Message Box to ask the user if he really 
 ' wants to end the program.
 Txt$ = "Do you really want to end Fun With Words?"
 Caption$ = "Goodbye Message"
 CALL MessageBox( 0, Txt$, Caption$, MBYESNO% OR MBTASKMODAL%, Code%)
 ' Code receives a value related to the user answer:
 ' IDYES%, or IDNO% in this case.
 IF Code% = IDYES% THEN GetMsg% = 0
 END IF 'SCCLOSE
 END IF 'WMSYSCOMMAND
 ELSE
 ' If no message is available, the application releases its Time
 ' Slice to let other applications run.
 CALL ReleaseTimeSlice
 END IF
WEND 'Main Event Loop
CALL EndExecution
END
'******************* EXTERNAL SUBROUTINES ***************
'***** Slightly Modified Original DOS Code **************
$INCLUDE "merge.bas"

$INCLUDE "cnt.bas"
$INCLUDE "sortwin.bas"
$INCLUDE "statist.bas"
'********************************************************
' SUBROUTINES
'********************************************************
 ErrorHandler:
 E=ERRTEST
 'IF E=53 then Print "Input File Not Found":END
 Print "Error";E;" Occurred"
 Txt$ = "Error="+STR$(E)+" Occurred at Line "+STR$(ERL)
 Caption$ = "Error Box"
 CALL MessageBox( 0, Txt$, Caption$,MBYESNO% OR MBTASKMODAL%, Code%)
 ' Code receives a value related to the user answer:
 ' IDYES%, or IDNO% in this case.
 IF Code% = IDYES% THEN END
 END














































Building Distributed Applications with Galaxy


Consumers and providers interact via sessions and statements




Stan Dolberg


Stan is a vice president at Visix Software in Reston, VA. He can be reached at
stan@visix.com.


Today's application developers face a major challenge when scaling up from
proof-of-concept client/server applications to large, multiplatform network
applications. All too often, an application originally written as a
single-platform, 4GL program and then scaled up to a system spanning many
departments, platforms, languages, and networks is as robust as a skyscraper
built out of wood with two-by-fours.
Galaxy 2.0 is a cross-platform toolset for building complex distributed
applications. Galaxy consists of class libraries with about 3500 API entry
points, plus GUI design tools and a distributed-services infrastructure. The
environment, which is available for C or C++, runs on Windows, NT, OS/2 2.1,
SunOS, Solaris, HP-UX, Ultrix, AIX, OSF/1 and IRIX, Macintosh System 7, and
OpenVMS.
The Galaxy framework components, known as "managers," can be broadly grouped
into four categories: 
Structural foundation.
Window-system/GUI abstractions.
Operating-system abstractions.
Distributed-computing abstractions. 
Galaxy abstractions are built on the lowest-available level of native services
and provide a single, high-level API across platforms.
Distributed Application Services (DAS) is the part of the Galaxy system that
provides communication between any Galaxy applications running on any
platform. 
DAS is made up of class libraries, run-time services, and tools. In comparing
DAS to the well-known OSI protocol model (see Figure 1), DAS encompasses
services analogous to the top three layers of the OSI stack. DAS relies on the
native-system platforms to provide the services that correspond to the lower
four layers of the OSI model. DAS uses an asynchronous, symmetrical,
peer-to-peer communications model for efficient, message-based
interapplication communication. DAS can be used to implement distributed
services in peer-to-peer, client/server, or even master/slave mode. On top of
several native protocols, DAS offers higher-level services that facilitate
development of distributed applications. DAS also allows Galaxy applications
to communicate with non-Galaxy applications, via interapplication
communication mechanisms such as the Distributed Computing Environment (DCE).
(See "Distributed Computing and the OSF/DCE," by John Bloomer, DDJ, February
1995.)
The DAS model consists of service providers and service consumers. A provider
exports a service for use by other applications; a consumer uses such a
service. This terminology is preferred over "client/server" because the
conventional terms no longer apply to modern distributed-application
environments: Increasingly, distributed applications function as both service
providers and consumers. For example, every DAS application is a consumer of
the services of the DAS Service Broker, the registry service that dynamically
locates providers by matching their attributes with those requested by a
consumer.
The DAS architecture consists of multiple levels of abstraction to facilitate
extensibility and portability; see Figure 2. The design of DAS tries to
balance the ease of handling large-grained objects with the flexibility of a
large number of API entry points. Objects can be subclassed, modified, and
enhanced, and entry points can be added to the API set for specific project
needs. The layered abstraction model limits platform dependencies and
eliminates interplatform differences in functionality.
DAS managers are in some cases derived from more general-purpose managers. For
example, the Datatag Manager is based on the Representation Manager, which, in
turn, provides an encapsulation object that transforms data formats between
applications on the fly (for example, dealing with floating-point formats on
dissimilar platforms) based on the relationships between Galaxy abstract data
types and native data types. 


DAS Major Components


The object-oriented components comprising DAS include six class managers, a
standard dialog called the "Service Chooser," and a run-time service provider
called the "Service Broker." The DAS managers consist of the Communication,
Datatag, Session, Signature, Service Broker, and Service Managers. 
The Communication Manager provides a low-level API to the network. Some
applications may use the Communication Manager directly, but most applications
will use the higher-level Session Manager instead. The Communication Manager
has a transport-oriented view of the network, although it is independent of
particular network transports. Applications must implement their own protocols
on top of this transport layer, in order to share data across different
machine architectures.
The Datatag Manager lets you create and manipulate application data
descriptions called "datatags"--a portable description of the structure and
contents of a piece of data in your application. Datatags allow application
data to be converted to different machine representations and enable
transparent sharing of data among applications. The Datatag Manager serves as
a bridge between the Representation Manager and the native language (C or
C++). 
The Session Manager implements high-level abstractions that allow service
consumers and providers to communicate with each other without explicit
knowledge of the underlying network. A "session" involves two applications
communicating with each other, via the vsession abstraction. A session
encapsulates the dialog between consumer and provider; see Figure 3. Unlike
the Communication Manager, users of the Session Manager do not implement
protocols. Derived from the Session Manager, the Session Statement Manager
abstracts the requests that one application can make of another. Using the
Session Statement Manager, an application forms a "statement" (an encapsulated
request) and sends it to a service provider.
The Service Manager is used to define service providers. A "service" is
defined as a collection of primitives and attributes. A primitive is a network
entry point into the service, abstracted via the Service Primitive Manager (a
submodule of the Service Manager). Service consumers use these entry points to
make requests of the service. The Service Manager uses the Session Manager for
communication between the consumer and the provider. 
The Service Broker Manager is used as a high-level interface to the Service
Broker from within an application. It provides a set of entry points that can
be used by service providers to register their services with the Service
Broker run-time service and by service consumers to query the Service Broker
for which services are available. The Service Broker Manager uses the Session
Manager to communicate between the application and the Service Broker.
The Signature Manager is used as a bridge between statements and primitives.
Signatures are primarily used to specify the prototypes for primitives. The
statements use these prototypes to help form the actual parameter list with
which to invoke the primitives, and to identify the return result (if any).
The primitives use these prototypes to identify their arguments and help form
the return result (if any). Signatures provide a means for type-checking data
sent between service consumers and service providers. 
The Service Chooser is a standard Galaxy dialog providing a graphical display
of those services that have been registered with a Service Broker. The user
can select a service from the Chooser and establish a session with the service
without requiring knowledge of the individual attributes of the service. The
Service Chooser builds the list of registered service providers by querying
the service broker. 
A Service Broker is a DAS service that knows about other available DAS
services--a networked registry of all registered services. The Service Broker
allows providers to register themselves and then maintains lists of their
attributes that consumers use in finding a desired service provider.
Typically, a network contains many Service Brokers working together, although
there is usually only one per machine. This distributed implementation is
usually not visible to applications that use the Service Broker. 
Unlike a naming service, the Service Broker can match an arbitrary set of
attributes between providers and consumers. And, unlike an object request
broker (ORB), the Service Broker does not participate in the conversation
between the applications. It makes the match and then "gets out of the way,"
while the applications talk to each other. This model provides greater
performance than an ORB implementation, and scales better than a mechanism
that is always in the middle of the provider/consumer conversation.


The Datatag Manager


Datatags and signatures are the essential building blocks of interapplication
conversations. The Datatag Manager and Signature Manager provide an API that
implements object-oriented abstractions for data typing, uniform data
representation, and service descriptions. These facilities, which enable
transparent interactions between applications across platforms and networks,
deserve a closer look, beginning with the Datatag Manager. The Datatag
Manager, which is based on the Representation Manager, is used to create and
manipulate datatags. Datatags provide a uniform means for specifying argument
types, return value types for DAS messages, and allow the efficient conversion
of data between the native representation of data and the protocol
representation. Datatags are roughly analogous to type definitions in
programming languages. They allow applications to share conceptually similar
data, even when the physical data representation may differ across a network
or a machine architecture. Data descriptions are managed independently of
specific uses, except when used in a datatag group.
The Datatag Manager provides functions which return primitive datatags that
correspond to common primitive C and C++ types, such as 8-bit signed scalar,
double-precision floating point, or strings. The Datatag Manager also provides
functions which return datatags that correspond to Galaxy-specific types, such
as vbool, vlonglong, or vscrap. The datatag facility can also express how data
should be stored, via entities such as the Galaxy dictionary type, pointers,
variable-length opaque, and so on. As with type definitions in programming
languages, datatags are meant to be shared by all applications using the
Datatag Manager.
Also supported are compound datatags, which describe structures that consist
of multiple datatypes. Much like the struct construct in C, compound datatags
allow you to define a structure that includes diverse types of data, such as
pointers, native Galaxy types, linked lists, arrays, dictionaries, trees, and
resources. Compound datatags allow complex structures to be handled across the
network as efficiently as basic data structures. You can also collect related
datatags into a container called a "datatag group." Datatags in a datatag
group can be stored, named, and manipulated as resources, and managed as a
group.
By using the Representation Manager together with the Datatag Manager, you can
encapsulate application data and create a datatag-scrap object. The Datatag
Manager defines a base datatag-scrap class and several primitive datatag-scrap
classes that provide default conversion behavior for each of the primitive
data-tags. Datatag-scrap methods automatically perform the transformations
necessary for data to move from application to the network and back. A datatag
scrap is typically used as a source or a destination for a call to a function
that converts to/from the application data representation. Datatag scraps can
be used to convert data stored in foreign representations into the data
structures defined by an application. 


Using the Datatag Manager


As mentioned earlier, the Datatag Manager can describe and convert data
structures defined by an application. Example 1(a) is a structure declared in
C. Example 1(b) shows how to convert that structure into a different
representation via a scrap. 

A linked list is a data structure that can become fairly complex when it
contains pointers to other nested structures. Many existing development tools
require you to write code that laboriously traverses linked lists, dispatching
to the networking software upon each list element. In DAS, datatags have been
implemented with pointers that allow linked lists (or other data structures
that incorporate references that are noncyclical) to be encapsulated and
passed across the network without the explicit involvement of the programmer.
Example 2(a) declares a linked list using a C struct. Example 2(b) shows the
equivalent datatag description. Example 3(a) and 3(b) show how the datatag can
be used by a provider or consumer to send/receive the linked list across
machine architectures without laborious conversion code.
Datatag scraps can be used in concert with the Resource Manager to share
complex items between applications running on the network. The Resource
Manager is "scrap aware," so anything that can be stored as a resource can be
accessed as a scrap. This includes dialog windows, user-interface objects,
text, error messages, color images, internationalization and
geometry-management information, or other structured data. The steps in
transmitting resources over the network are:
1. Store an item as a resource.
2. Use the Resource Manager API to make a scrap from the resource.
3. Send the scrap with the predefined scrap (vdatatagGetScrap()).
4. Receive the scrap.
5. Convert the scrap back into a resource.
6. Load the item from the resource.
As an example, say that several applications on different platforms need to
share a background-color specification. On each platform, burnt sienna may be
defined differently. How do you define a color specification that can be used
across these platforms? In Galaxy, there is a platform-independent
color-specification abstraction. This abstraction can be made into a resource,
which in turn can be made into a scrap, which in turn can be used with DAS to
provide a color specification as a run-time service. Example 4(a) shows the
service-consumer side of requesting a color spec, while Example 4(b) shows the
service-provider side.


Examining the Signature Manager


The Signature Manager is used to create and manipulate descriptions of
primitives known as "signatures," roughly analogous to function prototypes in
C and C++. A DAS primitive is a service entry point that is callable by
consumers. The primitive's arguments and return type are described with a
signature. Example 5 shows how to assemble datatag objects into a signature
when the primitive has a large number of arguments.
Signatures provide a bridge between DAS statements and DAS primitives.
Statements use the signature as a prototype in forming the parameter list when
invoking the primitive, and in identifying the return result. Primitives use
the signature in identifying their arguments and in forming the return result.
Arguments and results are typed via datatags created by the Datatag Manager.
Signatures are typically stored in Galaxy resource files, so they can be
shared by developers and, at run time, by service consumers and providers. In
addition to being stored as resources, signatures can be created at run time
through API calls. This allows an application to deal with data whose
structure is not known at compile time.


Services and Sessions


The Service Manager provides abstractions for creating and managing service
pro-viders. As mentioned earlier, a service consists of primitives and
attributes. Primitives are entry points into a service, and are invoked when a
consumer sends a statement to the service. Attributes describe the service and
how it can be located and used by a consumer. Attributes take the form: {name,
type, value}. Each attribute is uniquely named and is typed via a datatag. A
service can have any number of attributes, and you can add attributes to a
service so that its functionality becomes easier to identify by potential
consumers. 
For a provider to participate in sessions with new consumers, it must be
explicitly exported and enabled. Only enabled services can register themselves
with the Service Broker. A nonregistered service can only be accessed by
consumers that have been hard-wired to directly talk to the specific service. 
Service consumers initiate conversations, or "sessions," with service
providers. The Service Manager creates a new session each time a service is
invoked, one session for each connection with a consumer. A session is
represented by the datatype vsession. Every consumer/provider connection
consists of two vsession instances, one of which is private to the consumer
and one of which is private to the provider. The consumer uses its vsession to
send statements to the provider; the provider, likewise, sends return results
and status information via its vsession. In true peer-to-peer DAS sessions,
both vsessions are capable of querying and providing return results. 
DAS supports fully asynchronous communications between service providers and
service consumers. An application can be a consumer of multiple providers
concurrently, and can have multiple statements (encapsulated requests) pending
with one or more providers, while continuing to process other DAS, GUI, or OS
events from the Galaxy event loop. Alternatively, DAS supports synchronous
interactions--for situations where the application must wait for results from
one operation before proceeding with other threads of execution. 
Many existing development tools only allow network applications to interact
based on hard-coded locations or hard-coded server IDs. By contrast, the DAS
Service Broker allows applications to interact at run time based on the
attributes they have registered with the Service Broker. For example, if the
user needs a text editor, the application queries the Service Broker for
available text editors and can, based on the attributes of the request, match
anything from a WYSIWYG word processor to a programmer's editor. You don't
have to specify the editor by specific name or network ID. It is possible for
a service to be selected by a consumer based on only one or two of the many
primitive functions it might have to offer.
Applications establish sessions with the Service Broker by creating a vsession
and setting the appropriate attributes. Any vsession can be used to establish
a session with the Service Broker. The appropriate attributes would be
contained in the scrap returned by vdasservCreateAttributeScrap. Example 6
shows how an application establishes a session with a Service Broker by hand.
If you use the vdasservRegister-Service and vdasservUnregisterService
functions, you don't have to establish a session explicitly. 
An application can find available service providers by sending the Service
Broker the statement returned by vdasservMakeMatchStatement. The statement
takes two arguments: a vsession (with a broker), and a dictionary scrap that
specifies some of the required attributes and their values. An array of
providers that match the attributes is returned. Any provider matches the
request if it has all the attributes in the specification scrap with
appropriate values. The resulting array is not ordered and may contain
providers that were registered on other service brokers known to the service
broker that handled the request. Example 7 shows how to locate all of the
services known to a service provider.


Conclusion


The DAS facility in Galaxy provides an infrastructure of abstractions for
building platform-independent distributed applications. This facility has been
field proven through extensive use in large-scale, complex applications across
a broad range of industries. 


For More Information


Galaxy 2.0
Visix Software 
11440 Commerce Park Drive
Reston, VA 22091
703-758-8230 
Figure 1 DAS facilities are analogous to the top three layers of the OSI
protocol stack.
Figure 2 An architecture of DAS allows access at various levels of
abstraction.
Figure 3 The service-provider/service-consumer model.
Example 1: (a) Application data structure to be converted; (b) converting an
application data structure into a different data representation.
(a) typedef struct myStruct myStruct;
 struct myStruct
 {
 short first;
 vlonglong second;
 double third;
 vfixed fourth;
 };

(b) vscrap * myCreateStructScrap (myStruct *ptr)

 {
 vdatatag *dtag;
 vscrap *scrap;
 dtag = vdatatagCreateCompound();
 vdatatagConstructCompound(dtag,
 vdatatagGetShort(),
 vdatatagGetLonglong(),
 vdatatagGetDoubleFloat(),
 vdatatagGetFixed(),
 NULL);
 scrap = vdatatagScrapFromValue(dtag, ptr);
 return (scrap);
 }
Example 2: (a) A linked list expressed as a C struct; (b) a linked list
described using datatags.
(a)

typedef struct barney barney;
struct barney
{
 unsigned int purple;
 float dinosaur;
 vfixed beast;
 barney *next;
};
barney *listTop;

(b)

ptr = vdatatagCreatePointer();
dtag = vdatatagcreateCompound();
vdatatagConstructCompound(dtag,

vdatatagGetUnsignedInteger(),
 vdatatagGetSingleFloat(),
 vdatatagGetFixed(),
 ptr,
 NULL);
vdatatagSetPointerDatatag
 (ptr, dtag);
Example 3: Using the datatag to describe the linked list: (a) as a service
consumer; (b) as a service provider.
(a)

vsessionSetStatementArgs(session,
 listTop);

(b)

vserviceGetPrimitiveArgs(prim,
 scrap, &listTop);
Example 4: Requested as a network service, a color specification has been
converted to a resource and then encapsulated as a scrap; (a) the
service-consumer side; (b) the service-provider side.
(a)
 vcolorspec *colorspec;
vresource res;
vscrap *scrap;
res = vresourceCreateMem();
vcolorStoreSpec(spec, res);
scrap = vresourceGetScrap(res, NULL);
/* signature for statement has vdatatagGetScrap()
 * as argument datatag */

vsessionSetStatementArgs(session, scrap);
vscrapDestroy(scrap);
vresourceDestroyMem(res);

(b)
 vresource res, one;
vcolorSpec *spec;
vscrap *scrap;
/* signature for primitive has vdatatagGetScrap()
 * as argument datatag */
vserviceGetPrimitiveArgs(prim, args, &scrap);
res = vresourceCreateMem();
one = vresourceCreate(res, vname_Foo, vresourceUNTYPED);
vresourceSetScrap(one, NULL, scrap);
spec = vcolorLoadSpec(one);
vscrapDestroy(scrap);
vresourceDestroyMem(res)
Example 5: Datatag objects are used to construct a signature for a primitive.
{
 vsignature sig;
 vsignatureCreate();
 vsignatureSetTag(sig, vnameInterGlobalLiteral("my primitive"));
 vsignatureSetReturnDatatag(sig, vdatatagGetInteger());
 vsignatureConstructArgs(sig, vdatatagGetShort(),
 vdatatagGetFixed(),
 vdatatagGetLongLong(),
 vdatatagGetString(),
 vdatatagGetInteger(),
 NULL);
}
Example 6: Establishing a session with a Service Broker.
{
 vsession *sessionBroker;
 sessionBroker = vsessionCreate();
 vsessionSetAtrributesFromScrap(sessionBroker,
 vdasserCreateAttributeScrap());
 vsessionBegin(sessionBroker);
 ...
 vsessionEnd(sessionBroker);
}
Example 7: Locating services known to a Service Broker.
{
 vsessionStatement *statement;
 vsession *sessionBroker;
 vscrap *scrapSpec;
 // initialize sessionBroker
 scrapSpec = vscrapCreateDictionary();
 statement = vdasservMakeMatchStatement(sessionBroker, scrapSpec);
 vessionSetStatementNotify(statement, exampleMatchNotify);
 vessionSendStatement(statement);
 ...
}











PROGRAMMING PARADIGMS


The Shooting Spree at the Soiree and other Light Subjects




Michael Swaine


According to one theory, the Industrial Revolution created the phenomenon of
the mass murderer. A mind meant to be free but shackled to a relentlessly
repetitive task, so the theory goes, will eventually snap under the strain.
Mayhem at McDonald's. Panic at the Post Office.
According to another theory, endless repetition just makes people boring and
self-absorbed. Think of all those senile old geezers at family gatherings when
you were a kid, reminiscing about their days on the line at Ford.
I've written this column 81 times over the past seven years. I wonder which
way I'll go when I finally crack. Maybe that won't happen, because the way
I've approached writing the column makes it considerably less repetitive than
that assembly line at Ford.


On Ambiguity


When I started writing "Programming Paradigms" back in 1988, I chose the title
partly because the word "paradigm" was so ambiguous. I didn't want to be
shackled to some implicit or explicit "contract with DDJ readers" in which I
promised to publish clever TSRs and all the breaking news on Forth development
environments while simultaneously balancing the Federal budget. I'd fallen
into that trap in my previous job, promising that I would never abandon CP/M.
"Just keep the column unfettered," I thought, "a kite to the winds of changing
times."
I didn't know back then that I was being Clintonesque.
Not everybody embraces ambiguity. One reader regularly clips and sends me the
most egregious uses of the word "paradigm" in the press, apparently on the
theory that I am responsible for all uses of the word. This may be the same
reader who once pointed out to me that encouraging semantic ambiguity in the
use of a word devalues it like a Cold-Peace ruble or a Depression dollar, till
a pair o' dimes ain't worth two bits. Me, I like ambiguity.
Philosopher Thomas Kuhn, whose book, The Structure of Scientific Revolutions,
popularized the word "paradigm," at least among people among whom books like
his are popular, gloated about the flamboyant flexibility with which he used
the word. "One sympathetic reader," he said in a postscript to a late edition
of the book, "who shares my conviction that 'paradigm' names the central
philosophical elements of the book, prepared a partial analytical index and
concluded that the term is used in at least twenty-two different ways."
Twenty-two different ways! Kuhn is my hero.
And sure enough, the title has allowed me room to range widely. So widely that
a casual reader, picking up the magazine occasionally, might be confused about
exactly what the focus of this column is.
Well, with over 80 episodes in the can, the question can be analyzed
historically. Statistically, even. I did that. Ran the whole series of columns
through a battery of tests. Probed and poked. Squeezed out their essence.
Found out what I've been writing about all these years.
Turns out I have been writing four distinct columns under the title
"Programming Paradigms." 


On Paradigms Past


This seems fine to me; anyhow, it's a whole lot less than 22. But I can see
how it could be confusing to that casual reader, so I have mapped out the
varieties of meaning behind the title "Programming Paradigms." The four
columns you have been reading under the banner of "Programming Paradigms" are:
Paradigms past. Last month, I dug into programming history, searching for the
origins of the compiler. The previous month, I wrote about Ada Lovelace and
Steve Dompier, asking what it really meant to be a programmer.
I got some feedback on that "Nature of the Programmer" column from Bill Walker
of San Diego, who challenged my claim that programming is somehow unusual in
that the users of the tools are also the tool makers. "That's true," Bill
says, "if you look only at contemporary professions, but not if you look with
an eye to history. In their early days, many professions created their own
tools. I'd be surprised if, as you say, software development doesn't become
more and more specialized, just as other professions have. We've already
reached the point where many programmers have never worked in assembly
language. How long can it be before a significant number of programmers use
only visual programming tools and never write a line of code in a high-level
language?"
Well said, Bill. My own attitude here is, um, ambiguous.
On the one hand, it's been a long time since I wrote anything in assembly
language, and, as a weekend programmer with modest skills and a short
attention span, I need very high-level tools, like visual programming
languages (VPLs), if I'm going to get anything done. More generally, some
people are probably going to have to specialize in very high-level
programming, using tools developed by others, if the debacle of the Denver
airport baggage-handling system is not to continue to be the all-too-common
example of Big Software Development.
On the other hand, software development may be crucially different from any
earlier profession. The ability to create the tools of our own trade may be
much more important in software development, because software itself is
different from any previous artifact of human labor. It probably didn't matter
much when printers stopped building printing presses, if they ever did that.
It could matter a lot if software development becomes so specialized that
programmers no longer, even in principle, control the tools they use. I
continue to worry about this.
Running lite. The term "Running Light" has a long and honored history in this
magazine. It means programming efficiently, saving bytes and/or cycles.
"Running Lite" is about something less honored.
Running Lite is VPLs, scripting systems, prototyping tools. Code that gets
today's job done. The quick and dirty, the slow and easy, the pretty and
superficial. Kitchen-table programming, programming for that weekend
programmer with modest skills and a short attention span. It's also peripheral
concerns, like "Can I make any money writing Newton applications in my spare
time?" It's anything but the Real Thing.
"Running Lite" is to "Running Light" as "lite" is to "light."
If Bill Walker is right, it won't be long before there are hordes of lite
programmers using only VPLs and never writing a line of code in a high-level
language. Of course, as Mike Floyd reminds me, this presupposes the existence
of actual visual-programming languages. Today, visual programming is strong on
interface creation and marketing hype but doesn't live up to its promises when
you start writing down to the bones of your application.
Interviews. Granted, I don't actually write interviews, except for the
questions. But if you've ever had your unrehearsed spoken words pressed
verbatim into print and published indiscriminately, you may agree that
interview editing is an art. You may also share my skepticism about the
desirability, if not the possibility, of automatic speech recognition.
The bleeding edge. Artificial intelligence. Neural nets. Parallel algorithms.
Next year's operating systems. (If the brave and bloodied purchasers of
version 1.0 of a piece of software are on the bleeding edge, are pioneering
Pentium purchasers on the bleeding etch?)
This month's column seems to be about itself, but that's a sham. It's now
going to take a sharp turn in the direction of Running Lite, with a lite look
at making video and at the lite, third-party software market for that lite
computer, the Newton MessagePad.


On QuickCam: Video Lite


Connectix of San Mateo, California, is selling a digital camera for the Mac
and, probably by the time you read this, for Windows. Street price for the Mac
version is about $100. It requires no additional hardware, sending live
digital video to the computer via the serial port. It's called "QuickCam," and
it's the size and shape of a tennis ball. There's even a microphone built into
the ball. It comes with software that lets you shoot continuous video, with
sound, and save it as QuickTime files, or shoot still pictures and save them
as TIFF files. Other apps let you create a digital photo album or turn your
video into After Dark screen saver modules.
You're not going to produce Hollywood movies with this thing. You'll get small
movies at maybe 15 frames per second, and they'll be in 16 levels of gray.
This is definitely video lite.
All it has going for it are a ridiculously low price, effortless setup and
use, and extreme portability. Connectix suggests uses such as
videoconferencing, laser-printer-quality publishing (like resumes and
snail-mail letters), video-enhanced e-mail, verifying inventory, monitoring
the baby, or making training materials. Because it outputs QuickTime, any
application that deals with video on the Mac can use its movies; for example,
QuickCam movies can be placed in HyperCard stacks and controlled by the Movie
XCMD.
I've been testing QuickCam for a few weeks, around the house and yard, in the
office, and at a writing soiree where I shot all the participants. I have
learned that it is hard to aim a tennis ball.
Other than that, I have no complaints. QuickCam performs as advertised and is
a great bargain. Now I want color.



On Newton: Market Lite


It's not enough to have a perspective on the PDA or the pen-based market. To
make any kind of sense of it, you need several perspectives. You need to look
at what it is today, what it could be tomorrow, and the odds against that
tomorrow. It's appropriate to be a critic, savvy to be a believer, rash to be
an investor.
Of course, it doesn't make sense, ultimately, to lump together the PDA and the
pen-based computing markets as I just did. But, while the two markets are
potentially divergent, at the moment each of them consists of: 1. Newton and
2. background noise. Newton MessagePad sales, though unimpressive, far
outdistance the competition. The dominance of a small market is a fragile
thing, though, and Newton could become relatively less important as the
markets grow. But for now, Newton is where the sales and development work are
happening.
What development work am I talking about?
As of last September, when a compendium was prepared, there were over 100
commercial products that Apple knew about. This breaks down into some 50
companies each fielding a product or two and distributing independently, plus
30-some products distributed through Apple's StarCore.
What companies? There are a few established software companies, mostly ones
that have experience writing "lite" software, like utilities, and mostly from
the Apple universe: CE Software, Dubl-Click Software, Great Plains Software.
There are the companies that were founded on a belief in the pen-based market:
FingerTip Technologies, Palm Computing, Slate. And there are the vertical
intrusions, companies that saw potential applications of Newton technology in
their specialized markets: Educational Research Laboratories, HealthCare
Communications, KPMG Peat Marwick.
That's the commercial picture. The shareware market is very active, with
literally hundreds of products out there already, showing the predictable
range in quality from Doesn't Actually Work/Crashes System Routinely to As
Good As Mother Makes. 


Developments in Newton Development


Among the better shareware products are some good developer utilities, ranging
from Jason Harper's BeamToSelf, which lets you develop infrared beaming apps
without buying two Newtons, to Chris Christensen's memory-manipulation
utilities. There are also a lot of debugging aids, communications tools, and
code examples in the free/shareware directories on GENie, AOL, eWorld,
CompuServe, and on the net at comp.sys.newton.
The mandatory development environment for Newton development, though, is still
Apple's Newton Toolkit (NTK), which is still priced for committed commercial
and corporate developers at about $700. NewtonScript, a
Newton-platform-specific object-oriented language, is the language of NTK.
But alternatives are emerging. As I write this, it's still not possible to
write Newton applications in C or C++ (or even Pascal), but you can program
the device in LISP, Basic, or a LOGO-like language called "Newt." The initial
versions of these third-party language implementations are all pretty
rudimentary, but it appears that it's about as easy to put a new language on a
Newton as it was to put Tiny Basic on homebrew computers back in 1976.
Note that I said on a Newton.
NTK is a Macintosh application, with a Windows version hinted at Real Soon
Now, but these languages run on the Newton. It might seem to make no sense to
try to write programs on a Newton, even though handwriting recognition has
matured to the point that Palm Computing is claiming 100 percent recognition
at 30 words per minute with its recognition system for the Newton and other
pen-based systems. But there is a nice shareware product that makes using a
Newton for development at least a little less crazy: Typo turns any connected
computer into a keyboard for the Newton. Doesn't have to be a Mac: Any machine
running any communications software and connected via a serial cable can
become the Newton's keyboard.
These developments are maybe interesting, maybe suggestive of further
possibilities, but they are not C. I was trading phone messages with a
developer last fall who had a massive project that he had ported to various
UNIX systems, the Mac, and the PC, and now wanted to move to the Newton. It
took several messages before I got him to realize that this was simply not
possible. He could rewrite it from scratch in NewtonScript, or I could, but
there was no way to port it, because Newton does not support C.
That may be changing, in a small way. Apple is apparently accelerating and
extending its plan to open up Newton programming on a module-by-module basis.
NewtonScript, an interpreted language, will remain the language in which apps
are written, but it will be possible to compile selected chunks of code, and
there is talk of providing hooks for modules written in other languages.
In defense of the current approach, it's worth noting that: 1. NewtonScript
isn't that hard to learn; 2. the platform is so different that porting whole
apps doesn't make much sense anyway; 3. NewtonScript does a lot of the
interface work for the developer; and 4. Newton apps are necessarily small and
simple and can be developed in far less time than commercial computer apps.
But NewtonScript is interpreted and can be criminally slow. Providing the
ability to compile selected code is something Apple really needs to do.


Apple Gives Newton Swimming Lessons


There are a lot of things Apple needs to do. I think Apple has a mixed record
on developer support for Newton.
Admittedly, these are unfamiliar waters.
What's interesting about the products available for the Newton today is that
these are all small products, developed over a short time, with small
investments. It's a different financial model, with a different profitability
picture. Old models don't work here, and Apple is learning along with the
developers, I suppose.
But should Apple teach this baby to swim by throwing it in unfamiliar waters
and letting it learn by floundering?
As one developer whose company isn't yet supporting Newton pointed out to me,
Apple isn't talking to Newton users. It seems to do more promotion of
HyperCard add-ons than of Newton third-party products.
Ah, well. It's a new market, and it's still small. At best, a few horizontal
vendors are making money.
But the vertical strategy looks very promising, at least for those who already
own their target customer lists. Publishers of professional magazines and
newsletters and organizations and so forth would seem to have an edge, which
could pay off for those who are persistent. Apple is doing the right thing in
picking vertical markets to push, the medical market in particular.
Some have criticized Apple for the high price of the NTK, saying that it costs
four times the street price of the machine itself. I don't agree that it makes
sense to compare price of MessagePad to the price of the NTK, but I do think
Apple made a mistake in its recent revamping of its developer-support program.
Restricting the developer CD, formerly free to all NTK purchasers, was a
questionable decision. It sends a bad message when you discontinue any
developer-support program. Sure, big developers who are in it for the long run
will buy into a subscription program; but not all Newton developers are big
guys.
Some of these developers are not rich enough to be an income source for you,
Apple. That doesn't mean they aren't important in building markets and
credibility for Newton. Don't overlook developers just because they're running
lite.


























C PROGRAMMING


Alexander Stepanov and STL




Al Stevens


I am taking a break from the current "C Programming" project to present an
interview with Alexander Stepanov, the creator of the Standard Template
Library (STL), which ANSI/ISO has approved as a part of Standard C++. Alex
heads the Generic Programming Project at Hewlett-Packard Research Laboratories
in Palo Alto, California, and STL is the product of the research being
conducted by his team. 
STL is an important addition to the evolution of Standard C++, not only for
the rich library of container classes that it provides, but also for its
insight into "generic programming," a different approach to designing and
developing code. You C++ programmers should pay close attention to these
developments, which will surely influence your methods and techniques. I will
discuss STL and generic programming in more detail next month.
We met in a conference room in the main building of the HP Labs complex. I
wanted to learn more about generic programming and about how STL, a
significant body of code and a radical departure from conventional
class-library design, had won the overwhelming approval of the usually
conservative and cautious committee.
DDJ: Tell us something about your long-term interest in generic programming.
AS: I started thinking about generic programming in the late '70s when I
observed that some algorithms depended, not on some particular implementation
of a data structure, but only on a few fundamental semantic properties of the
structure. I started going through many different algorithms, and I found that
most algorithms can be abstracted away from a particular implementation in
such a way that efficiency is not lost. Efficiency is a fundamental concern of
mine. It is silly to abstract an algorithm in such a way that when you
instantiate it back, it becomes inefficient.
At that time, I thought that the right way of doing this kind of research was
to develop a programming language, which is what I started doing with two of
my friends: Deepak Kapur, who at present is a professor at State University of
New York, Albany; and David Musser, professor at Rensselaer Polytechnic
Institute. At that time the three of us worked at the General Electric
Research Center at Schenectady, New York. We started working on a language
called "Tecton," which would allow people to describe algorithms associated
with what we called "generic structures," which is just a collection of formal
types and properties of these types. Sort of mathematical stuff. We realized
that one can define an algebra of operations on these structures, refine them,
enrich them, and do all sorts of things.
There were some interesting ideas, but the research didn't lead to practical
results because Tecton was functional. We believed Backus's idea that we
should liberate programming from the von Neumann style, and we didn't want to
have side effects. That limited our ability to handle very many algorithms
that require the notions of "state" and "side effects."
The interesting thing about Tecton, which I realized sometime in the late
'70s, was that there was a fundamental limitation in the accepted notion of an
abstract data type. People usually viewed abstract data types as something
which tells you only about the behavior of an object and the implementation is
totally hidden. It was commonly assumed that the complexity of an operation is
part of implementation and that abstraction ignores complexity. One of the
things that is central to generic programming as I understand it now, is that
complexity, or at least some general notion of complexity, has to be
associated with an operation. 
Let's take an example. Consider an abstract data-type stack. It's not enough
to have Push and Pop connected with the axiom wherein you push something onto
the stack and, after you pop the stack, you get the same thing back. It is of
paramount importance that pushing the stack is a constant-time operation,
regardless of the size of the stack. If I implement the stack so that every
time I push it becomes slower and slower, no one will want to use this stack. 
We need to separate the implementation from the interface but not at the cost
of totally ignoring complexity. Complexity has to be and is a part of the
unwritten contract between the module and its user. The reason for introducing
the notion of abstract data types was to allow interchangeable software
modules. You cannot have interchangeable modules unless these modules share
similar complexity behavior. If I replace one module with another module with
the same functional behavior but with different complexity trade-offs, the
user of this code will be unpleasantly surprised. I could tell him anything I
like about data abstraction, and he still would not want to use the code.
Complexity assertions have to be part of the interface.
Around 1983, I moved from GE Research to the faculty of the Polytechnic
University, formerly known as Brooklyn Polytechnic, in New York. I started
working on graph algorithms. My principal collaborator was Aaron Kershenbaum,
now at IBM Yorktown Heights. He was an expert in graph and network algorithms,
and I convinced him that some of the ideas of high order and generic
programming were applicable to graph algorithms. He had some grants and
provided me with support to start working with him to apply these ideas to
real network algorithms. He was interested in building a toolbox of high-order
generic components so that some of these algorithms could be implemented,
because some of the network algorithms are so complex that they are
theoretically analyzed, but never implemented. I decided to use a dialect of
Lisp called "Scheme" to build such a toolbox. Aaron and I developed a large
library of components in Scheme demonstrating all kinds of programming
techniques. Network algorithms were the primary target. Later, Dave Musser,
who was still at GE Research, joined us, and we developed even more
components, a fairly large library. The library was used at the university by
graduate students, but was never used commercially. I realized during this
activity that side effects are important, because you cannot really do graph
operations without side effects. You cannot replicate a graph every time you
want to modify a vertex. Therefore, the insight at that time was that you can
combine high-order techniques when building generic algorithms with
disciplined use of side effects. Side effects are not necessarily bad; they
are bad only when they are misused.
In the summer of 1985, I was invited back to GE Research to teach a course on
high-order programming. I demonstrated how you can construct complex
algorithms using this technique. One of the people who attended was Art Chen,
then the manager of the Information Systems Laboratory. He was sufficiently
impressed to ask me if I could produce an industrial-strength library using
these techniques in Ada, provided that I would get support. Being a poor
assistant professor, I said yes, even though I didn't know any Ada at the
time. I collaborated with Dave Musser in building this Ada library. It was an
important undertaking, because switching from a dynamically typed language
such as Scheme, to a strongly typed language such as Ada allowed me to realize
the importance of strong typing. Everybody realizes that strong typing helps
in catching errors. I discovered that strong typing, in the context of Ada
generics, was also an instrument of capturing designs. It was not just a tool
to catch bugs; it was also a tool to think. That work led to the idea of
orthogonal decomposition of a component space. I realized that software
components belong to different categories. Object-oriented programming
aficionados think that everything is an object. When I was working on the Ada
generic library, I realized that this wasn't so. There are things that are
objects: Things that have state and change their state are objects. And then
there are things that are not objects. A binary search is not an object. It is
an algorithm. Moreover, I realized that by decomposing the component space
into several orthogonal dimensions, we can reduce the number of components,
and, more importantly, we can provide a conceptual framework of how to design
things.
Then I was offered a job at Bell Laboratories working in the C++ group on C++
libraries. They asked me whether I could do it in C++. Of course, I didn't
know C++ and, of course, I said I could. But I couldn't do it in C++, because
in 1987 C++ didn't have templates, which are essential for enabling this style
of programming. Inheritance was the only mechanism to obtain genericity, and
it was not sufficient.
Even now, C++ inheritance is not of much use for generic programming. Let's
discuss why. Many people have attempted to use inheritance to implement data
structures and container classes. As we know now, there were few, if any,
successful attempts. C++ inheritance and the programming style associated with
it are dramatically limited. It is impossible to implement a design which
includes as trivial a thing as equality using it. If you start with a base
class X at the root of your hierarchy and define a virtual-equality operator
on this class, which takes an argument of the type X, then derive class Y from
class X. What is the interface of the equality? It has equality which compares
Y with X. Using animals as an example (object-oriented people love animals),
define mammal and derive giraffe from mammal. Then define a member function
mate, where animal mates with animal and returns an animal. Then you derive
giraffe from animal and, of course, it has a function mate, where giraffe
mates with animal and returns an animal. It's definitely not what you want.
While mating may not be very important for C++ programmers, equality is. I do
not know a single algorithm where equality of some kind is not used.
You need templates to deal with such problems. You can have template class
animal, which has member function mate, which takes animal and returns animal.
When you instantiate giraffe, mate will do the right thing. The template is a
more powerful mechanism in that respect.
However, I was able to build a rather large library of algorithms, which later
became part of the Unix System Laboratory Standard Component Library. I
learned a lot at Bell Labs by talking to people like Andy Koenig and Bjarne
Stroustrup about programming. I realized that C/C++ is an important
programming language with some fundamental paradigms that cannot be ignored.
In particular, I learned that pointers are very good. I don't mean dangling
pointers. I don't mean pointers to the stack. But I mean that the general
notion of pointer is a powerful tool. The notion of address is universally
used. It is incorrectly believed that pointers make our thinking sequential.
That is not so. Without some kind of address, we cannot describe any parallel
algorithm. If you attempt to describe an addition of n numbers in parallel,
you cannot do it unless you can talk about the first number being added to the
second number, while the third number is added to the fourth number. You need
some kind of indexing. You need some kind of address to describe any kind of
algorithm, sequential or parallel. The notion of an "address" or a "location"
is fundamental in our conceptualizing computational processes--algorithms.
Let's consider now why C is a great language. It is commonly believed that C
is a hack which was successful because UNIX was written in it. I disagree.
Over a long period of time computer architectures evolved, not because of some
clever people figuring how to evolve architectures--as a matter of fact,
clever people were pushing tagged architectures during that period of
time--but because of the demands of different programmers to solve real
problems. Computers that were able to deal just with numbers evolved into
computers with byte-addressable memory, flat address spaces, and pointers.
This was a natural evolution, reflecting the growing set of problems that
people were solving. C, reflecting the genius of Dennis Ritchie, provided a
minimal model of the computer that had evolved over 30 years. C was not a
quick hack. As computers evolved to handle all kinds of problems, C, being the
minimal model of such a computer, became a very powerful language to solve all
kinds of problems in different domains very effectively. This is the secret of
C's portability: It is the best representation of an abstract computer that we
have. Of course, the abstraction is done over the set of real computers, not
some imaginary computational devices. Moreover, people could understand the
machine model behind C. It is much easier for an average engineer to
understand the machine model behind C than the machine model behind Ada or
even Scheme. C succeeded because it was doing the right thing, not because of
AT&T promoting it or UNIX being written with it.
C++ is successful because instead of trying to come up with some machine model
invented by just contemplating one's navel, Bjarne started with C and tried to
evolve C further, allowing more general programming techniques, but within the
framework of this machine model. The machine model of C is very simple. You
have the memory where things reside. You have pointers to the consecutive
elements of the memory. It's very easy to understand. C++ keeps this model,
but makes things that reside in the memory more extensive than in the C
machine, because C has a limited set of data types. It has structures that
allow a sort of extensible type system, but it does not allow you to define
operations on structures. This limits the extensibility of the type system.
C++ moved C's machine model much further toward a truly extensible type
system.
In 1988, I moved to HP Labs, where I was hired to work on generic libraries.
For several years, instead of doing that, I worked on disk drives, which was
exciting but was totally orthogonal to this area of research. I returned to
generic-library development in 1992, when Bill Worley, who was my lab
director, established an algorithms project with me as its manager. C++ had
templates by then. I discovered that Bjarne had done a marvelous job at
designing templates. I had participated in several discussions early on at
Bell Labs about designing templates and argued rather violently with Bjarne
that he should make C++ templates as close to Ada generics as possible. I
think that I argued so violently that he decided against that. I realized the
importance of having template functions in C++ and not just template classes,
as some people believed. I thought, however, that template functions should
work like Ada generics, that is, that they should be explicitly instantiated.
Bjarne did not listen to me, and he designed a template-function mechanism
where templates are instantiated implicitly using an overloading mechanism.
This particular technique became crucial for my work because I discovered that
it allowed me to do many things that were not possible in Ada. I view this
particular design by Bjarne as a marvelous piece of work, and I'm very happy
that he didn't follow my advice.
DDJ: When did you first conceive of the STL, and what was its original
purpose?
AS: In 1992, when the project was formed, there were eight people in it.
Gradually the group diminished, eventually becoming two people: me and Meng
Lee. While Meng was new to the area--she was doing compilers for most of her
professional life--she accepted the overall vision of generic-programming
research and believed that it could lead to changing software development at
the point when very few people shared this belief. I do not think that I would
have been able to build STL without her help (after all, STL stands for
STepanov and Lee_). We wrote a huge library--a lot of code with a lot of data
structures and algorithms, function objects, adaptors, and so on. There was a
lot of code, but no documentation. Our work was viewed as a research project
with the goal of demonstrating that you can have algorithms defined as
generically as possible and still be extremely efficient. We spent a lot of
time taking measurements, and we found that we can make these algorithms as
generic as they can be and still be as efficient as handwritten code. There is
no performance penalty for this style of programming! The library was growing,
but it wasn't clear where it was heading as a project. It took several
fortunate events to lead it toward STL.
DDJ: When and why did you decide to propose STL as part of the ANSI/ISO
Standard C++ definition?
AS: During the summer of 1993, Andy Koenig came to teach a C++ course at
Stanford. I showed him some of our stuff, and I think he was genuinely excited
about it. He arranged an invitation for me to give a talk at the November
meeting of the ANSI/ISO C++ Standards Committee in San Jose. I gave a talk
entitled, "The Science of C++ Programming." The talk was rather theoretical.
The main point was that there are fundamental laws connecting basic operations
on elements of C++, which have to be obeyed. I showed a set of laws that
connect very primitive operations such as constructors, assignment, and
equality. C++ as a language does not impose any constraints. You can define
your equality operator to do multiplication. But equality should be equality,
and it should be a reflexive operation. A should be equal to A. It should be
symmetric. If A is equal to B, then B should be equal to A. And it should be
transitive. Standard mathematical axioms. Equality is essential for other
operations. There are axioms that connect constructors and equality. If you
construct an object with a copy constructor out of another object, the two
objects should be equal. C++ does not mandate this, but this is one of the
fundamental laws that we must obey. Assignment has to create equal objects. So
I presented a bunch of axioms that connected these basic operations. I talked
a little bit about axioms of iterators and showed some of the generic
algorithms working on iterators. It was a two-hour talk and, I thought, rather
dry. However, it was very well received. I didn't think at that time about
using this thing as a part of the standard because it was commonly perceived
that this was some kind of advanced programming technique which would not be
used in the "real world." I thought there was no interest at all in any of
this work by practical people.
I gave this talk in November, and I didn't think about ANSI at all until
January. On January 6, I got a mail message from Andy Koenig, who is the
project editor of the standard document, saying that if I wanted to make my
library a part of the standard, I should submit a proposal by January 25. My
answer was, "Andy, are you crazy?" to which he answered, "Well, yes I am
crazy, but why not try it?"
At that point there was a lot of code but there was no documentation, much
less a formal proposal. Meng and I spent 80-hour weeks to come up with a
proposal in time for the mailing deadline. During that time the only person
who knew it was coming was Andy. He was the only supporter, and he did help a
lot during this period. We sent the proposal out and waited. While doing the
proposal we defined a lot of things. When you write things down, especially
when you propose them as a standard, you discover all kinds of flaws with your
design. We had to reimplement every single piece of code in the
library--several hundred components--between the January mailing and the next
meeting in March in San Diego. Then we had to revise the proposal, because
while writing the code, we discovered many flaws. 
DDJ: Can you characterize the discussions and debate in the committee
following the proposal? Was there immediate support? Opposition?
AS: We did not believe that anything would come out of it. I gave a talk,
which was very well received. There were a lot of objections, most of which
took this form: This is a huge proposal, it's way too late, a resolution had
been passed at the previous meeting not to accept any major proposals, and
here is this enormous thing, the largest proposal ever, with a lot of totally
new things. The vote was taken, and, interestingly enough, an overwhelming
majority voted to review the proposal at the next meeting and put it to a vote
at the next meeting in Waterloo, Ontario.
Bjarne Stroustrup became a strong supporter of STL. A lot of people helped
with suggestions, modifications, and revisions. Bjarne came here for a week to
work with us. Andy helped constantly. C++ is a complex language, so it is not
always clear what a given construct means. Almost daily I called Andy or
Bjarne to ask whether such-and-such was doable in C++. I should give Andy
special credit. He conceived of STL as part of the standard library. Bjarne
became the main pusher of STL on the committee. There were other people who
were helpful: Mike Vilot, the head of the library group, Nathan Myers of Rogue
Wave, Larry Podmolik of Andersen Consulting. There were many others.
The STL as we proposed it in San Diego was written in present C++. We were
asked to rewrite it using the new ANSI/ISO language features, some of which
are not implemented. There was an enormous demand on Bjarne and Andy's time
trying to verify that we were using these nonimplemented features correctly.
People wanted containers independent of the memory model, which was somewhat
excessive because the language doesn't include memory models. People wanted
the library to provide some mechanism for abstracting memory models. Earlier
versions of STL assumed that the size of the container is expressible as an
integer of type size_t and that the distance between two iterators is of type
ptrdiff_t. And now we were told, why don't you abstract from that? It's a tall
order because the language does not abstract from that; C and C++ arrays are
not parameterized by these types. We invented a mechanism called "allocator,"
which encapsulates information about the memory model. That caused grave
consequences for every component in the library. You might wonder what memory
models have to do with algorithms or the container interfaces. If you cannot
use things like size_t, you also cannot use things like T* because of
different pointer types (T*, T huge *, and so on). Then you cannot use
references because with different memory models you have different reference
types. There were tremendous ramifications on the library.
The second major thing was to extend our original set of data structures with
associative data structures. That was easier, but coming up with a standard is
always hard because we needed something which people would use for years to
come for their containers. STL has, from the point of view of containers, a
very clean dichotomy. It provides two fundamental kinds of container classes:
sequences and associative containers. They are like regular memory and
content-addressable memory. It has a clean semantics explaining what these
containers do. We needed to implement associative data structures in a short
period of time, and Dave Musser came up with a robust implementation. Dave
contributed in diverse and sundry ways to all aspects of STL.
When I arrived at Waterloo, Bjarne spent a lot of time explaining to me that I
shouldn't be concerned, that most likely it was going to fail, but that we did
our best, we tried, and we should be brave. The level of expectation was low.
We expected major opposition. There was some opposition but it was minor. When
the vote was taken in Waterloo, it was totally surprising because it was maybe
80 percent in favor and 20 percent against. Everybody expected a battle,
everybody expected controversy. There was a battle, but the vote was
overwhelming.
DDJ: What effect does STL have on the class libraries published in the
ANSI/ISO February 1994 working paper?
AS: STL was incorporated into the working paper in Waterloo. The STL document
is split apart, and put in different places of the library parts of the
working paper. Mike Vilot is responsible for doing that. I do not take an
active part in the editorial activities. I am not a member of the committee
but every time an STL-related change is proposed, it is run by me. The
committee is very considerate.
DDJ: Several template changes have been accepted by the committee. Which ones
have impact on STL?
AS: Prior to the acceptance of STL, there were two changes that were used by
the revised STL. One is the ability to have template member functions. STL
uses them extensively to allow you to construct any kind of a container from
any other kind of a container. There is a single constructor that allows you
to construct vectors out of lists or out of other containers. There is a
templatized constructor which is templatized on the iterator, so if you give a
pair of iterators to a container constructor, the container is constructed out
of the elements which are specified by this range. A range is a set of
elements specified by a pair of iterators, generalized pointers, or addresses.
The second significant new feature used in STL was template arguments, which
are templates themselves, and that's how allocators, as originally proposed,
were done.
DDJ: Did the requirements of STL influence any of the proposed template
changes?
AS: In Valley Forge, Bjarne proposed a significant addition to templates
called "partial specialization," which would allow many of the algorithms and
classes to be much more efficient and which would address a problem of code
size. I worked with Bjarne on the proposal, and it was driven by the need of
making STL even more efficient. Let me explain what partial specialization is.
At present, you can have a template function parameterized by class T called
swap(T&, T&). This is the most generic possible swap. If you want to
specialize swap and do something different for a particular type, you can have
a function swap(int&, int&), which does integer swapping in some different
way. However, it was not possible to have an intermediate partial
specialization, that is, to provide a template function of the form template
<class T> void swap(vector<T>&, vector<T>&);. This form provides a special way
to swap vectors. This is an important problem from an efficiency point of
view. If you swap vectors with the most generic swap, which uses three
assignments, vectors are copied three times, which takes linear time. However,
if we have this partial specialization of swap for vectors that swap two
vectors, then you can have a fast, constant time operation that moves a couple
of pointers in the vector headers. That would allow sort, for example, to work
on vectors of vectors much faster. With the present STL, without partial
specialization, the only way to make it work faster is for any particular kind
of vector, such as vector<int>, to define its own swap, which can be done, but
which puts a burden on the programmer. In very many cases, partial
specialization would allow algorithms to be more effective on some generic
classes. You can have the most generic swap, a less generic swap, an even less
generic swap, and a totally specific swap. You can do partial specialization,
and the compiler will find the closest match. Another example is copy. At
present the copy algorithm just goes through a sequence of elements defined by
iterators and copies them one by one. However, with partial specialization we
can define a template function template <class T> T ** copy (T**, T**, T** );.
This will efficiently copy a range of pointers by using memcpy, because when
we're copying pointers, we don't have to worry about construction and
destruction and we can just move bits with memcpy. That can be done once and
for all in the library, and the user doesn't need to be concerned. We can have
particular specializations of algorithms for some of the types. That was a
very important change, and as far as I know it was favorably received in
Valley Forge and will be part of the Standard.
DDJ: What kinds of applications beyond the standard class libraries are best
served by STL?
AS: I have hopes that STL will introduce a style of programming called
"generic programming." I believe that this style is applicable to any kind of
application, that is, trying to write algorithms and data structures in the
most generic way. Specifying families or categories of such structures
satisfying common semantic requirements is a universal paradigm applicable to
anything. It will take a long time before the technology is understood and
developed. STL is a starting point for this type of programming.
Eventually, we will have standard catalogs of generic components with
well-defined interfaces, with well-defined complexities. Programmers will stop
programming at the micro level. You will never need to write a binary-search
routine again. Even now, STL provides several binary-search algorithms written
in the most generic way. Anything that is binary-searchable can be searched by
those algorithms. The minimum requirements that the algorithm assumes are the
only requirements that the code uses. I hope that the same thing will happen
for all software components. We will have standard catalogs, and people will
stop writing these things.
That was Doug McIlroy's dream when he published a famous paper talking about
component factories in 1969. STL is an example of the programming technology
which will enable such component factories. Of course, a major effort is
needed, not just research effort, but industrial effort to provide programmers
with such catalogs, to have tools which will allow people to find the
components they need, to glue the components together, and to verify that
their complexity assumptions are met.
DDJ: STL does not implement a persistent-object-container model. The map and
multimap containers are particularly good candidates for persistent-storage
containers as inverted indexes into persistent-object databases. Have you done
any work in that direction or can you comment on such implementations?
AS: This point was noticed by many people. STL does not implement persistence
for a good reason. STL is as large as was conceivable at that time. I don't
think that any larger set of components would have passed through the
standards committee. But persistence is something that several people thought
about. During the design of STL, and especially during the design of the
allocator component, Bjarne observed that allocators, which encapsulate memory
models, could be used to encapsulate a persistent memory model. The insight
was Bjarne's, and it is an important and interesting insight. Several
object-database companies are looking at that. In October 1994 I attended a
meeting of the Object Database Management Group. I gave a talk on STL, and
there was strong interest there to make the containers within their emerging
interface to conform to STL. They were not looking at the allocators as such.
Some of the members of the Group are, however, investigating whether
allocators can be used to implement persistency. I expect that there will be
persistent object stores with STL-conforming interfaces fitting into the STL
framework within the next year.
DDJ: Set, multiset, map, and multimap are implemented with a red-black-tree
data structure. Have you experimented with other structures such as B*trees?
AS: I don't think that would be quite right for in-memory data structures, but
this is something that needs to be done. The same interfaces defined by STL
need to be implemented with other data structures--skip lists, splay trees,
half-balanced trees, and so on. It's a major research activity that needs to
be done because STL provides us with a framework where we can compare the
performance of these structures. The interface is fixed. The basic complexity
requirements are fixed. Now we can have meaningful experiments comparing
different data structures to each other. There were a lot of people from the
data-structure community coming up with all kinds of data structures for that
kind of interface. I hope that they would implement them as generic data
structures within the STL framework. 

DDJ: Are compiler vendors working with you to implement STL into their
products?
AS: Yes. I get a lot of calls from compiler vendors. Pete Becker of Borland
was extremely helpful. He helped by writing some code so that we could
implement allocators for all the memory models of Borland compilers. Symantec
is going to release an STL implementation for its Macintosh compiler. Edison
Design Group has been very helpful. We have had a lot of support from most
compiler vendors.
DDJ: STL includes templates that support memory models of 16-bit MS-DOS
compilers. With the current emphasis on 32-bit, flat-model compilers and
operating systems, do you think that the memory-model orientation will
continue to be valid?
AS: Irrespective of Intel architecture, memory model is an object, which
encapsulates the information about what a pointer is, what integer sizes and
difference types are associated with this pointer, what reference type is
associated with this pointer, and so on. Abstracting that is important if we
introduce other kinds of memory such as persistent memory, shared memory, and
so on. A nice feature of STL is that the only place that mentions the
machine-related types in STL--something that refers to real pointer, real
reference--is encapsulated within roughly 16 lines of code. Everything
else--all the containers, all the algorithms--is built abstractly without
mentioning anything which relates to the machine. From the point of view of
portability, all the machine-specific things which relate to the notion of
address, pointer, and so on, are encapsulated within a tiny, well-understood
mechanism. Allocators, however, are not essential to STL, not as essential as
the decomposition of fundamental data structures and algorithms.
DDJ: The ANSI/ISO C Standards committee treated platform-specific issues such
as memory models as implementation details and did not attempt to codify them.
Will the C++ committee be taking a different view of these issues? If so, why?
AS: I think that STL is ahead of the C++ standard from the point of view of
memory models. But there is a significant difference between C and C++. C++
has constructors and operator new, which deal with memory model and which are
part of the language. It might be important now to look at generalizing things
like operator new to be able to take allocators the way STL containers take
allocators. It is not as important now as it was before STL was accepted,
because STL data structures will eliminate the majority of the needs for using
new. Most people should not allocate arrays because STL does an effective job
in doing so. I never need to use new in my code, and I pay great attention to
efficiency. The code tends to be more efficient than if I were to use new.
With the acceptance of STL, new will sort of fade away. STL also solves the
problem of deleting because, for example, in the case of a vector, the
destructor will destroy it on the exit from the block. You don't need to worry
about releasing the storage as you do when you use new. STL can dramatically
minimize the demand for garbage collection. Disciplined use of containers
allows you to do whatever you need to do without automatic memory management.
The STL constructors and destructors do allocation properly.
DDJ: The C++ Standard Library subcommittee is defining standard namespaces and
conventions for exception handling. Will STL classes have namespaces and throw
exceptions?
AS: Yes, they will. Members of the committee are dealing with that, and they
are doing a great job.
DDJ: How different from the current STL definition will the eventual standard
definition be? Will the committee influence changes, or is the design under
tighter control?
AS: It seems to be a consensus that there should not be any major changes to
STL.
DDJ: How can programmers gain an early experience with STL in anticipation of
it becoming a standard?
AS: They can download the STL header files from butler.hpl.hp.com under /stl
and use it with Borland or IBM compilers, or with any other compiler powerful
enough to handle STL. The only way to learn a style of programming is by
programming. They need to look at examples and write programs in this style.
DDJ: You are collaborating with P.J. (Bill) Plauger to write a book about STL.
What will be the emphasis of the book, and when is it scheduled to be
published?
AS: It is scheduled to be published in the summer of 1995 and is going to be
an annotated STL implementation. It will be similar to Bill's books on the
Standard C Library and the Draft Standard C++ Library. He is taking the lead
on the book, which will serve as a standard reference document on the use of
the STL. I hope to write a paper with Bjarne that will address
language/library interactions in the context of C++/STL. It might lead to
another book. 
A lot more work needs to be done. For STL to become a success, people should
do research and experiment with this style of programming. More books and
articles need to be written explaining how to program in this style. Courses
need to be developed. Tutorials need to be written. Tools need to be built
which help people navigate through libraries. STL is a framework, and it would
be nice to have a tool with which to browse through this framework.
DDJ: What is the relationship between generic programming and object-oriented
programming?
AS: In one sense, generic programming is a natural continuation of the
fundamental ideas of object-oriented programming--separating the interface and
implementation and polymorphic behavior of the components. However, there is a
radical difference. Object-oriented programming emphasizes the syntax of
linguistic elements of the program construction. You have to use inheritance,
you have to use classes, you have to use objects, objects send messages.
Generic programming does not start with the notion of whether you use
inheritance or you don't use inheritance. It starts with an attempt to
classify or produce a taxonomy of what kinds of things there are and how they
behave. That is, what does it mean that two things are equal? What is the
right way to define equality? Not just actions of equality. You can analyze
equality deeper and discover that there is a generic notion of equality,
wherein two objects are equal if their parts--or at least their essential
parts--are equal. We can have a generic recipe for an equality operation. We
can discuss what kinds of objects there are. There are sequences. There are
operations on sequences. What are the semantics of these operations? What
types of sequences from the point of view of complexity trade-offs should we
offer the user? What kinds of algorithms are there on sequences? What kinds of
different sorting functions do we need? And only after we develop that, after
we have the conceptual taxonomy of the components, do we address the issue of
how to implement them. Do we use templates? Do we use inheritance? Do we use
macros? What kind of language technology do we use?
The fundamental idea of generic programming is to classify abstract software
components and their behavior and come up with a standard taxonomy. The
starting point is with real, efficient algorithms and data structures and not
with the language. Of course, it is always embodied in the language. You
cannot have generic programming outside of a language. STL is done in C++. You
could implement it in Ada. You could implement it in other languages. They
would be slightly different, but there are some fundamental things that would
be there. Binary search has to be everywhere. Sort has to be everywhere.
That's what people do. There will be some modifications on the semantics of
the containers, slight modifications imposed by the language. In some
languages you can use inheritance more, in some languages you have to use
templates. But the fundamental difference is precisely that generic
programming starts with semantics and semantic decomposition. For example, we
decide that we need a component called "swap." Then we figure out how this
particular component will work in different languages.
The emphasis is on the semantics and semantic classification, while
object-orientedness, especially as it has evolved, places a much stronger
emphasis, and, I think, too much of an emphasis, on precisely how to develop
things, that is, using class hierarchies. OOP tells you how to build class
hierarchies, but it doesn't tell you what should be inside those class
hierarchies.
DDJ: What do you see as the future of STL and generic programming?
AS: I mentioned before, the dream of programmers having standard repositories
of abstract components with interfaces that are well understood and that
conform to common paradigms. To do that, there needs to be a lot more effort
to develop the scientific underpinnings of this style of programming. STL
starts it to some degree by classifying the semantics of some fundamental
components. We need to work more on that. The goal is to transform software
engineering from a craft to an engineering discipline. It needs a taxonomy of
fundamental concepts and some laws that govern those concepts, which are well
understood, which can be taught, which every programmer knows even if he
cannot state them correctly. Many people know arithmetic even if they never
heard of commutativity. Everybody who graduated from high school knows that
2+5 is equal to 5+2. Not all of them know that it is a commutative property of
addition. I hope that most programmers will learn the fundamental semantic
properties of fundamental operations. What does "assignment" mean? What does
"equality" mean? How to construct data structures.
At present, C++ is the best vehicle for this style of programming. I have
tried different languages and I think that C++ allows this marvelous
combination of abstractness and efficiency. However, I think that it is
possible to design a language based on C and on many of the insights that C++
brought into the world, a language which is more suitable to this style of
programming, which lacks some of the deficiencies of C++, in particular its
enormous size. STL deals with things called "concepts." What is an iterator?
Not a class. Not a type. It is a concept. (Or, if we want to be more formal,
it is what Bourbaki calls a structure "type," what logicians call a "theory,"
or what type-theory people call a "sort.") It is something which doesn't have
a linguistic incarnation in C++. But it could. You could have a language where
you could talk about concepts, refine them, and then finally form them in a
very programmatic kind of way into classes. (There are, of course, languages
that deal with sorts, but they are not of much use if you want to sort.) We
could have a language where we could define something called "forward
iterator," which is just defined as a concept in STL--it doesn't have a C++
incarnation. Then we can refine forward iterator into bidirectional iterator.
Then random iterator can be refined from that. It is possible to design a
language which would enable even far greater ease for this style of
programming. I am fully convinced that it has to be as efficient and as close
to the machine as are C and C++. And I do believe that it is possible to
construct a language that allows close approximation to the machine on the one
hand, and has the ability to deal with very abstract entities on the other
hand. I think that abstractness can be even greater than it is in C++ without
creating a gap between underlying machines. I think that generic programming
can influence language research and that we will have practical languages,
which are easy to use and are well suited for that style of programming. From
that you can deduce what I am planning to work on next. 
FigureMeng Lee and Alexander Stepanov








































ALGORITHM ALLEY


Algorithm Analysis




Micha Hofri 


Micha, who is the author of Probabilistic Analysis of Algorithms
(Springer-Verlag, 1987), can be contacted at hofri@cs.rice.edu.


Introduction 
by Bruce Schneier
Anyone can open a book and find an algorithm that sorts an array of numbers.
It's much harder to choose--or invent--the sorting algorithm that works best
for a particular job. It's about noticing the strengths and weaknesses of
different algorithms that do the same thing, and figuring out which set of
characteristics most closely matches the needs of the situation at hand.
You can see this distinction in many off-the-shelf software products. The
behemoths that most large application-software products have become take up
tens of megabytes of memory and are sluggish on all but the fastest of
computers. Like sledgehammers, they do the job--effectively, but without
flourish.
More interesting are applications on the cutting edge of computing.
Missile-guidance systems, where shaving a few clock cycles off the running
time of an algorithm can mean the difference between success and failure.
Smart cards, where complex applications are crammed into a credit-card device
with processors eight bits wide and total RAM measuring in hundreds--not
millions--of bytes.
In this month's column, Micha Hofri looks at the analysis of
algorithms--specifically, at time/memory trade-offs. With this technique, a
programmer can throw extra memory at a problem in order to improve
performance, or sacrifice performance in order to decrease memory
requirements.
Algorithms can be analyzed with qualitative and quantitative objectives.
"Qualitative analysis" answers questions about correctness, termination,
freedom from deadlock, and predictability. These questions occasionally need
quantitative calculations as well, but we reserve the latter adjective for
analysis that concerns performance, which is what this article is all about.
Since the algorithms we look at are performed in digital computers, the two
resources of importance are time and space. "Time" is the run time to
completion, and "space," the storage area required to hold code and data. Both
are always needed, but in most cases, the analysis of only one type of
resource is of practical interest. Sometimes the designer can trade off the
two to some extent. 
Analysts do not use seconds to express results about run time; it is too
implementation and platform dependent. Sometimes we count machine
instructions, but this, too, is rarely a good idea. It is better to count
"roughly similar" source instructions or just the main, dominating,
characterizing operations. For example, when analyzing sorting algorithms
performed completely in main memory, we could concentrate on counting
comparisons that are needed; we could then claim that some part of the rest is
proportional to this count, and that whatever remains is negligible. For
higher precision, we could count exchanges as well. However, if the sort
required frequent disk accesses, we would count only those operations that
take much longer to complete. Except in particularly simple situations, there
is art to this science.
Space requirements are normally simpler. The area needed for code is rarely
interesting for two reasons: It is usually not controllable and, when space is
an issue, it is far smaller than the demands of temporary storage for data. An
algorithm may have some natural space unit to use: a record, word, node, or
fixed-size vector. Otherwise, we simply count bytes.


Techniques of Analysis


The analysis we consider skirts issues of platform, language, and compiler
dependence. The results have wider validity and do not date as fast. In other
words, we analyze algorithms, not programs. An important benefit is that this
approach simplifies the needed mathematical models of the algorithms, and
eventually, the analysis of those models. The analysis itself is mathematical,
and the kinds of mathematics are determined by the model--typically, they
include some algebra and combinatorics. Generating functions are often helpful
as well. If we want asymptotic results, this immediately raises the ante: We
need to use functions of complex variables. Due to space limitations, in this
article I'll stick to "back-of-envelope" calculations.
When you need to select a machine or a compiler, the dependencies we shunned
earlier are the center of interest. This is a different ball game: Except for
statistics, mathematics is rarely any help; the key words here are
"measurement" and "benchmarking." 
Somewhere in between is simulation. In the present context, it is used as a
substitute for mathematical analysis, when the going gets too rough, when our
model gives rise to an equation we cannot solve, for example, or when we
cannot write meaningful equations. (The latter might be an indication that we
do not yet have a good grip on the underlying problem.)


Modes of Analysis


"Mode" refers to the kind of answer the analysis provides. Let us restrict
attention for the moment to running time, an issue that arises when an
algorithm needs different times to process different inputs (of similar size).
This is not always the case: Most algorithms to invert a matrix, run for
essentially the same time for all matrices of a given size. This is typical
when the algorithm only computes. Algorithms that search for a solution,
rather than calculating it, show huge variations.
If the answer of the analysis has to be a bound, then the analyst must find a
function f(n) with the following property: To process n items, algorithm A
will never use more than f(n) instructions, iterations, or other relevant
unit. This is sometimes a desirable assurance, and this is what worst-case
analysis provides. The designer of a real-time system may need it in order to
prove that his design is within specs.
Such guarantees are reassuring, but not always practical. Consider an example
where an algorithm has 4n possible inputs of size n. A bound of 34n operations
is not only frightening, it is also nearly meaningless when it is reached in a
few--or even n--problem instances only, or even in 10n of the 4n possible
inputs. One reason is that even with these large run times, the average time
on all 4n possible instances could be far more modest; 2n+7, for example.
Second, if n exceeds 30 or so, the probability of hitting one of the unlucky
instances is infinitely smaller than that of the machine being shredded by a
meteorite. The immensely popular simplex algorithm for linear optimization has
exactly such a skewed profile.
Averages are especially important when we consider algorithms that are used
repeatedly, such as those that compute trigonometric functions or perform
joins over a relational database. These algorithms require an estimate of both
the average resource requirements and the likely deviations. This calls for
probabilistic analysis that provides the expected run time and its variance
(because of the need for the variance to estimate the probable deviations, I
shy away from the term "average-case analysis"). 
The issue just raised is important in the following scenario: You need to
choose between two algorithms. Should you choose on the basis of worst-case or
average behavior? Often, if one algorithm is better in one mode, it is also
better in the other. But in any number of situations, the ordering according
to the two modes is reversed. So the answer depends on the more pressing
criteria of performance: guarantee versus overall efficiency.
Naturally, there is an in-between mode. It is meaningful for algorithms that
live a long time, such as those that maintain data structures, route messages
in a network, or supervise a disk channel. Such algorithms may use foresight
to advantage and periodically restructure the data or collect statistics to
recompute optimal parameters. Then it may be more meaningful to consider the
average cost of an operation in a sequence of operations, where such
computational "investments" are prorated on the entire sequence, than to look
at an isolated operation. The term "amortized analysis" describes such
analysis done in the worst-case behavior mode. Because those "investments" can
be controlled by user-supplied parameters, this approach is equally meaningful
when considering average behavior.


Example: Space/Time Trade-Off


A trade-off is possible if you can save on one resource by spending more on a
less critical one. Consider computing a trigonometric function. This is
normally done by evaluating the ratio of two polynomials of moderate degree
(depending on the desired precision). We could drastically reduce the required
time by keeping in storage a table with a few hundred values (or
thousands--again, this is determined by the required precision) and using
low-level interpolation to derive the desired result. With a very large table
we might even skip the interpolation, saving 90 percent or so of the needed
time.
A related but different situation arises when computing one value of the
function is relatively slow, compared with constructing a complete table.
Computing the binomial coefficient BC(n,k) for 0<kn/2 needs 3k
additions/multiplications. Constructing a table of all such coefficients for
0knN requires approximately N2+N additions using the standard recursion. If
we expect an algorithm to use at least some portion of the table (a common
situation), we can save a lot of time by preparing it during the
initialization phase.
In contrast to these examples, which choose between two methods, there are
situations where the trade-off can be parameterized. Here is a simple example:
We need to sort a large file on disk. The file is too large to be accommodated
in the main memory available for this application. (For now, forget about
virtual storage, even if it's technically possible; using it here effectively
calls for virtuoso programming and a good deal of a priori information about
the original order of the file.) A naive mergesort algorithm reads chunks of
the file into storage, sorts each chunk internally, and writes it as a
separate file on disk (which we assume has enough free space). Then the chunks
are read in parallel and merged into one fully sorted file. Word is the
sorting unit, and the file has N words. Let the size of a chunk be n, so that
k=N/n chunks are created. Strictly speaking, we would have -N/n chunks, but
to keep it simple we'll assume that N/n is an integer. Let's also assume that
with buffers and caches, other tasks running in parallel and sufficient disk
bandwidth, the bottleneck is the CPU. We want to optimize the use of this
resource.
The "time" needed to sort n words is An*ln(n) comparisons. A is a constant
that we assume known and that depends on the selected algorithm. Several
algorithms for internal sort reach this optimal goal (optimal for
comparison-based sort, on information-theoretic grounds) on the average, and
some variants approach it even in the worst case--with larger constants, but
still quite reasonably. The time to merge k streams of size n each is Bk(k1)n,
where B is another constant we assume known (because when we sort in
increasing order, k1 comparisons are needed to determine the smallest word out
of the k currently at the heads of the k chunks, and this must be done for all
the N=nk words).
The total time is T=Akn ln(n)+Bk(k1)n=N(A ln(n)+B(k1)). The storage used is
essentially the one buffer of size n. How large should it be? T is minimized
when k=1, and as we increase it, the needed storage goes down, and the run
time goes up. Where we stop depends on the available physical storage and the
demands of concurrent operations. If this sort is needed often enough and the
sorting time we get with the current storage limitation is unacceptable, the
designer may consider purchasing main memory, and the saved time will be
expressed in dollars and cents.
A still-different space/time trade-off appears in the suggestion made by Tom
Swan ("Algorithm Alley," DDJ, April 1994) to replace binary search (trees) by
search tries with higher fan-out degree. In fact, several variations on the
theme expose a variety of trade-offs. The simplest one arises from using more
space for internal nodes to represent successive characters in the key,
yielding a large reduction in search time.


Further Reading



By far, the richest source for a variety of algorithms and their analyses is
still Donald Knuth's The Art of Computer Programming (Addison-Wesley, 1981). 
Two additional good sources for algorithms are Algorithmics: Theory and
Practice, by Gilles Brassard and Paul Bratley (Prentice-Hall, 1988), and
Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, and
Ronald L. Rivest (MIT Press and McGraw-Hill, 1990). The analyses in these
books are less ambitious, and usually bounds are considered. The second book
gives examples of amortized analysis, as well.
Finally, my book, Probabilistic Analysis of Algorithms (Springer-Verlag,
1987), is mathematical and intended to supply the tools for analysis. However,
it is not an "algorithms book." A new, more comprehensive version is due later
this year from Oxford University Press.


























































PROGRAMMER'S BOOKSHELF


Inside the Windows 95 Book-Publishing Phenomenon




Al Stevens


Al is a contributing editor to DDJ. He can be reached at astevens@ddj.com.


Adrian King's Inside Windows 95 is the first of what will be many books on the
subject of Microsoft's next operating system, previously known by the code
name "Chicago." This book is more notable than most first books because it was
published in October 1994, which means that the majority of the work that went
into the book occurred almost a year before Windows 95 is due to be released.
King is a former Microsoft employee who was involved in the development of
earlier Windows versions. Apparently Microsoft granted him an inside track to
write the Inside book for the new release, guaranteed to be a best seller
based solely on the subject: Interest in Windows 95 runs high. Microsoft has
been open about allowing journalists to publish some aspects of the new
features, teasing a user community thirsting for more information. Screen
shots of the new shell and dialog boxes are common in press preview articles;
as a result everyone is familiar with the new look well before release.
A Microsoft employee appeared as a guest on Public Television's "Computer
Chronicles" and demonstrated Windows 95, prefixing the answer to each of the
host's questions with an enthusiastic, "Exactly!" The same show hosted
representatives from several applications vendors demonstrating their products
running on Windows 95. Effective marketing, all this, but it has a certain
tabloid appeal, as well. Just as books about O.J. hit the stand before the ink
was dry on his arrest sheet, books about hot, new software products appear as
soon as there is a market for information, whether or not there is a product
to use or anything to report. Magazine articles and TV shows are one thing;
they address current events and are soon discarded and forgotten. Books are
another thing; they persist long after their information perishes.
It is a disturbing trend. How do you write a comprehensive book about a beta
operating system and publish it almost a year before the product's scheduled
delivery? (Ironically, a footnote on page 2 of the book dresses down PC Week
for publishing a review of an illegal copy of the beta "almost a year in
advance of the product's planned release date.") There are two problems.
First, everyone with sufficient inside information is either under the
nondisclosure agreement (NDA) or is a Microsoft employee. Second, the product
is bound to undergo significant changes between the publication of the book
and the product's release, which has slipped several times since the book was
written.
The first problem vanishes when the product's vendor is also the publisher of
the book. Since Microsoft owns Microsoft Press, it is in a position to give
dispensation to an author and release him from any vows associated with his
former employment and exposure to the beta software. As a side benefit, they
control the book's content. Now that their book is on the shelves and has
captured the first-look market, Microsoft has relaxed the NDA, and you can
expect a plethora of Windows 95 books, several by the time this review is
published.
The second problem, that the book is obsolete before it hits the shelves, is
addressed this way: Who cares? No one involved in the book project does,
that's for sure. Otherwise they wouldn't publish it. Only some other authors
who do not have the Redmond blessing, who have dutifully honored their
nondisclosure agreements, and who were scooped by the authorized book, would
care. Nobody else. The author gets paid, the publisher makes a pile, and users
and programmers get some information, no matter how perishable it might be.
Timeliness notwithstanding, how good is the book? Well, if the content is
accurate, and if the details are not overtaken by events, it's pretty good, at
least for the programming reader. Windows 95's goals and underlying
architecture should have been firmed up in time for this book's release, and
those are the details that programmers want to know. The user interface is
bound to change, and several parts already have, but those changes will have
little effect on your plans to write software. Even if you have betas of the
OS and its development tools, this book is still the best source for an
overview of Windows 95's operating and development environment. Developers who
are planning to port their applications to Windows 95 or to develop new
applications should read this book first.
I'm writing this review in late 1994, and Beta-2 has just been released. The
author used Beta-1. With one significant exception, the reports of the author
coincide with my observations of Beta-1. On page 17 the author discusses
performance. He says, "By the time of the Beta-1 release, Windows 95
performance was already as good or better than Windows 3.1 performance in
almost every respect." Balderdash. Perhaps the operative word is "almost."
Perhaps he figured it was safe to make the claim since only someone under
nondisclosure could reliably contradict it. Nonetheless, I had to replace a
venerable old ST-251 40-Mbyte hard drive with something considerably faster to
achieve even moderately acceptable Beta-1 performance. It took forever to load
applications. The same system ran fine under Windows 3.1 with the old disk
drive. One user reported that his 486DX2/66 with 8 Mbytes of RAM was running
Beta-2 about 20 percent slower than Windows 3.1. Subsequent betas are supposed
to do a better job of hitting the Windows 3.1 benchmark, but this book serves
no good purpose by perpetuating such misleading, Microsoft-serving claims.
Chapter 1 introduces Windows 95, giving the history of the Windows product
line and the motivation for the new version. Among the reasons for a totally
new operating system rather than another upgrade to DOS and Windows are: 
The user interface needs an overhaul. The Windows 3.1 desktop metaphor has
often been criticized because Program and File Managers force users to orient
themselves to the programs they run rather than to their work and their
documents. Windows 95 uses a so-called "document-centric" operating model to
correct that flaw. 
Installing hardware in contemporary PC systems is a nightmare; users must deal
with IRQs, port addresses, DMA channels, and such. The Plug and Play
specification is supposed to eliminate those problems. 
Setting up MS-DOS options with respect to memory, device drivers, and the like
is known to be a daunting task for the nontechnical user. Windows 95 strives
to improve that situation by automating most of it.
Not surprisingly, Windows 95, a full 32-bit, preemptive-multitasking operating
system, will be able to run the old 16-bit so-called "legacy" DOS and Windows
applications in a 16-bit cooperative multitasking environment. A new buzzword,
"legacy." Anything old is a euphemistic legacy--venerated, but destined to be
replaced by the new. Since some of Microsoft's applications are not yet ready
for 32-bitness, they need a dignified classification so that users will
continue to buy and use the old chestnuts. Legacy connotes value. We are
subliminally conditioned to respect our legacies. I bet that when the only
remaining legacies are from other vendors, we'll get new expressions--like
"old-timer," "antiquity," and "remnant."
If you already understand Intel processor architecture or don't really care
to, you can skip Chapter 2. There is one important point, though. Windows 95
is not destined to be a portable operating system like NT. Windows 95 was
developed for the Intel platform, and that is where it stays, according to the
book.
Chapter 3 is entitled "A Tour of Chicago." (Throughout the book the code name
"Chicago" keeps popping up, revealing how close publication was to the name
decision.) The tour includes an overview that explains how Windows 95 manages
multitasking, scheduling, and memory; describes virtual machines and devices;
and addresses Windows programming at a high level. This chapter is where
programmers get their first look at the Windows 95 operating-platform
architecture. You learn that the API is a subset of Win32 but that Windows 95
adds features not available to the Win32s API or NT. There is a brief
description of event-driven programming and message handling, the underlying
programming model of most GUI operating environments. Chapter 4 is more of the
same, but with more information about the operating-system internals. It
explains 386 protection rings and privilege levels and how the operating
system uses them. A 4-gigabyte virtual-memory map shows how Windows 95
organizes the DLLs and applications. The chapter introduces Windows 95
threads, which are new to Windows and similar to the scheduling mechanism used
by NT.
Chapter 5 will get the most attention from users. It describes the new,
improved user interface and shell. Windows 95 adopts a user interface that
looks something like OS/2's, but Windows 95 adds some features of its own.
According to the book, these changes are the product of an exhaustive study by
Microsoft of the way that users work and the attendant problems with the
Program, File, and Task Managers used in Windows 3.1. The book does a thorough
job of describing the Windows 95 solutions to the inherent Windows 3.1
user-interface problems, addressing how the new features work and what
motivated their creation. Not only does the shell change its face, but
application programs change, too. Because legacy applications display their
screens with calls to the Windows API, even those old applications take on the
new Windows 95 look when you run them under the new operating system. In
addition to a different, sculptured look to frames, menus, dialog-box
controls, and so on, the drop-down menus behave differently--more like the
Macintosh--and there are several new user-interface controls. Programmers
should pay attention to this chapter. New and ported applications should
strive to use the best of the new user-interface features.
Chapter 5 also discusses several user-interface features that were considered
and discarded. There are even some screen shots of abandoned features. The
chapter explains the rationale behind some of the decisions.
Programmers will be happy to learn that online-help files are going to be less
complex than those of Windows 3.1, which reduces the work involved in building
online-help databases. The move is toward less information in online-help
files, with, I suppose, the complete user's guides on CD-ROM. You don't have
to comply with that model, but Windows 95 does, and applications are expected
to fall in line. 
The author is critical of one change to the user interface, and I agree with
him. They added a button to the upper right of every window. It is the Close
button, and it has an X to identify it. Somebody decided to put the X button
where the maximize button used to be. Since it occupies the traditional
(legacy?) maximize-button position, and since X is traditionally identified
with the maximize command on the control menu, the result is predictable;
every time you want to maximize a window with the mouse you close the window.
You can't help yourself. It takes a while to get used to the new feature, and
then it causes greater confusion when you return to a Windows 3.1 or NT
installation. It was a dumb move on Microsoft's part, but I am told that they
intend to stick with it.
Chapter 6 discusses the Windows 95 APIs: Win32, User, and the GDI. Chapter 7
describes the file system and explains how they applied a clever hack to come
up with a FAT that supports legacy DOS file systems along with long filenames
in media that can be used by both environments.
Chapter 8 is about Plug-and-Play. If all the hardware designers cooperate and
design to that specification, you will not have to understand the entrails of
your adapter cards to install a new one. Chapter 9 is about networking, and
Chapter 10 discusses the features of Windows 95 that support mobile computing,
most notably an accessory called the "Briefcase" that keeps your documents up
to date between the office machine and the laptop.
Inside Windows 95 is well written and comprehensive. It could tell more about
multimedia and some of the accessories. There are some neat features in the
new user interface that the book does not address, such as Quick View, which
lets the user view the contents of various applications' documents without
launching the applications. I hope that as the details of Windows 95 firm up,
the author updates and expands the book and puts out at least one more
edition, eliminating the speculation in the text and describing the complete
package, whatever it turns out to be.
Inside Windows 95
Adrian King
Microsoft Press, 1994, 476 pp. $24.95
ISBN 1-55615-626-X





















SWAINE'S FLAMES


At Ease


Letters. I get letters. Well, not real letters, of course. The last real
letter I got had a 29-cent stamp on it. What I get these days are eletters.
Enotes, ecards, ebills, ejunk mail, eChristmas letters. Email. Lots of email.
The writing of letters, presumed by many to be a lost art, done in by
Alexander Graham Bell, turns out not to have been dead but merely sleeping.
Today, it's alive, healthy, and immediate and totally e.
You get a lot of email, too, I suppose. In some cases, I don't have to
suppose, because I have ereceived emessages from you, to which I have esent
eresponses.
The hundreds of email messages that epoured in, in response to my recent
emoticontest, were appropriate and welcome. They overwhelmed the eWorld
message system, or more likely, my clumsy use of the eWorld message system,
halfway through my attempt to ereply to all of them. So I'm sorry to report
that some of you won't ereceive a personal ereply. The correct answer, again,
was that the smileys represented movie critics Gene Siskel and Roger Ebert,
although "Steve Ballmer and Bill Gates" was an unanticipated and morally
superior answer that many of you eentered.
"What is a cool guy like me doing eliving at an eWorld eaddress anyway," one
email-related emessage einquired. I didn't have a satisfactory eanswer until
December 10, when a lovely, color ebirthday card earrived from the folks who
run eWorld, or maybe from the efolks who run eWorld. (Of course, my eWorld
eaddress is not my only ehome. I ereside at several other elocations.)
The health and vigor of email hasn't made any of it great eart yet. The
econtent sometimes still leaves something to be desired. I notice that I eget
a lot of email about email. Being an old LISP programmer and a fan of the
writings of John Barth, Jorge Luis Borges, and Raymond Smullyan, I appreciate
the self-referentiality of this, but I'm not sure whether I'm ereceiving such
eposts because the fin-de-si`ecle Zeitgeist is media obsessed or because DDJ
readers, dealing daily with programs that write programs, slide easily into
self-reference, or because I so often write about email.
One ecorrespondent quite correctly pointed out that broadcasting my email
address is really asking for it. "It" being, as I have learned, mostly
complaints about subscriptions. I don't know anything about subscriptions. For
subscription problems, you have to talk to Big Ed, the warehouse manager in
Eldon, Iowa. Ed handles subscription queries for all magazines in the world.
There's a theory that Ed doesn't actually exist, but he does have a very nice
answering-machine-message menu system, whose topology is non-Euclidean. You
could be the first person to emerge from it with your sanity.
Please note that I'm not the only editor whose email address appears in the
magazine. You can, in fact, send email to any of us care of editors@ddj.com.
And keep those ecards and eletters ecoming.
Michael Swaineeditor-at-large
MichaelSwaine@eworld.com












































OF INTEREST
DataFocus has announced the availability of the X/Software SDK (X/SDK), the
second product in its Nutcracker line of porting tools. While the original
Nutcracker SDK supports the porting of character-mode UNIX applications
written in C or C++ to Win32, the X/SDK allows you to port X and Motif
applications to Windows NT. The X/SDK packages an X Server, UNIX tools, and a
set of APIs within Windows DLLs. Once system dependencies such as pathnames,
user IDs, or illegal C++ keywords have been resolved, you can recompile the
source code and link in the X/SDK DLLs. Because a Nutcracker application is
native to Windows, it can be debugged with tools such as WINDBG. The DLLs
include the ANSI C run time and provide support for interprocess
communication, file I/O, curses, and memory management, as well as X and Motif
APIs, including xlib and the xt toolkit.
Nutcracker bundles in the MKS Toolkit to provide common UNIX tools including
awk, vi, touch, and so on. The X/SDK retails for $2995.00. Reader service no.
20.
DataFocus
12450 Fair Lakes Circle, Suite 400
Fairfax, VA 22033-3831
703-631-6770
MainSoft has announced an upgrade to MainWin, its UNIX library that makes the
Windows API available to applications running under Motif. With this tool,
Windows applications can run natively under UNIX. The new release includes
support for 32-bit Windows NT; both 16- and 32-bit code is supported from a
single shared library, thus allowing Windows 3.1 and Windows NT code to
execute simultaneously; and MainWin now supports Microsoft's Dynamic Data
Exchange Management Library (DDEML). 
Developers wishing to write MainWin applications will need to purchase the
MainWin Cross-Development Kit (CDK). The CDK allows developers to target both
Windows and X Windows from a single source-code base using standard Windows
API calls. The CDK includes a workstation copy of MainWin; a Windows-compliant
API; support for the Microsoft Foundation Class library; the ability for end
users to select either a Windows or Motif look-and-feel; and a common file
format that allows data to be shared between UNIX and Windows applications.
MainWin 1.1 retails for $199.00. The CDK sells for $5000.00 for a single copy
with price reductions for subsequent copies. Reader service no. 21.
MainSoft
1270 Oakmead Parkway, Suite 310
Sunnyvale, CA 94086
408-774-3400
Franz is billing its Allegro CL 2.0 for Windows as the first dynamic
object-oriented programming system for the Windows platform. The developer's
edition of Allegro CL is a 32-bit compiler based on the Common Lisp Object
System (CLOS), which features single and multiple inheritance, method
combination, multiple-argument discrimination and meta-object protocol.
Allegro CL adds a Windows interface builder that allows you to create dialog
boxes and widgets in a visual fashion, browsers, inspectors, and a
Windows-hosted debugger. A professional edition is available for those
requiring royalty-free distribution of their applications. The professional
version includes a run-time generator, source code for Common Graphics and
Interface Builder, and a free copy of Win-Emacs from Perl Software. Allegro CL
2.0 for Windows Developer's Edition retails for $595.00; the Professional
Edition sells for $995.00. Reader service no. 22.
Franz Inc.
1995 University Avenue
Berkeley, CA 94704
510-548-3600
ViewSoft is shipping Version 1.1 of its C/C++ GUI development environment,
Utah for Windows. Like many such products, Utah begins with an interface
builder that generates C++ code. Unique to the product, however, is its
Editable Object System (EOS), a C++ class library with platform-independent
extensions that provide reflection (self-describing objects) and dynamic
binding. Programmer-defined objects can inherit the same functionality by
deriving from base EOS objects. The only platforms currently supported are
Windows and Windows NT. However, the company indicates that OS/2, Macintosh,
and Motif versions are under development. In fact, Utah already provides
mappings between EOS's self-describing object structure and IBM's System
Object Model (SOM).
Other features of the product include ViewSoft's Semantic Interface Technology
which uses "smart components," developed to eliminate GUI dependencies;
support for OLE; and the ability to both create and use VBXs. Utah retails for
$1495.00. Reader service no. 23.
ViewSoft
60 North 300 West
Provo, UT 84601
810-377-0787
WinRez is a library of C functions that simulates Macintosh resource-handling
and memory-management routines under Windows. Mac resource files are converted
by renaming existing API calls to WinRez function calls and using the supplied
resource fork conversion utility to convert resources, which can then be
accessed by any Windows application. The product is supplied in both .DLL and
.LIB file formats. C libraries support Windows 3.1, Win32s, Windows NT, and
OS/2. The company claims that WinRez has minimal overhead requirements, adding
less than 64 Kbytes and using minimal stack space when linking in the .LIB
version. A "no-nonsense" license agreement lets you distribute the .DLL with
applications, provided that the app doesn't allow end users to directly modify
or edit resources. WinRez retails for $349.00. Reader service no. 24.
Partium Inc.
86 Gerrard Street East, Suite 14D
Toronto, ON
Canada M5B 2J1
416-598-9717
Creative Digital Research has launched CDR Publisher, a CD-ROM publishing
software package which supports cross-platform recording between PC- and
UNIX-based discs. By the end of the first quarter of this year, the company
claims the software will support ISO 9660, ISO 9660 with Rockridge Extensions,
and Macintosh HFS format, all integrated on a single CD-ROM. The software
supports image-on-the-fly, optimized CD layout, automatic filename conversion,
and outputting to multiple media.
CDR Publisher currently runs on Windows 3.1, Solaris, SunOS, and Silicon
Graphics systems. The package sells for $495.00. Reader service no. 25.
Creative Digital Research
7291 Coronado Drive
San Jose, CA 95129
408-255-0999
Tools and Techniques has released Data Junction v5 for Windows, which provides
import/export capabilities for any structured data--databases, spreadsheets,
flat files, SQL, binary/EBCDIC, legacy Cobol, ASCII, reports, and other data
formats. Although Version 5 is a stand-alone tool, the underlying technology
will shortly be released as DJLIB, a programmer's library for universal data
import/export services. The DJLIB API will be packaged as DLLs, VBXs, and
OCXs, as well as OLE applets. The "advanced" version of Data Junction sells
for $249.00, while a "professional" version, which supports fewer formats,
sells for $149.00. Reader service no. 26.
Tools and Techniques
2201 Northland Drive
Austin, TX 78756
512-459-1308
ImageSoft has announced Subtleware for C++/SQL, a persistence interface for
mapping C++ objects to and from SQL tables. Used with the C++/SQL API, the
SQLExec facility maps existing database schemata and data to C++ application
objects. C++/SQL defines C++ classes from existing database schemata using a
class generator called CGEN. The system also includes a code generator called
SGEN. The software, which sells for $899.00, supports all popular SQL
databases, operating systems, and hardware platforms. Reader service no. 27.
ImageSoft
2 Haven Avenue
Port Washington, NY 11050
516-767-2233
Prentice Hall has published Protect Your Privacy: The PGP User's Guide,
written by frequent DDJ contributor William Stallings. In his book, Stallings
explains what PGP ("Pretty Good Privacy") encryption software is, how you can
get it, and what you can do with it. The 300-page book sells for $19.95. ISBN
0-13-185596-4. Reader service no. 28.
Prentice Hall
P.O. Box 11073
Des Moines, IA 50381-1073
800-947-7700
Performix has released Version 2.0 of Drag-it, its drag-and-drop tool for
Visual C++, along with a new version for Visual Basic. Drag-it 2.0 includes a
class library, starter files, a builder, and sample apps. The tool also
supports OLE. Drag-it 2.0 for VC++ sells for $495.00, while Drag-it/VBX
retails for $295.00. Reader service no. 29.
Performix
6618 Daryn Drive
Westhills, CA 91307
708-291-8421
Micro Tempus has announced Tempus Connectivity Solution (TCS), a distributed
architecture platform for connecting disparate PC local networks, mainframes,
and minis into a wide-area network. TCS provides for file and resource sharing
between users, as well as high-level data security. TCS is a peer-to-peer
technology that is based on Advanced Program-to-Program Communications (APPC)
and Advanced Peer-to-Peer Networking (APPN). 

TCS currently supports DOS/Windows, OS/2, NetWare, and MVS/VTAM. A future
release will support Windows NT and UNIX.
The TCS SDK lets you create custom applications using languages such as Visual
Basic and C++. The SDK also comes with a variety of sample apps, including a
file back-up utility and a conference-call program that lets multiple users
hold party-line conversations over the TCS network. Source code for the sample
applications is provided in both Visual Basic and C so it can be used as a
basis for building other TCS applications. Reader service no. 30.
Micro Tempus
999 de Maisonneuve Blvd., W., Suite 1100
Montreal, PQ
Canada H3A 3L4 
514-848-0803 
In a move to gain support for its new object-oriented application environment,
Taligent has announced a new certification and branding program. Along with
the announcement came news of the renaming of its TalAE to "CommonPoint,'' as
well as a new logo for the product line. Taligent, a company backed by
heavyweights Apple Computer, Hewlett-Packard, and IBM, states that the
certification and branding program is intended to ensure the consistency and
compatibility of their frameworks across HP-UX, OS/2, and AIX. With
implementations expected sometime in 1995, Taligent will license the
CommonPoint name to implementations which have passed a set of "depth and
breadth" tests that, among other things, checks the functionality and quality
of a core set of frameworks, and verifies the existence of public objects and
member functions. Reader service no. 31.
Taligent 
1021 N. De Anza Boulevard 
Cupertino, CA 95014 
408-255-2525
FlashPort from AT&T Software Solutions Group makes it possible to translate
software in object form from one computer system to different systems. Once
translated, the software is semantically and functionally identical to the
original. The translation software is compatible with any language (or
combination of languages), except for those which extensively compose code
dynamically (such as Forth). FlashPort generates globally optimized object
code in the native instruction set of the target platform. Source platforms
include 680x0-based UNIX, 680x0-based Macintosh System 7, and IBM System
360/70/90. Target platforms include IBM RS/6000 and PowerPC, Apple Power
Macintosh, Sun SPARCStation, HP Precision Architecture, MIPS, and Intel
Pentium processors. Reader service no. 32.
AT&T Software Solutions Group
10 Independence Blvd.
Warren, NJ 07059-6799
800-462-8146
The Robotics Practitioner is a new quarterly magazine published by Footfalls
Ltd. that covers the design, development, and performance of robots. In
general, articles will focus on the practicalities of building robots from
both the hardware and software perspective. Regular coverage also includes
clubs, events, and online discussion groups. Subscriptions to the magazine are
$38.00 annually. Reader service no. 33.
Footfalls Ltd.
483 S. Kirkwood Road, Suite 130
Kirkwood, MO 63122
314-822-4263
trp@footfalls.com
Coming out of Intel's Embedded Processor Division is the PCI I/O Software
Development Kit (PCI-SDK), which is based on the i960 embedded RISC processor.
The kit bundles a peripheral component interconnect (PCI) bus I/O add-in card
with interchangeable i960 processor modules; PLX technologies PCI 9060,
eliminating the need to write proprietary ASIC, FPGA, or PAL components; and
an 80960-to-PCI bus bridge chip for the i960. Also available are
user-selectable I/O modules from Cyclone Microsystems, including SCSI-2,
SCSI-3, Ethernet, and a high-speed serial I/O module. 
The SDK also provides programmer tools, including a C compiler, profiler, and
debugger. Intel hopes that developers creating software for intelligent I/O
cards will take advantage of the increased bandwidth, throughput, and
concurrency that the PCI bus offers. For more information and pricing on the
SDK, contact Intel at 800-628-8686, or request packet #272612 from the Intel
Literature Center at 800-548-4725; for information regarding the bridge chips,
contact PLX Technologies at 415-960-0448; for information on the customized
I/O modules, contact Cyclone Systems at 203-786-5536. Reader service no. 34.
Intel Literature Center
P.O Box 7641
Mt. Prospect, IL 60056-7641
800-548-4725


































EDITORIAL


The Green, Green Cash of Gnomes


It comes as no surprise that greed can get in the way of good judgment. Nor is
it surprising that the computer industry has its fair share of nincompoops who
put spare change before common sense. In a speech before a group of
influential Texas business leaders and educators, for instance, John Roach,
chairman and CEO of Tandy Corp., scolded educators for continuing to teach
"obsolete" skills such as cursive writing and long division. What schools
should be doing, he said, is buying computers and calculators--presumably
those made by Tandy. "We don't have anyone in my company doing long division,"
said Roach. 
I don't know about you, but I'm certainly relieved that Tandy executives don't
have to count on their fingers and toes anymore. Unlimbering a Radio Shack
calculator is a lot easier than pulling off socks and boots, at least when you
need to figure out how to split a $40.00 lunch tab. On the other hand, if
Tandy bothered with long division anymore, it might have run across those
floating-point division problems in its Pentium-based PCs a little sooner.
Then there's Martha Siegel, author of How to Make a Fortune on the Information
Superhighway and CEO of Cybersell, an Arizona company that provides Internet
marketing services. Citing rampant fraud, sexual harassment, defamation,
forgery, and profanity on the Internet, Siegel, in a San Francisco Chronicle
op/ed piece, is screaming for the federal government, Supreme Court, and
international diplomats to step in and regulate the Net. In particular, Siegel
is incensed that programs called "cancellor robots"(or "cancelbots") can erase
messages from specified sources. I'm assuming such messages include
unsolicited electronic junk mail from Internet marketing services.
It is noteworthy that the openness leading to the abuses Siegel claims is the
same openness that enabled her to carve out a comfortable business niche in
the first place. Now that Siegel has settled in, it is time for the government
to keep out the competition. 
And when it comes to greed, it's hard to forget CompuServe. After defining the
Graphics Interchange Format (GIF) as a means of transferring graphics data,
CompuServe actively encouraged developers to support the specification with a
no-cost policy--all developers had to do was acknowledge CompuServe's
copyright. 
What CompuServe forgot to say (or didn't realize in 1987) was that LZW--the
heart and soul of GIF--had been patented by Unisys (ne Sperry) in 1985. Even
though it was public knowledge that GIF was LZW based and that LZW was
patented (DDJ reported on this in 1989), it wasn't until 1994 that the two
companies agreed on a licensing agreement. CompuServe has to pay Unisys a
royalty of 1 percent (or about $.11/copy) for each copy of CompuServe
Information Manager connection software it sells, along with a one-time fee of
$125,000 for past use, and a $5000 monthly fee. In addition, CompuServe got
the right to relicense LZW to developers using GIF in software that connects
to CompuServe. With this agreement in pocket, CompuServe decided to cover its
losses by demanding a royalty of 1.5 percent, or $.15/copy (whichever is
greater) from developers who have supported GIF--the same developers who have
helped CompuServe grow to be the biggest of the commercial online-information
services. 
In doing so, CompuServe chose not to alienate its customers by passing on the
costs. Instead, the company decided to soak third-party developers. While in
all likelihood CompuServe started out by breaking intellectual-property laws,
it ended up breaking something even more significant--the trust of its
longtime partners. 
Finally, what would a discussion of greed be without mention of the
government. You may recall that in August 1994, I described the troubles Ruth
Koolish is having with the California Board of Equalization--the tax board.
California tax collectors see the information highway as a new source of
revenue, and they are apparently using Koolish as the test case for levying
sales taxes on electronic data transfers. (Her case still hasn't been
resolved, by the way.) Other states--Illinois, Florida, Maryland,
Massachusetts, Ohio, and Rhode Island, to name a few--are trying to cash in on
the information highway by extending sales and use taxes for
telecommunications and information services. Florida, for instance, hopes to
raise $120 million by including information services under its 6 percent tax
umbrella.
For its part, New York last year tried to increase its standard 8 percent tax
rate for information-service-related revenues to 13 percent. After protests,
the New York Department of Taxation and Finance ruled that the full 13 percent
tax did not apply to services that have a written component. Therefore, since
you can generate a hard copy of the data you download to your PC, only the 8
percent sales tax applies. (The full 13 percent tax applies to those services
which are exclusively aural; 900-number phone services, for instance.) On one
hand, government at all levels is calling for the "paperless office." New
York, on the other hand, is forcing businesses and individuals to maintain
ties to paper documents. Of course, you could counter that when it comes to
government, one hand doesn't always know what the other is doing. The only
thing you can be sure of is that both hands will be reaching into taxpayer's
pockets.
Jonathan Ericksoneditor-in-chief













































LETTERS


Woof, Woof (As in Dogma)


Dear DDJ,
In his "Swaine's Flames," February 1995, Michael is correct. The Supreme Court
must absolutely avoid any deviation from the Constitution, legislated or not.
Further, it would be criminal if even the Supreme Court ruled in a way that
violated the Constitution. Thus even the Supreme Court must always cite legal
or historical precedent whenever it rules. If not, it would be the duty of
every American to oppose any ruling that unilaterally diverged from our
Constitution. And thus has it always been--except once. 
In Engel v. Vitale when the (infamous) Warren court ruled to remove prayer
from the public schools, it cited zero historical precedents and zero legal
precedents. No court has done that before or since. Historical note: That was
just a few months before the court obstructed justice in the Kennedy
assassination.
Michael would do well to remember Plato: "The rejection of a dogma itself
implies a dogma." Next thing to happen is that dogmatists like Michael will
remove our freedom of expression.
Stephen Lindner
lindner@m-net.arbornet.org


Serial Port IRQs


Dear DDJ,
I liked John Ridley's article, "Identifying Serial Port IRQs" (DDJ, February
1995), but couldn't get the listing to work until changing a port write in
Which IRQ(). Setting bit 1 needs all the other bits unchanged. For example,
Example 1(a) should be changed to Example 1(b).
Ken Lagace
Baltimore, Maryland
John responds: Thanks for your note, Ken. I have received a lot of e-mail
regarding PORTINFO. As a result of some research into some problems a few
people have been having with it, I have come up with the following changes. As
a result, it is much more reliable; I haven't been able to break it so far
with these changes in place. In addition, operation under Windows seems to be
much improved (though still not perfect). In particular, IsUART has problems
with some serial ports. Specifically, some internal modems use a
microcontroller to emulate a UART, and it does not respond quickly enough in
loopback mode for IsUART to identify it correctly. To remedy this situation,
I've added the timing loop in Example 2(a). 
Apparently, the enable/disable right next to each other don't always allow all
pending interrupts to be processed. To remedy this, the relatively slow output
statements that normally happen just before and after this pair are moved
inside; see Example 2(b).
As a result of this slowdown, however, quite often two IRQs are generated.
Since multiple interrupts are going to happen anyway, you can omit the first
enable/disable pair, if you want. Their only purpose was to try to eliminate
duplicate IRQs. 
To avoid printing of duplicate IRQs, the calling program should count bits in
the return value, and loop until only one bit comes back. You may wish to
limit the looping to 10 or 20 iterations, since a defective COM port may
actually generate multiple interrupts; see Example 2(c). Also, since the
IsUART routine is now immensely slow, you may wish to remove the IsUART call
from the beginning of the WhichIRQ routine. Just be sure not to call it unless
you know there is a UART on the port.


Dave from Nebraska, You're on the Air


Dear DDJ,
Part of Michael Swaine's "Programming Paradigms" (DDJ, January 1995)
concerning the use of a portable radio to give voice to the loops in an
executing program on an early Altair machine brought back memories.
About a decade before the time being referred to in Michael's column, I was
completing a Fortran program as part of a master's thesis in mathematics. This
was at a small midwestern university at which the available computing facility
was an IBM 1620; this was a hands-on situation in which you signed up for time
on the machine. My original plan was to have the program produce a fairly
large number of orbital coordinates, the final step for each point being the
evaluation of a series to a sufficient level of convergence. 
When my results began to appear at a dismayingly slow rate, I stopped the
program, and on a reexecution, listened to the portable radio sitting on the
computer, with the frequency selector at the low end of the AM range. The
sounds produced by the radio helped me understand where the real crunching was
taking place. I wound up changing the program to output a pair of parameters
which could be plugged into a table to obtain the final results I needed.
David Moxness
Fremont, Nebraska


Big Mac Attack


Dear DDJ,
In his February 1995 "Editorial," Jonathan Erickson made reference to
ronald@macdonalds.com. An expanded version of the story was reported in Wired
magazine (October, 1994). In fact, McDonalds has had a registered domain name
for several years, mcd.com.
That McDonalds is not particularly interested in trying to soak up all
possible variations of its name, deserves praise and not the implied criticism
that it and other corporations are not paying attention to the Internet.
According to the Wired story, there was no extortion attempt by the
macdonalds.com registrar towards McDonalds. 
Also, regarding that joke about Intel Pentium processors a couple of pages
later--that is quite out of place as well. Instead of the joke, a little
research into the problem and a paragraph or two regarding the types of
software affected or an analysis of Intel's 27,000-year claim might be in
order. Just how did Intel arrive at that number anyway ?
Bill England
Redmond, Washington
DDJ responds: Sorry you didn't like the Intel joke, Bill. How about this one:
Q: How many Pentium engineers does it take to screw in a light bulb? A:
2.9999999. In all seriousness, we did take the Pentium's fdiv problems to
heart, witness this month's "Undocumented Corner" article by Tim Coe on page
129. Likewise, you might want to refer to the February 1995 issue of our Dr.
Dobb's Developer Update, which had two articles on the Pentium problems, one
by senior editor Ray Valds and another by Bill Jolitz. And finally, McDonalds
has indeed been very aggressive in soaking up variations of its trademark.


Cybertorts


Dear DDJ,
The bit in Jonathan Erickson's February 1995 "Editorial'' about Microsoft
going after somebody who posted a beta version of Windows 95 on the Internet
was rich. Has there ever been a more widely distributed "secret" in history?
Microsoft has sent me no less than three copies (to H. Helms, Harry Helms, and
Harry L. Helms) of the latest beta version. It seems like Microsoft would
thank the guy for saving a lot of time and distribution costs.
Harry Helms 

Solano Beach, California


386BSD Serial Drivers


Dear DDJ,
Regarding the discussion and code on ring buffers in Bill Wells's article,
"Writing Serial Drivers for UNIX" (DDJ, December 1994), I am pleased to see
further work with 386BSD ring buffers. After reading the article, however, I
think Bill missed some of the reasons why I added ring buffers to 386BSD in
the first place (in Release 0.0 in March of 1992). Since Bill's ring buffers
appear to be an extension of this work, knowing some of the background may
allow even more enhanced operation of the concept.
Prior to 386BSD Release 0.0, UNIX character lists (or clists) were used in BSD
systems to buffer terminal I/O. This abstraction dated back to the earliest
Bell Labs UNIX system and was perfectly adapted for its original use in a
timesharing system with limited memory resource (for example, a PDP 11/45 with
a maximum of 252 Kbytes of memory running 15--30 users each on a terminal).
Blocks of 16 characters (32 on VAX systems) were maintained on a single
shared-free list, totaling 5--32 Kbytes in size. Primitive functions
allocated/freed blocks in the course of implementing FIFO queues of single
characters. Memory utilization was quite high, since all of the terminals
could use the shared-buffer resource.
However, in 386BSD, we determined that memory utilization of terminal buffers
was not interesting. The average PC used with 386BSD (a console, two serial
ports, and more than 4 Mbytes of memory) supported only a handful of users
(typically, one). We could thus afford to allocate resources permanently to
terminals, and interrupts did not have to be masked across the set of devices
sharing the buffer pool. (Altering the interrupt mask on x86 PCs is
considerably more expensive than on PDP-11s or VAXen, where it can be made
into a single inline instruction. This is true across many other architectures
besides the x86.) While in hindsight this may appear obvious, at the time we
replaced the old encumbered clists code this was considered controversial. In
fact, some derivatives of 386BSD Release 0.0 and 0.1 went well out of their
way to reimplement clists (even though ring buffers were in other real-time
operating systems) merely because they were being used in a BSD system in a
"new" way. Although current versions of 386BSD have yet to take advantage of
this, we intend to comprehensively replace shared interrupt masks and other
exclusion mechanisms (for example, program-visible spin locks) with a new
mechanism for exclusion that works well in multiprocessing environments. 
Bill also states that interrupt blocking between the top and bottom halves of
the driver is no longer necessary, given that characters will just fall into
additional consecutive storage. This is again a consequence of isolated buffer
pools and can be used to great advantage in the terminal-driver
implementation, which currently uses interrupt masking in sections no longer
critical. But when are the terminal input queues really empty? Since the
length of the queues can always increase, we may need to recheck the queue for
characters to resolve the race multiple times (for instance, ring-buffer
length is now "volatile"). Continuing on, what if we have the pathologic case
of a character arriving exactly every loop interval? Then, the top half of the
discipline waits indefinitely!
While this may seem trivial, this has important implications for POSIX/UNIX
terminal-driver implementations dealing with pending I/O during mode shifts
that require the queues to be emptied prior to transition between modes (for
instance, certain ioctl requests). The subtlety of this problem is that the
interpretation of pending characters is now at risk--they may be incorrectly
interpreted if a race occurs between the top and bottom halves of the terminal
driver. In these cases, you still must mask interrupts.
In a sense, you might even say that one set of critical sections has been
exchanged for another set in the terminal driver. This and other subtle
dilemmas in preserving the correct POSIX semantics caused us to adopt a
conservative track in 386BSD versions.
Another major consideration was the need to migrate the kernel to use dense
storage for buffers instead of the sparse storage used in lists. With sparse
storage, overhead in parceling up storage is endured as another price of
increased memory utilization. Worse, sparse buffer storage has no locality of
reference (for example, buckets of characters are not guaranteed to be
consecutive)--thus, a memory-storage hierarchy may have unpredictable access
timing. As a consequence, 386BSD Release 0.1 took advantage of dense storage
to implement costly operations on a list as inline operations on a single
segment in the ring buffer (two segments in the case of overlapping the end of
the ring). Yet another consideration was to reduce unnecessary copy operations
by redirecting the pointers of the buffers at will. As an example of this,
Serial Line IP (SLIP) could transfer directly to a packet memory buffer in
certain common cases, avoiding a copy entirely (this makes use of other
mechanisms in BSD message buffers, of course).
Another concern was the need for "wide" characters of 16 or 32 bits. In 386BSD
Release 1.0, the structure of the line disciplines was rewritten to allow
transparent use of the ring-buffer headers with internal implementations
(within, say, a UNICODE terminal driver "wtermios" and corresponding
serial/console drivers capable of implementing wide characters) different than
the default 8-bit character size. We felt that the terminal driver should bear
the burden of work, and that above the level of the terminal driver, the de
facto view of buffer contents should only be a queue-item count This avoids
duplicating knowledge of queue structure for 8, 16-, and 32-bit character
implementations. 
One limitation of ring buffers occurs when stackable protocol modules replace
the current line disciplines. In this case, one needs to "switch" ring-buffer
contents further up the level-of-abstraction stack of the kernel. This then
reintroduces many of the problems incurred with clists. As such, we decided
that a single, comprehensive mechanism used for high-speed networking was
better than ring buffers, and have been working in that direction with our
internal 386BSD development. Ring buffers have been left in until SIGNA (the
Simple Internet Gigabit Network Architecture) can encompass terminal drivers
and disciplines.
Another interesting aspect of ring buffers which is ripe for exploitation is
that of "chunking" I/O to FIFO serial cards. As this article correctly points
out, ring buffers cannot be completely filled--a single element must be free.
However, if you change the quantization from byte to the size of the FIFO (in
determining rollover and "full" state), the FIFO can conceivably be unloaded
using a single I/O-port string instruction, since one would always be
guaranteed contiguous buffer space. On VAXen in the "old" days of 4BSD, it was
found that stuffing FIFOs in this manner was the most efficient way of
handling serial I/O, beating out both pseudo-DMA and real DMA.
Bill Jolitz
Oakland, California
Example 1: Serial-port IRQs.
(a)
_disable(); /* Ready for the real thing now */
IRQ_Happened = 0; /* Clear bitmap */ outp(CurPortBase+IER,0x02); /* enable xmt
empty int. */
(b)
_disable(); /* Ready for the real thing now */
IRQ_Happened = 0; /* Clear bitmap */
outp(CurPortBase+IER,(HoldIER 0x02)); /* enable xmt empty int. */
Example 2: Serial-port IRQ update.
(a)
/* routine IsUART(): code added between outp and if statements */
_outp(PortAddr+MCR, 0x0a LOOPBIT); /* Turn on RTS */
{
long oldtick, far *ticks=0x0040006c; /*points to timer ticks area*/
oldtick = *ticks+2; /*wait for ~1 or 2 ticks */ while (oldtick != *ticks) ;
}
if ((_inp(PortAddr+MSR) & 0xf0) == 0x90) /* If CTS is on, there's a UART */
(b)
/* routine WhichIRQ(): move outp statements to between second enable/disable
pair */
enable(); /* BANG! */
_outp(CurPortBase+IER, 0x02); /* enable xmt empty int. */
_outp(OCW1, (_inp(OCW1) & (~IRQbit)) /* Restore 8259 */
(HoldOCW1 & IRQbit));
disable(); /* OK, we're done. */
(c)
(In module PORTINFO.C, added function:)
--begin--
short NumBits(unsigned short IRQ_bitmap)
{
short x,bits;
for (x=8,bits=0; x; x--,IRQ_bitmap >>=1)
if (IRQ_bitmap & 1)
bits++;
return bits;
}
/* in function main(), put the call to WhichIRQ() in a loop */
do
IRQ_bitmap = WhichIRQ(PortAddr);
while (NumBits(IRQ_bitmap) > 1);
































































Image Authentication for a Slippery New Age


Knowing when images have been changed




Steve Walton


Steve is currently developing intelligent pattern-recognition systems for
manufacturing as a senior principal engineer for the Boeing Commercial
Airplane Group. He can be contacted at stevew@eskimo.com.


One of the major plot elements of the film Rising Sun hinged on sophisticated
digital imagery of a murder, recorded by security cameras. To cover up the
guilt of a political figure, the face of the perpetrator was digitally
replaced with that of another person. Sean Connery and friends eventually
triumph by doing a little bit-twiddling to prove that the image had been
manipulated, thus leading back toward the truth. 
In reality, a similar digital rework would be incredibly clumsy. A modern
digital warrior can make such modifications nearly undetectable, as we saw
with the disabled Vietnam vet in the movie Forrest Gump.
Films are (mostly) intentionally fictitious. Everybody applauds the genius of
the folks at Industrial Light and Magic and other such special-effects teams
because we can finally hold our imaginations right up there beside reality and
ask, "Why not?" 
But what if those images contained key legal evidence in a murder trial or
journalist's photographs of foreign atrocities? Conceivably, we could cheer on
a war based on evidence hacked up in a computer system.
This is a serious problem.
Ironically, even as our prehistoric contract with these visual channels of
truth is being rewritten, an old, old method may come to our rescue. 
In the middle ages, kings, dukes, barons, and anyone else who styled
themselves important would carve a design onto a stamp or a ring. This was
used to impress a wax or lead closure sealing the wrappings of items sent by
courier, ostensibly proving that the document or package did indeed come from
them and hence could be considered authentic.
In time, the stamp took the name of its function and became known as a "seal."
Figure 1 shows King John's seal, which he affixed to the Magna Carta to
represent his word of honor. By this act, a government acknowledged for the
first time its relationship with the rights and responsibilities of the
people.
When literacy became more widespread and it became common for people to have
more than one name, the written signature replaced the seal on most legal
documents and other important works.
If we are going to continue to trust images as evidence of true events, I
propose that we use that old digital magic to revive the original concept of
the royal seal, and apply it to the data streams that feed us truths. The
method should be easy, ubiquitous, difficult or impossible to defeat, and
fast. 
In this article, I outline a class of simple algorithms that satisfy most, if
not all of those requirements, and which can be implemented by anyone with
access to a computer language. I believe it has advantages in storage, speed,
and stealthiness over systems like those found in RSA Data Security's RSAREF
cryptography toolkit (specifically, MD2 and MD5). In addition to the C source
code for the algorithms presented here, executables and two test Targa images
are available electronically; see "Availability," page 3. Note that the code
is slower than it could be because it was written to illustrate algorithms,
and not built for speed. 


The Mechanization of Imagery


In the simplest and most common format, digital images are represented by
rectangular arrays of picture elements (pixels), each of which may or may not
be physically square. The common VGA display has 480 rows of 640 pixels each,
for instance. 
The numbers representing a single pixel, used to reconstruct its color and
intensity, come in assorted flavors. In direct representational models, they
are just brightness, either of a gray-level or a tri-stimulus component of a
color model.
The simplest model for direct image representation is 8-bit monochrome. Each
pixel is one byte deep and can thus display 256 levels of brightness. A
properly scaled image will use this fairly narrow dynamic range to represent
levels from black to white. A poorly scaled image will be illegibly washed out
in shades of gray. Figure 2 illustrates the logical structure of an 8-bit
monochrome image.
It's more interesting when you attempt to represent color. The most
straightforward approach is to use an 8-bit dynamic range for each of the
three phosphors in a modern CRT--red, green, and blue. Within limits, these
224 possible colors cover most of those discernible to the human eye. Figure 3
shows the use of 24-bit color.
The hue-lightness-saturation (HLS) and hue-saturation-brightness (HSB) models
both attempt to map the dynamic-response characteristic of the human retina so
that unavoidable quantizations will nevertheless cover color nuances that RGB
misses. And, in a pragmatic sense, choosing a color by varying HLS-style
parameters is much more natural than using sliders on red, green, and blue.
Color representations can be made with almost any set of tri-stimulus values.
Both Digital Image Processing, by W.K. Pratt (John Wiley & Sons, 1978), and
The Image Processing Handbook, by J.C. Russ (CRC Press, 1992) provide good
discussions of this topic.
The most common approach to representing color images has been to build an
8-bit palette, or indirection table, containing 24-bit color entries that
ultimately form the pixels displayed on a screen. The convoluted evolution of
display hardware in the computer era is the primary culprit behind the many
complexities of these approaches (preferred color model, bit planes,
range-matched bit depths, and the like). Palettization and compression cause
particular problems for the algorithm described here, so I'll set them aside
for now.
Normally, images only exist in space-wasting direct forms while they are being
edited and all their information must be quickly at hand. When stored or
transmitted, images are usually reduced in size by conversion to a palettized
form, by some sort of compression scheme, or both. Some compression algorithms
are so effective that they can lower storage requirements to less than one bit
per pixel. 
Unless these compression schemes are "lossless," they will destroy any
authentication information you might embed into an image. Different forms of
the sealing algorithm will have to be developed for these cases.


Where Have All My Pixels Gone?


Something is needed to verify that the received image is exactly the same as
the one sent, and as a side benefit to prove who sent it. No effort needs to
be made to hide or encrypt the image, as you are simply trying to assure that
the viewer see what he or she is meant to. (There is a subtle philosophy
operating here. When something has been encrypted, it begs to be exposed. But
a simple signature can, and most probably will, go completely unnoticed.)
How can you ensure that an image remains unchanged? The easiest way is to use
a checksum scheme. Regardless of the given pixel bit depth, you just add all
of them up using an unsigned integer-summing variable which equals or exceeds
the bit depth of the image pixels. The overflow bit is ignored. The result is
a single integer that changes if any single pixel is changed. See Algebraic
Methods for Signal Processing and Communications Coding, by R.E. Blahut
(Springer-Verlag, 1992), and Digital Signal Transmission, by C.C. Bissell and
D.A. Chapman (Cambridge University Press, 1992), for discussions of data
integrity. 
The probability that any two images will have the same checksum is related to
the bit width of that sum. If you only use an 8-bit integer, there is a 1:256
possibility that any two images share the same one. Clearly, you don't want
such a high probability that your test will come out positive even if an image
has been changed. 
If you increase the checksum width to 24 bits, the possibility recedes to a
more secure 1:16,777,216; and for that really warm feeling, you can use
checksums of bit width nxpixel depth simply by concatenating n adjacent pixels
to make larger integers.
Once you have exacted a checksum, it would be nice to embed it into the image
itself, so that no separate piece of information exists to aid and abet the
potential brigand. How is this done?


Bits of Strings and Sealing Wax...


Historically, 8-bits-per-pixel for monochrome images was driven primarily by
the desire to map system memory to display hardware one byte at a time,
speeding memory access. Even 24-bit images are often stored as 32-bit
quantities to allow the use of longword transfers when editing or painting.
Many of the less-significant bits are, however, purely noise caused by the
imaging device. From a visual standpoint, in live "natural" images, the noise
at this bit level is entirely masked by the complexity of the scene. Most
visual information is carried in the top nybble, with the bottom nybble going
along for the ride. Consequently, you can disguise your checksum as noise.
Figure 4 illustrates the basic method of doing this using a monochrome image.
You can view the checksum as an array of bits Ncs in length. A uniformly
distributed pseudorandom number generator (see Numerical Recipes in C, by W.H.
Press et al., Cambridge University Press, 1988, and Seminumerical Algorithms,
Second Edition, Vol. 2 of The Art of Computer Programming, by Donald E. Knuth,
Addison-Wesley, 1981) is used to map these bits onto a path of randomly
selected pixel locations within the limits of the image (you can call this a
"random walk"). 
At each location, the least-significant bit (LSB) of the pixel value is forced
to match the value of the corresponding nth checksum bit as n goes from 0 to
Ncs--1. The human eye cannot distinguish an LSB shift in intensity on most
commercial display systems that have eight or more bits of dynamic range.

Note that, on the average, 50 percent of the pixels will not be changed. This
is crucial to the security of the algorithm.
For simplicity, the example algorithm just uses the ASCII values of a seal
string truncated to the length of the random-number-generator seed. To ensure
that the addresses do not overlap, you should check the random walk
immediately after entry and ask the user to enter another one if it contains
any overlapping or intersecting pixel addresses.
Since the LSBs at random-walk locations clearly cannot be allowed to
contribute to the checksum calculation, you will have to use a three-stage
process. 
1. Obtain an acceptable sealing string from the user and build a random walk
from it. 
2. Go over the entire image to construct a checksum out of the seven
most-significant bits of each pixel. 
3. Embed the checksum bits, one by one, into the pixels at the random-walk
addresses.
To check the authenticity of an image, you essentially reverse the process on
the receiving end: Generate the random walk based on a user-entered seal, and
extract the embedded checksum based on the bits hidden in the LSBs of pixels
along the walk. Then measure the checksum using the upper seven bits of each
pixel. If the two numbers match, the image has not been tampered with. 
Hey! You're done! 
Figure 5 reveals the gray-level interpretation of the previous discussion
shown in this example. Any differences are purely printing effects.


...and Other Fancy Stuff


Anything which mixes up and shuffles the process I've just described will
decrease the probability that an unauthorized person can determine what the
seal was. Within probabilistic limits implied by the checksum width, images
cannot be modified without destroying an embedded seal, unless that seal is
known.
A straightforward checksum algorithm is not completely resistant to purposeful
attack. The summing order doesn't affect the outcome, so you can design a
"paintbrush" tool that operates by swapping pixels from other locations within
the image. 
Alternatively, blocks of pixels with a sub-checksum can be replaced by other
blocks having the same sub-checksum. For instance, it wouldn't be too tough to
replace one face with another.
The checksum process can be made sensitive to pixel placement in any number of
ways. Perhaps the simplest here would be to use the random sequence that
generated the random walk (it's already running). I'll call the subset of a
seal-based number sequence contained within an image a "seal space."
Instead of just adding pixel values as you scan the image, you can add or
subtract them based on the bits we encounter as we travel along seal space.
Alternatively, you could multiply the pixel by the respective members of the
seal space, modulo checksum bit width. Either approach works well. The second
way is shown in Listings One and Two .
To make it even harder to determine the seal, you could view the set of pixel
bytes as a set of bits instead and add up integers of varying bit widths
determined by the random sequence. For example, add two bits to the next seven
bits to the next three bits, and so forth. This can be clumsy to mechanize,
but it hopelessly loses the sense of the random sequence for anyone who hopes
to reconstruct it. 
Color images are even better for hiding checksum bits. You can use the same
random space to select among the color-vector (for example, red, green, and
blue bytes) LSBs. Or, if you're really feeling mean, you could transform each
pixel triplet to another color space (RGB to HLS, for instance) and use those
dimensions. Nor do you have to use a classic transform: You can pick your own,
as long as you remember that the bits ultimately chosen for modification must
not impact image quality.
Finally, you can also embed a large number of independent seals into an image,
each of which can have an independent set of checksum measurements. This opens
up all kinds of possibilities, including the ability to pick a scoundrel from
among a trusted group of seal owners.


Palettes


When a palette is calculated for a 16- or 24-bit image, the resulting colors
will not exactly match the originals--they are a compromise. Therefore, if a
sealed image is "palettized," it will entirely lose the sealing information
coded into the LSBs. So how can you seal a palette image? 
If you were to apply the exact monochrome sealing algorithm to an 8-bit
indirect color image, the result would look similar to "spike" noise.
Modifying the LSB of an arbitrary pixel will shift the color anywhere from a
tiny amount to an extreme amount depending on the color that is "next to" it.
You can just live with that, or you can rearrange the palette to suit our
algorithm. With any "real" image, a palette will contain groups of similar
colors, such as those providing shades of green for vegetation, or just the
normal shading of brightness that occurs when light strikes any
three-dimensional object. The trick is to locate these groups and rebuild the
palette such that very similar colors are adjacent in the table. 
Once a new palette is built and the image is remapped to it, the 8-bit
monochrome sealing algorithm can be used at leisure. 
One method is to extract the 128 most-different colors from the palette and
assign them to even-numbered indexes. Then search the palette again to find
the most-similar color to each of these, placing those in the adjacent
odd-numbered indexes. This will work for all but the most pathological of
palettes. Finally, this remapping is used to change the content of all the
image pixel data so that each points to its original reference color. 
A program that illustrates a minor variation of this technique is provided
electronically. (This program doesn't work very hard to find the 128
most-separate colors, but it still works well and is a bit faster.)
Figures 6(a) and 6(b) are examples of an 8-bit color-palette image containing
500 embedded seals. They illustrate the results with and without palette
reorganization, respectively.
This technique is even more effective when used with longer palettes, since
more colors to choose from means a higher probability of finding extremely
close color pairs. 


Compression


Unless it's lossless, don't do it.


The Hardware Seal Machine


The most important place to implement an image-sealing algorithm is at the
moment of image creation: within a scanner, still camera, or video system. The
video stream is the most demanding application. The design suggested here is
one way that a hardware answer can be constructed; see Figure 7.
Remember that the algorithm is at least two stages long. A double-buffered
image memory handles this by holding a frame for one cycle while the checksum
bits are extracted; these are then added to the video stream on the next cycle
when that frame is read out. The penalty is a 1-frame delay in the video
stream.
The pixel-address generator counts up pixels from the beginning of a frame.
The address is used to write pixel bytes into the input frame buffer, read
output pixel bytes from the output buffer, and to tell the walkspace
controller when to write out another checksum bit. It also passes a signal
that switches the frame-buffer definitions and clears the checksum register in
preparation for the next frame.
The walkspace controller contains logic that passes the appropriate bit
address (0checksum length--1) to the checksum control logic when a walkspace
address has been reached. The simplest way to implement this is just another
frame buffer filled with bit addresses where appropriate, and a NULL flag such
as $FF, where not (this approach mirrors the software implementation in
Listing One).
The checksum control logic has two functions. It contains two checksum
registers that are swapped with the frame buffer I/O definitions. One of these
performs the checksum addition function (or its analog), and the other outputs
the appropriate bits of the previous checksum to the LSB pixel switch in the
outgoing data stream. 
The output pixel switch sets the LSB of the outgoing data byte to match the
appropriate checksum bit, or passes the data through unchanged if the address
is not on a walkspace pixel.
The inverse of this design, a seal-checking machine, would not require the
double frame buffer but would otherwise look very similar. Checksums built
from the walkspace path would be collected when the checksum was constructed.
The two would be compared at the end of the frame.
This algorithm could also be used with ordinary analog VCRs and camcorders. By
raising the walk-bit level of the checksum bits to bit 1 or 2 instead of 0,
the seal signal can be raised above the noise of the recording system. When
seal checking, frame averaging could be used to improve the signal-to-noise
ratio. Sensitivity to absolute video level can be reduced by using relative
measures of local pixel neighborhoods to set the checksum bits.


How Secure is it Really?


Several features of this approach lead me to believe that it is very secure
indeed. In order to modify an image, yet maintain the seals intact, an
interloper would have to determine the location and order of all the hidden
checksum bits, the sequence with which the checksum was constructed, and
exactly how many seals are in the image. He would then be able to add the
incriminating evidence with his favorite paint program, and reseal the image.
Hmmm...

By the very nature of "natural" images, it is completely impossible to tell
the difference between a raw picture and one with a large number of
predetermined seals implanted. The embedded checksums cannot be distinguished
from the natural pixel noise. Even in "unnatural" images that do not contain
pixel noise, such as those produced by ray-tracing or CAD programs,
approximately 50 percent of the checksum bit locations can only be guessed at,
since they haven't changed the image at all!
Let's assume for a moment that somehow an intruder has figured out which
pixels contain the checksum. He may know the overall length of the checksum
because he has access to the source code used to seal the image. Now, in what
order are they? For n bits, there are n! possible combinations. For a 64-bit
checksum, this is a very large number. And each and every one of these is just
one possible seal, so...
...there's no way of determining how many seals have been applied. Even with
magic knowledge of both the (coded) checksum value and the exact random
sequence of a seal, a very large number of seals could still be embedded into
the image. An appropriate scenario might have more than one person in
authority independently contribute a seal to a sensitive image. Which ones
were used? Our man in the black hat only has to miss one for us to discover
his meddling. 
The number of seals that can be embedded at one time in an image is
infinitesimally small compared to the set of all possible seals. Remember that
the pixel-address space used for embedding checksum bits must not intersect
itself. 
You are therefore limited by the relationship 
Figure
where n is the maximum number of image seals; m is the pseudorandom modulus
and maximum bit width of a seal; Nr and Nc are the height and width of the
image, respectively; and r is the percentage of the image you use up for your
chunks of seal space.
If m=128 bits, Nr=Nc=128, and you require at least 50 percent of the image to
be unmodified, then a maximum of only 64 seals can be embedded.
If the industrious interloper has managed to locate the 50 percent of the seal
space he has visible to him (25 percent of the image), perhaps by using his
stolen copy of the unsealed image and subtracting to find changed pixels,
there are still 4096 bits floating around in 8192 possible locations. Pure
permutation tells you that you have 8192!/4096! potential random sequences--an
impossibly huge number. 
In truth, it must be admitted that the actual number is quite a bit less,
limited by how good your pseudo-random-number generator is and how many bits
you used in the modulus. 
For a perfect generator that does not repeat, the example has only 2128
possible random walks (this number is reduced a bit because we have to choose
non-self-intersecting sequences for the first m addresses). Out of this number
you have used 64. This leaves the interloper with a much improved 1 in >1030
chance of getting it right. If this isn't good enough for you, nothing
prevents you from using a longer modulus.
For larger images, the job is just harder. 
Clearly, it is not possible for you to be fooled by having the universe of all
possible seals embedded, so as to cover the ones you might have used.
To summarize the security of the techniques presented here, an interloper
cannot tell if an image has been sealed, has no way of finding the unchanged
LSBs, and cannot blanket your image with all possible seals.
That's safe enough for me. 
Figure 1 King John's seal.
Figure 2 Logical structure of an 8-bit monochrome image (pixel bit depth=3).
Figure 3 Logical structure of 24-bit color (pixel bit depth=24).
Figure 4 Disguising your checksum as noise using a monochrome image. (a)
Random walk sequence for two nonintersecting seals; (b) fixed modifications
due to seal embedding (checksum=00112).
Figure 5 Gray-level interpretation.
Figure 6: 8-bit color-palette images containing 500 embedded seals (a) with
and (b) without palette reorganization.
Figure 7 Real-time seal machine for 8-bit monochrome digital video.

Listing One 

// <sealimg.c>
// Copyright 1994 by Steve Walton
//
// This implementation measures checksums using all of the upper 7 bits of
each
// pixel, without varying the number of bits by random sequence. It does, 
// however, ensure order dependence by multiplying sum elements pairwise with 
// a pseudo-random sequence and accumulating a sum.
// The overall structure of this program is intended to roughly mirror or 
// simulate what could be used as a hardware design.
// A temporary file is used to store "images" of checksum bits and walk 
// locations. Obviously, this could be done in extended memory, but this
method
// will work on 286-class machines with little or no RAM running without a
swap
// manager, and is independent of image size. Up to 255 using checksum lengths
// up to 256 bits long are possible within the limits of this structure.
// The temporary disk file is roughly twice the size of image being tested 
// for seals. Named <temp> and placed in the launch path, it is deleted after 
// use. When filling the image with seal checksum bits, the sealspace map is 
// used to embed them in one pass through the image; the map tells us which 
// bit for which seal is to be forced into the pixel LSB.
// To reduce (apparent) code complexity and improve readability, this program
// is written using static global variables for file pointers, control 
// variables, and so forth. Code-crafting could improve the structure a great 
// deal, but this form is probably better for illustrating the underlying 
// algorithms. All user-written functions used in this program are contained 
// in this listing.
// Targa image format is used for file I/O, primarily because it is as 
// universal as TIFF and MUCH easier to use on a casual basis. Hopefully, the 
// structure is semi-self-evident from the code.
// Written with and compiled by Microsoft QuickC for Windows v1.00
// TARGA is a registered trademark of Truevision, Inc.
// TGA is a registered trademark of Truevision, Inc.

#include <stdio.h>
#include <stdlib.h>

#include <conio.h>
#include <process.h>
#include <malloc.h>
#include <math.h>

// Structures must be stored byte-aligned! Compile with switch /Zp1
(Microsoft)
// or the equivalent.

typedef unsigned long ULONG;
typedef unsigned short USHORT;
typedef unsigned short BOOL;
typedef unsigned char BYTE;

// Truevision TARGA file header format
typedef struct {
 BYTE idLength; // Identifies number of bytes in optional Targa Field 6
 BYTE colorMapType; // Type of color map (0=no color map, 1=color map used)
 BYTE imageType; // Type of image (1=uncompress color-mapped, 
 // 2=24-bit direct, 3= monochrome)
 USHORT firstEntry; // Index of first color map entry (usually 0)
 USHORT mapLength; // Length of color map (number of colors)
 BYTE entrySize; // Size of color map entry (bits)
 USHORT xOrigin; // Horiz coord of L-L image corner on display 
 USHORT yOrigin; // Vert coord of L-L image corner on display 
 USHORT width; // Width of the image in pixels
 USHORT height; // Height of the image in pixels
 BYTE pixDepth; // Number of bits in each pixel location 
 BYTE imageDesc; // Alpha Channel bits or overlay bits 
} tgaHeader_t;
// Truevision TARGA RGB triplet format
typedef struct {
 BYTE blu; // 8-bit Blue component
 BYTE grn; // 8-bit Green component
 BYTE red; // 8-bit Red component
} rgbTriplet_t;

typedef struct { // The per-pixel-address descriptor 
 BYTE sealNum; // ... Index number of seal 
 BYTE bitNum; // ... bit number (0=LSB) of the checksum 
} sealSpace_t;

typedef union { // Mechanizes our VERY SIMPLE pseudo-random-number 
 ULONG bits32; // generator seeds
 char asciiChar[4]; // The characters are kept for reference.
} seal_t;

#define TRUE 1
#define FALSE 0

#define MONO 3
#define PALETTE 1
#define NOSEAL 0xFF

// -------------- Global Variable List -------------------------------
ULONG
 checksum[256]
;
seal_t
 sealArray[255]; // Contains the seeds for each seal

FILE
 *fpIn, // Input file pointer
 *fpOut // Output file pointer
;
char
 inFileName[80], // Input file name string (allow 80 bytes)
 outFileName[80] // Output file name string (allow 80 bytes)
;
 //////////////////////////// 
 // F u n c t i o n s //
 //////////////////////////// 
//-----------------------u r a n d D o u b l e
()------------------------------
// A pseudo-random number generator designed to return floating-point 
// quantities as a percentage of a passed range argument. This is the standard

// linear congruential generator which uses the recurrence relation i(j+1) = 
// mod(m)[a * i(j) + c], where m is the modulus, a is the muliplier,
// and c is the offset. The randomness of these sequences is very dependent 
// upon a, m, and c. The value of i(0) is called the seed.
#define URANDDOUBLE_MULTIPLIER 2416L // Multiplier
#define URANDDOUBLE_OFFSET 374441L // Offset
#define URANDDOUBLE_MODULUS 1771875L

static unsigned long urandDoubleSeed = 3456L; // global static unsigned 
double urandDouble( double range ){ // long seed
 urandDoubleSeed = (urandDoubleSeed * URANDDOUBLE_MULTIPLIER + 
 URANDDOUBLE_OFFSET) % URANDDOUBLE_MODULUS;
 return( ((double)urandDoubleSeed * (range)) / (double)URANDDOUBLE_MODULUS );
}
//---------------------------u r a n d W A
()----------------------------------
// A pseudo-random number generator designed to return 16-bit quantities. This

// is an arrayed function of up to 256 outputs. This is the standard linear 
// congruential generator which uses the recurrence relation i(j+1) = mod(m)[a
// * i(j) + c], where m is the modulus, a is muliplier, and c is the offset.
// The randomness of these sequences is very dependent upon a, m, and c.
// The value of i(0) is called the seed.
#define URANDWA_MULTIPLIER 2416L // Multiplier
#define URANDWA_OFFSET 374441L // Offset
#define URANDWA_MODULUS 1771875L

static unsigned long
 urandWASeed[256]; // global static unsigned long seed
USHORT urandWA( short index ){
 urandWASeed[index] = (urandWASeed[index] * URANDWA_MULTIPLIER + 
 URANDWA_OFFSET) % URANDWA_MODULUS;
 return( (USHORT)( (double)urandWASeed[index] * 65536.0 / 
 (double)URANDWA_MODULUS ) );
}
//==================== s o l i c i t S e a l s ()
=============================
// Gets up to 255 4-character sealing strings from the user and fills the 
// file pointed to by <sealSpace> with seal# and bit# information. We are
using
// 32-bit checksums, and the walk space is driven by urandWord()
short solicitSeals( FILE *sealSpace, long sealSpaceLength ){
 BOOL
 sealGood = TRUE,
 done = FALSE
 ;
 short
 i,
 numSeal=0

 ;
 long walkAddress;
 char
 temp
 ;
 sealSpace_t
 sealPix
 ;
 seal_t
 inSeal
 ;
 printf( "\n----Seal entry----\n" );
 printf( "Each four characters will be taken as a seal. Each seal's 
 validity is\n" );
 printf( "checked as it is entered. Invalid (intersecting) seals are 
 not accepted.\n" );
 printf( "You may enter up to 255 seals (indices are 0..254)\n" );
 while( !done ){
 printf( "Enter seal %d (ESC to end): ", numSeal );
 i=0;
 while( i < 4 ){
 temp = (char)getch();
 if( temp == 27 ){
 done = TRUE;
 break;
 } else {
 inSeal.asciiChar[i] = temp;
 printf( "-" );
 i++;
 }
 }
 if( !done ){
 printf( "%c Checking %c%c%c%c...", 7, inSeal.asciiChar[0], 
 inSeal.asciiChar[1], inSeal.asciiChar[2], inSeal.asciiChar[3] );
 // First, check the sealspace for any of these locations...
 sealGood = TRUE;
 urandDoubleSeed = inSeal.bits32 % URANDDOUBLE_MODULUS;
 for( i=0; i<32; i++ ){
 walkAddress = (long)urandDouble( sealSpaceLength );
 if( walkAddress == sealSpaceLength ) walkAddress = sealSpaceLength-1;
 fseek( sealSpace, walkAddress*sizeof(sealSpace_t), SEEK_SET );
 fread( &sealPix, sizeof( sealSpace_t ), 1, sealSpace );
 if( sealPix.sealNum != NOSEAL ){
 sealGood = FALSE;
 break;
 }
 }
 if( sealGood ){
 printf( "ok -- embedding...\n" );
 sealArray[numSeal].bits32 = inSeal.bits32; // Copy into seal array 
 // embed this seal into the sealspace map
 urandDoubleSeed = inSeal.bits32 % URANDDOUBLE_MODULUS; 
 for( i=0; i<32; i++ ){
 walkAddress = (long)urandDouble( sealSpaceLength );
 if( walkAddress == sealSpaceLength ) walkAddress=sealSpaceLength-1;
 sealPix.sealNum = (BYTE)numSeal; // Keep sealArray index in the map
 sealPix.bitNum = (BYTE)i; // Keep bit number in the map
 fseek( sealSpace, walkAddress*sizeof(sealSpace_t), SEEK_SET );
 fwrite( &sealPix, sizeof( sealSpace_t ), 1, sealSpace );

 }
 numSeal++;
 if( numSeal == 255 ) done = TRUE; // Limit number of seals
 } else {
 printf( "no. Try another.\n" );
 }
 } // End checking and embedding section
 }
 printf( "\nYou have embedded %d seals.\n", numSeal );
 return( numSeal );
}
//======================= o p e n F i l e s ()
================================
// Open image files. Test each file for validity and return FALSE if something
// is wrong. There are all sorts of clever ways to avoid using a "goto" 
// statement, and I support most of them. This is one of the places where its 
// use is justified to generate short, quick code.
BOOL openFiles( void ){
 tgaHeader_t *hdr;
 // Go get temporary storage for the header so we can test it for validity
 hdr = (tgaHeader_t *)malloc( sizeof(tgaHeader_t) );
 if( hdr == NULL ){
 printf( "Your system's seriously ill! Can't allocate 18 bytes 
 of dynamic memory!" );
 goto bugout;
 }
 // OPEN THE INPUT FILE
 printf( "Enter complete input image file name: " );
 scanf( "%s", &inFileName[0] );
 fpIn = fopen( &inFileName[0], "rb" ); // Attempt to open the input file 
 if( fpIn == NULL ){
 printf( "Problem opening file %s\n", inFileName );
 goto errorExitCloseFile;
 }
 fread( hdr, sizeof(tgaHeader_t), (size_t)1, fpIn ); // Read the input block
 rewind( fpIn ); // Reset file pointer to zero
 if( hdr->imageType != MONO && hdr->imageType != PALETTE ){ 
 printf( "Input image is not Targa monochrome or palette." );
 goto errorExitCleanAll;
 }
 if( hdr->imageType == MONO &&
 (hdr->mapLength != 0 hdr->colorMapType != 0 hdr->pixDepth != 8 
 hdr->entrySize != 0 )){
 printf( "Corrupted monochrome image file." );
 goto errorExitCleanAll;
 }
 if( hdr->imageType == PALETTE &&
 (hdr->mapLength != 256 hdr->colorMapType == 0 
 hdr->pixDepth != 8 hdr->entrySize != 24 ) ){
 printf( "Not an appropriate 8-bit palette image file." );
 goto errorExitCleanAll;
 }
 // OPEN THE OUTPUT FILE
 printf( "Enter output image name: " );
 scanf( "%s", &outFileName[0] );
 fpOut = fopen( &outFileName[0], "wb" ); // Attempt to open the output file 
 if( fpOut == NULL ){
 printf( "Problem opening file %s\n", outFileName );
 goto errorExitCloseFile;
 }

 // If we've gotten this far, it's likely that everything is OK and we can 
 // get back to business...
 return( TRUE );
errorExitCleanAll: // Error exit point
 free( hdr );
errorExitCloseFile: // Another error exit point
 fcloseall();
bugout: // Get outta here!
 return( FALSE );
}
//=========================== m a i n () =====================================
// This program opens an 8-bit grey or 8-bit color mapped Targa file, reads it
// in row-by-row, and applies a list of seals to it.
void main( void ){
 BYTE
 *rowIn, // Pointer to row of input image pixels
 *rowOut // Pointer to row of output image pixels
 ;
 short
 i,j,k, // Temporary index variables
 numSeals // Total number of seals to check for in the image
 ;
 long
 sealSpaceLength // Total number of pixels in the input image.
 ;
 FILE
 *sealSpace // File pointer to temporary "seal space frame" image file
 ;
 tgaHeader_t
 inFileHeader // Will contain input file Targa header
 ;
 sealSpace_t
 *sealRow // Pointer to row of seal space elements
 ;
 rgbTriplet_t
 *palette // Pointer to a palette full of rgb triplets
 ;
 // Print out the program identification and default values
 printf( "<sealimg.c>\nCopyright 9/20/94 by Steve Walton\n" );
 if( !openFiles() ) exit(0); // Get the files from the user and open them
 fread( &inFileHeader, sizeof(tgaHeader_t), (size_t)1, fpIn ); 
 sealSpaceLength = (long)inFileHeader.width * (long)inFileHeader.height;
 sealSpace = fopen( "temp", "wb+" ); // Open sealspace file for random R/W
 sealRow = (sealSpace_t *)malloc( sizeof( sealSpace_t ) * 
 inFileHeader.width );
 printf( "Clearing sealspace...\n" );
 // Clear temporary file with seal walkspace data, since it will be filled 
 for( i=0; i<(short)inFileHeader.width; i++ ){ 
 sealRow[i].sealNum = NOSEAL;
 sealRow[i].bitNum = 0;
 }
 for( j=0; j<(short)inFileHeader.height; j++ ){ // Clear the sealspace file
 fwrite( sealRow,sizeof(sealSpace_t),(size_t)inFileHeader.width,sealSpace );
 }
 // Now that we have a place to put them, fill sealSpace with valid seals 
 // gotten from the user. Remember that the array containing the actual seal 
 // strings is the global static array <sealArray[255]>
 numSeals = solicitSeals( sealSpace, sealSpaceLength );
 // Allocate some memory for all of our image manipulation to come...

 rowIn = (BYTE *)malloc( inFileHeader.width ); 
 rowOut = (BYTE *)malloc( inFileHeader.width ); 
 // We will now go through the image and calculate the seal-warped checksums.
 // Checksums are modulo-32, based on an iterative multiply-accumulate 
 // operation of the form cs32 <- (ran16 * (pixel>>1) ) % 0xFFFFFFFF + cs32
 // where cs32 is the check sum "summing" variable, pixel is the 8-bit image 
 // pixel data at address N, and ran16 is the upper 16 bits of the Nth 
 // iteration of a 32-bit linear congruential generator. The modulo 0xFFFFFFFF
 // is obtained simply by using unsigned long integer arithmetic. 
 
 printf( "Measuring commuted checksums...\n" );
 for( j=0; j<numSeals; j++ ){
 checksum[j] = 0L; // Clear all of the checksums
 urandWASeed[j] = sealArray[j].bits32 % URANDWA_MODULUS; 
 // Set seeds to values implied by seals
 }
 fseek( fpIn, sizeof(tgaHeader_t) + (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ), SEEK_SET );
 for( i=0; i<(short)inFileHeader.height; i++ ){
 fread( rowIn, (size_t)1, (size_t)inFileHeader.width, fpIn );
 for( j=0; j<(short)inFileHeader.width; j++ ){
 for( k=0; k<numSeals; k++ ){
 checksum[k] += (ULONG)urandWA(k) * (ULONG)(rowIn[j]>>1); 
 }
 }
 }
 // Get the output image ready to go. Put a copy of the input file header at 
 // the beginning of the output file, and copy the color palette over if it 
 // is a palette-type image.
 fseek( fpOut, 0, SEEK_SET ); // Reset output file pointer to zero
 fwrite( &inFileHeader, sizeof( tgaHeader_t ), (size_t)1, fpOut ); 
 if( inFileHeader.imageType == PALETTE ){
 palette = (rgbTriplet_t *)malloc( (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ) );
 fseek( fpIn, sizeof(tgaHeader_t), SEEK_SET ); // Set input pointer 
 fread( palette, sizeof( rgbTriplet_t ), 
 (size_t)inFileHeader.mapLength, fpIn ); // Read palette
 fseek( fpOut, sizeof(tgaHeader_t), SEEK_SET ); // Set output pointer 
 fwrite( palette, sizeof( rgbTriplet_t ), (size_t)inFileHeader.mapLength, 
 fpOut ); // Write palette
 free( palette ); // Free this -- we don't use the palette
 }
 // Set the file pointers of all files to the beginning of the image data
 fseek( fpIn, sizeof(tgaHeader_t) + (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ), SEEK_SET );
 fseek( fpOut, sizeof(tgaHeader_t) + (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ), SEEK_SET );
 fseek( sealSpace, 0, SEEK_SET );

 // Go through the image and embed checksums. Use the sealspace map to tell 
 // which bit of which checksum to place with each pixel...
 printf( "Embedding checksums into image data...\n" );
 for( i=0; i<(short)inFileHeader.height; i++ ){
 fread( rowIn, (size_t)1, (size_t)inFileHeader.width, fpIn );
 fread( sealRow,sizeof(sealSpace_t), (size_t)inFileHeader.width,sealSpace );
 for( j=0; j<(short)inFileHeader.width; j++ ){
 if( sealRow[j].sealNum == NOSEAL ){
 rowOut[j] = rowIn[j];
 } else {

 rowOut[j] = (BYTE)(
 (ULONG)(rowIn[j] & 0xFE) 
 (0x01L & (checksum[ sealRow[j].sealNum] >> sealRow[j].bitNum ))
 );
 }
 }
 fwrite( rowOut, (size_t)1, (size_t)inFileHeader.width, fpOut );
 }
 free( rowIn );
 free( rowOut );
 free( sealRow );
 fcloseall();

 system( "del temp" );
 printf( "Execution complete.\n" );
}



Listing Two 

// <testseal.c> Copyright 1994 by Steve Walton
// This implementation measures checksums using all of the upper 7 bits of
each
// pixel, without varying number of bits by random sequence. It does, however,
// ensure order dependence by multiplying sum elements pairwise with a 
// pseudo-random sequence and accumulating a sum. The overall structure of
this
// program is intended to roughly mirror or simulate what could be used as 
// a hardware design. A temporary file is used to store "images" of checksum 
// bits and walk locations. Obviously, this could be done in extended memory, 
// but this method will work on 286-class machines with little or no RAM 
// running without a swap manager, and is independent of image size. Up to 255

// using checksum lengths up to 256 bits long are possible within the limits 
// of this structure. The temporary disk file is roughly twice the size of the
// image being tested for seals. Named <temp> and placed in the launch path,
it
// is deleted after use. When checking the image for seal checksum bits, the 
// sealspace map is used to find them in one pass through the image; the map 
// tells us which bit for which seal is to be forced into the pixel LSB. To
// reduce (apparent) code complexity and improve readability, this program is 
// written using static global variables for file pointers, control variables,
// and so forth. Code-crafting could improve the structure a great deal, but 
// this form is probably better for illustrating the underlying algorithms. 
// All user-written functions used in this program are contained here.
// Targa image format is used for file I/O, primarily because it is as 
// universal as TIFF and MUCH easier to use on a casual basis. Hopefully, the 
// structure is semi-self-evident from the code. Written with and compiled by 
// Microsoft QuickC for Windows v1.00. TARGA is a registered trademark of 
// Truevision, Inc. TGA is a registered trademark of Truevision, Inc.
//
#include <stdio.h>
#include <stdlib.h>
#include <conio.h> 
#include <process.h>
#include <malloc.h>
#include <math.h>

// Structures must be stored byte-aligned! Compile with switch /Zp1
(Microsoft)
// or equivalent.

typedef unsigned long ULONG;

typedef unsigned short USHORT;
typedef unsigned short BOOL;
typedef unsigned char BYTE;

// Truevision TARGA file header format
typedef struct {
 BYTE idLength; // Identifies number of bytes in optional Targa Field 6
 BYTE colorMapType; // Type of color map (0=no color map, 1=color map used)
 BYTE imageType; // Type of image (1=uncompress color-mapped, 
 // 2=24-bit direct, 3=monochrome)
 USHORT firstEntry; // Index of first color map entry (usually 0)
 USHORT mapLength; // Length of color map (number of colors)
 BYTE entrySize; // Size of color map entry (bits)
 USHORT xOrigin; // Horiz coord of lower-left image corner on a display
 USHORT yOrigin; // Vert coord of lower-left image corner on a display 
 USHORT width; // Width of the image in pixels
 USHORT height; // Height of the image in pixels
 BYTE pixDepth; // Number of bits in each pixel location 
 BYTE imageDesc; // Alpha Channel bits or overlay bits 
} tgaHeader_t;
// Truevision TARGA RGB triplet format
typedef struct {
 BYTE blu; // 8-bit Blue component
 BYTE grn; // 8-bit Green component
 BYTE red; // 8-bit Red component
} rgbTriplet_t;

typedef struct { // per-pixel-address descriptor for keeping embedded checksum

bit locations
 BYTE sealNum; // Index number of seal associated with the checksum partially
 // embedded at this location
 BYTE bitNum; // Bit number (0=LSB) of the checksum associated with this seal
} sealSpace_t;
typedef union { // Mechanizes pseudo-random-number generator seeds
ULONG bits32; 
 char asciiChar[4]; // The characters are kept for reference.
} seal_t;
#define TRUE 1
#define FALSE 0
#define MONO 3
#define PALETTE 1
#define NOSEAL 0xFF
// -------------- Global Variable List -------------------------------
ULONG
 checksum[256], // array of 32-bit image checksums, implied by seals
 checksumEmbedded[256] // array of 32-bit image checksums stripped from 
 // embedded walk sequences
;
seal_t
 sealArray[255]; // Contains the seeds for each seal

FILE
 *fpIn, // Input file pointer
 *fpOut // Output file pointer
;
char
 inFileName[80], // Input file name string (allow 80 bytes)
 outFileName[80] // Output file name string (allow 80 bytes)
;

 //////////////////////////// 
 // F u n c t i o n s //
 //////////////////////////// 
//======================== u r a n d D o u b l e ()
===========================
// Pseudo-random number generator designed to return floating-point quantities
// as a percentage of a passed range argument. This is the standard linear 
// congruential generator which uses the recurrence relation i(j+1) = mod(m)[a
// * i(j) + c], where m is the modulus, a is the muliplier, and c is offset. 
// The randomness of these sequences is very dependent upon a, m, and c.
// The value of i(0) is called the seed.
//
#define URANDDOUBLE_MULTIPLIER 2416L // Multiplier
#define URANDDOUBLE_OFFSET 374441L // Offset
#define URANDDOUBLE_MODULUS 1771875L
static unsigned long urandDoubleSeed = 3456L;
double urandDouble( double range ){
 urandDoubleSeed = (urandDoubleSeed * URANDDOUBLE_MULTIPLIER + 
 URANDDOUBLE_OFFSET) % URANDDOUBLE_MODULUS;
 return( ((double)urandDoubleSeed * (range)) / (double)URANDDOUBLE_MODULUS );
}
//========================= u r a n d W A () =================================
// A pseudo-random number generator designed to return 16-bit quantities. This
// is an arrayed function of up to 256 outputs. This is the standard linear 
// congruential generator which uses the recurrence relation i(j+1) = mod(m)[a
// * i(j) + c], where m is the modulus, a is the muliplier, and c is offset.
// The randomness of these sequences is very dependent upon a, m, and c. 
// The value of i(0) is called the seed.
//
#define URANDWA_MULTIPLIER 2416L // Multiplier
#define URANDWA_OFFSET 374441L // Offset
#define URANDWA_MODULUS 1771875L

static unsigned long
 urandWASeed[256]; // global static unsigned long seed

USHORT urandWA( short index ){
 urandWASeed[index] = (urandWASeed[index] * URANDWA_MULTIPLIER + 
 URANDWA_OFFSET) % URANDWA_MODULUS;
 return( (USHORT)( (double)urandWASeed[index] * 65536.0 / 
 (double)URANDWA_MODULUS ) );
}
//========================= s o l i c i t S e a l s () =====================
// Gets up to 255 4-character sealing strings from the user and fills the file
// pointed to by <sealSpace> with seal# and bit# information. We are using 
// 32-bit checksums, and the walk space is driven by urandWord()
short solicitSeals( FILE *sealSpace, long sealSpaceLength ){
 BOOL
 sealGood = TRUE,
 done = FALSE
 ;
 short
 i, // Temporary index variable
 numSeal=0
 ;
 long 
 walkAddress;
 char
 temp
 ;

 sealSpace_t
 sealPix
 ;
 seal_t
 inSeal
 ;
 printf( "\n---- Seal Test List Entry ----\n" );
 printf( "Each four characters entered will be taken as a seal. \n" );
 printf( "You may submit up to 255 seals for testing (indices are 0..254)\n"
);
 printf( "If you get self-intersections, at least two seals in your list\n" );
 printf( "are mutually exclusive, one of which cannot be present.\n" );

 while( !done ){
 printf( "Enter seal %d (ESC to end): ", numSeal );
 i=0;
 while( i < 4 ){
 temp = (char)getch();
 if( temp == 27 ){
 done = TRUE;
 break;
 } else {
 inSeal.asciiChar[i] = temp;
 printf( "-" );
 i++;
 }
 }
 if( !done ){
 printf( "%c (checking for intersections...", 7 );
 // First, check the sealspace for any of these locations...
 sealGood = TRUE;
 urandDoubleSeed = inSeal.bits32 % URANDDOUBLE_MODULUS;
 for( i=0; i<32; i++ ){
 walkAddress = (long)urandDouble( sealSpaceLength );
 if( walkAddress == sealSpaceLength ) walkAddress = sealSpaceLength-1;
 fseek( sealSpace, walkAddress*sizeof(sealSpace_t), SEEK_SET );
 fread( &sealPix, sizeof( sealSpace_t ), 1, sealSpace );
 if( sealPix.sealNum != NOSEAL ){
 sealGood = FALSE;
 break;
 }
 }
 if( sealGood ){
 printf( "ok.)\n" );
 sealArray[numSeal].bits32 = inSeal.bits32; 
 // embed this seal into the sealspace map
 urandDoubleSeed = inSeal.bits32 % URANDDOUBLE_MODULUS;
 for( i=0; i<32; i++ ){
 walkAddress = (long)urandDouble( sealSpaceLength );
 if( walkAddress == sealSpaceLength ) walkAddress=sealSpaceLength-1;
 sealPix.sealNum = (BYTE)numSeal; // Keep sealArray index in the map
 sealPix.bitNum = (BYTE)i; // Keep bit number in the map
 fseek( sealSpace, walkAddress*sizeof(sealSpace_t), SEEK_SET );
 fwrite( &sealPix, sizeof( sealSpace_t ), 1, sealSpace );
 }
 numSeal++;
 } else {
 printf( "no. Try another.)\n" );
 }
 } // End checking and embedding section

 }
 printf( "\nYou have submitted %d seals for testing.\n", numSeal );
 return( numSeal );
}
//============================ o p e n F i l e s () ==========================
// Open image files. Test each file for validity and return FALSE if something
// is wrong. There are all sorts of clever ways to avoid using a "goto" 
// statement, and I support most of them. This is one of the places where 
// its use is justified to generate short, quick code.
BOOL openFiles( void ){
 tgaHeader_t *hdr;
 // Get temporary storage for the header so we can test it for validity
 hdr = (tgaHeader_t *)malloc( sizeof(tgaHeader_t) );
 if( hdr == NULL ){
 printf( "Your system's seriously ill! Can't allocate 18 bytes 
 of dynamic memory!" );
 goto bugout;
 }
 // OPEN THE INPUT FILE
 printf( "Enter complete input image file name: " );
 scanf( "%s", &inFileName[0] );
 fpIn = fopen( &inFileName[0], "rb" ); 
 if( fpIn == NULL ){
 printf( "Problem opening file %s\n", inFileName );
 goto errorExitCloseFile;
 }
 fread( hdr, sizeof(tgaHeader_t), (size_t)1, fpIn );
 rewind( fpIn );
 if( hdr->imageType != MONO && hdr->imageType != PALETTE ){ 
 printf( "Input image is not Targa monochrome or palette." );
 goto errorExitCleanAll;
 }
 if( hdr->imageType == MONO &&
 (hdr->mapLength != 0 hdr->colorMapType != 0 hdr->pixDepth != 8 
 hdr->entrySize != 0 )){
 printf( "Corrupted monochrome image file." );
 goto errorExitCleanAll;
 }
 if( hdr->imageType == PALETTE &&
 (hdr->mapLength != 256 hdr->colorMapType == 0 
 hdr->pixDepth != 8 hdr->entrySize != 24 ) ){
 printf( "Not an appropriate 8-bit palette image file." );
 goto errorExitCleanAll;
 }
 // If we've gotten this far, it's likely that everything is OK and we can 
 // get back to business...
 return( TRUE );
errorExitCleanAll: // Error exit point
 free( hdr );
errorExitCloseFile: // Another error exit point
 fcloseall();
bugout: // Get outta here!
 return( FALSE );
}
 //////////////////////////// 
 // m a i n () //
 //////////////////////////// 

//==========================================================================

// This program opens either an 8-bit grey or 8-bit color-mapped Targa image,
// reads it in row-by-row, and checks for presence of a list of embedded
seals.

void main( void ){
 BYTE
 *rowIn // Pointer to row of input image pixels
 ;
 short
 i,j,k, // Temporary index variables
 numSeals // Total number of seals to check for in the image
 ;
 long
 sealSpaceLength // Total number of pixels in the input image.
 ;
 FILE
 *sealSpace // File pointer to temporary "seal space frame" image file
 ;
 tgaHeader_t
 inFileHeader // Will contain input file Targa header
 ;
 sealSpace_t
 *sealRow // Pointer to row of seal space elements
 ;
 // Print out the program identification and default values
 printf( "<testseal.c>\nCopyright 9/20/94 by Steve Walton\n" );

 if( !openFiles() ) exit(0); // Get the files from the user and open them

 fread( &inFileHeader, sizeof(tgaHeader_t), (size_t)1, fpIn );
 sealSpaceLength = (long)inFileHeader.width * (long)inFileHeader.height;

 sealSpace = fopen( "temp", "wb+" ); // Open sealspace file for random R/W
 sealRow = (sealSpace_t *)malloc( sizeof( sealSpace_t )*inFileHeader.width);

 printf( "Clearing sealspace...\n" );
 // Clear the temporary file with the seal walkspace data
 for( i=0; i<(short)inFileHeader.width; i++ ){ 
 sealRow[i].sealNum = NOSEAL;
 sealRow[i].bitNum = 0;
 }
 for( j=0; j<(short)inFileHeader.height; j++ ){ 
 fwrite( sealRow, sizeof(sealSpace_t),(size_t)inFileHeader.width,sealSpace);
 }
 // Now that we have a place to put them, fill sealSpace with valid seals 
 // gotten from the user. Remember that the array containing the actual seal 
 // strings is the global static array <sealArray[255]>
 numSeals = solicitSeals( sealSpace, sealSpaceLength );

 // Allocate some memory for all of our image manipulation to come...
 rowIn = (BYTE *)malloc( inFileHeader.width ); 
 //rowOut = (BYTE *)malloc( inFileHeader.width ); 

 // We will now go through the image and calculate seal-warped checksums.
 // Checksums are modulo-32, based on an iterative multiply-accumulate 
 // operation of the form cs32 <- (ran16 * (pixel>>1) ) % 0xFFFFFFFF + cs32
 // where cs32 is the check sum "summing" variable, pixel is the 8-bit image 
 // pixel data at address N, and ran16 is the upper 16 bits of the Nth 
 // iteration of a 32-bit linear congruential generator. The modulo 0xFFFFFFFF
 // is obtained simply by using unsigned long integer arithmetic.

 printf( "Measuring checksums...\n" );
 for( j=0; j<numSeals; j++ ){
 checksum[j] = 0L; // Clear all of the checksums
 urandWASeed[j] = sealArray[j].bits32 % URANDWA_MODULUS; 
 }
 fseek( fpIn, sizeof(tgaHeader_t) + (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ), SEEK_SET );
 for( i=0; i<(short)inFileHeader.height; i++ ){
 fread( rowIn, (size_t)1, (size_t)inFileHeader.width, fpIn );
 for( j=0; j<(short)inFileHeader.width; j++ ){
 for( k=0; k<numSeals; k++ ){
 checksum[k] += (ULONG)urandWA(k) * (ULONG)(rowIn[j]>>1); 
 }
 }
 }
 // Set the file pointers of all files to the beginning of the image data
 fseek( fpIn, sizeof(tgaHeader_t) + (inFileHeader.mapLength * 
 sizeof( rgbTriplet_t ) ), SEEK_SET );
 fseek( sealSpace, 0, SEEK_SET );
 // go through the image and embed checksums. Use the sealspace map to tell 
 // which bit of which checksum to place with each pixel...
 printf( "Checking for embedded checksums in image data...\n" );
 for( i=0; i<numSeals; i++ ) checksumEmbedded[i] = 0L; 
 for( i=0; i<(short)inFileHeader.height; i++ ){
 fread( rowIn, (size_t)1, (size_t)inFileHeader.width, fpIn );
 fread( sealRow, sizeof(sealSpace_t),(size_t)inFileHeader.width,sealSpace );
 for( j=0; j<(short)inFileHeader.width; j++ ){
 if( sealRow[j].sealNum != NOSEAL ){
 checksumEmbedded[sealRow[j].sealNum] = (ULONG)( rowIn[j] & 0x01 ) 
 << sealRow[j].bitNum;
 }
 }
 }
 for( i=0; i<numSeals; i++ ){
 if( checksum[i] == checksumEmbedded[i] ){
 printf( "#%d, %c%c%c%c, is present\n", i,
 sealArray[i].asciiChar[0], sealArray[i].asciiChar[1], 
 sealArray[i].asciiChar[2], sealArray[i].asciiChar[3] );
 } else {
 printf( "#%d, %c%c%c%c, is not present\n", i,
 sealArray[i].asciiChar[0], sealArray[i].asciiChar[1], 
 sealArray[i].asciiChar[2], sealArray[i].asciiChar[3] );
 }
 }
 free( rowIn );
 free( sealRow );
 fcloseall();

 system( "del temp" );
 printf( "Execution complete.\n" );
}












The Detrimental Wire Exclusion Heuristic


A new approach to combinatorial optimization




Paul J. Martino


Paul is a computer-science major at Lehigh University and president of Ahpah
Software, a company working on the upcoming graphics adventure game,
"Vengeance." Paul can be contacted on CompuServe at 73340,3456.


The problem of computer wiring belongs to the field of combinatorial
optimization. It is analogous to the shortest-Hamiltonian-path problem except
that a source/terminator rule dictates where the tour must begin and end. The
computer-wiring problem is classified as NP-Hard (Lawler et al.). In order to
find the optimal solution with 100 percent assurance, branching and bounding
techniques must typically be used. These processes usually require inordinate
program run times and are prohibitive for large numbers of pins. Therefore,
heuristics that create good tours are desirable, even if the tours are
sometimes suboptimal.
The two main classes of heuristics for combinatorial-optimization problems are
tour creation and tour improvement. The first produces a tour without having a
tour as a starting point (nearest neighbor, for example); the second improves
upon an initial tour (like Lin-Kernighan). Some heuristics, such as Krolak,
Felts, and Marble's man-machine approach, incorporate both classes of
heuristics. This is the type of heuristic I present in this article. This new
heuristic can be applied to combinatorial-optimization problems, including the
traveling-salesman and vehicle-routing problems. 


Computer-Wiring Theory


Underlying the computer-wiring problem is the assumption that, given a set of
n pins, there are n(n--1)/2 distances between any two pins in the set. These
individual distances are referred to as "wires" or "edges." The nxn matrix of
wires is called the "cost matrix" and represents any notion of distance such
as time, length, or cost. The value of the wire between locations i and j is
noted as Cij. The objective of the computer-wiring problem is to find the set
of n--1 wires in C that creates the minimum-valued serial tour through all
pins. Symmetric problems have Cij=Cji for all i and j and Cii=0 for all i. 
The concept of this heuristic is that although a wire is short, it is not
necessarily effective to use in the tour. This notion was derived through
numerous runs of the nearest-neighbor heuristic, which works as follows:
1. Choose a pin, i, from the set of all pins to connect.
2. Select the smallest Cij in which pin j has not been visited. Add this wire
to the end of the connection order. 
3. Set i=j and repeat step 2 until all pins are connected. 
The nearest-neighbor heuristic is classified as "greedy" because it always
attempts to use the shortest remaining wire. This process typically results in
the use of extremely long wires toward the end of the tour. Figure 1(a)
illustrates how the early use of the short Wire A leads to the use of the long
Wire E. Figure 1(b) illustrates a much better tour that initially uses Wire F
instead of Wire A. The new heuristic isolates and eliminates these short wires
(like Wire A) that are detrimental to the overall tour. This isolation and
elimination of specific wires is called "detrimental-wire exclusion." 
Many of the short but detrimental wires are part of a minimum spanning tree
(MST) for the set of pins. The MST is the set of the shortest n--1 wires that
form a jointed net that includes all pins; see Figure 2. The lengths of the
edges in this figure are a measure of the Cij values.
Kruskal proves a polynomial algorithm for the MST problem. This and other
algorithms do not enforce the restriction that each pin can touch a maximum of
two others. To enforce this restriction, Held and Karp and Christofides use
Lagrangian multipliers. Later work with Lagrangian multipliers is done by
Volgenant and Jonker. Note that an MST is not always unique--problems with
several wires of the same length typically have a few MSTs. 
To find a pattern in the short but detrimental wires, we examined both classic
and new, randomly generated problems. We found that many of the detrimental
wires are MST wires that include multiply connected pins (those which have
more than one wire attached to them). Wires A, B, E, G, H, J, and K in Figure
2 are examples of these. 


The Detrimental-Wire-Exclusion Heuristic


The concepts I've discussed lead to the following heuristic for the
computer-wiring problem:
1. If not given the cost matrix C, create it based on an appropriate cost
function. 
2. Sort the wires of the matrix from shortest to longest and mark all wires as
eligible for use in the tour.
3. Using the n--1 shortest wires that include all n pins, create an MST for
the set of pins. Any optimal MST algorithm can be used. This tree will serve
as a model for wire exclusion in later steps. 
4. Create a tour using this subheuristic: 
 (a) Pick the shortest eligible wire from the wire list that does not cause
any pin to be connected to more than two wires and does not create a closed
circuit within the current subtour. Add the selected wire to the list of wires
to be used in the tour. 
 (b) Repeat step 4(a) until a tour of n--1 wires is created. Disjoint wires
can be selected during all but the final repetition of step 4(a). 
5. Starting at any pin i, mark a possibly detrimental wire that connects to
pin i as ineligible for use in the tour. Then create a new tour as in step 4.
Recall that a possibly detrimental wire is a member of the MST found in step
3, and it connects to a pin that has at least one other minimum-tree wire
connected to it. Note that pin i does not always have a possibly detrimental
wire connected to it. 
6. If the new tour is shorter than any previously seen tour, permanently mark
that wire as ineligible and store this tour as the best one seen.
7. Repeat steps 5 and 6 for each pin in the tour. Some possibly detrimental
wires may be excluded twice because both pins the wire connects may be
multiply connected.
8. The shortest tour found by steps 4--7 is the one the heuristic produces.


An Example


Figure 3 illustrates the use of the detrimental-wire-exclusion heuristic on a
set of six pins labeled p1 through p6. The 15 wires are noted as w1 through
w15, where w1 is the shortest wire and w15 is the longest. Tables 1 and 2 list
the pins and wires used in this example. Distances in this example are
computed by the Manhattan metric (discussed on the following page.) Figure
3(a) shows a minimum spanning that uses wires w1, w2, w3, w4, and w7 to
include all 6 of the pins. The number in brackets at the bottom left of the
diagrams is the total length of all the wires used in the diagram. After a
tree is found, all wires are marked as eligible and the subheuristic is
called. The resulting tour is shown in Figure 3(b). 
Since p1 is not multiply connected, move to p2. Exclude w1, w4, and w7 one at
a time because they all multiply connect to p2. Figure 3(c) is the tour
created without w1. Since this tour is shorter than any seen previously, w1 is
permanently marked as ineligible. Figure 3(d) is the tour created without the
use of w4 or w1. Since it is longer than the shortest tour, w4 is again made
eligible. Figure 3(e) is the tour created without w7 or w1. Since it is the
same length as the shortest seen tour, the conditions prior to the exclusion
of w7 are restored. All the wires that multiply connect p2 have been tested;
since p3 is not multiply connected, move to p4. Now exclude w1, w2, and w3 one
at a time because they all multiply connect to p4. Since w1 is already
permanently marked as ineligible, there is no need to create a new tour. Move
to the next wire. Figure 3(f) is the tour created without w2 or w1. It is
longer than the shortest tour, so w2 is again made eligible. Figure 3(g) is
the tour created without w3 or w1. Since this tour is longer than the best
tour seen, w3 is again made eligible. Since p5 and p6 are not multiply
connected and p6 is the last pin, end the heuristic and return the best tour. 
This example is typical of the behavior of the heuristic. A fairly good
starting point, Figure 3(b), is improved upon by deleting wire w1. The more
pins, the greater the improvement that can be expected, because the accuracy
of the starting tour decreases as the number of pins increases. 


Coding Considerations



When implementing this new heuristic, you must consider several points. For
instance, when given a set of pins to connect, the pin location should be
stored in the pin-data structure (when minimizing a cost matrix, this is
unnecessary). The first component of the pin-data structure is a
three-dimensional coordinate system for routing multilayer circuit boards. The
second component should be an array of wire indexes to which each pin is
connected in the minimum tree. Storing this information directly in the pin
structure avoids look-up time and decreases program run time. The third
component of the structure should be a ring index. 
To prevent short circuits in the connection order, assign each pin its own
ring index at the beginning of the heuristic. Now each wire connects to two
pins, and each pin has an index. When a wire is selected, all pins with a ring
index the same as that of the pin with the lower ring index of the two are set
equal to the higher ring index. Prior to a wire being selected, enforce the
rule that pins with the same ring index cannot be connected. 
The wire structure should contain the indexes of the two pins that the wire
connects, the cost of the wire, a permanent-ineligibility flag, and a
temporary-ineligibility flag. Placing these flags here will avoid look-up
time. 
Listings One and Two provide a C implementation of this algorithm. The program
compiles with Borland C without any modifications using bcc -ml -etspheur *.c.
Changing <alloc.h> to <malloc.h> allows you to compile with Microsoft C.
Provided electronically are the files PROB56.CTY, problem #56, and TSPHEUR.C
and TSPHEUR.H, the command-line interface to the algorithm; see
"Availability," page 3.


Building Cost Matrixes


In many combinatorial-optimization problems, the cost matrix is known. In
others, only a set of locations is given. The simplest way to build the cost
matrix for a set of points is to use the traditional distance formula on all
point pairs. In many cases, this is not an accurate indicator of the cost of
traveling between the two points. Therefore, other techniques for building
cost matrices are needed. 
One technique is the Manhattan distance. Given any two points in
three-dimensional space, the Manhattan distance between (x1, y1, z1) and (x2,
y2, z2) is defined as M(p1, p2)=x1--x2+y1--y2+z1--z2.
This calculation is frequently used in the routing of multilayer circuit
boards. In order to restrict the usage of vias (movement from one layer to
another), a sufficiently large coefficient must be used on the z1--z2 term.
Another technique involves weighted bitmaps. Given two locations on a bitmap
in which each cell of the map indicates the cost and/or plausibility of using
that cell, the distance between those points can be found by using a
minimum-cost routing algorithm. A routing algorithm is needed when the
locations to be connected have blockages between them or if certain places on
the bitmap are inherently more expensive than others. For example, certain
parts of a circuit board are made of more expensive material than others.
Therefore, placing an etch on the less expensive space lowers the total cost
of the etch.


Computational Results


Lower- and upper-bounding techniques are used to quantify the usefulness of
the heuristic. The minimum spanning distance is the absolute lower bound; that
is, a tour can never be shorter than the minimum spanning distance. The
nearest-neighbor heuristic is used as an upper bound for two reasons: First, a
heuristic that creates tours longer than those of the nearest-neighbor
heuristic is usually not effective. Second, many circuit boards are designed
so that pins that need to be connected are placed sufficiently close to one
another; hence, tours created by the nearest neighbor are normally ample
solutions. 
In addition to the problems listed in Table 3, a series of 945 trials has been
performed that includes varying densities of cities. Half of the trials
include the enforcement of a source/terminator rule. Half of the trials used
the Manhattan-distance calculation, and the other half used the traditional
distance formula. In addition, several trials use air mileage between cities
in the United States. On average, the tours generated by the
detrimental-wire-exclusion heuristic were 19.1 percent longer than the minimum
spanning distance; the tours generated by the nearest neighbor were 16.4
percent longer than those created by the new heuristic. 
The pin locations for the listed problems have been randomly generated with an
X and Y range of 30 to 830 in multiples of 10. The pin locations used in
problem 56 are listed in Table 4. The tour generated for problem 56 has a
length of 6161. In this problem, distances between cities are computed by the
traditional distance formula. 
The problems listed in Table 3 were computed on an Intel 386/25 CPU. The
complete list of data indicates a run-time growth with about n2.6. This can be
explained by the fact that the creation of a tour by the subheuristic usually
does not require O(n2) time. Our research indicates that, typically, only the
first n1.6 wires of a n2 wire list need to be examined to create a connection
order for Euclidean and real-world problems. 


Multiple-Wire Exclusion


The heuristic as described thus far excludes one wire at a time. In some
cases, however, multiple-wire exclusion may lead to a slight improvement in
tour length, and it can be incorporated easily. Once all wires have been
excluded one at a time, they can be excluded two at a time and so forth. A
loop around the calling of the subheuristic makes for a simple implementation.
The main problem with multiple-wire exclusion is that run time increases
dramatically. Excluding wires one at a time has a worst case run time of
O(n3). If the maximum wires to be excluded at one time is k, then there will
be at least nCk calls to the subheuristic. For a fixed k of less than half of
n, the worst-case run time will be O(nk+2). For instance, excluding two wires
at a time leads to the number of calls to the subheuristic in Example 1. Since
the worst-case run time of the subheuristic is O(n2), the total run time will
be O(n4). 


Conclusions


Although problems with only 175 pins or less were tackled, the heuristic
presented here is portable and can be implemented on more powerful machines.
Therefore, problems with thousands of pins can be quickly solved. Since the
computer-wiring problem is similar to other combinatorial optimization
problems, the heuristic can be applied to many other optimization
applications.
Acknowledgments
I'd like to thank John and Charles Martino for their assistance in writing
this paper. 


References


Christofides, N. "The Shortest Hamiltonian Chain of a Graph." SIAM Journal of
Applied Mathematics. Vol. 19, 1970.
Felts, W., P. Krolak, and G. Marble. "A Man-Machine Approach Toward Solving
the Traveling Salesman Problem." Communications of the ACM. Vol. 14, 1971.
Held, M. and R.M. Karp. "The Traveling-Salesman Problem and Minimum Spanning
Trees." Operations Research. Vol. 18, 1970.
------. "The Traveling-Salesman Problem and Minimum Spanning Trees: Part II."
Operations Research. Vol. 19, 1971.
Kruskal, J.B., Jr. "On the Shortest Spanning Subtree of a Graph and the
Traveling Salesman Problem." Proceedings of the American Mathematical Society.
Vol. 7, 1956.
Lawler, E.L., J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (eds.). The
Traveling Salesman Problem: A Guided Tour of Combinatory Optimization. New
York, NY: John Wiley, 1990. 
Lin, S. and B.W. Kernighan. "An Effective Heuristic Algorithm for the
Traveling Salesman Problem." Operations Research. Vol. 21, 1973.
Volgenant, T. and R. Jonker. "A Branch and Bound Algorithm for the Symmetric
Traveling Salesman Problem Base on the 1-Tree Relaxation." European Journal of
Operations Research. Vol 9, 1982.
Figure 1 (a) A nearest-neighbor tour; (b) a much more effective tour.
Figure 2 A minimum-spanning tree.
Figure 3 (a) A minimum-spanning tree; (b) all wires are eligible; (c) exclude
w1; (d) exclude w4 and w1; (e) exclude w7 and w1; (f) exclude w2 and w1; (g)
exclude w3 and w1.
Table 1: Pin locations.
Num X Y 
1 46 28
2 33 16

3 7 9
4 31 29
5 45 4
6 18 34
Table 2: Sorted wire list.
Num Pins Len 
1 2--4 15
2 1--4 16
3 4--6 18
4 2--5 24
5 1--2 25
6 1--5 25
7 2--3 33
8 2--6 33
9 1--6 34
10 3--6 36
11 4--5 39
12 3--5 43
13 3--4 44
14 5--6 57
15 1--3 58
Table 3: Computational results and bounds (time in seconds).
Problem Pin Min-Tree DWE Heuristic Nearest Neighbor
Number Count Time Length Time Length Time Length
1 50 0.39 4103 2.97 4717 0.55 5293
2 50 0.33 4131 2.80 4698 0.49 5682
3 50 0.33 3618 3.24 4328 0.50 4583
4 50 0.33 3661 2.86 3948 0.49 4585
55 100 1.48 5202 23.56 5941 2.20 7091
56 100 1.54 5571 20.43 6161 2.04 6846
57 100 1.54 5459 19.33 6175 2.14 6871
58 100 1.54 5662 20.66 6428 2.14 7531
117 150 3.74 6709 64.65 7728 5.17 8605
118 150 3.74 6885 61.19 7774 5.39 9188
119 150 3.68 6550 60.15 7452 5.66 9086
120 150 3.68 6756 57.95 7702 5.49 9375
Table 4: Problem 56 pin locations.
Num X Y 
1 370 820
2 610 360
3 260 220
4 120 70
5 490 30
6 770 280
7 540 70
8 690 560
9 260 470
10 300 240
11 570 250
12 490 510
13 610 530
14 230 470
15 810 170
16 600 700
17 70 690
18 50 740
19 620 280
20 680 390
21 790 370

22 570 400
23 70 80
24 270 770
25 730 450
26 70 110
27 370 410
28 80 360
29 680 260
30 400 250
31 210 220
32 540 620
33 770 470
34 410 770
35 580 570
36 520 290
37 410 310
38 420 190
39 410 30
40 190 90
41 320 530
42 640 790
43 280 690
44 280 800
45 280 280
46 670 230
47 800 240
48 390 650
49 140 70
50 170 210
51 70 230
52 370 170
53 410 740
54 570 540
55 220 310
56 550 550
57 530 630
58 800 770
59 70 220
60 200 310
61 290 350
62 800 320
63 90 810
64 70 100
65 660 220
66 200 420
67 270 600
68 300 270
69 380 140
70 290 80
71 370 380
72 760 440
73 810 690
74 160 120
75 630 660
76 80 560
77 320 220
78 380 130
79 710 70
80 770 140

81 160 680
82 260 50
83 320 320
84 400 40
85 560 190
86 150 490
87 770 460
88 130 340
89 130 450
90 70 680
91 740 810
92 330 30
93 750 630
94 130 530
95 200 30
96 810 710
97 820 90
98 120 770
99 600 290
100 580 190
Example 1 Excluding two wires at a time.

Listing One 
/***** CONORDER.H (TSPHEUR.EXE) Connection order routines in this unit. *****/
/* connection order pin record type */
typedef struct {
 INT x, y, z;
 INT conns[9];
 INT ring, level;
 INT srctrm;
} CON_CITYTYPE;
/* connection order wire record type */
typedef struct {
 INT cfrom, cto;
 INT distance;
 CHAR eligible, exclude;
} CON_WIRETYPE;
/* length macros */
#define CON_CITYMAX 150
#define CON_WIREMAX 12000
#define CON_MAXLEN 9999999L
/* prototypes */
VOID con_alloc( VOID );
VOID con_dealloc( VOID );
LONG con_heuristic( INT );
LONG con_hookup( INT );
VOID con_initcities( INT );
INT con_levelok( INT, INT );
VOID con_makewires( VOID );
LONG con_mintree( VOID );
LONG con_neighbor( VOID );
VOID con_randcities( VOID );
VOID con_reset( VOID );
VOID con_setring( INT, INT );
VOID con_sortwires( VOID );
VOID con_sortshift( INT, INT );
INT con_wireok( INT, INT, INT );




Listing Two
/***** CONORDER.C (TSPHEUR.EXE) Connection order routines in this unit. *****/
#include <stdio.h>
#include <math.h>
#include "defs.h"
#include "conorder.h"
#include "tspheur.h"
/* variables */
CON_CITYTYPE *con_cities[CON_CITYMAX + 1]; /* city records */
CON_WIRETYPE *con_wires[CON_WIREMAX + 1]; /* wire records */
INT con_citycount, con_wirecount; /* number of cities and wires */
INT con_numexclude; /* number of excluded wires */
INT con_wireorder[CON_CITYMAX + 1]; /* global order of wires */
INT con_order[CON_CITYMAX + 1]; /* order of wires */
INT con_sterm[3]; /* source term vars */
FLOAT con_lwtot = 0; /* total of wires used in hookup */
INT con_hookcount = 0; /* number of calls to hookup */
/* imports */
EXTERN INT tsp_stflag;
EXTERN INT tsp_disttype;
EXTERN INT tsp_3d;
EXTERN INT tsp_randx;
EXTERN INT tsp_randy;
EXTERN INT tsp_randz;
/* con_alloc() allocates memory for the connection order algorithm */
VOID con_alloc( VOID )
 {
 INT ws, ps;
 LONG i;
 ps = sizeof( CON_CITYTYPE );
 for( i = 0; i < CON_CITYMAX + 1; i++ )
 con_cities[i] = (CON_CITYTYPE *) mem_alloc( ps );
 ws = sizeof( CON_WIRETYPE );
 for( i = 0; i < CON_WIREMAX + 1; i++ )
 con_wires[i] = (CON_WIRETYPE *) mem_alloc( ws );
 }
/* con_dealloc() deallocates memory from the connection order algorithm */
VOID con_dealloc()
 {
 LONG i, li;
 for( i = 0; i <= con_citycount; i++ )
 mem_free( (CHAR *) con_cities[i] );
 li = con_citycount * (con_citycount - 1L) / 2L + 1L;
 for( i = 0; i < li; i++ )
 mem_free( (CHAR *) con_wires[i] );
 }
/* con_heuristic() creates a net with only clim connects to each city */
LONG con_heuristic( INT clim )
 {
 LONG templen, newlen;
 INT i, k, j;
 tsp_printf( "Running heuristic." );
 fprintf( stdout, " " );
 con_initcities( 0 );
 con_numexclude = 0;
 templen = 0;
 newlen = con_hookup( con_citycount - 1 );
 for( i = 1; i <= con_citycount - 1; i++ )

 con_wireorder[i] = con_order[i];
 for( i = 1; i <= con_citycount; i++ )
 {
 if( i % 7 == 6 )
 fprintf( stdout, "\b\b\b%2d%%", (i * 100) / con_citycount );
 if( con_cities[i]->conns[0] >= clim )
 {
 k = 1;
 while( k <= con_cities[i]->conns[0] )
 {
 con_initcities( 0 );
 if( !con_wires[con_cities[i]->conns[k]]->exclude )
 {
 con_wires[con_cities[i]->conns[k]]->eligible = 0;
 templen = con_hookup( con_citycount - 1 );
 if( templen < newlen )
 {
 con_numexclude++;
 con_wires[con_cities[i]->conns[k]]->exclude = 1;
 newlen = templen;
 for( j = 1; j <= con_citycount - 1; j++ )
 con_wireorder[j] = con_order[j];
 }
 }
 k++;
 }
 }
 }
 fprintf( stdout, "\b\b\b\b" );
 return( newlen );
 }
/* con_hookup() creates a serial route order and places it in con_order[] */
LONG con_hookup( INT wcount )
 {
 LONG netlen = 0;
 INT i = 1, ti, cf, ct;
 con_hookcount++;
 while( i <= con_wirecount )
 {
 if( con_wires[i]->eligible && con_wires[i]->distance != 0 )
 {
 cf = con_wires[i]->cfrom;
 ct = con_wires[i]->cto;
 ti = 0;
 if( tsp_stflag )
 ti = (wcount > 1);
 if( con_wireok( cf, ct, ti ) )
 {
 if( con_levelok( cf, ct ) )
 {
 con_setring( con_cities[cf]->ring, con_cities[ct]->ring );
 con_order[wcount--] = i;
 con_wires[i]->eligible = 0;
 con_cities[cf]->level++;
 con_cities[ct]->level++;
 netlen += con_wires[i]->distance;
 if( wcount < 1 )
 {
 con_lwtot += i;

 i = con_wirecount;
 }
 }
 }
 }
 i++;
 }
 if( wcount > 0 )
 {
 con_lwtot += con_wirecount;
 return( CON_MAXLEN );
 }
 else
 return( netlen );
 }
/* con_initcities() reinitializes the all the cities in the net */
VOID con_initcities( INT unc )
 {
 INT j;
 for( j = 1; j <= con_wirecount; j++ )
 {
 if( unc )
 con_wires[j]->exclude = 0;
 if( !con_wires[j]->exclude )
 con_wires[j]->eligible = 1;
 }
 for( j = 1; j <= con_citycount; j++ )
 {
 con_cities[j]->ring = j;
 con_cities[j]->level = 0;
 }
 }
/* con_levelok() insures that no city has more than n connections on it */
INT con_levelok( INT cf, INT ct )
 {
 INT cfl, ctl, ret;
 cfl = con_cities[cf]->level;
 ctl = con_cities[ct]->level;
 if( con_cities[cf]->srctrm == 0 && con_cities[ct]->srctrm == 0 )
 {
 if( cfl < 2 && ctl < 2 )
 return( 1 );
 }
 else
 {
 if( con_cities[cf]->srctrm )
 if( con_cities[ct]->srctrm )
 ret = ((cfl < 1) && (ctl < 1));
 else
 ret = ((cfl < 1) && (ctl < 2));
 else
 ret = ((cfl < 2) && (ctl < 1));
 return( ret );
 }
 return( 0 );
 }
/* con_makewires() creates a sorted list of interconnection wire lengths */
VOID con_makewires()
 {

 FLOAT tf;
 LONG i, j;
 INT tlen;
 LONG xd, yd, zd;
 tsp_printf( "Building cost matrix." );
 con_wirecount = 0;
 for( i = 1; i <= con_citycount; i++ )
 {
 for( j = i + 1; j <= con_citycount; j++ )
 {
 switch( tsp_disttype )
 {
 case 2: xd = (SIGNED) con_cities[i]->x - (SIGNED) con_cities[j]->x;
 yd = (SIGNED) con_cities[i]->y - (SIGNED) con_cities[j]->y;
 zd = (SIGNED) con_cities[i]->z - (SIGNED) con_cities[j]->z;
 tf = sqrt( xd * xd + yd * yd + zd * zd ) + 0.5;
 tlen = (INT) tf;
 break;
 default: tlen = abs( con_cities[i]->x - con_cities[j]->x ) +
 abs( con_cities[i]->y - con_cities[j]->y ) +
 abs( con_cities[i]->z - con_cities[j]->z );
 break;
 }
 con_wirecount++;
 con_wires[con_wirecount]->cfrom = i;
 con_wires[con_wirecount]->cto = j;
 con_wires[con_wirecount]->distance = tlen;
 con_wires[con_wirecount]->eligible = 1;
 con_wires[con_wirecount]->exclude = 0;
 }
 }
 }
/* con_mintree() finds the minimum tree of the sorted list of wires */
LONG con_mintree()
 {
 LONG len = 0;
 INT i = 1, k, w = 0;
 tsp_printf( "Creating minimum tree." );
 con_initcities( con_citycount );
 while( i <= con_wirecount )
 {
 if( con_wireok( con_wires[i]->cfrom, con_wires[i]->cto, 0 ) &&
 con_wires[i]->distance != 0 )
 {
 con_setring( con_wires[i]->cfrom, con_wires[i]->cto );
 len += con_wires[i]->distance;
 con_wires[i]->eligible = 0;
 k = ++(con_cities[con_wires[i]->cfrom]->conns[0]);
 con_cities[con_wires[i]->cfrom]->conns[k] = i;
 k = ++(con_cities[con_wires[i]->cto]->conns[0]);
 con_cities[con_wires[i]->cto]->conns[k] = i;
 w++;
 }
 if( w == con_citycount - 1 )
 i = con_wirecount;
 i++;
 }
 return( len );
 }

/* con_neighbor() creates a connect order by nearest city algorithm */
LONG con_neighbor()
 {
 INT cf = 1, ct = 1, cn = 1;
 INT wlim, wc = 0, p, ti;
 LONG nlen = 0L;
 tsp_printf( "Running neighbor." );
 if( tsp_stflag )
 cn = con_sterm[1];
 con_initcities( 1 );
 wlim = con_citycount - 1;
 while( wc < wlim )
 {
 p = 1;
 while( p <= con_wirecount )
 {
 if( con_wires[p]->eligible )
 {
 cf = con_wires[p]->cfrom;
 ct = con_wires[p]->cto;
 ti = 1;
 if( wc == con_citycount - 2 )
 ti = 0;
 if( con_wireok( cf, ct, ti ) )
 if( con_levelok( cf, ct ) )
 {
 if( cf == cn ct == cn )
 {
 wc++;
 con_setring( con_cities[cf]->ring, con_cities[ct]->ring );
 con_wireorder[wc] = p;
 con_cities[cf]->level++;
 con_cities[ct]->level++;
 nlen += con_wires[p]->distance;
 if( cn == cf )
 cn = ct;
 else
 cn = cf;
 p = con_wirecount;
 }
 }
 }
 p++;
 }
 }
 return( nlen );
 }
/* con_randcities() creates n random cities */
VOID con_randcities()
 {
 INT i, j, x1, y1, z1, ok, done;
 for( i = 1; i <= con_citycount; i++ )
 {
 done = 0;
 while( !done )
 {
 ok = 1;
 x1 = ((INT) rand() % tsp_randx) * 10 + 30;
 y1 = ((INT) rand() % tsp_randy) * 10 + 30;

 z1 = 0;
 if( tsp_3d )
 z1 = ((INT) rand() % tsp_randz) * 10;
 for( j = 1; j < i; j++ )
 if( con_cities[j]->x == x1 && con_cities[j]->y == y1 &&
 con_cities[j]->z == z1 )
 ok = 0;
 if( ok )
 {
 con_cities[i]->x = x1;
 con_cities[i]->y = y1;
 con_cities[i]->z = z1;
 done = 1;
 }
 }
 }
 }
/* con_reset() resets the traveling salesman hueristic */
VOID con_reset()
 {
 INT i;
 for( i = 1; i < CON_CITYMAX + 1; i++ )
 {
 con_cities[i]->x = 0;
 con_cities[i]->y = 0;
 con_cities[i]->z = 0;
 con_cities[i]->conns[0] = 0;
 con_cities[i]->ring = 0;
 con_cities[i]->level = 0;
 con_cities[i]->srctrm = 0;
 }
 }
/* con_setring() sets the members of dest ring to src ring */
VOID con_setring( INT src, INT dest )
 {
 INT sw, i;
 sw = con_cities[dest]->ring;
 for( i = 1; i <= con_citycount; i++ )
 if( con_cities[i]->ring == sw )
 con_cities[i]->ring = con_cities[src]->ring;
 }
/* con_sortwires() sorts the list of wires via a heap sort */
VOID con_sortwires()
 {
 CON_WIRETYPE temp;
 INT root, last;
 tsp_printf( "Sorting wires." );
 root = con_wirecount / 2;
 last = con_wirecount;
 while( root > 0 )
 {
 con_sortshift( root, last );
 root--;
 }
 root = 1;
 while( last > 1 )
 { temp = *con_wires[1];*con_wires[1] = *con_wires[last];
 *con_wires[last] = temp; last--; con_sortshift( root, last ); }
 }

/* con_sortshift() is needed for the heap sort of the wires */
VOID con_sortshift( INT root, INT last )
 {
 CON_WIRETYPE temp;
 INT ptr, succ, key;
 ptr = root;
 succ = 2 * root;
 if( succ < last )
 if( con_wires[succ + 1]->distance > con_wires[succ]->distance )
 succ++;
 key = con_wires[root]->distance;
 temp = *con_wires[root];
 while( succ <= last && con_wires[succ]->distance > key )
 {
 *con_wires[ptr] = *con_wires[succ];
 ptr = succ;
 succ = 2 * ptr;
 if( succ < last )
 if( con_wires[succ + 1]->distance > con_wires[succ]->distance )
 succ++;
 }
 *con_wires[ptr] = temp;
 }
/* con_wireok() reports if it is ok use a specific wire */
INT con_wireok( INT cf, INT ct, INT doit )
 {
 INT fr, tr;
 fr = con_cities[cf]->ring;
 tr = con_cities[ct]->ring;
 if( tr != fr )
 {
 if( tsp_stflag )
 if( doit )
 if( con_cities[con_sterm[1]]->ring == fr )
 if( con_cities[con_sterm[2]]->ring == tr )
 return( 0 );
 else ;
 else
 if( con_cities[con_sterm[1]]->ring == tr )
 if( con_cities[con_sterm[2]]->ring == fr )
 return( 0 );
 return( 1 );
 }
 return( 0 );
 }


















The BMP File Format, Part 2


Reading and interpreting the bits




David Charlap


David is a software engineer with Visix Software, makers of the Galaxy
Application Framework, a toolkit for cross-platform, object-oriented,
distributed software development. He can be contacted via the Internet at
david@visix.com.


The bitmap (BMP) file format is the standard way of storing bitmap images in
Windows applications. In last month's installment of this two-part article, I
presented four portable data structures that describe all versions of bitmap
files on all platforms, including non-Windows environments. This month, I'll
explain how to use those structures to read and interpret bits, then examine
how the structures fit together.


Reading the Bits


Once the headers BITMAPFILEHEADER, BITMAPARRAYHEADER, BITMAPHEADER, and RGB
are read, the only thing left to do is read the pixel data. The offsetToBits
field in the BITMAPFILEHEADER structure points to the start of this data. The
interpretation of the bits is a function of the image's bit depth and the
compressionScheme field of the BITMAPHEADER structure.
For all images with bit depths less than 24 that do not use BITFIELDS
encoding, the pixel values are array offsets into the image's color table. For
images with bit depths of 24 or those that do use BITFIELDS encoding the pixel
values are decoded directly into the color's red, green, and blue values.


Interpreting the Bits: No Compression


Uncompressed bit data is easy to interpret: It is an array of rows of pixels.
Each row ends with 0--3 extra bytes, so that each row contains a multiple of
four bytes. Each row is a packed array of pixel values. Each pixel's bit width
is the image's bit depth.
For a bit depth of 1, each byte represents eight pixels; the most significant
bits within a byte are the left-most pixels. For a bit depth of 4, each byte
represents two pixels, the most significant four bits representing the
left-side pixel, and the least significant four bits, the right-side pixel.
For a bit depth of 8, each byte represents one pixel.
For a bit depth of 16, each pixel is represented by two bytes. The bytes must
be read as a Big-endian word; that is, the first byte read is the most
significant. While this seems backwards (all other structures in BMP files are
Little-endian), it makes sense in this context as a logical side effect of a
packed array with 16-bit elements. It should be noted that a 16-bit
color-table-based bitmap is nonstandard--Windows 16-bit bitmaps must use the
BITFIELDS encoding, and OS/2 does not have a standard 16-bit-deep bitmap
format. 
For a bit depth of 24, each pixel is represented by three bytes, which are
interpreted as an RGB value. The first byte is the blue component; the second,
green; and the third, red.


Interpreting the Bits: RLE Compression


There are two main forms of compression that bitmaps may use: run-length
encoding (RLE) and modified Huffman encoding. Unfortunately, IBM chose not to
document the modified Huffman encoding scheme in the OS/2 2.1 toolkit.
However, compressed bitmaps are very rare, and most are RLE encoded, since RLE
is well documented and Windows does not support Huffman-encoded bitmaps.
I have yet to come across a compressed bitmap in all the time I've been
writing Windows and OS/2 programs. This leads me to believe that finding valid
test cases for bitmap-decompressing code may be very difficult; therefore, the
sample programs will not read compressed bitmaps. The description of RLE that
follows is based entirely on the Windows SDK and OS/2 toolkit documentation.
RLE compression comes in three varieties: 4-, 8-, and 24-bit. Windows supports
the 4- and 8-bit varieties. OS/2 supports all three. The three varieties are
encoded with only slight differences.
RLE-encoded bitmaps are stored as a series of records, of which there are five
kinds: RLE data, unencoded data, delta records, end-of-line records, and
end-of-RLE records.
An RLE data record consists of a count of pixels followed by a pixel value.
The value is repeated for the entire count of pixels. The first byte of an RLE
data record is the number of pixels into which the record decodes. This value
must be greater than 0. The next byte (or next three bytes for 24-bit RLE) is
the value to be repeated; see Table 6. [Editor's Note: Tables 1 through 5 were
presented last month.]
An unencoded data record consists of a flag value, a count, and then a string
of data. The first byte is always 0, which differentiates it from an RLE data
record. The second byte is a count (of bytes for 24-bit RLE, of pixels for
8-bit and 4-bit RLE). The rest of the record is pixel data to be inserted into
the bitmap without processing; see Table 7.
A delta record consists of two flag values and two delta values. The first
byte is always 0, and the second is always 2. The third and fourth bytes are
unsigned byte values representing horizontal and vertical deltas,
respectively. A delta record indicates that the next current pixel offset
should be moved by the deltas indicated before decoding the next record.
An end-of-line record consists of two flag values. The first and second bytes
are both 0. The end-of-line record indicates that the current line is complete
and the next record will begin at the start of the next row.
An end-of-RLE record also consists of two flag values, the first of which is
always 0 and the second, always 1. This record indicates the end of all RLE
data.
Table 8 shows the decoding of the 8-bit RLE data stream 03 04 05 06 00 03 45
56 67 00 02 78 00 02 05 01 02 78 00 00 09 1E 00 01 (hexadecimal values). Table
9 shows the decoding of the 4-bit RLE data stream 03 04 05 06 00 06 45 56 67
00 04 78 00 02 05 01 04 78 00 00 09 1E 00 01 (hexadecimal values).


Interpreting the Bits: BITFIELDS Encoding


In addition to RLE and modified Huffman encoding, there is one more possible
value for the compressionScheme field of a BITMAPHEADER:
COMPRESSION_BITFIELDS. In reality, BITFIELDS is not a compression scheme, but
an encoding scheme, since data is not compressed. BITFIELDS encoding is only
used for images whose bit depth is 16 or 32. Only Windows NT supports
BITFIELDS encoding--Windows 3.x and OS/2 do not.
In BITFIELDS encoding, each pixel is represented by 16 or 32 bits, depending
on the image's bit depth. Each pixel represents a specific RGB value. The
three 32-bit integers immediately following the BITMAPHEADER structure (where
a color table normally resides) are used to describe how a pixel's value
decodes into red, green, and blue values. The integers are masks applied to a
pixel value to isolate its components.
The best way to explain this is with an example. A 32-bit image might be
encoded using ten bits of red, ten bits of green, and ten bits of blue per
pixel, with two unused bits; see Figure 5. [Editor's Note: Figures 1 through 4
appeared in last month's installment.] Table 10 shows the three integers in
the color table for bits encoded this way. An example of a 16-bit BITFIELDS
encoding might be a 5-6-5 encoding (commonly used on 16-bit display adapters)
like that in Figure 6. Table 11 shows the three integers in the color table
for bits encoded this way.


How All These Structures Fit Together



Now I'll examine how the structures combine together to form a bitmap file.
The functions readSingleImageBMP, readSingleImageICOPTR,
readSingleImageColorICOPTR, and readMultipleImage in readbmp.h and readbmp.c
(available electronically; see "Availability," page 3) demonstrate how to use
the structure-reading functions to decode a bitmap file.
The simplest structure is the single-image BMP file. All BMP files created on
Windows systems are of this kind; likewise, many OS/2 BMP files. The structure
of a single-image BMP file is:
BITMAPFILEHEADER
BITMAPHEADER
color table
 ...
bits
The file begins with a BITMAPFILEHEADER structure whose type field contains
TYPE_BMP. This is immediately followed by a BITMAPHEADER structure and then a
color table. Elsewhere in the file is a block of data that contains the
image's bits. A pointer to this location is stored in the BITMAPFILEHEADER
structure.
Slightly more complex is the OS/2 monochrome icon/pointer format. Here, too, a
single BITMAPFILEHEADER is followed by a BITMAPHEADER, a color table, and a
block of bit data. The type field of the BITMAPFILEHEADER contains either
TYPE_ICO or TYPE_PTR, indicating either an icon or pointer type image. The
structure of the file is the same, but the bits are interpreted differently.
In such an image, the bit depth is always 1 (monochrome) and the color table
is ignored. Additionally, the height of the image is double the height that
will be displayed. Decoding the bits will result in an image whose top half is
an XOR mask and whose bottom half is an AND mask. To display the icon/pointer,
the bottom half is combined with screen pixels using the AND operator. After
applying the AND mask, the top half of the image is combined with screen
pixels using the XOR operator. These two masks allow four "colors" to be
applied--background, foreground, transparent, and inverse; see Table 12.
The OS/2 color icon/pointer format is a bit more complicated. A color icon is
actually the composite of two images: a monochrome icon (containing the mask
data) and a bitmap (containing the color data). Color icons and color pointers
have the structure shown in Figure 7. For a color icon, both BITMAPFILEHEADER
structures have TYPE_ICO_COLOR in the type field. For a color pointer, both
BITMAPFILEHEADER structures have TYPE_PTR_COLOR in the type field.
Reading a color icon is rather simple once code for reading bitmaps and
monochrome icons/pointers is in place. The first image is read using the same
code used for a monochrome icon/pointer. The second image is read using the
same code used for a bitmap. This results in two masks and a color bitmap,
which are combined with screen pixels; see Table 13.
Once code is in place to read all three types of images, reading an array of
images is simple. When the file begins with a BITMAPARRAYHEADER (indicated by
a type field of TYPE_ARRAY), you simply traverse the linked list, reading each
image using the existing code on each one.
The structure of a multiple-image bitmap file is shown in Figure 8. Each
BITMAPARRAYHEADER in the list contains a pointer to the next
BITMAPARRAYHEADER, establishing the list. Immediately following each
BITMAPARRAYHEADER is the start of a bitmap image (bitmap, icon, or pointer),
which is read exactly as it would be read in a single-image file.


Putting it All Together


Once routines are in place to read all four types of files (bitmap,
icon/pointer, color icon/pointer, and array), writing a program to read any
arbitrary bitmap file is easy. Test.c (Listing Three) shows one way to do
this. [Editor's Note: Listings One and Two appeared in last month's
installment.]
Test.c opens a file and reads the first two bytes to determine the type of
bitmap file being read. The appropriate image-reading function is then called,
and the image(s) in the file are read in. The test program then dumps
information about the image to an output file; an actual application would
probably do something else.


Conclusion


The bitmap file format is large and complex, with many variations. OS/2 and
Windows document only their own variants, and almost no documentation is
available for other systems (such as workstations). With the information
presented here, however, you should have no problem reading any bitmap file
you come across--no matter what platform you're developing for.
The sample code provided returns data in a format that is easy for this
particular test program to use, but modifying the sample code to produce a
different format would not be difficult.


References


Microsoft Corp. Microsoft Win32 Programmer's Reference, vol. 5. Redmond, WA:
Microsoft Press, 1993. 
Microsoft Corp. Microsoft Windows Programmer's Reference, version 3, vol. 2.
Redmond, WA: Microsoft Press, 1990. 
IBM Corp. OS/2 2.0 Programmer's Toolkit: Presentation Manager Reference
(online manual). 
Table 6: RLE data-record encoding.
 1st byte Count of pixels.
 2nd--4th bytes(24-bit RLE only) RGB value to be repeated.
 2nd byte(8-bit RLE only) Color index to be repeated.
 2nd byte(4-bit RLE only) Two color indexes to be
 alternated for entire count
 of pixels.
Table 7: Unencoded data record.
 1st byte Must be zero.
 2nd byte (24-bit) Indicates length (in bytes) of the
 rest of the record; must be
 a multiple of 3.
 2nd byte (8-bit) Indicates length (in bytes) of the rest
 of the record; must be greater than 2.
 2nd byte (4-bit) Indicates length (in pixels) of the rest
 of the record; must be greater than 2.
 Remaining bytes (24-bit) Every three bytes is another pixel's RGB
 value; if the total count of pixels
 is odd, an extra byte is appended
 for an even length overall.
 Remaining bytes (8-bit) Every byte is another pixel's color index;
 if the total count of pixels is odd,

 an extra byte is appended for an even
 length overall.
 Remaining bytes (4-bit) Every byte represents two pixels' color indexes,
 the most-significant four bits being
 the first pixel's value and the
 least significant four bits, the
 next pixel's value. If the
 count of pixels is odd, the last four
 bits will contain 0s; if the count of bytes
 is odd, an extra byte is appended for
 an even length overall.
Table 8: Example of 8-bit RLE encoding.
 Bytes Meaning Bytes output 
 03 04 Three bytes of value 04. 04 04 04
 05 06 Five bytes of value 06. 06 06 06 06 06
 00 03 45 56 67 00 Three bytes of unencoded data. 45 56 67
 02 78 Two bytes of value 78. 78 78
 00 02 05 01 Delta record; move cursor forward none
 five pixels and up one row.
 02 78 Two bytes of value 78. 78 78
 00 00 End of row; next record begins on none
 the next line, at column 0.
 09 1E Nine bytes of value 1E. 1E 1E 1E 1E 1E 1E 1E 1E 1E
 00 01 End of RLE data. none
Table 9: Four-bit RLE encoding.
 Bytes Meaning Pixel-value Output 
 03 04 Three pixels alternating 0 and 4. 0 4 0
 05 06 Five pixels alternating 0 and 6. 0 6 0 6 0
 00 06 45 56 67 00 Six pixels of unencoded data. 4 5 5 6 6 7
 04 78 Four pixels alternating 7 and 8. 7 8 7 8
 00 02 05 01 Delta record; move cursor forward five none
 pixels and up one row.
 04 78 Four pixels alternating 7 and 8. 7 8 7 8
 00 00 End of row; next record begins on the none
 next line, at column 0.
 09 1E Nine pixels alternating 1 and E. 1 E 1 E 1 E 1 E 1
 00 01 End of RLE data. none
Table 10: BITFIELDS description of a 10-10-10 encoding.
 Position Value (binary) Value (hex) 
 1 (Red) 11111111110000000000000000000000 FFC00000
 2 (Green) 00000000001111111111000000000000 003FF000
 3 (Blue) 00000000000000000000111111111100 00000FFC
Table 11: BITFIELDS description of a 5-6-5 encoding.
 Position Value Value 
 (binary) (hex) 
 1 (Red) 1111100000000000 F800
 2 (Green) 0000011111100000 07E0
 3 (Blue) 0000000000011111 001F
Table 12: Mask operations.
 Screen AND XOR Result 
 pixel mask mask 
 x 0 0 0 (background)
 x 0 1 1 (foreground)
 x 1 0 x (transparent)
 x 1 1 ~x (inverse)
Table 13: Color/mask operations.
 Screen pixel AND mask XOR mask Color pixel Result 
 x 0 0 c c (color)
 x 0 1 c c (color)

 x 1 0 c x (transparent)
 x 1 1 c ~x (inverse)
Figure 5 Layout of a 32-bit image might be encoded using ten bits of red, ten
bits of green, and ten bits of blue per pixel, with two unused bits.
Figure 6 16-bit BITFIELDS encoding.
Figure 7: Structure of color icons and color pointers.
BITMAPFILEHEADER (for the monochrome icon part)
BITMAPHEADER (for the monochrome icon part)
color table (for the monochrome icon part)
BITMAPFILEHEADER (for the bitmap part)
BITMAPHEADER (for the bitmap part)
color table (for the bitmap part)
bits (for the monochrome icon part)
...
bits (for the bitmap part)
...
Figure 8: Structure of a multiple-image bitmap file.
BITMAPARRAYHEADER (for first image)
BITMAPFILEHEADER
BITMAPHEADER
color table
BITMAPFILEHEADER (if this image is a color icon or a color pointer)
BITMAPHEADER (if this image is a color icon or a color pointer)
color table (if this image is a color icon or a color pointer)
...
BITMAPARRAYHEADER (for the second image)
BITMAPFILEHEADER (for the second image)
BITMAPHEADER (for the second image)
color table (for the second image)
...
...
bits (for the first image)
...
bits (for the second image)
...
...

Listing Three (Listings One and Two appeared last month.)

/* Test program for reading bitmap files. It accepts an input file and an
 * output file on the command line. It will read and process the input file
 * and dump an ASCII representation of the contents to the output file. The
 * dump will consist of the color image and two masks. Missing parts will be
 * indicated as such (BMP files have no masks and nonochrome ICO/PTR files
 * have no color data. In the color image, the dump will be a series of RGB
 * values (in hexadecimal). In the masks, the dump will be represented by "."
 * symbols representing zeros and "@" symbols representing ones. */

#include <stdio.h>
#include <stdlib.h>
#include "bmptypes.h"
#include "endian.h"
#include "readbmp.h"

int main (int argc, char *argv[])
{
 FILE *fp;
 RGB **argbs;
 char **xorMasks, **andMasks;
 UINT32 *heights, *widths, row, col;

 UINT16 fileType;
 long filePos;
 int numImages, i;
 int rc;
 
 if (argc < 3)
 {
 printf ("usage: test <infile> <outfile>\n");
 return 1;
 }
 fp = fopen(argv[1], "rb");
 if (fp == NULL)
 {
 perror ("Error opening source file");
 return 2;
 }
 /* Read the first two bytes as little-endian to determine the file type.
 * Preserve the file position. */
 filePos = ftell(fp);
 rc = readUINT16little(fp, &fileType);
 if (rc != 0)
 {
 perror("Error getting file type");
 return 3;
 }
 fseek(fp, filePos, SEEK_SET);

 /* Read the images. */
 switch (fileType) {
 case TYPE_ARRAY:
 /* If this is an array of images, read them. All the arrays we need
 * will be allocated by the reader function. */
 rc = readMultipleImage(fp, &argbs, &xorMasks, &andMasks, &heights,
 &widths, &numImages);
 break;
 case TYPE_BMP:
 case TYPE_ICO:
 case TYPE_ICO_COLOR:
 case TYPE_PTR:
 case TYPE_PTR_COLOR:
 /* If this is a single-image file, we've a little more work. In order
 * to make the output part of this test program easy to write, we're
 * going to allocate dummy arrays that represent what readMultipleImage
 * would have allocated. We'll read the data into those arrays. */
 argbs = (RGB **)calloc(1, sizeof(RGB *));
 if (argbs == NULL)
 {
 rc = 1005;
 break;
 }
 xorMasks = (char **)calloc(1, sizeof(char *));
 if (xorMasks == NULL)
 {
 free(argbs);
 rc = 1005;
 break;
 }
 andMasks = (char **)calloc(1, sizeof(char *));
 if (andMasks == NULL)

 {
 free(argbs);
 free(xorMasks);
 rc = 1005;
 break;
 }
 heights = (UINT32 *)calloc(1, sizeof(UINT32));
 if (heights == NULL)
 {
 free(argbs);
 free(xorMasks);
 free(andMasks);
 rc = 1005;
 break;
 }
 widths = (UINT32 *)calloc(1, sizeof(UINT32));
 if (widths == NULL)
 {
 free(argbs);
 free(xorMasks);
 free(andMasks);
 free(heights);
 rc = 1005;
 break;
 }
 numImages = 1;
 /* Now that we have our arrays allocted, read the image into them. */
 switch (fileType) {
 case TYPE_BMP:
 rc = readSingleImageBMP(fp, argbs, widths, heights);
 break;
 case TYPE_ICO:
 case TYPE_PTR:
 rc = readSingleImageICOPTR(fp, xorMasks, andMasks, widths, heights);
 break;
 case TYPE_ICO_COLOR:
 case TYPE_PTR_COLOR:
 rc = readSingleImageColorICOPTR(fp, argbs, xorMasks, andMasks,
 widths, heights);
 break;
 }
 break;
 default:
 rc = 1000;
 }
 /* At this point, everything's been read. Display status messages based
 * on the return values. */
 switch (rc) {
 case 1000:
 case 1006:
 printf ("File is not a valid bitmap file\n");
 break;
 case 1001:
 printf ("Illegal information in an image\n");
 break;
 case 1002:
 printf ("Legal information that I can't handle yet in an image\n");
 break;
 case 1003:

 case 1004:
 case 1005:
 printf ("Ran out of memory\n");
 break;
 case 0:
 printf ("Got good data from file, writing results\n");
 break;
 default:
 printf ("Error reading file rc=%d\n", rc);
 perror ("Errno:");
 break;
 }
 /* If the return value wasn't 0, something went wrong. */
 if (rc != 0)
 {
 if (rc != 1000 && rc != 1005)
 {
 for (i=0; i<numImages; i++)
 {
 if (argbs[i] != NULL)
 free(argbs[i]);
 if (andMasks[i] != NULL)
 free(andMasks[i]);
 if (xorMasks[i] != NULL)
 free(xorMasks[i]);
 }
 free(argbs);
 free(andMasks);
 free(xorMasks);
 free(widths);
 free(heights);
 }
 return rc;
 }
 fclose(fp);
 fp = fopen(argv[2], "wt");
 if (fp == NULL)
 {
 perror ("Error opening target file");
 return 3;
 }
 /* Dump the images. */
 fprintf (fp, "There are %d images in the file\n", numImages);

 for (i=0; i<numImages; i++)
 {
 /* Loop through all the images that were returned. */
 fprintf (fp, "Doing image number %d\n\n", i+1);
 fprintf (fp, "Image dimensions: (%ld,%ld)\n", widths[i], heights[i]);
 
 if (argbs[i] != NULL)
 {
 /* If the image has colors, dump them (BMP, color ICO and color
 * PTR files */
 fprintf(fp, "Colors");
 for (row = 0; row < heights[i]; row++)
 {
 fprintf (fp, "\n\nRow %ld pixels (R,G,B), hex values:\n", row);
 for (col = 0; col < widths[i]; col++)

 {
 fprintf (fp, "(%2.2x,%2.2x,%2.2x)",
 argbs[i][row * widths[i] + col].red,
 argbs[i][row * widths[i] + col].green,
 argbs[i][row * widths[i] + col].blue);
 }
 }
 }
 else
 {
 /* If image has no colors, say so. (monochrome ICO and PTR files) */
 fprintf (fp, "No color image\n");
 }
 if (xorMasks[i] != NULL)
 {
 /* If the image has an xor mask, dump it. (ICO and PTR files) */
 fprintf (fp, "\nXOR mask\n");
 for (row = 0; row < heights[i]; row++)
 {
 for (col = 0; col < widths[i]; col++)
 {
 fprintf (fp, "%c", xorMasks[i][row * widths[i] + 
 col] ? '@' : '.');
 }
 fprintf (fp, "\n");
 }
 }
 else
 {
 /* If the image has no xor mask, say so. (BMP files). */
 fprintf (fp, "No xor mask\n");
 }

 if (andMasks[i] != NULL)
 {
 /* If the image has an and mask, dump it. (ICO and PTR files) */
 fprintf (fp, "\nAND mask\n");
 for (row = 0; row < heights[i]; row++)
 {
 for (col = 0; col < widths[i]; col++)
 {
 fprintf (fp, "%c",
 andMasks[i][row * widths[i] + col] ? '@' : '.');
 }
 fprintf (fp, "\n");
 }
 }
 else
 {
 /* If the image has noand mask, say so. (BMP files) */
 fprintf (fp, "No and mask\n");
 }

 if (i != numImages-1)
 fprintf (fp, "\n------------------------------------------\n\n");
 
 }
 fclose(fp);
 /* Dumping is complete. Free all the arrays and quit */

 for (i=0; i<numImages; i++)
 {
 if (argbs[i] != NULL)
 free(argbs[i]);
 if (andMasks[i] != NULL)
 free(andMasks[i]);
 if (xorMasks[i] != NULL)
 free(xorMasks[i]);
 }
 free(argbs);
 free(andMasks);
 free(xorMasks);
 free(widths);
 free(heights);
 
 return 0;
}
/* Formatting information for emacs in c-mode
 * Local Variables:
 * c-indent-level:4
 * c-continued-statement-offset:4
 * c-brace-offset:-4
 * c-brace-imaginary-offset:0
 * c-argdecl-indent:4
 * c-label-offset:-4
 * End:
 */




































Directed Acyclic Graph Unification


An object-oriented approach to building constraint systems




David Perelman-Hall


David earned his PhD in computational linguistics at the University of Texas
at Austin. He can be contacted at phall@ccwf.cc.utexas.edu.


One approach to parsing natural language is to build a two-part parsing
system, where one part is the grammar and the second part is a set of rules
associated with the grammar. What I call the "Part One" grammar, commonly a
set of rewrite rules in Backus-Naur form (BNF) used to recognize the tokens of
the language, lays down the entire road map of the basic parsing process. The
rules in "Part Two" constrain the parser's actions and serve as navigators,
indicating which paths the Part One parser can investigate. Each constraint
represents an assertion which linguists have determined to be a requirement
for grammaticality.
The familiar BNF rules that usually form Part One contain a left-hand side
(LHS) and a right-hand side (RHS) separated by an arrow. The rules describe
how one side of the arrow may be written as the other side, as in Example 1.
These rules are order independent, so the arrows in Example 1 don't actually
imply that action must occur from left to right (top-down, from start node S
to the terminal symbols). 
Linguists frequently refer to the rewrite rules of Part One as
"phrase-structure rules" and use them to parse input; for example, the
sentence, "the cow eats the grass" would be accepted by Example 1. However,
using only Part One to parse natural language tends to fail because Part One
accepts ungrammatical input strings as though they were grammatical. Suppose,
for example, that you admit in Example 1 the perfectly legitimate verb "eat"
by introducing the rewrite rule: V-->"eat". The grammar then would accept the
ungrammatical input, "the cow eat the grass." With Part Two, you can at least
try to prevent acceptance of ungrammatical input such as this.
What linguists attempt to codify in Part Two is the inherent capacity of
humans to determine what is or isn't grammatical. This is not an easy thing to
do; most people can tell you only that a sentence is ungrammatical, but not
why. The rule set which holds this knowledge, and its application, is what
differentiates between a dumb token recognizer and an expert system.
A commonly accepted method of employing Part Two is to associate one or more
constraints (taken from the rule set of Part Two) with the terms of
phrase-structure rules in such a way that the phrase-structure rules can be
successfully applied only if all associated constraints are satisfied.
For example, you can associate a value for grammatical number with the input
terms forming verbs and nouns. Therefore, "eat" would have the value plural,
while "eats," "grass," and "cow" would have the value singular. Now the rule
S-->NP VP can be rewritten only if the values for grammatical number match. In
the association of the elaborated phrase-structure rule in Example 2, the
value for number has been percolated up through the rule system from its
original entry with the input terms. Use of this feature-value equivalence
metric prevents the parsing system from accepting the ill-formed sentence,
"the cow eat the grass," because the required match between VP number and NP
number would fail, and therefore the rewrite rule would not be applicable.
In this article, I'll describe "feature-value unification," a general,
object-oriented method of enacting such a constraint system adapted from
directed acyclic graph (DAG) unification. While feature-value unification can
be used in natural-language parsing, it can also be used in applications where
you want an object to enter into relations only with other objects that meet
the constraints specified in the feature-value DAGs. In an object-oriented
system, you would include DAG objects as data members in those objects whose
behaviors must be constrained and require their successful unification before
allowing the objects to interact. There are numerous issues surrounding
natural-language parsing that I won't cover, including building DAGs directly
from the elaborated phrase-structure rules and implementing a parsing engine.
Instead, I'll focus on feature-value (or DAG) unification through an example
drawn from natural-language processing.


Dispensing with the Formalities


A directed acyclic graph is rooted at a single node. Traversal through the
graph is unidirectional (from root to node) and noncyclic (no arc can provide
a path for returning to its originating node). DAG unification can net two
results: 
A Boolean value based on the success of the unification.
A new DAG, bearing the results of the unification. 
Figure 1 shows a DAG which illustrates the number agreement between subject
and predicate phrases of a sentence, and is the representation of the S node
of the elaborated grammar in Example 3. This rule applies only if the value of
case for the noun phrase (NP) is nominative, the subcat for the verb phrase
(VP) is intransitive, and the agreement for NP matches that of VP. The value
of agreement itself in this example is another DAG, which expands into
feature-value constructs such as number-->singular, gender-->masculine,
animacy-->animate, and so on.
Structures in the feature-value notation which are merged during unification
include:
Atomic values, as for features case and subcat in Figure 1.
Variable values, or equivalencies, as for feature agreement of Figure 1.
Dag values, which themselves are comprised of atomic, variable, or DAG values,
and so on, iteratively.
Successful unification produces a DAG that results from merging the
information from two or more DAGs. Unification proceeds by iterating through
the feature-value sets of both DAGs thus:
Where the features of both DAGs match, and the associated values match if they
are either atomic or variables instantiated with atomic values, add the
feature-value to the resulting DAG. 
Where the features of both DAGs match, but the value for the feature in one of
the DAGs is a variable, add the feature-value found in the other DAG to the
resulting DAG. 
Where the features of both DAGs match, but the value for the feature in both
of the DAGs is a variable, add a feature-value indicating the identity of the
variables to the resulting DAG. 
Where only one DAG contains the feature, add its feature-value pair to the
resulting DAG.
DAG-valued features obey this algorithm recursively. Unification is an
additive procedure. It can succeed or fail. Failed unification does not
produce a resulting DAG. Failure occurs when atomic values do not match for
like features. Possible results of unification are pictured in Figure 2. (In
principle, you can unify as many DAGs as you want, but in practice I unify
them two at a time.) 


C++ Requirements


To use DAGs, you need a structure and mechanism to construct them and an
algorithm for performing unification. I'll build DAGs using inheritance and
C++ templates and unify them using virtual functions and double indirection. I
have typedefed sets, lists, and maps for use throughout the code, making
extensive use of templates.
In its abstract essence, feature-value unification relies on DAGs built from
objects of types Feature and Value. A Feature is typedefed as a String. A
Value is either an Atomic, Variable, or Dag value--all of which are formed by
inheritance from an abstract base Value class; see Figure 3. The physical
representation, and hence the behavior, differs among the derived concrete
atomic-, variable-, and DAG-valued classes; however, the types of activity in
which they participate--most importantly, unification--are identical. This
allows you to declare in an abstract base class a virtual unify() function
that anything considered a Value should uniquely define in its derived
concrete class. This lets you use polymorphism in the implementation of
unification among differently valued DAGs.
Listing One is the declaration of the Value and ValuePtr classes. (Listing Two
is the Value class header file.) Notice that the Value class declares no data
members; it merely proposes the behavior that a value should be able to
implement. To take advantage of polymorphism with this class, I've implemented
a smart-pointer wrapper class, ValuePtr, which takes as one of its
constructors a reference to a Value object, but maintains as a data member a
pointer to an object of type Value. The three Values--Atomic, Variable, and
Dag--are handled as ValuePtrs throughout the program. Because ValuePtrs
internally handle newing, copying, and deleteing of pointed-to Values, you can
safely ignore issues of memory management with DAGs by using ValuePtrs.
ValuePtrs actually alias Values, so they can be used to implement
polymorphism.


The "Value" of Polymorphism


Every member function of the Value class is declared virtual, and many of them
are pure virtual functions. This means that every function in the Value class
can be redefined in classes derived from Value, and pure virtual functions
must be redefined in derived classes in order for an object of that class to
be constructed. We want the functionality of copying, writing to output,
equality testing, and unifying to differ depending on the type of value.
Notice in the base class that the operator==() and unify() functions each have
four prototypes, differing in the type of the value parameter, taking
references to a base Value class object and to all three derived classes. This
arrangement lets you employ polymorphism and double indirection to gain
default definitions for inequality and failed unification.
Let's use the equality function to examine how polymorphism achieves the
default behavior. The base class contains the only pure virtual version of the
equality test, taking a base Value class reference, as might be expected. The
base class also defines three (not pure) virtual equality tests. These tests
take a reference to each of the derived types and by default returns False. If
an object of the base Value class (actually a pointer or reference will be
required) ever makes the call ValueObject -->operator==(SomeDerivedValueType&
DV); for any of the three derived Value types, it returns False by default.
In Listings Three through Eight, each derived class defines two versions of
the equality test: one taking a reference to an object of its own type, the
other taking a reference to the base Value type. For example, the derived
concrete AtomicValue class (Listing Three) defines an operator==() taking an
AtomicValue reference, and one taking a Value reference. During program
execution, the only type of object calling operator==() can be one of the
concrete derived types. If one atomically valued DAG seeks to test equality
with another, AtomicValue::operator==(AtomicValue&) is called. This function
performs an actual equality test between AtomicValue objects. Every equality
test in a derived class defines the actual test for objects of its own class.

If, however, the type of the object being tested for equality differs from the
calling type, this scenario occurs: Because the type of the parameter differs
from the type of the calling object, it can not test equality with an object
of its own type, so by default the operator==(Value&) of the derived class is
called. This works because in each derived class there is no operator==()
prototyped to take the other two derived classes, and each derived concrete
value is by inheritance an instance of the base Value class. In each derived
class, the equality test prototyped for a base Value reference is the same: It
turns the Value reference parameter into the calling object, passing it the
derived object as the parameter. This calls the specific
Value::operator==(someDerivedType&) that corresponds to the type of the
derived-object parameter.
In the example of the AtomicValue, the call would resolve to the base Value
part of an AtomicValue testing equality with an AtomicValue (the call would
be: Value::operator==(AtomicValue&)). Recall that--as defined in the base
class--any equality test made by a base Value object with a derived object
returns False. Hence, the test between unlike derived classes for operator==()
defaults to False in each derived class without actually testing the elements
of the value objects. Figure 4 shows the calling sequence for both
AtomicValue::operator==(AtomicValue& avr) and
AtomicValue::operator==(DagValue&dvr).
This handy method of relying on polymorphism to implement a default base
behavior has a compile-time drawback: It results in warnings letting you know
that the derived functions prototyped for the base class hide the base-class
functions prototyped for the derived class. These warnings are correct, but in
this case can be safely ignored because you make no calls to the base-class
functions prototyped for the hidden derived classes.


Unifying DAGs


Unifying DAGs is more complicated than testing for equality of Values. The
same basic arrangement of double indirection exists (including that for each
derived Value type, there is a definition of what it means for that type to
unify with itself). DAG unification is also complicated by the fact that it
produces more than a True or False; it yields both a resulting DAG (if
successful) and a Boolean indication of the attempt to unify. Furthermore, as
the unification results in Figure 2 show, more than one possible unification
of types has a legitimate chance of success. This means that, because
VariableValues can unify with any type of Value, there must be a way of
tracking the Values that VariableValues acquire while unification takes place
so that the final resulting DAG contains the proper Values which instantiated
the VariableValues.
The simple part is implementing atomically valued unification. If the
atomically valued elements are identical, just add the identical
Feature-AtomicValue pair to the resulting DAG (see Listing Three).
Here the simplicity ends. The result of a call not prototyped for an
AtomicValue depends on the type of the non-AtomicValue. Possibilities are
either DagValue or VariableValue. The call prototyped for a DagValue,
AtomicValue::unify(DagValue&,_), has no definition in the AtomicValue class.
Eventually, it defaults to AtomicValue::unify(Value&,_) via double
indirection, returning FALSE from the base class without performing any
unification.
The same call in the VariableValue class, however, is a different story. A
VariableValue has to attempt to unify with any type of Value. Consequently,
the VariableValue class can't merely pass unification on to a base class for
default behavior. Indeed, you can see in Listings Five and Six that the
VariableValue class defines a unify function for all derived Value class
parameters. In this case, AtomicValue::unify(VariableValue&,_) is turned into
the call VariableValue::unify(AtomicValue&,_), which has a legitimate chance
of succeeding and therefore must have a definition.
DagValues can unify with themselves; however, the DagValue class's
implementation of unify() uses double indirection (see Listings Seven and
Eight). DAGs can't unify with atoms, so double indirection directs the calling
sequence toward failure. DAGs might unify with Variables, so a call to
VariableValue::unify(DagValue&, _) might succeed. The qualification exists
because there is always a chance that the variable has already been
instantiated to some non-VariableValue which will not successfully unify with
the other Value.
DAG unification is implemented as a global function, taking the two DAGs under
consideration. The complete set of features in both DAGs is culled together
and iterated over. The Dag::value(Feature&) method is used to acquire from
both DAGs the values associated with the current feature. If the feature
exists in both DAGs, attempt to unify the acquired values and add the
resulting Feature-Value pair to the resulting DAG; if it exists in one DAG
only, add that Feature-Value pair to the resulting DAG. If both values are
DagValues, then DagValue::unify(DagValue&, _) calls the global unify function.



The Substitution List


DAG unification can be destructive or nondestructive. Destructive unification
manipulates one of the original DAGs, changing its data structure so that it
houses the unification of the two DAGs. The original DAG is permanently
changed, hence the term "destructive." I've opted to build the resulting
unified DAG from scratch, nondestructively incorporating the elements of the
participating DAGs into the newly created resulting DAG. To do this, there has
to be a mechanism for tracking the various instantiations which
Variable-valued DAGs take on. This tracking is done in the SubstitutionList
class. To test or perform the unification of two DAGs, you need to supply a
SubstitutionList object to track the variable instantiations.
The SubstitutionList inherits privately from a Map template instantiated with
Variables as keys and ValuePtrs as values. The source code for the
SubstitutionList class is provided electronically; see "Availability," page 3.
Private inheritance models the "has-a" relationship, and no portion of the
base class is accessible outside of the inheriting class proper. The interface
to the SubstitutionList class, therefore, is comprised of methods unique to
the derived class which you declare public. These consist mostly of methods
for setting Values into the SubstitutionList, setting variables identical in
the SubstitutionList, checking whether the SubstitutionList contains a
variable, checking whether it is empty, returning from the SubstitutionList a
ValuePtr associated with a variable, returning the set of variables set equal
to each other.
Not only is the SubstitutionList a map (thereby containing a private
data-storage facility; see Figure 5), but it also contains as a private
(protected) member a list of sets of variables. This protected member is a
nested template instantiated as ListOf <SetOf<VariableValues>>. Each separate
set of variables is a set for which all variables will bear the same value
upon successful unification. If the value of at least one and only one
variable within a set should instantiate to a nonvariable value (meaning that
it is either Dag valued or Atomic valued), the value for that entire set of
variables is obtained from the base variable-to-value map class. This is
because if at least one and only one variable of a set bears a nonvariable
value, then one variable from that set will also be in the map, acting as a
kind of place holder to indicate the value to which every member of the set
will instantiate (by mapping). The code prevents more than one place holder
from being built. A set which will not instantiate to a nonvariable value will
be a set of variables, all of which bear the same identity as a variable.
Before unifying DAGs, a SubstitutionList is created to hold all substitutions
made for variables during unification. During the actual unification, the
SubstitutionList is filled with any substitutions made for variables.
Immediately upon completion of unification, any variables in the resulting DAG
that have substitutions as defined by the contents of the SubstitutionList are
then replaced by the appropriate substitutions. In this sense, in order to
accommodate variables in DAG unification, unification is a two-phase process:
The first phase is unifying anything which isn't a variable, and the second is
the exchange of values for variables.


How They All Operate Together


To illustrate the concepts presented here, I've written a demonstration
program (available electronically) that builds a DAG compliant with Grammar 2
from an NP of "the cow" and a VP of "eats grass". The construction of both
DAGs involves building an inner DAG for the set of Feature-Value arcs which
contain requirements under the agreement node. I have determined that these
should be animacy and number. As long as the atomic values for animacy and
number match, the agreement subDAGs will unify. The values for animacy and
number can be supplied by the lexical items themselves, so that the word
"cow", for instance, supplies "animate" and "singular" for the animacy and
number features, and "eats" supplies values for number and subcategorization.
For a natural-language parser, once a sentence is presented for parsing, a
lexical look-up routine can build the DAGs for each lexical item as each item
is encountered (such a DAG factory is not supplied with this article). This
allows you to create phrase-structure rewrite rules for terminal elements
which bear fully informational DAGs--that is, adumbrating Grammar 1 with
requirements for DAG unification to produce a flushed out Grammar 2. Once
parsing starts, if a rewrite rule successfully applies to a terminal element,
the result of the rewrite will contain a new DAG created by the unification of
the DAGs specified in the addenda to the rewrite rule. In this way, each
successfully applied rule results in the passing-on of DAGs so that the final
end-state node (S in this case, working bottom-up) contains a single DAG
bearing the complete history of its parse. This percolating activity is handy
for investigating grammatical reasons for parse failure because it builds up a
structure which may be examined for the semantics attached in the form of
Feature-Values.
In fact, for grammar sticklers, the sentence being parsed in main() in
accordance with Grammar 2 purposefully contains an additional inconsistency
which should be remedied in order to conform with good grammar. Perhaps
finding this breach of grammar is enough indication of the difficulty that
people have in pointing out why some input is ungrammatical.


References


Kay, M. "Functional Grammar." 5th Annual Meeting of the Berkeley Linguistic
Society, 1979. Berkeley: Berkeley Linguistics Society.
Knight, Kevin. "Unification: A Multidisciplinary Survey." ACM Computing
Surveys. Vol. 21, 1989.
Figure 1 DAG with equivalence for "agreement."
Figure 2 Possible results of unification.
Figure 3 Value inheritance hierarchy and ValuePtr class relationship.
Figure 4 Double-indirection function-call sequence.
Figure 5 SubstitutionList's private data storage.
Example 1: Part One grammar.
S --> NP VP (Sentence may be written as Noun Phrase then Verb Phrase)
VP --> V NP (Verb Phrase may be written as Verb then Noun Phrase)
NP --> DET N (Noun Phrase may be written as Determiner then Noun)
N --> "cow" (Noun may be written as "cow")
N --> "grass" (Noun may be written as "grass")
V --> "eats" (Verb may be written as "eats")
DET --> "the" (Determiner may be written as "the")
Example 2: The value for number has been percolated up through the rule
system.
S --> NP VP
<NP number>=<VP number>
Example 3: Part Two grammar.
S-->NP VP
<NP agreement>=<VP agreement>
<NP case>=nominative
<VP subcat>=transitive

VP-->V
<VP agreement>=<V agreement>
<VP subcat>=<intransitive>
<VP subcat>=<V subcat>
VP-->V NP
<VP agreement>=<V agreement>
<VP subcat>=<transitive>
<VP subcat>=<V subcat>
NP-->DET N
<NP agreement>=<N agreement>
<NP agreement number>=<DET number>
N-->"cow"
<N number>=<singular>
N-->"grass"
<N number>=<singular>
V-->"eat"
<V number>=<plural>
<VP subcat>=<transitive>
V-->"eats"
<V number>=<singular>
<VP subcat>=<transitive>
DET-->"the"
<N number>=<N agreement>

Listing One 

/*************************************************************************
 Base Value class and ValuePtr class 
 Copyright David Perelman-Hall & Jamshid Afshar 1994.
*************************************************************************/

#ifndef VALUE_H
#define VALUE_H

#include "misc.h" // GATHERED INCLUDES. HELPS WITH PRE-COMPILED HEADERS.

// Forward References
class SubstitutionList;
class Value;

// Pointer (or Reference) class to a Value
class ValuePtr {
friend ostream& operator << (ostream& os, const ValuePtr& vp);
private:
 Value* _ptr;
public:
 ValuePtr();
 ValuePtr(const ValuePtr& vp);
 ValuePtr(const Value& v);
 ~ValuePtr();
 void operator=(const ValuePtr& vp);
 bool operator==(const ValuePtr& vp) const;
 bool operator!=(const ValuePtr& vp) const;
 const Value* operator->() const;
 const Value& operator*() const;
 operator const void*() const { return _ptr; }
};
// FORWARD REFERENCES TO CLASSES DERIVING FROM VALUE
class AtomicValue;

class VariableValue;
class DagValue;
class Value {
 public:
 //void CTOR
 Value() {}
 virtual Value* copy() const = 0;
 virtual void write(ostream& os, int level=0) const = 0;
 virtual bool operator == (const Value& value) const = 0;
 virtual bool operator == (const AtomicValue&) const { return FALSE; }
 virtual bool operator == (const VariableValue&) const { return FALSE; }
 virtual bool operator == (const DagValue&) const { return FALSE; }
 virtual bool unify(const Value& value, SubstitutionList&, 
 ValuePtr&) const = 0;
 virtual bool unify(const AtomicValue&, SubstitutionList&, 
 ValuePtr&) const;
 virtual bool unify(const VariableValue&, SubstitutionList&, 
 ValuePtr&) const;
 virtual bool unify(const DagValue&, SubstitutionList&, ValuePtr&) const;
 virtual ValuePtr substitute(const SubstitutionList& substList) const = 0;
};
inline
ostream& operator << (ostream& os, const Value& v)
{
 v.write(os);
 return os;
}
inline ValuePtr::ValuePtr() : _ptr(0) { }
inline ValuePtr::ValuePtr(const ValuePtr& vp) : _ptr(vp._ptr->copy()) { }
inline ValuePtr::ValuePtr(const Value& v) : _ptr(v.copy()) { }
inline ValuePtr::~ValuePtr() { delete _ptr; _ptr=0; }
inline void ValuePtr::operator=(const ValuePtr& vp){
 if(_ptr!=0)
 delete _ptr; 
 if(vp._ptr != 0)
 _ptr = vp._ptr->copy();
 else
 _ptr = 0;
}
inline bool ValuePtr::operator==(const ValuePtr& vp) const
 { return (*_ptr) == *(vp._ptr); }
inline bool ValuePtr::operator!=(const ValuePtr& vp) const
 { return !( *this == vp ); }
inline const Value* ValuePtr::operator->() const { assert(_ptr != 0); 
 return _ptr; }
inline const Value& ValuePtr::operator*() const { assert(_ptr != 0); 
 return *_ptr; }
inline ostream& operator << (ostream& os, const ValuePtr& vp)
{
 assert(vp._ptr != 0);
 os << *(vp._ptr);
 return os;
}
#endif



Listing Two


#include "misc.h" // GATHERED INCLUDES. HELPS WITH PRE-COMPILED HEADERS
#pragma hdrstop // END OF PRE-COMPILED HEADER INCLUDES

//PROTOTYPED FOR EACH DERIVED CLASS
bool Value::unify(const AtomicValue&, SubstitutionList&, ValuePtr&) const
 { return FALSE; }
bool Value::unify(const VariableValue&, SubstitutionList&, ValuePtr&) const
 { return FALSE; }
bool Value::unify(const DagValue&, SubstitutionList&, ValuePtr&) const
 { return FALSE; }



Listing Three

/*************************************************************************
 AtomicValue class -- Copyright David Perelman-Hall & Jamshid Afshar 1994.
*************************************************************************/

#ifndef ATOMIC_H
#define ATOMIC_H

#include <cstring.h> // BORLAND HEADER
#include "value.h" // BASE VALUE & VALUEPTR CLASSES

typedef string String;
#define Atomic String

class AtomicValue : public Value, public Atomic {
 public:
 //EMPTY CTOR
 AtomicValue(){}
 //COPY CTOR
 AtomicValue( const AtomicValue& value ) : Atomic(value) { }
 //CONSTRUCTOR
 AtomicValue( const String& str ) : Atomic(str) { }
 //CONSTRUCTOR
 AtomicValue( const char* s ) : Atomic(s) { }
 //RETURN POINTER TO NEW ATOMIC VALUE COPY CONSTRUCTED FROM THIS
 Value* copy() const { return new AtomicValue(*this); }
 // EXPLICIT STRING CONVERSION
 const String& str() const { return (const Atomic&)*this; }
 //ASSIGNMENT OPERATOR
 void operator = (const AtomicValue& value) { Atomic::operator=(value); }
 //EQUALITY
 virtual bool operator == (const Value& value) const;
 virtual bool operator == (const AtomicValue& value) const;
 // UNIFY
 virtual bool unify(const Value& value, SubstitutionList& subst, 
 ValuePtr& result) const;
 virtual bool unify(const AtomicValue& value, SubstitutionList& /*subst*/,
 ValuePtr& result) const;
 // OUTPUT
 virtual void write(ostream& os, int level) const 
 { os << (const Atomic&)*this; }
 // SUBSTITUTE
 virtual ValuePtr substitute(const SubstitutionList& /*substList*/) const
 { return *this; }
};

#endif



Listing Four

#include "misc.h"
#pragma hdrstop

//PROTOTYPED FOR BASE CLASS
bool AtomicValue::operator == (const Value& value) const
 { return value==*this; }
//PROTOTYPED FOR ATOMIC VALUE
bool AtomicValue::operator == (const AtomicValue& value) const
 { return (Atomic&)*this == (Atomic&)value; }
//PROTOTYPED FOR BASE CLASS
bool AtomicValue::unify(const Value& value, SubstitutionList& subst, 
 ValuePtr& result) const
 { return value.unify(*this, subst, result); }
//PROTOTYPED FOR ATOMIC VALUE
bool AtomicValue::unify(const AtomicValue& value, SubstitutionList& /*subst*/,
 ValuePtr& result) const
{ 
 if (value == *this) {
 result = *this;
 return TRUE; 
 }
 else {
 return FALSE; 
 }
}



Listing Five

/*************************************************************************
VariableValue class -- Copyright David Perelman-Hall & Jamshid Afshar 1994.
*************************************************************************/

#ifndef VARIABLE_H
#define VARIABLE_H

#include <cstring.h> // BORLAND HEADER
#include "value.h" // BASE VALUE CLASS

typedef string Variable; // BORLAND

Variable gensym();

class VariableValue : public Value {
private:
 Variable _v;
public:
 //empty constructor
 VariableValue(){}
 //copy constructor
 VariableValue( const VariableValue& value ) : _v(value._v) { }
 VariableValue( const Variable& var ) : _v(var) { }

 Value* copy() const { return new VariableValue(*this); }
 const Variable& str() const { return _v; }

 //assignment operator
 void operator = (const VariableValue& value) { _v = value._v; }
 virtual bool operator == (const Value& value) const;
 virtual bool operator == (const VariableValue& value) const;
 virtual bool unify(const Value& value, SubstitutionList& subst, 
 ValuePtr& result) const;
 virtual bool unify(const AtomicValue&, SubstitutionList&, 
 ValuePtr& result) const;

 virtual bool unify(const VariableValue& var, SubstitutionList& subst, 
 ValuePtr& result) const;
 virtual bool unify(const DagValue&, SubstitutionList&, 
 ValuePtr& result) const;
 virtual void write(ostream& os, int level) const { os << _v; }
 virtual void read(istream& is) { is >> _v; }
 virtual ValuePtr substitute(const SubstitutionList& substList) const;
};
#endif



Listing Six

#include "misc.h"
#pragma hdrstop
#include <stdio.h>

Variable gensym()
{
 static unsigned i = 0;
 char sym[20];
 sprintf(sym, "_X%ud", i++);
 return Variable(sym);
}
bool VariableValue::unify(const Value& value, SubstitutionList& subst, 
 ValuePtr& result) const
{ 
 if (!value.unify(*this, subst, result)) {
 if (!subst.set(_v, value)) 
 return FALSE;
 result = value;
 }
 return TRUE;
}
bool VariableValue::unify(const AtomicValue& value, SubstitutionList& subst, 
 ValuePtr& result) const
{ 
 if( !subst.set(_v, value) )
 return FALSE;
 result = value; 
 return TRUE;
}
bool VariableValue::unify(const VariableValue& var, SubstitutionList& subst, 
 ValuePtr& result) const
{ 
 if( !subst.set_identical(_v, var.str()) ) {

 //cerr << "unify vars (" << *this << "," << var << ")\n"; //##
 return FALSE;
 }
 result = var; 
 return TRUE;
}
bool VariableValue::unify(const DagValue& value, SubstitutionList& subst, 
 ValuePtr& result) const
{
 if( !subst.set(_v, value) )
 return FALSE;
 result = value; 
 return TRUE; 
}
ValuePtr VariableValue::substitute(const SubstitutionList& substList) const
{ 
 if( substList.contains(_v) )
 return substList.val(_v);
 else
 return VariableValue(_v);
}
bool VariableValue::operator == (const Value& value) const
{ return value==*this; }
bool VariableValue::operator == (const VariableValue& /*value*/) const
{ return TRUE;/*## _v == value._v;*/ }



Listing Seven

/*************************************************************************
 DagValue class -- Copyright David Perelman-Hall & Jamshid Afshar 1994.
*************************************************************************/

#ifndef DAG_H
#define DAG_H

#include "misc.h"
#include "value.h"
#include "atomic.h"
#include "variable.h"

typedef MapOf<Feature, ValuePtr> FVList;
typedef MapOfIter<Feature, ValuePtr> FVListIter;

class Dag {
public:
 // CONSTRUCTOR
 Dag():_datap(new Data){}
 // COPY CONSTRUCTOR 
 Dag(const Dag& dag): _datap(dag._datap){}
 // DESTRUCTOR
 ~Dag(){}
 // ASSIGNMENT OPERATOR
 void operator=(const Dag& dag){_datap=dag._datap;}
 // ADD A FEATURE-VALUE PAIR TO THIS DAG
 void add(const Feature& f, const ValuePtr& vp){own_fvlist().enter(f,vp);}
 // EQUALITY OPERATORS
 bool operator==(const Dag& dag) const;

 bool operator!=(const Dag& dag) const {return !operator==(dag);}
 // CLEAR DATA FROM THIS DAG
 void clear(void){own_fvlist().clear();}
 // TEST FOR EXISTENCE OF FEATURE IN THIS DAG
 bool contains(const Feature& feature) const {
 return fvlist().contains(feature);
 }
 // RETRIEVE VALUE ASSOCIATED WITH FEATURE 
 ValuePtr value(const Feature& feature) const {
 if(!fvlist().contains(feature)){
 cerr << "Dag " << *this << " does not have feature " << feature
 << endl;
 }
 assert(fvlist().contains(feature));
 return fvlist().valueOf(feature);
 }
 // RETRIEVE THE LIST OF FEATURES OF THIS DAG
 FeatureSet list_features() const;
 // SET SUBSTITUTION FOR VARIABLES 
 Dag substitute(const SubstitutionList& substList) const;
 // APPLY SUBSTITUTION
 static Dag apply_substitution(const Dag& dag,
 const SubstitutionList& substList){return dag.substitute(substList);}
 // OUTPUT OPERATOR
 void write(ostream& os, int level) const;
 // IOSTREAM OPERATORS
 friend ostream& operator << ( ostream& os, const Dag& dag );
 //friend istream& operator >> ( istream& is, Dag& dag );
private:
 struct Data {
 FVList fvs;
 Data() {}
 Data(const FVList& map) : fvs(map) {}
 };
 Data *_datap;
 const FVList& fvlist() const {return _datap->fvs;}
 FVList& own_fvlist() {return _datap->fvs;}
};
/*
inline
istream& operator >> ( istream& is, Dag& dag )
{ 
 is >> dag.own_fvlist(); 
 return is;
}
*/
// DECLARATION OF GLOBAL UNIFY WHICH ACTUALLY UNIFIES TWO DAGs
bool unify(const Dag& dag1, const Dag& dag2, SubstitutionList& subst,
 Dag& result);
// DECLARATION & DEFINITION OF GLOBAL UNIFY TAKING TWO DAG REFERENCES,
// DOES NOT ACTUALLY UNIFY, BUT RETURNS BOOLEAN TEST ON THEIR UNIFICATION
inline
bool unify(const Dag& dag1, const Dag& dag2){
 SubstitutionList subst;
 Dag result;
 return unify(dag1, dag2, subst, result);
}
// MAKE DAG_LIST CLASS
typedef ListOf<Dag> DagList;

typedef ListOfIter<Dag> DagIter;

ostream& operator<<(ostream& os, const ListOf<Dag>& list);

class DagValue : public Value {
private:
 Dag _d;
public:
 //empty constructor
 DagValue(){}
 //copy constructor
 DagValue( const DagValue& value ) : _d(value._d) { }
 DagValue( const Dag& dag ) : _d(dag) { }
 Value* copy() const { return new DagValue(*this); }
 //assignment operator
 void operator = (const DagValue& value) { _d = value._d; }

 virtual bool operator == (const Value& value) const
 { return value==*this; }
 virtual bool operator == (const DagValue& value) const
 { return _d==value._d; }
 virtual bool unify(const Value& value, SubstitutionList& subst, 
 ValuePtr& result) const
 { return value.unify(*this, subst, result); }
 virtual bool unify(const DagValue& value, SubstitutionList& subst, 
 ValuePtr& result) const;
 virtual void write(ostream& os, int level) const;
 //virtual void read(istream& is) { is >> _d; }
 virtual ValuePtr substitute(const SubstitutionList& substList) const;
};
#endif



Listing Eight

#include "misc.h"
#pragma hdrstop

// DAG SPECIFIC OSTREAM OPERATOR
ostream& operator << ( ostream& os, const Dag& dag )
{
 dag.write(os, 0);
 return os;
}
// DAG SPECIFIC OUTPUT OPERATOR
void Dag::write(ostream& os, int level) const
{
 os << "[";
 for (FVListIter i(&fvlist()); !i.offEnd(); i.next()) {
 os << i.key() << " : ";
 i.value()->write(os, level+1);
 if (!i.last()) {
 os << "\n";
 for (int i=0; i<level; i++)
 os << " ";
 os << " ";
 os << " ";
 }

 }
 os << "]";
}
// RETRIEVE THE LIST OF FEATURES OF THIS DAG
FeatureSet Dag::list_features() const 
{
 // CREATE A SET OF KEYS TO RETURN
 FeatureSet tmp_features(fvlist().keys(),fvlist().size());
 // RETURN IT
 return tmp_features;
}
// EQUALITY OPERATOR
bool Dag::operator==(const Dag& d) const
{
 // TYPEDEF AN ITERATOR OVER A MAP FROM FEATURES TO VALUE POINTERS
 typedef MapOfIter<Feature, ValuePtr> FVIter;
 // STEP SIUMULTANEOUSLY THROUGH LIST OF FEATURES 
 for (FVIter fv1(&this->fvlist()), fv2(&d.fvlist()); 
 !fv1.offEnd() && !fv2.offEnd(); fv1.next(), fv2.next()){
 // IF EITHER CURRENT FEATURES OR CURRENT VALUES DON'T MATCH, RETURN FALSE
 if (fv1.key()!=fv2.key() fv1.value()!=fv2.value())
 return FALSE;
 }
 // RETURN TRUE IF STEPPED ENTIRELY THROUGH BOTH LISTS, ELSE RETURN FALSE
 return fv1.offEnd() && fv2.offEnd();
}
// SUBSTITUTE
Dag Dag::substitute(const SubstitutionList& substList) const
{
 // RETURN DAG
 Dag tmpDag;
 // LIST OF FEATURES IN THIS DAG
 FeatureSet features=list_features();
 // STEP THROUGH LIST OF FEATURES FROM THIS DAG
 for (SetOfIter<Feature> feature(&features); feature; ++feature)
 // ADD TO RETURN DAG THE FEATURE AND ANY VALUE LISTED FOR IT IN THE
 // SUBSTITUTION LIST, AS FOLLOWS:
 // 1) IF THERE IS A SUBSTITUTION FOR IT AND THE VALUE IS A
 // VARIABLE, ADD WHATEVER EXISTS IN THE SUBSTITUTION LIST
 // FOR THAT VARIABLE,
 // 2) IF THERE IS NO SUBSTITUTION FOR IT AND THE VALUE IS A
 // VARIABLE, ADD THE VARIABLE,
 tmpDag.add(feature(),value(feature())->substitute(substList));
 // RETURN THE CREATED DAG
 return tmpDag;
}
// OUTPUT OPERATOR FOR A DAGVALUE
void DagValue::write(ostream& os, int level) const
{
 _d.write(os, level);
}
bool DagValue::unify(const DagValue& value, SubstitutionList& subst, 
 ValuePtr& result) const
{
 Dag unification;
 // CALL GLOBAL FUNCTION, PASSING 2 DAGs
 if (::unify(_d, value._d, subst, unification)) {
 result = DagValue(unification);
 return TRUE;

 }
 else {
 return FALSE;
 }
}
ValuePtr DagValue::substitute(const SubstitutionList& substList) const
{ return DagValue(_d.substitute(substList)); }
// GLOBAL UNIFY FUNCTION
bool unify(const Dag& dag1,const Dag& dag2,SubstitutionList& subst,Dag&
result)
{
 // GET SET OF ALL FEATURES FROM BOTH DGAS
 FeatureSet features = dag1.list_features() + dag2.list_features();
 {
 for (SetOfIter<Feature> feature(&features); feature; ++feature) {
 // TRY TO UNIFY VALUES WHERE FEATURE IS COMMON TO BOTH DAGS
 if (dag1.contains(feature()) && dag2.contains(feature())) {
 // cout << "Attempting to unify feature " << feature() 
 << " in both dags." << endl;
 ValuePtr unified_value; // HOLD THE RESULT OF SUCCESSFUL UNIFICATION

 // CALL VALUE::UNIFY THROUGH VALUEPTR WRAPPER, BUILD UP SUBSTITUTION 
 // LIST & RESULTING DAG
 if (dag1.value(feature())->unify(*(dag2.value(feature())), subst, 
 unified_value)) {
 result.add(feature(), unified_value);
 }
 else { // FAILED TO UNIFY VALUES FOR FEATURE COMMON TO BOTH DAGS
 cout << dag1.value(feature()) << " and " << dag2.value(feature()) 
 << " did not unify." << endl;
 return FALSE;
 }
 }
 else { // ADD FEATURE-VALUE WHERE EXISTS IN ONE DAG ONLY
 if(dag1.contains(feature()))
 result.add(feature(), dag1.value(feature()));
 else
 result.add(feature(), dag2.value(feature()));
 }
 }
 }
 Dag result_with_subst;
 {
 // DO VARIABLE SUBSTITUTIONS BUILT UP IN PREVIOUS LOOP
 for (SetOfIter<Feature> feature(&features); feature; ++feature) {
 result_with_subst.add(feature(), 
 result.value(feature())->substitute(subst));
 }
 }
 result = result_with_subst;
 return TRUE;
} End Listings












Above-Real-Time Training and the Hyper-Time Algorithm


Altering time in human-computer interfaces 




Dutch Guckenberger, Liz Guckenberger, Frank Luongo, Kay Stanney, Jose
Sepulveda 


Dutch is the senior software engineer for ECC International and teaches
graduate simulation courses at the University of Central Florida. Liz is a
research associate with UCF, while Kay and Jose are associate professors in
UCF's Industrial Engineering Department. Frank is a software engineer at ECC.
They can be contacted at dutchg@pegasus.cc.ucf.edu.


Historically, the rate of information presentation for a computer application
is chosen and hard coded by the original software developer. The programmer
tries to select a rate that is not too fast for the novice, nor too slow for
the expert, and rarely allows variability in the rate of presentation. The net
effect of this "developer selection" is that no group is completely happy, and
we are forced to adapt to the machine. The hyper-time algorithm (HTA) empowers
the user to dynamically control the rate of information presentation. HTA is a
modular addition to existing applications.
HTA makes it possible for you to alter the flow of "simulated time" to benefit
the user of a computer or simulator. Slower than real time can be used for
novice users or to emphasize a particular section of interest. Faster than
real time can be used for experts or persons "time surfing" (analogous to
TV-channel surfing) over uninteresting portions of information. NASA and
U.S.-military research has utilized HTA to demonstrate that "Above Real Time
Training" (ARTT) improves human performance, increases retention, increases
training-device effectiveness, and decreases stress. On the flip side, slowing
down the rate of information presentation also has benefits, particularly when
it comes to debugging, emphasizing important points, and helping disadvantaged
learners. 
Over the years, we've converted seventeen different simulators and six
different computer applications so that they can perform above/below-real-time
operations using the hyper-time algorithm. More recently we've developed a
control for varying the video rate of MPEG video presentations, and we're
currently working with Sigma Design to synchronize faster-than-real-time
audio. Once we have combined faster-than-real-time video and audio, HTA-MPEG
will make it possible for you to watch a two-hour movie in 80 minutes, while
having a more intense involvement and better memory recall.
Interestingly, advertisers have been aware of the nonobvious benefits of above
real time for many years. When time-compressed speech is used in radio
advertisements, people like the commercials better and remember them better
(MacLachlan and LaBarbera, 1978). The TV-advertising industry notes similar
results using time-compressed commercials (MacLachlan and LaBarbera, 1979;
MacLachlan and Seigel, 1980; and Riter, Balducci, and McCollum, 1982). For
instance, do you remember the FedEx commercials with a fast-talking actor?
Have you ever noticed how fast the motion is in MTV videos, or watched the
fast motion of the old silent movies? What you probably didn't notice was that
your comprehension did not suffer. Humans are able to quickly adapt to
different rates of information presentation. It's unfortunate that the
advertising community is using our above-real-time adaptability to put slogans
and trivia into our brains, while educational TV programs and computer-based
training have yet to capitalize on these techniques. Shouldn't we be using
this technique to better educate our children? ARTT could be used to make
education more enjoyable, while helping students retain more information.
(We're currently developing above-real-time MPEG playback for education and
training purposes. Educators interested in testing or applying the technique
can contact us for more information.) 
A less-obvious benefit is that HTA supports a new method of training in
simulators. ARTT, the focus of our research for the last six years, is a
multidisciplinary research program with theoretical support garnered from
neuroscience, cognitive psychology, human-computer interaction, and learning
theories, as well as applied research. ARTT research has received three
national awards--National Security Industrial Association, best paper 1992;
NAVAIR 4th Airborne Weapons Conference 1993, best technical paper; and Link
Foundation Advanced Simulation and Training Fellowship 1993--94 (Guckenberger,
et al., 1992, 1993, 1994). ARTT research has been supported by NASA's Dryden
Flight Research Center (DFRC), ECC International, the University of Central
Florida, the U.S. Navy, the Link foundation, and the U.S. Army. Currently
pending is support from the U.S. Air Force, a commercial airline company, the
Chicago Cubs, and the Cleveland Indians (for ARTT batting practice). We
mention this simply to illustrate that HTA and its application in ARTT have
widespread applicability. For the purposes of this article, we'll emphasize
applications and try to restrict theoretical considerations. If you're
interested in the intrinsic time adaptability of humans, a good place to start
is Chapters 1 and 2 of Human-Computer Interaction, by Card, Moran, and Newell
(Lawrence Erlbaum Associates, 1983). The variable cognitive processor-rate
principle and the variable perceptual processor-rate principle of the
human-information processor model are starting points for understanding ARTT
phenomena.


Above-Real-Time Background


Simulators utilizing ARTT allow you to "over-train" in the time dimension. In
simplest terms, ARTT pseudostresses individuals in safe, simulator
environments, preparing them for additional real-world stresses not present in
the simulation. Research has shown enormous benefits in performance and
retention of ARTT-trained tasks; tank gunnery has shown 50 percent higher
performance, and the accuracy of F-16 pilots improved 28 percent in performing
emergency procedures under stress. The ARTT pilots not only performed the
emergency procedure with near-100 percent accuracy, they followed by removing
the stress (in this experiment, the stress was enemy MIGs). The ARTT pilots
killed six times as many MIGs as the control real-time pilots (Guckenberger et
al., 1992, 1993).
ARTT refers to a training paradigm that places the operator in a simulated
environment that functions at faster than normal time. In the case of
air-combat maneuvering, a successful tactical air intercept which might
normally take five minutes is compressed into two or three minutes. All
operations of the intercept would correspondingly be accelerated--airspeed,
turn and bank velocities, weapons flyout, and performance of the adversary. In
the presence of these time constraints, the pilot would be required to perform
the same mission tasks to the same performance criteria as he would in a
real-time environment. Such a training paradigm represents a departure from
the intuitive, but not often supported, feeling that the best practice is
determined by the training environment with the highest fidelity. ARTT can be
implemented economically on existing simulators. It is important to realize
that ARTT applications require the simulated time to change, not the update
rate. Over 20 years ago, flight-test engineers recognized that if you could
program a simulator to operate in "fast time," you could give test pilots a
more accurate experience or "feel" of real-world stresses that would be
present in the aircraft (Kolf, 1973; Hoey, 1976).
The bulk of the original support for ARTT in simulators came from NASA
reports. During the X-15 program in the late 1960s, researchers at NASA's DFRC
needed a mechanism to address the X-15 test pilots' post-flight comments of
being "always behind the airplane_" and "_could never catch up" (Kolf, 1973).
Clearly, there were some differences between the perceived time in the
well-practiced simulator flights and that in the experimental aircraft. The
first time NASA used fast-time simulation was toward the end of the X-15
program. Pilots compared practice runs at various time constants with flights
they had already flown. A fast-time constant of 1.5x felt closest to their
flight experience and was successfully implemented in the lifting-body
programs, but lack of funding prevented the program from fully developing the
capability. Nevertheless, NASA's test pilots at DFRC have endorsed the use of
fast-time simulation as part of the training process. It is important to note
that DFRC's Jack Kolf is the father of ARTT in simulators. He recognized the
problem, fostered a successful solution, and implemented ARTT for NASA test
pilots. 
Vidulich, Yeh, and Schneider (1983) examined time compression as an aid in
training a basic, high-performance air-traffic-control skill. One group
practiced an intercept with a target plane traveling at 260 knots. The second
group practiced the intercept at 5200 knots--20 times real time! Both groups
were then tested in real time. The time-compressed group performed certain
aspects of the skill significantly better; in other areas, their performance
was the same as the real-time group. 
ARTT benefits have been extended to virtual-reality environments, where
college students were able to perform 40 percent faster and with less workload
and stress than conventionally trained, control VR subjects (Guckenberger,
Stanney, Mapes, 1993). 
The concept of time surfing may be useful for observing information hitherto
obscured by the low-repetition rates. For example, it may be possible to
identify enemy traffic routes, supply areas, headquarters, and communication
centers by fast-time playback of information from a God's-eye view of the
battlefield. The traffic patterns and key crossroads would be easily
distinguished. The fast-time playback would blur the positions of individual
vehicles into a moving line, which would not only give direction, but also the
traffic load, as revealed by the intensity of the line. A good analogy is the
fast-time playback of cloud movements during the weather report you see every
night on TV. Time-lapse radar images also show the internal-structure elements
as the weather patterns move. 
The key point is that the rate of information presentation varies for the
benefit of the user. The control of this rate can be assigned to the user, the
instructor, or intelligent-tutor software that matches the rate of information
presentation to a user's current performance level. The simplest case for an
intelligent tutor is a lookup table that selects a rate of information
presentation as indexed by current-performance score. 


Manipulating Time


To manipulate time, you must adopt an entirely new way of thinking about it.
Once you understand HTA, you'll see that update rate and hardware requirements
are unaffected--you're merely altering "simulated time." 
For example, consider the case of a real-time vehicle simulator. Somewhere
deep in the heart of the software is an assigned value (usually hardcoded) for
the hardware clock-tick value, TICK_VALUE. The physics model calculates the
next frame's x,y,z location from the present x,y,z, plus the Velocity_Vectors,
multiplied by the TICK_VALUE. By dynamically assigning different values to
TICK_VALUE, you can alter the flow of simulated time without effecting the
hardware-update rate, without changing the physics model, and without changing
the I/O rates; that is, you change only the time-frame integration value. (The
HTA implementation for controlling video-disc and CD-ROM interfaces are a
little more complex, although not unduly so.) 
The HTA implementation should not affect normal real-time use of the computer
application or simulation. If the HTA is implemented as a command-line
parameter, then it's an initialization issue. If it's implemented as a dynamic
variable, then the cost to the real-time process is one extra comparison per
loop. 
The real work is done in the alter_ time_rate() function (see Example 1),
which is platform dependent. If the platform utilizes a video disc or similar
per-frame encoded device, alter_time_rate() modifies the presentation of the
frames in faster than real time by skipping frames and in slower than real
time by increasing the display time of some frames. If the platform does its
own computer-generated imagery, alter_time_rate() has only to alter the
TICK_VALUE variable as described earlier. If the application is a word
processor that goes as fast as the calculations can be made, a timer module is
added to allow alter_time_rate() to control the rate of information
presentation. If the platform is an MPEG playback board like Sigma Design's
ReelMagic Board, the application interface can be used to allow alter_
time_rate() to control the rate of video information presentation. If the
simulation is to run at the same rate of information presentation for the
entire simulated mission, the ARTT rate can be done as a simple initialization
function; see Listing One, page 100. Listing Two is source code to change
Sigma Design's ReelMagic MPEG playback board to run video at above (and below)
real time. (In the future, we'll examine ways to add audio to HTA-MPEG,
covering RAPID-COMmunication, a patented method developed by the U.S. Air
Force that has demonstrated higher throughput and improved retention of
computer-displayed information.) 


Implementing ARTT 


To illustrate how you can implement HTA, we'll examine the VideoDisc
Interactive Gunnery Simulator (VIGS) built by ECC International for the U.S.
Army. We chose this particular simulator as an example to emphasize that the
computation requirements are not effected by implementing ARTT.
The VIGS unit--an 8086-based PC running under MS-DOS with ECC-proprietary
video, sound, and I/O boards--was designed to teach basic gunnery skills. The
trainer uses realistic battlefield scenarios created using an image generator
and recorded onto a video laser disc. Scene and target information for each
frame of every battlefield scenario are garnered from the image generator, and
stored on disk. The video-playback system uses a Hitachi laser-disc player
which offers several different playback modes, including normal playback (30
fps), 2x playback (60 fps), and a step function which allows the video disc to
be stepped through one frame at a time. The video player is controlled via an
RS-232 port.
The ARTT experiment required some changes to the instructor interface of the
system. Upon the initialization of each lesson, the instructor was presented
with an option screen like Figure 1 to select the operating mode of the
machine. The instructor used a numeric keypad to select a menu option. If the
ARTT mode was selected, a screen similar to Figure 2 appeared, letting the
instructor set up the experiment parameters. The first five selections in
Figure 2 are self explanatory. The random-mode selection presented the subject
with random-ARTT variables, while the sequential-mode selection presented the
subject with increasing ARTT variables. 
During the real-time-operation mode, the video player was placed at the
normal-playback rate (30 fps), and the loop times for ballistic calculations,
target tracking, and scene movement were synchronized to 30 Hz. This was done
by writing an interrupt handler which was triggered by a 60-Hz interrupt
signal generated by the video board. All target, scene, and some ballistic
information was stored in tables indexed by frame numbers (remaining ballistic
information was based on gun elevation and target range). These frame numbers
were synchronized to reflect the relative frame number of the video-disc scene
being displayed. Target hits or misses were determined by accessing these
tables and performing dynamic equations. 
The ARTT implementation of this trainer was greatly simplified by realizing
the value of keeping ballistic calculations based in "real time" while causing
target and scene movement to accelerate. Manipulating the target and scene
time constant was a simple task: The video-disc playback rate and updates of
the frame-number variable were simply manipulated during the 60-Hz update
rate. Using 30 fps as the normal-playback rate, the different update rates
were calculated using the equations in Example 2. Example 2(a), for instance,
determines the amount of time the ARTT-adjusted scenario will run, while
Example 2(b) determines the necessary frame variable and video-player update
rate by dividing the total number of available frames by the ARTT scenario
time. 
By using the ratio of ARTT-update rate to 60 Hz, we developed an algorithm to
synchronize the video-playback rate with the frame-number variable used to
index the information arrays. Example 2(c) shows the calculations used for the
1.33x ARTT constant. Assuming a 60-second, real-time scenario, the 1.33
ARTT-adjusted scenario runs approximately 45 seconds.
A 60-second scenario contains 1800 frames of information, resulting in Example
2(d). Substituting Hz for fps, we obtain the ratio 40 Hz (required ARTT update
rate)/60 Hz (video interrupt), or 2/3. To implement this ratio, the video
player was placed in step mode and issued a step command "2" out of every
three 60-Hz interrupt loops. The frame variable was incremented every time the
video player was stepped. Similar calculations were used on the 1.6X and 0.5X
ARTT constants, yielding ratios of 4/5 and 1/4, respectively. Figure 3 shows
an associated flow diagram while Example 3 presents pseudocode. Listing One is
the video-player driver. 
Although the sample code presented here was specific to the VIGS and the
video-disc player, adapting it to similar applications using video disc or
CD-ROM formats should be straightforward.


Conclusion



ARTT research has demonstrated that humans are time adaptable, and therefore
capable of sustained performance and learning at much higher levels than
conventionally accepted. In this context, real time--as treated by typical
approaches to software design and human-computer interaction--is an artificial
barrier, a self-imposed limit, and an incomplete paradigm.
Note that a thorough theoretical treatment of ARTT is scheduled for
publication by Rochester Press later this year on behalf of the Link
Foundation Fellowship in Advanced Simulation and Training. If interested, you
can obtain a draft copy and additional information by contacting us.


References


Card, S.K., T.P. Moran, and A. Newell. The Psychology of Human-Computer
Interaction. Hillsdale, NJ: Lawrence Erlbaum Assoc., 1983.
Guckenberger, D., K.C. Uliano, and N.E. Lane. The Application of Above
Real-Time Training for Simulators: Acquiring High-Performance Skills.
Presented at the 14th Interservice/Industry Training Systems and Education
Conference. San Antonio, TX, 1992.
------. Training High-Performance Skills Using Above Real-Time Training. NASA
Technical Report Contract NAG-2-750. 1993.
Guckenberger, D., K. Stanney, and D. Mapes. Virtual Time: Adding the Fourth
Dimension to Virtual Reality. Presented at the 15th Interservice/Industry
Training Systems and Education Conference, Orlando, FL, 1993. 
Guckenberger, D., K. Stanney, and N.E. Lane. The Effects of Above Real-Time
Training (ARTT) in an F-16 Simulator, 1993.
Hoey, R.G. "Time Compression as a Means for Improving the Value of Training
Simulators." Unpublished manuscript, 1976. 
Kolf, J. "Documentation of a Simulator Study of an Altered Time Base."
Unpublished manuscript, 1973. 
MacLachlan, J. and P. LaBarbera. "Time-Compressed Speech in Television
Commercials." Journal of Advertising Research (August 1978).
------. "Time-Compressed Speech in Radio Advertising," Journal of Marketing
(January 1979).
MacLachlan, J. and M.H. Siegal. "Reducing the Costs of TV Commercials by Use
of Time Compressions." Journal of Marketing Research (February 1980).
Matin, E., K.R. Boff, and R. Donovan. "Raising Control/Display Efficiency with
Rapid Communication Display Technology." Proceedings of the Human Factors
Society 31st Meeting, 1987.
Matin, E. and K.R. Boff. "Information Transfer Rate with Serial and
Simultaneous Visual Display Formats." Human Factors, 1988. 
------. "Human Machine Interaction with Serial Visual Displays." Proceedings
of the Society for Information Displays (SID). Las Vegas, NV, May 1990.
Riter, C.B., P.J. Balducci, and D. McCollum. "Time Compressions: New
Evidence." Journal of Advertising Research, 1982.
Schneider, W. "Training High-Performance Skills: Fallacies and Guidelines."
Human Factors, 1985.
Thompson, M. "General Review of Piloting Problems Encountered During
Simulation Flights of the X-15." Presented at the 9th Annual Meeting of the
Society of Experimental Tests Pilots, 1965.
Vidulich, M., Y.Y. Yeh, and W. Schneider. "Time-Compressed Components for
Air-Intercept Control Skills." Proceedings of the 27th Meeting of the Human
Factors Society, 1983.
Example 1: Initialization and HTA.
initialization() {
 ...
 /* initialize ARTT variables */
 current_artt_rate = 1.0;
 old_artt_rate = 1.0;
}
main_loop() { ...
 /* get the artt-rate if input from user, otherwise lookup list */
 ....
 /* if artt_rate is different from last pass change rate */
 if (current_artt_rate != old_artt_rate ) {
 alter_time_rate(current_artt_rate);
 old_artt_rate = current_artt_rate;
 }
 ....
}
Figure 1: Sample VIGS operator screen.
M1 GUNNERY TRAINER
ECC INTERNATIONAL
Select Mode of OperationL
0: Normal
1: ARTT
2: DEMONSTRATION
Figure 2: Sample VIGS ARTT menu screen.
ARTT MENU SCREEN
Time Mode For Trails:
Mode Legend:
1: REAL TIME
2: 2X
#: 1.3X
4: 1.6X
5. 0.5X
6. RANDOMSELECT

7: SEQUENTIALSELECT
Your Choice?
Example 3: Pseudocode for typical ARTT-based application-flow diagram.
HTA skipped frame pseudocode */
Initialization of HTA_Skip */
HTA_Skip_mode =1
/* User Input */
User Select mode 1=1.0, 2 =2.0, 3=1.33, 4=1.6, 5=0.5, 6=3.0
/* Error Check User Input */
If (error_check(mode) == TRUE) then
{symbol 183 \f "Symbol" \s 10 \h} Report error
{symbol 183 \f "Symbol" \s 10 \h} Re-prompt for input
{symbol 183 \f "Symbol" \s 10 \h} Offer quit option
Case mode
 1: play 1.0x @ 30 fps (skip every other 60-Hz pass)
 2: play 2.0x @ 60 fps, increment
 3: play 1.33x @ step every 2 out of 3 60-Hz passes
 4: play 1.6x @ step every 4 out of 5 60-Hz passes
 5: play 0.5x @ step once every 4 60-Hz passes
 6: play 3.0x @ step 3 times every 2 60-Hz passes
/*for Videodisc updates */
Example 2: Calculating ARTT-based application-update rates.
(a)
60seconds/ARTTConstant=
 ARTTScenarioTime

(b)
TotalAvailableFrames/ARTTScenarioTime

(c)
60Seconds/1.33=45Seconds

(d)
1800frames/45seconds=40fps
Figure 3 Typical ARTT-based application-flow diagram.

Listing One 

{ /*
*****************************************************************
** ECC INTERNATIONAL CORPORATION
*****************************************************************
** %BEGIN_HEADER
** %NAME: play.c
** %PURPOSE: Above Real-Time Modified Version of VIGS
** %DEPENDENCIES:
** -----
** DEVELOPMENT ENVIRONMENT
** COMPILER: Pascal 
** OPERATING SYSTEM: DOS
** COMPUTER: PC
** COMPILATION INSTRUCTIONS: uses Borland MAKE. Type 'make'
** TARGET ENVIRONMENT
** CPU: 8086
** HARDWARE NAME: VIGS 
** OPERATING SYSTEM: n/a
** MACHINE DEPENDENCIES: n/a
** %CHANGE HISTORY:
** REV: 3.0 DATE: 08-07-91 PROGRAMMER: Frank Luongo

** DESCRIPTION: Added Above Real-Time Modifications per Hyper-Time Algorithm
** %END_HEADER
*****************************************************************
*/
}
 ..... 
Procedure Play;
 {code for Hitachi laserdisc driver} 
 const 
 PLAY_1 = 37; 
 PLAY_2 = 102; 
 STEP = 36; 
 begin 
 Case mode_number of 
 1{1X} : 
 { a 30Hz flag is set in the interrupt handler} 
 if v30flg then begin 
 if frame_cnt = 1 then begin 
 {wait for Tx ready} 
 while ((inport($3fd) and $20) <> $20 ) do
 {since loop time of program is considerably less than 30 ms 
 we use the play command for 1X instead of the step command.}
 outport($3f8,PLAY_1); 
 {only send play command once so not to reset frame_cnt}
 frame_cnt := frame_cnt + 1; 
 end; 
 {increment frame number variable to sync with videodisc player}
 frame_num := frame_num + 1; 
 end; 
 2:{2X} 
 begin 
 if frame_cnt = 1 then begin 
 {wait for Tx ready} 
 while ((inport($3fd) and $20) <> $20 ) do
 {since loop time of program is less than 16 ms we use play 
 command for 2X instead of the step command.}
 outport($3f8,PLAY_2); 
 {we only send the play command once so we do not need to reset 
 frame_cnt} 
 frame_cnt := frame_cnt + 1; 
 end; 
 {increment the frame number variable every 60 Hz to maintain sync 
 with the videodisc player} 
 frame_num := frame_num + 1; 
 end; 
 3:{1.33X} 
 begin 
 {keep track of the number of 60Hz passes} 
 frame_cnt := frame_cnt + 1; 
 {issue step command and inc frame_num every 2 out of 3 60 Hz passes}
 if frame_cnt < 4 then begin 
 {wait for Tx ready} 
 while ((inport($3fd) and $20) <> $20 ) do;
 {send the step command out the serial port} 
 outport($3f8,STEP); 
 {increment the frame variable to maintain sync} 
 frame_num := frame_num + 1; 
 end else 
 {reset frame count} 

 frame_cnt := 1; 
 end; 
 end; 
 4:{1.6X} 
 begin 
 {keep track of the number of 60Hz passes} 
 frame_cnt := frame_cnt + 1; 
 {issue step command and inc frame_num every 4 out of 5 60 Hz passes}
 if frame_cnt < 6 then begin 
 {wait for Tx ready} 
 while ((inport($3fd) and $20) <> $20 ) do;
 {send the step command out the serial port} 
 outport($3f8,STEP); 
 {increment the frame variable to maintain sync} 
 frame_num := frame_num + 1; 
 end else 
 {reset frame count} 
 frame_cnt := 1; 
 end; 
 end; 
 5:{0.5X} 
 begin 
 {keep track of the number of 60Hz passes} 
 frame_cnt := frame_cnt + 1; 
 {issue step command and inc frame_num every 1 out of 4 60 Hz passes}
 if frame_cnt > 3 then begin 
 {wait for Tx ready} 
 while ((inport($3fd) and $20) <> $20 ) do;
 {send the step command out the serial port} 
 outport($3f8,STEP); 
 {increment the frame variable to maintain sync} 
 frame_num := frame_num + 1; 
 end else 
 {reset frame count} 
 frame_cnt := 1; 
 end; 
 end; 
 end;{case} 
end; 




Listing Two
 
//*****************************************************************
// ECC INTERNATIONAL CORPORATION
//*****************************************************************
// %BEGIN_HEADER
// %NAME: artt_dos.c
// %PURPOSE: Above Real Time Video Demonstration: 
// Demonstration of Above Real-Time MPEG Playback on Sigma Design's 
// ReelMagic Board. Modified Sigma Design example code for demonostration.
// Note: If compliling with compilers that do not use Borland's BGI (like
 Visual C++), be sure to comment out the indicated lines 
// This example demonstrates how to get the position in the appropriate 
// time format and how to vary the playback speed buffers.
// At the dos prompt, type "artt_dos <MPEG FileToPlay>"
// Use up and down keys to change the speed, escape to exit

// Note : link it with FMPFCTS.OBJ. Use bat file Dutch.bat to load fmpdrv
// and play nfl file and unload driver
// %DEPENDENCIES:
// Important: Ensure fmpdrv.exe TSR is running prior to executing program.
// FMPFCTS.OBJ 
// <stdio.h>
// <stdlib.h>
// <conio.h>
// <bios.h>
// "types.h"
// "fmpdrv.h"
// "fmpmacs.h"
// "fmpfcts.h" 
// DEVELOPMENT ENVIRONMENT
// COMPILER: Turbo C or Visual C++
// OPERATING SYSTEM: DOS
// COMPUTER: PC
// COMPILATION INSTRUCTIONS: uses Borland MAKE. Type 'make'
// TARGET ENVIRONMENT
// CPU: 486
// HARDWARE NAME: PC with MPEG Playback
// OPERATING SYSTEM: n/a
// MACHINE DEPENDENCIES: n/a
// %CHANGE HISTORY:
// REV: 1.0 DATE: 06-07-94 Software Engineer: Dutch Guckenberger
// DESCRIPTION: Header installation splayer.c modifications
// DATE: 07-03-94 Software Engineer: Dutch Guckenberger
// DESCRIPTION: Modified Sigma Design's Example code provided by Dennis 
// Gutridge to compile and run on MicroSoft's Visual C++. Key 
// was turning off audio prior to ARTT replay.
// %END_HEADER
//
//
***************************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <bios.h>

#include "types.h"
#include "fmpdrv.h"
#include "fmpmacs.h"
#include "fmpfcts.h"

#define KEY_UP -72
#define KEY_DOWN -80
#define KEY_ESC 27

#define SPEED_DIV 50
#define SPEED_MIN 1
#define SPEED_MAX 200
#define SPEED_STEP 1

// Error function - Writes Msg and stop the program
void Error(char *Msg,int ExitCode)
{
 fprintf(stderr,Msg);
 exit(ExitCode);
}

void main(int argc,char *argv[])
{
 BYTE hStream;
 BOOL Done=FALSE;
 BOOL NewSpeed=TRUE;
 int Speed=SPEED_DIV;
 int ch;
 DWORD d;
 if (argc<2)
 Error("Specify a file to play.\n",1);
 // Locate the driver
 if (!FindDriver())
 Error("Driver not found.\n",2);
 // Re-init the driver
 FMPInit();
 // Open the file
 hStream=FMPOpen(FMPF_FILE,(DWORD)(LPSTR)argv[1]);
 // if hStream is null, the file has not been properly opened
 if (!hStream)
 {
 Error("Error while opening the file.\n",3);
 }
 // has the file been recognized ?
 if (FMPGet(hStream,FMPI_STM_TYPE)==FMPF_UNKNOWN)
 {
 FMPClose(hStream);
 Error("The file format is unknown.\n",4);
 }
 // set the destination window
 FMPSet(hStream,FMPI_VID_DEST_SIZE,MAKEDWORD(352,240));
 FMPSet(hStream,FMPI_VID_DEST_POS ,MAKEDWORD((704-352)/2,(400-240)/2));
 // set the time format to frames
 FMPSet(hStream,FMPI_STM_TIME_FMT,FMPF_FRAMES);
 // if it's a system stream, let's unselect the audio channels
 // because speed variation will not work properly with audio .
 if (FMPGet(hStream,FMPI_STM_TYPE)==FMPF_GROUP)
 {
 int i;
 BYTE aStream;
 BYTE n=FMPGet(hStream,FMPI_GRP_NB);
 for (i=1;i<=n;i++)
 {
 aStream=FMPCommand(FMP_GROUP,hStream,FMPF_GETFMPF_INDEX,i);
 if (FMPGet(aStream,FMPI_STM_TYPE)==FMPF_AUDIO)
 FMPCommand(FMP_GROUP,hStream,FMPF_UNSELECT,aStream);
 }
 }
// Following line commented out for Visual C++ Version
// clrscr();
 printf("Use up and down arrows to change the speed. Escape to quit\n");
 printf("Frame :\nTime :\nSpeed Ratio :");
 // Play the file
 FMPPlay(hStream,FMPF_END_REPEAT,0);

 while (!Done)
 {
// Following line commented out for Visual C++ Version
// gotoxy(13,2);
 FMPSet(hStream,FMPI_STM_TIME_FMT,FMPF_FRAMES);

 d=FMPGet(hStream,FMPI_STM_POSITION);
 if (d!=(DWORD)-1)
 printf("%7lu",d);
// Following line commented out for Visual C++ Version
// gotoxy(9,3);
 FMPSet(hStream,FMPI_STM_TIME_FMT,FMPF_HMSF);
 d=FMPGet(hStream,FMPI_STM_POSITION);
 if (d!=(DWORD)-1)
 printf("%02d:%02d:%02d %02d",HIBYTE(HIWORD(d)),
 LOBYTE(HIWORD(d)), HIBYTE(LOWORD(d)),LOBYTE(LOWORD(d)));
// Following line altered for Visual C++ Version
 if (kbhit())
 {
 ch=getch();
 if (!ch) ch=-getch();
 switch (ch)
 {
 case KEY_UP : Speed+=SPEED_STEP;NewSpeed=TRUE;break;
 case KEY_DOWN : Speed-=SPEED_STEP;NewSpeed=TRUE;break;
 case KEY_ESC : Done=TRUE;break;
 }
 }
 if (NewSpeed)
 {
 Speed=max(SPEED_MIN,min(SPEED_MAX,Speed));
 FMPSet(hStream,FMPI_STM_SPEED,MAKEDWORD(Speed,SPEED_DIV));
// Following line commented out for Visual C++ Version
// gotoxy(15,4);
 printf("%.2f\n",Speed/(float)SPEED_DIV);
 NewSpeed=FALSE;
 }
 }
 // close the stream
 FMPClose(hStream);
}




























A POP3 Mail Client using WinSock


Encapsulate the API you need--and ignore the rest




Robert A. Duffy


Robert is vice president of engineering at TeacherSoft Corp., where he
develops Internet-access software and browser technology. He can be contacted
at raduffy@teachersoft.com.


As the Internet continues to grow exponentially, the demand for internetworked
applications is at an all-time high, with no slowdown in sight. There are now
many opportunities for developers to implement a new generation of
client/server applications. When Windows 95 is released with its increased
emphasis on connectivity, the need for internetworked applications will only
heighten.
In this article, I'll present a simple C++ foundation for Internet client
applications running on Microsoft Windows, based on classes that encapsulate
the Windows Sockets API. Building on a simple socket class, I'll demonstrate
how to implement a mail client that understands the Post Office Protocol
(POP), a powerful Internet protocol for processing e-mail now available on
most Internet mail servers. My mail client is constructed using Borland's
ObjectWindows Library (OWL). Although my classes comprise a small piece in the
grand scheme of Internet development, they can serve as a foundation for
more-sophisticated applications. 


The Windows Sockets API


Internet client applications such as Mosaic, archie, ftp, telnet, and finger,
are built on the TCP/IP suite of network protocols in conjunction with
higher-level protocols such as HTTP, FTP, SMTP, SNMP, NNTP, POP, and so on. On
UNIX, the low-level TCP/IP protocols are usually encapsulated by a library
implementing an API such as Berkeley sockets or TLI. Developing equivalent
Internet applications under Windows involves an API known as "Windows
Sockets." Windows Sockets has become the standard API for Internet development
under Windows and is now being ported to other platforms such as OS/2 and the
Macintosh. 
The Windows Sockets API, sometimes known as "WinSock," was defined in 1992 by
Martin Hall, Mark Towfiq, and several other engineers representing companies
such as JSB, Microdyne, Microsoft, FTP Software, and NetManage. Version 1.1 of
the spec was written in January 1993 and has been implemented by many vendors.
There is also a popular shareware implementation, known as "Trumpet Winsock,"
available at many ftp sites on the Internet. For additional background
information on Windows Sockets, refer to the article "Untangling the Windows
Sockets API," by Mike Calbaum, Frank Porcaro, Mark Ruegsegger, and Bruce
Backman (DDJ, February 1993). Likewise, the article, "Building an Internet
Global Phone," by Sing Li (Dr. Dobb's Sourcebook on the Information Highway,
Winter 1994), presents a discussion of Mark Clouden's freely available
WSNETWRK library, a Windows DLL that encapsulates and expands upon the
functionality of the WinSock API.
The design of Windows Sockets is based on Berkeley sockets, the networking
facility found in classic Berkeley UNIX. As with Berkeley sockets, the Windows
Sockets API is not too large or difficult to understand. One difference
between the two results from the fact that the design of Berkeley sockets
assumes a multithreaded operating system. Because of this, if your code uses
only the Berkeley-equivalent API calls, it may cause your Windows application
to block for unreasonable periods of time while waiting for specific network
requests to be fulfilled. 
To address this problem, the WinSock designers added a set of asynchronous
extensions which provide the additional functionality needed to build a
responsive network application under Windows. The extensions are all named
with a "WSA" prefix. Except for WSAStartup() and WSACleanup(), the services
are entirely optional. The WSAStartup() function initializes the socket
system, while WSACleanup() closes everything down. Other than these two
Windows-specific extensions, Windows Sockets behaves almost identically to
Berkeley sockets running on UNIX--except, of course, for the lack of true
multithreading. Even so, you can still easily port a lot of existing UNIX code
to Windows and convert necessary portions to the asynchronous model as needed.


A Thin Wrapper


My first goal in this article is to design a C++ class that provides a thin
wrapper around the Windows Sockets API. This class does not encapsulate the
entire API, because I think it is best to service only the areas you need for
a specific job. If you need added functionality, you can derive
more-specialized classes from the base API.
After presenting the socket class, I'll show how applications can use this
base-level functionality by defining a child class that provides access to
Internet e-mail using POP. POP provides a high-level API for storage and
forwarding of mail, and it is a good example of how to extend our basic socket
class.
The vanilla socket class provides initialization code and basic connection
features. Using this class, you can construct a socket, connect it to a known
host, and read and write data to that host. All interaction with the socket
must be managed by the developer. The base socket class, TAsyncSocket, is
shown in Listing One . Only excerpts from the source code accompany this
article. The complete listings are available electronically; see
"Availability," page 3. 
The public interface for TAsyncSocket is small. The class provides member
functions for establishing an asynchronous connection and for synchronous
sending and receiving. The synchronous methods are handy for very small pieces
of data that are normally cached by the WinSocket system and do not usually
cause blocking (although it is never guaranteed that the socket will not
block).
The other public function of note is LastError(), which returns the last error
encountered by the socket. It is simply an encapsulation of WSAGetLastError().
Basic time-out error handling is also provided. If the socket system blocks
for a specific amount of time, the blocking call is canceled, and the socket
is closed. More-robust error handling is a must for a complete set of tools,
but is beyond the scope of this discussion. 


Managing Socket Information


Managing the information between the socket owner and the socket can be
problematic. While the object-oriented nature of the socket makes it reusable,
it hides some of the details we might find useful. For instance, what happens
when an error is encountered or when the socket needs to read or write data? A
mechanism to deal with those situations is found in the TSocketCatch class,
which provides a communications path between the socket classes and the other
classes or the code that manages them. One parameter of the TAsyncSocket class
is a TSocketCatch pointer. TAsyncSocket uses this class to pass information
back to its manager. Listing Two presents the TSocketCatch class. It's pretty
simple, but it provides the basic mechanism for passing error information back
to higher-level code. Here again, you can add more-sophisticated functionality
in a derived class. The class for catching POP socket information is a good
example; see Listing Three . The TPOPSocketCatch class adds notification
methods for message retrieval and for the number of messages to be retrieved. 


The POP Socket


The POP protocol is text based, similar to other higher-level Internet
protocols such as SMTP. Our POP socket class provides support for retrieving
mail and hides most of the complexity of the POP protocol. The examples and
classes are based on protocol version POP3, specified in the Internet document
RFC1081. This document, along with other Internet specifications, is available
via ftp from ds.internic.net in the rfc directory. In this case, the file is
rfc1081.txt, and it is worth studying. 
The POP protocol steps through a series of states based on information the
client and server exchange. Commands are ASCII based and made up of a keyword,
possibly followed by an argument and terminated with a CR/LF pair. The success
or failure of a POP command is indicated with either the positive \ "+OK" or
the negative \ "--ERR". Additional relevant information may follow. As with
commands, all responses are terminated with a CR/LF pair. Some responses can
be multiline, but those cases are clearly indicated in RFC1081.
Listing Four is the POP socket class, derived from TAsyncSocket. We extended
the basic socket significantly. The key member function is RetrieveMail(),
which provides a single call to retrieve all mail for a specific user. To
achieve this, the socket goes through several states. During the life of a POP
connection, the POP server and client coexist in different states based on the
information passed between them. The states are as follows: Connection,
Authorization, Transaction, and Update.
Once you call RetrieveMail, a series of events is started that ends with
either mail being delivered, or a message of "No Mail" being returned. This,
of course, assumes that the host name, user name, and password provided are
all valid. If not, you'll get an error message. 


The Connect State


In a typical RetrieveMail mail session, the first state entered into is the
Connect state. As you can see from Listing Five , when RetrieveMail() is
called, it sets up string holders for the host, user, and password
information, and then calls Connect(). The Connect() function (see Listing
Six) calls the parent function TAsyncSocket::Connect() (see Listing Seven).
Notice we provide the POP port and protocol information, simplifying the
connect function at this level. 
It is TAsyncSocket::Connect() that actually creates the socket, via a call to
the Windows Sockets function, Socket(). If socket() returns a valid socket
identifier, Connect() starts the event chain rolling by calling
TAsyncSocket::AsyncGetHostByName().

The AsyncGetHostByName function is in Listing Eight . This function requests
an SM_HOSTNAME message to be sent when the name resolution has occurred. This
is key because the member function that handles that message is virtual. This
call gets us back into the TAsyncPOPSocket realm--specifically, its host name
handler; see Listing Nine.
Again, we call our parent to take advantage of its built-in functionality--in
this case, the parent function is TAsyncSocket::SmHostName(). If all goes
well, we post an FD_READ using WSAAsyncSelect() and wait on the final stage of
the connection state. The SmPopConnect() routine in Listing Ten verifies that
we have connected to a POP server by checking for the standard "+OK" POP
reply. If so, the authorization process is started, via a call to Authorize().


The Authorization State


Authorization consists of sending the user and password information to the POP
server. The process is started by a call to TAsyncPOPSocket::Authorize(),
whose implementation is in Listing Eleven . This calls WSAAsyncSelect, which
results in a data ready (FD_READ) message at SmPopAuthorize() in Listing
Twelve . The final stage of authorization is to pair the user name with a
valid password. This routine posts the password to the server and sets up
SmPopPassword() to check on the result. As shown in Listing Thirteen ,
SmPopPassword() checks for a POP_OK response; if one is received, we enter the
transaction state.
Thus far, the discussion has focused on getting the POP session to the
transaction state. For the sake of space, I will leave the rest of the code
browsing up to you. To thoroughly understand the sequence of events in this
protocol, I suggest you step through the code with a debugger. 
Once your code is in the Transaction state, your program can check on mail
stats, retrieve and delete mail, and possibly invoke some extended commands,
based on your POP server's support for these extensions. The listings show the
source for a complete mail-retrieval session.
After the Transaction state, the next state is Update. A "Quit" command posted
to the POP server moves you into the Update state. All transactions that have
taken place are committed and the POP server will accept new authorization
information if desired. If your application has finished processing user mail,
your code should close the socket at this point.
In summary, to retrieve mail, a mail client must:
1. Acquire POP mail host, user name, and password from the user.
2. Create, connect, and instigate the mail transaction based on gathered
information.
3. For each mail message to be retrieved, open a new window and display the
message to the user.
The source code gives complete details for each step. In implementing the mail
client, I used the Application Expert in the Borland IDE to generate a shell
MDI application. From there, I added the bits of socket code needed to
retrieve all mail for a specific user from a host and then show it in a simple
text window. You can easily extend this program to provide support for message
saving, printing, and other useful services.


Conclusion


The classes presented here only cover the specific areas needed to get a basic
socket up and running. By staying away from much of the clutter involved in
Windows Sockets, we can focus on the task of retrieving mail using the POP3
protocol.
You can extend these socket classes to support protocols other than POP. One
area you might investigate is a generic text socket that handles the
line-oriented interface to many Internet protocols. Another area that needs
attention is real error handling. The comments in the code point out where
error-handling code is needed. You can build exception handling into the code
or go about it in a more traditional way. In either case, I've included a
couple of classes in the socket.cpp file that you may find useful. 

Listing One 

class _OWLCLASS TAsyncSocket : public TWindow
{
public:
 TAsyncSocket(TWindow* pParent,TSocketCatch* pCatcher);
 ~TAsyncSocket();
 int Connect(const char* pHost, int nPort, int nProtocol);
 int Disconnect();
 int SyncSend(const char far* pData, int nLen);

 int SyncRecv(char far* pData, int nMaxLen);
 int SyncSendLine(const char* pData);
 int LastError();
protected:
 void EvTimer(uint uTimerId);
 void SetupWindow();
 void CleanupWindow();
 int AsyncGetHostByName(const char* pServerName);
 virtual LRESULT SmHostName(WPARAM,LPARAM);
 virtual LRESULT SmDataReady(WPARAM,LPARAM){return TRUE;};

 // data members
 char* pWorkBuffer; // work buffer
 HANDLE hAsync; // handle used for async Winsocket calls
 TSocketCatch* pCatcher; // ptr to our info catcher
 BOOL bValid; // true if socket is initialized
 BOOL bConnected; // true if socket is connected
 static BOOL bBlocking; // true if socket is blocking
 WSADATA wsaData; // Winsocket data
 HINSTANCE hSockInstance; // HINSTANCE for socket
 SOCKET sock; // socket identifier
 hostent* pHost; // ptr to host entry
 protoent* pProto; // ptr to proto entry
 struct hostent Host; // instance of hostent

 struct protoent Proto; // instance of protoent
 struct sockaddr_in inetAddr; // instance of sockeaddr_in strucutre
 struct sockaddr sa; // instance of sockaddr structure
 DECLARE_RESPONSE_TABLE(TAsyncSocket);
};



Listing Two

class _OWLCLASS TSocketCatch
{
public:
 virtual void WinSocketError(int nError, const char* pErrText){};
};



Listing Three

class _OWLCLASS TPOPSocketCatch : public TSocketCatch
{
public:
 virtual void MessageRetrieved(int nID, const char* pFilename){};
 virtual void MessageCount(int nCount){};
};



Listing Four

class _OWLCLASS TAsyncPOPSocket : public TAsyncSocket
{

public:
 enum {NONE,RETRIEVE,SEND};
 TAsyncPOPSocket(TWindow* pParent = NULL, TPOPSocketCatch* pC = 0);
 ~TAsyncPOPSocket();
 Connect(const char* pHost);
 int Hangup();
 int Authorize(const char* pName, const char* pPassword);
 int RetrieveMail(const char* pHost, const char* pUser, 
 const char* pPassword);
 void RetrieveNext();
 long MessageTotal();
 int GetMessageCount();
 int RetrieveMessages();
 int Quit();
protected:
 int MailResponse();
 virtual LRESULT SmHostName(WPARAM,LPARAM);
 LRESULT SmPopConnect(WPARAM,LPARAM);
 LRESULT SmPopAuthorize(WPARAM,LPARAM);
 LRESULT SmPopPassword(WPARAM,LPARAM);
 LRESULT SmPopMessageCount(WPARAM,LPARAM);
 LRESULT SmPopGetMessage(WPARAM,LPARAM);
 LRESULT SmPopStartMessage(WPARAM,LPARAM);
 
 // data members

 string sServer;
 string sUser;
 string sPassword;
 TPOPSocketCatch* pMyCatcher;
 bool bAuthorized;
 string* pWorkString;
 long lMessageTotal;
 long lMessageCount;
 int nMessagesPending;
 char* pCurrentMessage;
 ofstream out;

 DECLARE_RESPONSE_TABLE(TAsyncPOPSocket);
};



Listing Five

int TAsyncPOPSocket::RetrieveMail(const char* pHost, const char*
pUser, const char* pPassword)
{
 sServer = pHost;
 sUser = pUser;
 sPassword = pPassword;
 return Connect(sServer.c_str());
}



Listing Six

int TAsyncPOPSocket::Connect(const char* pHost)
{
 return TAsyncSocket::Connect(pHost,POP_PORT,PF_INET);
}



Listing Seven

int TAsyncSocket::Connect(const char* pHost,int nPort,int nProtocol = PF_INET)
{
 memset(&inetAddr,0,sizeof(inetAddr)); // cleanup structure
 inetAddr.sin_family = nProtocol; // set protocol(almost always PF_INET)
 inetAddr.sin_port = htons(nPort); // setup port
 sock = socket(nProtocol,SOCK_STREAM,0); // create the socket, 
 // passed protocol, STREAM,
 // no protocol specification (0)
 if (sock != INVALID_SOCKET) // socket was created ok
 {
 return AsyncGetHostByName(pHost); // acquire the host
 // asynchronously
 }
 else
 {
 return LastError(); // return WSAError
 }
}




Listing Eight

int TAsyncSocket::AsyncGetHostByName(const char* pHost)
{

 hAsync = WSAAsyncGetHostByName( HWindow, SM_HOSTNAME, pHost,
 pWorkBuffer, WORKBUFF_SIZE);
 if (hAsync == 0) // if we did not receive a handle
 return LastError(); // return socket error
 else // otherwise
 return 0; // return success
}


Listing Nine
LRESULT TAsyncPOPSocket::SmHostName(WPARAM wp, LPARAM lp)
{
 TAsyncSocket::SmHostName(wp,lp); // call parent 
 if (bConnected) // normal connection was ok
 { // so start our series of commands

 WSAAsyncSelect(sock,HWindow,SM_POPCONNECT,FD_READ);
 } //
 else // connect did not happen
 { // if we have a catcher,
 if (pCatcher) // then let them know
 pCatcher->WinSocketError(LastError(),"");
 } 
 return TRUE; 
}



Listing Ten

LRESULT TAsyncPOPSocket::SmPopConnect(WPARAM /* wp */, LPARAM lp)
{
 WSAAsyncSelect(sock,HWindow,0,0); // turn off async msg's
 if (HIWORD(lp) == 0 && LOWORD(lp) == FD_READ) // check for error
 { 
 if (MailResponse() == POP_OK) // did we get an OK?
 { // yes, authorize the account
 Authorize(sUser.c_str(),sPassword.c_str());
 }
 else 
 return TRUE; // ERROR HANDLING needed here
 }
 return TRUE; 
}



Listing Eleven

int TAsyncPOPSocket::Authorize(const char* pName, const char* pPassword)
{

 if (!bConnected) // error catch for no connection
 {
 // ERROR HANDLER needed here.
 } 
 string s = POP_USER;
 s += pName;
 SyncSendLine(s.c_str());
 WSAAsyncSelect(sock,HWindow,SM_POPAUTHORIZE,FD_READ);
 return 0;
}


Listing Twelve

LRESULT TAsyncPOPSocket::SmPopAuthorize(WPARAM /* wp */, LPARAM lp)
{
 WSAAsyncSelect(sock,HWindow,0,0); // turn off async msgs
 if (HIWORD(lp) == 0 && LOWORD(lp) == FD_READ) // check for error
 { //
 if (MailResponse() == POP_OK) // did we get POP_OK
 { // yes

 string s = POP_PASS; // format password
 s += sPassword; // string
 SyncSendLine(s.c_str()); // and send it
 // response goes to SmPopPassword
 WSAAsyncSelect(sock,HWindow,SM_POPPASSWORD,FD_READ); 
 return TRUE; //
 } 
 // else needs ERROR HANDLER
 } // 
 return TRUE; // ERROR HANDLER 
}



Listing Thirteen

LRESULT TAsyncPOPSocket::SmPopPassword(WPARAM /* wp */, LPARAM lp)
{
 WSAAsyncSelect(sock,HWindow,0,0); // turn off async msgs
 if (HIWORD(lp) == 0 && LOWORD(lp) == FD_READ) // check for error
 { //
 if (MailResponse() == POP_OK) // check for POP_OK
 { //
 bAuthorized = TRUE; // got it, set flag
 GetMessageCount(); // start message retrieval
 return TRUE; //
 } //
 else //
 return TRUE; // ERROR HANDLER
 } //
 return TRUE; // ERROR HANDLER 
}









Borland C++ 4.5 and OLE 2.0 Programming


Upgrading a familiar toolset




Ted Faison


Ted, who has written several books and articles on C++ and Windows, is
president of Faison Computing, a firm that develops C++ applications, class
libraries, and custom controls for Windows. Ted can be reached on CompuServe
at 76350,1013.


In the C++ wars, Watcom, Symantec, Borland, Microsoft, and others have been
battling for the hearts, minds, and pocketbooks of programmers. Since the
release of Microsoft's Visual C++, the primary weapon in this battle has been
the addition of code-generation tools and increased functionality within
application frameworks. Borland C++ 4.0, for example, featured a revamped
ObjectWindows Application Framework (OWL) and eliminated nonstandard C++
notation to support Windows event-handling. However, Visual C++ 1.5 included
features such as database and OLE support that Borland C++ wouldn't have for
some time. With the introduction of Borland C++ 4.5 (BC++ 4.5), Borland is
back on the front lines, particularly when it comes to creating OLE-enabled
applications. In fact, BC++ 4.5 allows you to create OLE servers and
containers with the same ease as non-OLE applications. For developers writing
database applications in 4.5, Borland is providing an add-on database package.
In this article, however, I'll focus on the OLE additions to BC++ 4.5, along
with other interesting changes to the toolset. In doing so, I'll draw
occasional parallels to Microsoft's recently released Visual C++ 2.0 and
provide a comparison of OWL and the Microsoft Foundation Class (MFC) library.
(For details on VC++ 2.0, see "Building an OLE Server Using Visual C++ 2.0,"
by John LaPlante, DDJ, February 1995.) Finally, I'll show how easy it is to
develop a simple OLE server.


BC++ 4.5 Arsenal


Borland C++ 4.5 requires about 100 Mbytes of disk space and is delivered on
CD-ROM (or 28 3.5-inch diskettes). It includes everything you need to create
fully polished Windows applications--an integrated development environment,
profilers, several text-editor environments (Borland Classic, Brief, or
Epsilon), and the Resource Workshop resource editor. Resource Workshop allows
you to work with standard Windows controls, enhanced Borland controls, 16- and
32-bit VBX controls, and third-party custom-control DLLs. (In comparison,
Microsoft's AppStudio supports only standard Windows controls and 16-bit VBX
controls.) Although Resource Workshop can be invoked from the integrated
environment, it runs as a separate program. (In contrast, the version of
AppStudio that ships with VC++ 2.0 runs in-place. When you run AppStudio, you
don't leave the Visual Workbench environment at all.)
Borland also provides several debugging tools: an integrated debugger that
lets you debug programs without leaving the development environment; the
stand-alone Turbo Debugger; and a remote debugger requiring a separate
computer connected by a serial link. Another postmortem utility, Winspector,
allows you to track down problems when a general-protection fault (GPF)
occurs. Winspector captures the information that leads to the fault and
creates a report indicating the modules and functions involved. Winspector
works with an application's symbol table, translating arcane addresses into
symbols.
New to BC++4.5 is the OpenHelp system, which lets you configure how the
various help files are searched. With over a dozen help files to deal
with--Windows, DOS, OWL, OLE, Resource Workshop, and so on--it's nice to be
able to configure the IDE help system to search only specific files. OpenHelp
is a help-system manager that uses named groups of help files called "search
ranges." The default search range, called "BC 4.5," includes all the help
files. You can create your own search ranges to use only selected help files.
For example, you might create a range called "OWL and OCF" that includes only
C++, OWL, and OCF help files. You can create as many search ranges as you
want. To run OpenHelp, you bring up the regular Help system by using the Help
menu or clicking Control-F1 on a selected word. From the Help toolbar, you
click SelectAll, and the OpenHelp dialog box appears. Perhaps the most useful
feature of OpenHelp is its ability to handle searches of regular expressions.
To find all the occurrences of words beginning with the string WM_GET, click
the Regular Expression option and specify the search string WM_GET.


The ObjectComponents Framework


For Windows programmers, the single most-useful component of BC++ 4.5 (apart
from the compiler) is OWL 2.5, with its support for OLE 2.0. Although it is
backward compatible with OWL 2.0, OWL 2.5 lets you create OLE servers,
containers, server/containers, and automated OLE programs. You can create
servers both as DLLs and as EXEs. DLL servers are handy when you want to
minimize the overhead of function calls between your application and a
stand-alone OLE server running as a separate EXE application. With a DLL
server, the OLE server runs in the same process space as your application,
obviating the need for function calls to cross process boundaries.
OWL 2.5 was extended to support OLE by enlisting the services of an entirely
new OLE class hierarchy, the "ObjectComponents Framework" (OCF). OCF
completely encapsulates low-level OLE interfaces and details, providing an
exterior C++ look-and-feel that lets you work at a much-higher level of
abstraction than the OLE API. Using OCF, Borland was able to incorporate full
OLE support into OWL 2.5 with relatively minimal additions to OWL itself. (In
comparison, MFC was OLE empowered by adding some 20,000 lines of code to MFC.)
OWL was extended by creating a few new classes, such as TOleDocument and
TOleWindow and by using code that manipulated a series of OCF objects. The
hierarchy of the OCF classes used by OWL is shown in Figure 1. (OCF has
several other classes used internally, but since OWL has no knowledge of them,
I won't discuss them.) OCF doesn't completely encapsulate OLE on its own.
Borland also provides BOCOLE (Borland ObjectComponents OLE), a DLL that wraps
some of the lower-level details of OLE. BOCOLE allows you to access all of the
OLE interfaces and functionality, but in addition defines a set of interfaces
of its own. The idea was to alleviate the so-called "impedance mismatch"
between applications objects and OLE objects. Applications deal with entities
like documents, views, and windows. OLE is concerned with class factories,
data objects, and memory allocators. Borland created interfaces in BOCOLE that
match some of the abstractions used in applications. Consequently, BOCOLE has
interfaces like IBApplication, IBDocument, and IBContainer. All the interfaces
are derived from the standard OLE interface IUnknown and begin with the
letters IB (Interface Borland). Figure 2 shows the entire class for the BOCOLE
interfaces.
When you create OLE servers and containers using the AppExpert facility, you
get OWL applications that use new OWL classes like TOleDocument and
TOleWindow. These classes have internal OCF objects to handle OLE. The OCF
classes make calls to BOCOLE, which in turn calls OLE interfaces. The good
news is that everything is transparent to you; you never need to get involved
with the OLE details unless you want to change one of the OLE mechanisms built
into OWL.


MFC versus OWL 


New C++ programmers frequently ask questions about application frameworks. The
bottom line is that the framework you prefer depends on the tasks your
application performs, your programming style, and how object oriented you are.
In short, the choice is subjective. I prefer OWL because of its excellent
container classes, great GDI encapsulation, multiple inheritance, and the
large number of classes it supports. On the other hand, MFC 3.0 supports a few
things OWL doesn't--ODBC database programming, OCX controls, mini-framed
windows, dockable windows, and tabbed dialog boxes. Those differences may be
important to you. Listings One through Thirty-one show a number of common
programming operations, coded in both MFC and OWL code. In each listing, "(a)"
refers to MFC code, while "(b)" refers to OWL.
From the outset, MFC was designed with C programmers in mind. It provides a
relatively thin encapsulation of underlying Windows API functions, often
foregoing C++ features that would have made life easier. Consider creating a
pen. Using Windows API code with C, you do something like Example 1(a).
Because pens are most often created with the PS_SOLID style, a width of "1,"
and the color "black," it would have made sense for MFC to use C++ default
arguments. Instead, a pen must be created as in Example 1(b). The OWL approach
shown in Example 1(c) exemplifies how default arguments allow you to use a
simpler notation. Declaring a TPen object creates not only the object, but
also a corresponding Windows pen. The pen is automatically initialized with
default arguments, which you can always change if necessary. 
When you use a pen to draw a line, Windows API code looks something like
Example 2(a). Note that the MFC code in Example 2(b) is almost identical. When
you are done using the pen, MFC requires you to deselect it from the device
context, before the pen is destroyed by going out of scope. By comparison, OWL
GDI objects are smart, because they track the TDC object they are selected
into and automatically deselect themselves before destroying the associated
Windows GDI object; see Example 2(c). If you have a lot of GDI code in your
application, using OWL will simplify things considerably, saving you from
tracking GDI memory leaks due to undeleted objects.


Creating an OLE Application


To show how to get an OLE app up and running, I've created a simple OLE server
which I'll test by embedding in it into a Word for Windows 6.0 document.
Developing an OLE server or container with BC++ 4.5 couldn't be easier.
AppExpert has been beefed up to include support for all OLE application types.
You can create servers, containers, server/containers, and automated
applications. The first step in creating any OLE app is always AppExpert.
Using the menu ProjectAppExpert, bring up the AppExpert Options dialog (see
Figure 3) from which you can create servers as either DLLs or EXEs. The latter
are more versatile, because they can be used from both 16- and 32-bit
applications. 16-bit server DLLs can only be used with 16-bit applications.
32-bit DLLs are restricted to 32-bit applications. By clicking the Generate
button, I created a new project named MYTEST, containing nearly 40 files.
Building a project with AppExpert is only the starting point; invariably, you
will need to add some code of your own to make a useful application. AppExpert
allows you to create applications with or without the Doc/View model. By
default, AppExpert will use the Doc/View model, whereby the data used in a
window is stored in a class derived from the OWL class TDocument. The data is
not displayed by the document object, but by a viewer object, derived from the
OWL class TView. A document manager is created by the application object to
handle the connection between a document and a view. The viewer is actually a
child window that occupies the client area of a parent window. In an MDI app,
the parent window is an MDI child window derived from class TMDIChild. In an
SDI app, the parent window is a regular window derived from TFrameWindow.
If you select one of the OLE options with AppExpert, your application will use
the Doc/View model regardless of whether you checked the Doc/View option or
not, because OLE requires an application to support independent entities for
its data and the displaying of the data.
I tested MYTEST by embedding it in a Word 6.0 document. Using Word's
InsertObject menu, I opened the OLE dialog box, selected the OLE object type
MYTEST, and inserted it into my document. Double-clicking the MYTEST object to
activate it in place, Word's menus are merged with those of MYTEST. The text
inside the embedded MYTEST object is painted by the default code created by
AppExpert. To change what MYTEST objects display, you need to add your own
code to the Paint function in the server's TOleView-derived class. 
The painting code created by AppExpert for the MYTEST example program is shown
in Listing Thirty-two . Note the statement dc.TextOut(0, 30, "mytest OLE
Server"); at the end of the listing. To change what MYTEST objects display,
replace this statement with whatever code you need. Typically, the data
contained in the view's document is accessed, using the TDocument pointer data
member Doc.


Final Comments


Although most developers are still shying away from OLE 2.0 support in their
programs, I think that will begin to change later this year, with the release
of Windows 95. Borland C++ 4.5 gives you all the support you need for OLE,
with one exception: OLE controls. The good news is that OLE controls aren't
used by any applications I know of yet. However, this will soon change, and
they will become essential with Windows 95. Presumably, Borland will update
OWL to support not only OLE controls, but all the new gadgets and widgets in
Windows 95.



For More Information


Borland C++ 4.5
Borland International
100 Borland Way
Scotts Valley, CA 95066-3249
408-431-1000
$499.95
Figure 1 The OCF classes used directly by OWL.
Figure 2 The complete BOCOLE hierarchy.
Example 1: (a) Creating a pen using C and the Windows API; (b) MFC does not
take advantage of default arguments; (c) OWL's use of default arguments allows
for simpler notation.
(a)
HPEN pen;
pen = CreatePen(PS_SOLID, 1,
 RGB(0, 0, 0) );

(b)
Cpen pen;
pen.Create(PS_SOLID, 1,
 RGB(0, 0, 0) );

(c)
 TPen pen;
Example 2: (a) Making Windows API calls from C to draw a line; (b) drawing a
line using MFC; (c) drawing a line using OWL.
(a)
void DrawLine(HDC dc)
{
 HPEN pen;
 pen=CreatePen(PS_SOLID, 1, RGB(0,0,0) );
 HPEN pOldPen = SelectObject(dc, pen);
 MoveTo(dc, 10, 10);
 LineTo(dc, 20, 30);
 SelectObject(dc, oldPen);
}

(b)
void CMyWnd::DrawLine(CDC& dc)
{
 CPen pen;
 pen.CreatePen(PS_SOLID, 1, RGB(0,0,0) );
 CPen* pOldPen = dc.SelectObject(&pen);
 dc.MoveTo(10, 10);
 dc.LineTo(20, 30);
 dc.SelectObject(pOldPen);
}

(c)
void TMyWnd::Line(TDC& dc)
{
 TPen pen;
 dc.SelectObject(pen);
 dc.MoveTo(0, 100);
 dc.LineTo(100, 20);
}
Figure 3 The AppExpert Options dialog box.

Listing One: Creating a window. 


(a) CMyWnd* myWnd = new CMyWnd;myWnd->Create(...);

(b) TMyWindow* w = new TMyWindow(...);



Listing Two: Creating an MDI frame window.

(a) class CMyWnd : public CMDIFrameWnd {...};class CMyApp : public CWinApp
{public:// ... virtual BOOL InitInstance() { CMyWnd* w = new CMyWnd; if
(!w)->LoadFrame(IDRES) ) return FALSE; w->ShowWindow(m_nCmdShow);
w->UpdateWindow(); m_pMainWnd = w; return TRUE; }};

(b) class TMyWnd : public TMDIFrame {...};class TMDIFileApp : public
TApplication {public: void InitMainWindow() { Frame = new TMDIFrame(..);
Frame->Attr.AccelTable = IDRES; Frame->SetMenuDescr(...); }};



Listing Three: Creating an SDI frame window.

(a) class CMyWnd : public CFrameWnd {...};class CMyApp : public CWinApp
{public:// ... virtual BOOL InitInstance(){ CMyWnd* w = new CMyWnd; if
(!w->LoadFrame(IDRES) ) return FALSE; w->ShowWindow(m_nCmdShow);
w->UpdateWindow(); m_pMainWnd = w; return TRUE; }};

(b) class TMyWnd : public TFrameWindow {...};class TSDIFileApp :
publicTApplication {public: void InitMainWindow() { Frame = new
TFrameWindow(...); Frame->Attr.AccelTable = IDRES; Frame->SetMenuDescr(...);
}};



Listing Four: Creating Doc/View templates.

(a) class CMyWnd : public CMDIChildWnd {..};class CMyApp : public CWinApp
{public:// ... virtual BOOL InitInstance() { AddDocTemplate(new
CMultiDocTemplate(IDRES, RUNTIME_CLASS(CMyDoc), RUNTIME_CLASS(CMyWnd),
RUNTIME_CLASS(CMyView))); CMyWnd* w = new CMyWnd; if (!w->LoadFrame(IDRES) )
return FALSE; w->ShowWindow(m_nCmdShow); w->UpdateWindow(); m_pMainWnd = w;
}};

(b) DEFINE_DOC_TEMPLATE_CLASS( TMyDocument, TMyView, MyTemplate);MyTemplate
btpl("My files", "*.txt", 0, "TXT", dtAutoDelete);class TMyApp : public
TApplication {public:// ...void InitMainWindow() { DocManager = new
TDocManager(dmSDI dmMenu); }};



Listing Five: Iterating over child windows.

(a) void CMyWnd::Iterate(){ for (CWnd* w = GetTopWindow(); w != NULL; w =
w->GetNextWindow()) { // use child window 'w }}

(b) static void f(TWindow* w, void*){...do something with 'w'}void
TMyWindow::g(){ ForEach(f); }


Listing Six: Locating a child window.

(a) CWnd* CMyWnd::FindChild(){ for (CWnd* w = GetTopWindow(); w != NULL; w =
w->GetNextWindow()) { // see if child window found if (w is the right window)
return w; } return 0;}

(b) static BOOL f(TWindow* win, void*){ if (win satisfies some condition)
return TRUE; else return FALSE;}void TMyWindow::g(){ TWindow* first =
FirstThat(f); TWindow* last = LastThat(f);}



Listing Seven: Finding active MDI child window.

(a) class CMyWnd: public CMDIFrameWnd {// ...public: void f() { CMDIChildWnd*
w = MDIGetActive(); if (!w) return; // use w ... }};

(b) class TMDIFileApp : public TApplication {public:// ... MDIClient*
Client;protected: void f() { TMDIChild* w = Client->GetActiveMDIChild(); if
(!w) return; // use w..}};



Listing Eight: Initializing controls in a dialog box.

(a) class CMyDlg : public CDialog {public:// ... //{{AFX_DATA(CMyDlg) int
m_Value1; int m_Value2; //}}AFX_DATAprotected: DECLARE_MESSAGE_MAP()};void
CMyDlg::DoDataExchange(CDataExchange* pDX){ CDialog::DoDataExchange(pDX);
DDX_Text(pDX, IDC_EDIT1, m_Value1); DDV_MinMaxInt(pDX, m_Value1, -10, 20);
DDX_Text(pDX, IDC_EDIT2, m_Value2); DDV_MinMaxInt(pDX, m_Value2, 0, 100);}

(b) struct { // transfer buffer // ...} Bufferclass TMyDlg : public TDialog
{public: // ... TMyDlg(...) { // .. create controls TransferBuffer = &Buffer;
}};




Listing Nine: Bitmapped buttons.

(a) ass CMyDlg : public CDialog {public: enum {IDD = IDD_BITMAPDLG};
CMyDlg();// ...protected: CBitmapButton button1;};CMyDlg::CMyDlg() :
CDialog(CMyDlg::IDD) { if (!button1.LoadBitmaps("Up","Down","Focus")) {
TRACE("Problem!"); AfxThrowResourceException(); }}

(b) no code necessary



Listing Ten: Creating a pen.

(a) CPen pen;pen.CreatePen(PS_SOLID, 1, RGB(0,0,0));

(b) TPen pen(TColor(0, 0, 0) );



Listing Eleven: Drawing a line.

(a) void CMyWnd::Line(CDC& dc){ CPen pen; pen.CreatePen(PS_SOLID, 1,
RGB(0,0,0) ); CPen* pOldPen = dc.SelectObject(&pen); dc.MoveTo(10, 10);
dc.LineTo(20, 30); dc.SelectObject(pOldPen);}

(b) void TMyWnd::Line(TDC& dc){ TPen pen(TColor(0, 0, 0) );
dc.SelectObject(pen); dc.MoveTo(0, 100); dc.LineTo(100, 20); dc.RestorePen();}



Listing Twelve: Painting with a brush.

(a) void CMyWnd::Box(CDC& dc){ CBrush brush(RGB(0, 0, 0) ); CBrush* pOldBrush
= pDC->SelectObject(&brush); dc.Rectangle(30, 30, 100, 100);
dc.SelectObject(pOldBrush);}

(b) void TMyWnd::Box(TDC& dc){ dc.SelectObject( TBrush(Color(0,0,0)));
dc.Rectangle(0,20,30,400); dc.RestoreBrush();}



Listing Thirteen: Creating fonts.

(a) void CMyWnd::Font(CDC& dc){ LOGFONT lf; memset(&lf, 0, sizeof(lf));
lf.lfHeight = 20; lf.lfWeight = FW_BOLD; strcpy(lf.lfFaceName, "Arial"); CFont
font; font.CreateFontIndirect(&lf));}

(b) void TMyWnd::Font(TDC& dc){ LOGFONT lf; memset(&lf, 0, sizeof(lf));
lf.lfHeight = 20; lf.lfWeight = FW_BOLD; strcpy(lf.lfFaceName, "Arial"); TFont
font(&lf);}



Listing Fourteen: Displaying bitmaps.

(a) void CMyWnd::DrawBM(CDC& dc){ CBitmap bm; bm.Create("MYBITMAP"); CBitmap*
pbmOld; CDC dcMem; dcMem.CreateCompatibleDC(&dc); pbmOld =
dcMem.SelectObject(&bm); dc.BitBlt(100, 100, 50, 50, &dcMem, 0, 0, SRCCOPY);
dcMem.SelectObject(pbmOld); dcMem.DeleteDC();}



(b) void TMyWnd::Draw(TDC& dc){ TBitmap* bm = new TBitmap(*GetModule(), "ID");
TMemoryDC memoryDC(dc); memoryDC.SelectObject(*Bitmap); TRect rect(0, 0, 40,
40); dc.BitBlt(rect, memoryDC, TPoint(0,0), SRCCOPY);}



Listing Fifteen: Creating an array.

(a) CByteArray myArray;

(b) TIArrayAsVector<int>myArray(5,0,5);;




Listing Sixteen: Copying an array.

(a) CByteArray myArray; // array to be copiedCByteArray copyArray; // array
copied into for (int i=0; i < myArray.GetSize(); i++) copyArray [i] = myArray
[i];
(b) TVectorImp<int> myArray;TVectorImp<int> copyArray;for (int i = 0; i <
myArray.Count(); i++) copyArray [i] = myArray [i];



Listing Seventeen: Adding elements to an array.

(a) CByteArray myA;BYTE value = 2;myA.Add(value);

(b) TIArrayAsVector<int> myA(5,0,5);int value = 5;myA.AddAt(&value, 5);



Listing Eighteen: Removing elements from an array.

(a) CByteArray myArray;myArray.RemoveAt(10);
(b) TIArrayAsVector<int> myArray(5,0,5);myArray.Detach(3);



Listing Nineteen: Searching an array for an item.

(a) CByteArray myArray;int FindItem(BYTE value){ for (int i=0; i <
myArray.GetSize(); i++) { if (myArray [i] == value) return i; } return -1;}

(b) TIArrayAsVector<int> myArray(5,0,5);int value = 5;int index =
myArray.Find(&value);



Listing Twenty: Deleting the items in an array.

(a) CStringArray myArray;void DeleteArray(){ for (int i=0; i <
myArray.GetSize(); i++) delete myArray [i]; myArray.RemoveAll();}

(b) TIArrayAsVector<int> myArray(5,0,5);myArray.Flush();



Listing Twenty-one: Creating a list.

(a) CStringList myList;

(b) TListImp<string> myList();



Listing Twenty-two: Copying a list.

(a) CStringList myList; // list to copyCStringList copyList; // list copied
tovoid CopyList(){ POSITION pos = myList.GetHeadPosition(); while (pos)
copyList.AddTail(myList.GetNext(pos) );}

(b) TListImp<string> myList;TListImp<string> copyList;static void
DoCopy(string& s, void*){copyList.Add(s);}void f(){ myList.ForEach(DoCopy, 0);
}


Listing Twenty-three: Adding items to a list.

(a) CStringList myList;myList.AddTail("Hello");myList.AddHead("Good-bye");


(b) TListImp<string> myList;string s("Test");myList.Add(s);



Listing Twenty-four: Removing items from a list.

(a) CStringList myList;void RemoveItem(CString& target){ POSITION pos =
myList.GetHeadPosition(); while (pos) { CString& str = myList.GetNext(pos); if
(str == target) myList.RemoveAt(pos); delete str; }}
(b) TListImp<string> myList;string s("Test");myList.Detach(s);



Listing Twenty-five: Searching a list for an item.

(a) CStringList myList;BOOL HasString(CString& target){ POSITION pos =
myList.GetHeadPosition(); while (pos) { CString& str = myList.GetNext(pos); if
(str == target) return TRUE; } return FALSE;}

(b) TListImp<string> myList;string s("Test");if (myList.Find(s) ) { // the
item was found...}


Listing Twenty-six: Deleting all the items in a list.

(a) CStringList myList;void DeleteList(){ POSITION pos =
myList.GetHeadPosition(); while (pos) delete myList.GetNext(pos);
myList.RemoveAll();}

(b) TListImp<string> myList;myList.Flush();



Listing Twenty-seven: Creating a dictionary.

(a) CMapStringToOb myMap;

(b) // create a hashable classclass HashString : public string {public:
HashString() : string() {} HashString(const char* s) : string(s) {} unsigned
HashValue() const { return hash(); }};void f(){ typedef
TDDAssociation<HashString, HashString> symbol;TDictionaryAsHashTable <symbol>
Dictionary;



Listing Twenty-eight: Copying a dictionary.

(a) CMapStringToOb myMap; // map to copyCMapStringToOb myCopy; // map copied
toPOSITION pos = myMap.GetStartPosition();while (pos) { CString string;
CObject* pObject; myMap.GetNextAssoc(pos, string, pObject);
copyMap.SetAt(string, pObject);}

(b) typedef TDDAssociation<HashString, HashString> symbol;typedef
TDictionaryAsHashTable <symbol> dictionary;dictionary myTable;dictionary
copyTable;static void DoCopy(symbol& s, void*){ copyTable.Add(s); }void f(){
myTable.ForEach(DoCopy, 0);}



Listing Twenty-nine: Adding items to a dictionary.

(a) CMapStringToOb myMap;CString string;CObject* pObject;myMap.SetAt(string,
pObject);

(b) symbol s(HashString("K"), HashString("V"));myTable.Add(s);



Listing Thirty: Removing items from a dictionary.

(a) CMapStringToOb myMap;void RemoveItem(CString& str){ CObject* pObject; if
(!myMap.Lookup(str,&pObject)) return; myMap.RemoveKey(str); delete str; delete
*pObject;}

(b) symbol s(HashString("K"), HashString("V") );myTable.Detach(s);



Listing Thirty-one: Searching a dictionary for an item.


(a) CMapStringToOb myMap;BOOL HasItem(CString& str){ CObject* pObj; return
myMap.Lookup(str,&pObj) ;}

(b) symbol s(HashString("K"), HashString("V") );symbol* r = myTable.Find(s);if
(r) { // found...}


Listing Thirty-two:

// Paint routine for Window, Printer, and PrintPreview for a TOleView client.
void mytestOleView::Paint (TDC& dc, bool erase, TRect& rect)
{
 mytestApp *theApp = TYPESAFE_DOWNCAST(GetApplication(), mytestApp);
 if (theApp) {
 // Only paint if we're printing and we have something 
 // to paint, otherwise do nothing.
 if (theApp->Printing && 
 theApp->Printer && 
 !rect.IsEmpty()) {
 // Use pageSize to get the size of the window to render into.
 // For a Window it's the client area; for a printer, it's the
 // printer DC dimensions; for print preview, it's layout window.
 TSize pageSize(rect.right - rect.left, rect.bottom - rect.top);
 TPrintDialog::TData &printerData = theApp->Printer->GetSetup();
 // Compute the number of pages to print.
 printerData.MinPage = 1;
 printerData.MaxPage = 1;
 TOcView* ocView = GetOcView();
 // Default TOcPart painting
 TRect clientRect = GetClientRect();
 TRect logicalRect = clientRect + (Tsize&) ocView->GetOrigin();
 for (TOcPartCollectionIter i(GetOcDoc()->GetParts()); 
 i; 
 i++) {
 TOcPart& p = *i.Current();
 if (p.IsVisible(logicalRect)) {
 TRect r = p.GetRect();
 r -= ocView->GetOrigin();
 // Draw the embedded object
 p.Draw(dc, r, clientRect);
 if (p.IsSelected()) {
 TUIHandle handle(r, TUIHandle::HandlesIn 
 TUIHandle::Grapples 
 TUIHandle::HatchBorder, 5);
 handle.Paint(dc);
 } 
 else {
 TUIHandle handle(r, TUIHandle::HatchBorder, 5);
 handle.Paint(dc);
 }
 }
 }
 // INSERT>> Special printing code goes here.
 } else {
 TOleView::Paint(dc, erase, rect);
 // INSERT>> Normal painting code goes here.
 }
 dc.TextOut(0, 30, "mytest OLE Server");
 }
}
































































Serialization and MFC


Extending MFC for cross-platform portability 




Chane Cullens


Chane is the product manager for Wind/U at Bristol Technology and can be
contacted at chane@bristol.com.


Serialization is the process of writing or reading one or more objects to or
from a persistent-storage medium, such as a disk file, which is generally
referred to as the "application database." The Microsoft Foundation Class
Library (MFC) supplies built-in support for serialization in the class
CObject. Thus, all classes derived from CObject can take advantage of
CObject's serialization protocol. 
The basic idea of serialization is that an object should be able to write its
current state, usually indicated by the value of its member variables, to
persistent storage. Later, the object can be re-created by reading (or
"deserializing") the object's state from the storage. Serialization handles
all the details of object pointers and circular references to objects that are
used when you serialize an object. A key point is that the object itself is
responsible for reading and writing its own state; thus, the object is
responsible for implementing most of the cross-platform portability.
MFC uses an object of the CArchive class as an intermediary between the object
to be serialized and the storage medium. This object is always associated with
a CFile object, from which it obtains the necessary information for
serialization, including the filename and whether or not the requested
operation is a read or write. The object that performs a serialization
operation can use the CArchive object without regard to the nature of the
storage medium. A CArchive object uses overloaded insertion (<<) and
extraction (>>) operators to perform writing and reading operations.
In this article, I'll examine how MFC's serialization mechanism allows objects
to persist between runs of the program in a cross-platform environment. While
the MFC serialization mechanism was not specifically designed to support
cross-platform development, it has been extended by Microsoft and other
WinAPI/MFC suppliers (including Wind/U on UNIX from Bristol Technology, the
company I work for) to support a wide variety of non-Intel platforms.


Cross-Platform Issues


The big problem in designing a cross-platform application database is coping
with the byte-ordering differences (Little-versus Big-endian) of various
operating systems and CPUs. Since some newer CPUs allow byte ordering to be
specified by the operating system, you should never assume a byte ordering
based only on the CPU type. For example, when running NT on the MIPS CPU, byte
ordering matches Intel x86 style; when running UNIX, byte ordering is swapped.
Table 1 lists the processors on which Win32 is available, along with their
byte ordering, and other CPU/OScombinations.
Little-endian byte order places the least-significant byte (LSB) first and the
most-significant byte (MSB) last. In Figure 1, which shows the bit and byte
layout for four bytes of data, a pointer to a 4-byte integer contains the
address of the LSB of that integer (bits 0--7). Adding 1 to the pointer value
causes it to point to the next-higher byte of the value (bits 8--15), and so
forth. 
Big-endian byte ordering places the MSB first, followed by the next MSB, and
so on, with the LSB last. In Figure 2, which shows this Big-endian layout, a
pointer to an integer value contains the address of the integer's MSB. The
individual ordering of bits within each byte does not differ between
processors. Therefore, writing a single byte of data is the same on each
machine, but if you write a larger object (such as a 4-byte integer) the bytes
will be swapped.
There are several ways to address the byte-ordering issue. One is to write
ASCII data to tagged streams, where every item is marked with type, size, and
byte-ordering information. Another option is to convert the binary data to an
architecture-neutral format in the application database. Then, convert the
data to an architecture-specific format in the application. In an MFC
application, a variation of this architecture-neutral format can be used where
the neutral format is Little-endian byte ordering. This way, no additional
code is needed for Windows NT or Windows 95 applications, but UNIX or
Macintosh applications rely on MFC to byte-swap data that is being read or
written.


MFC Objects and Serialization


MFC supports serialization of CObject-derived objects, MFC objects not derived
from CObject, and most native C++ types. Table 2 lists the MFC objects that
support serialization.
On the Macintosh and UNIX RISC systems, MFC archives are byte-swapped by
default. Byte-swapped archives are always kept in Little-endian order.
Byte-swapping is performed whenever a WORD, DWORD, LONG, float, or double is
read from or written to an archive--unless the Read or Write member functions
are used. Read and Write never byte-swap. To prevent archives from being
byte-swapped, use the bNoByteSwap mode flag when creating the archive; for
instance, CArchive ar(pfile, CArchive::store CArchive::bNoByteSwap);. Using
this statement on UNIX or the Macintosh creates an archive that stores data in
the Big-endian byte order, rather than swapping bytes and storing them in the
Little-endian order.


Fundamental Types


On Little-endian architectures, the serialization functions for the
fundamental types are implemented as inline functions in afx.inl; see Listing
One . Thus, on Little-endian architectures, the bytes for the fundamental
types are efficiently inserted or extracted into the archive buffer. Since
BYTE data is not impacted by the architecture, it is always inlined. 
For Big-endian architectures, most serialization functions are implemented in
arccore.cpp, and the bytes must be swapped both before the data is inserted
into the archive buffer and after the data is extracted from it. To improve
efficiency and readability, static arrays are defined at the top of
arccore.cpp to serve as temporary storage for the swapped bytes; see Listing
Two . 
The 16-bit WORD type is byte-swapped using the functions in Listing Three . In
both functions, the statement *(WORD*)&wAfx=w; fills the static structure with
the unswapped bytes. Then the bytes are swapped one at a time using the
WordBits array within the static structure. 
The 32-bit LONG type is byte-swapped using the functions in Listing Four , in
which the DWORD CArchive operator is called, since the DWORD is also a 32-bit
integer. 
The 32-bit DWORD type is byte-swapped using the two functions in Listing Five
. As with the WORD implementations, the statement *(DWORD*)&dwAfx=dw; is used
to fill the static structure with the four unswapped bytes. Then the bytes are
swapped one at a time using the DwordBits array within the static structure. 
The 32-bit float type is byte-swapped using the two functions in Listing Six .
Like the DWORD implementations, the statement *(float*)&fAfx=f; fills the
static structure with the four unswapped bytes. Then the bytes are swapped one
at a time using the FloatBits array within the static structure. 
The 64-bit double type is byte-swapped using the two functions in Listing
Seven . Again, the statement *(double*)&dAfx=d; fills the static structure
with the four unswapped bytes. Then the bytes are swapped one at a time using
the DoubleBits array within the static structure. This code allows you to
create cross-platform apps with the major fundamental data types: WORD, LONG,
DWORD, float, and double. 


MFC Simple Value Types


Along with the fundamental data types, most framework applications use
MFC-supplied simple value types; see Table 3. To archive these simple value
types, the framework builds on the byte-swapped fundamental types already
discussed. This layering on top of the fundamental types is the key to
serializing your own objects in a cross-platform application database.
The CPoint class is derived from the POINT structure, and the archive
operators are implemented for the Windows C POINT struct. Since CPoint objects
are derived from POINT structures, CPoint objects are archived using the POINT
operators. Listing Eight contains the portable POINT input and output
operators. 
The portable version of these operators simply serializes two 4-byte LONG
types (x and y) instead of directly serializing eight bytes of untyped data
using the CArchive Read and Write members. The implementations of RECT and
SIZE are similar to POINT in Listing Nine . The portable version for RECT
structures simply serializes four 4-byte LONG types; the nonportable version
serializes 16 bytes of untyped data. Similarly, the portable version of SIZE
serializes two 4-byte LONG types, and the nonportable version serializes eight
bytes of untyped data. 
At first glance, CString objects seem inherently portable, but the object
contains a length that needs to be byte-swapped, and serialization must deal
with the interoperability of ANSI and Unicode strings. (See arccore.cpp on the
MFC source distribution for the nifty serialization implementation of
CString.)
The header in a serialized CString contains information on both the length and
type of the string, ANSI or Unicode. This information is encoded in the first
few bytes of the archive as in Table 4. Unicode strings contain a leading
0xFF, 0xFFFe before the length information. (Unicode generally should not be
used in cross-platform applications since it is not supported all on
architectures.)
The CTime and CTimeSpan objects contain the time component time_t type, which
is defined as a long. Thus, serializing these time objects is as easy as
casting the time component to a portable fundamental data type, DWORDs.

When serializing these MFC simple value types, your application database
remains cross-platform compatible. Any required byte-swapping is handled by
the framework, hidden from the application developer.


MFC Collection Classes


Collection classes (including serialization), are used throughout MFC and by
most developers using MFC. MFC has a wide variety of collection
classes--arrays, lists, and maps--in both template-based (MFC 3.0) and
non-template-based implementations. The cross-platform serialization
portability of these classes varies. Table 5 enumerates the template classes,
noting if they should be used for architecture-independent application
databases. Note that all MFC 3.0 collection classes typecast the integer
number of elements in the collection m_nSize to a 16-bit unsigned short (WORD)
when archiving. Therefore, serialization of collections with more than 64K
items will fail.


Template Collection Classes


Template-based collection classes introduced with MFC 3.0 can hold a variety
of objects in arrays, lists, and maps. These collection classes are templates
whose parameters determine the types of the objects stored in the aggregates.
To make these classes serializable, the Serialize and SerializeElements helper
functions have been incorporated into the template implementations. Table 6
lists the MFC template collection classes.
The CArray, CList, and CMap classes call SerializeElements to store collection
elements to or read them from an archive. The default implementation of the
SerializeElements helper function is not portable and does either a bitwise
write from the objects to the archive or a bitwise read from the archive to
the objects, depending on whether the objects are being stored in or retrieved
from the archive. For a cross-platform application database, you must override
SerializeElements. Listing Ten is the default, nonportable SerializeElements
function from afxtempl.h. 
When serializing CObject-derived objects that include IMPLEMENT_SERIAL, an
implementation modeled after Listing Eleven will keep your database portable.
Instead of a single untyped ar.Read or ar.Write, each object in Listing Eleven
is serialized using its own serialization ability. In this case, the
overridden input or output operator is used to create a portable database for
each object in the array. While Listing Eleven makes sense for CArray template
collections where pKids is the head of the array, it is not obvious that this
could work for CList or CMap collections where there is not an array of
elements. Yet, work it does! The implementations of CArray, CList, and CMap
Serialize are good examples of portable Serialize functions. For a closer look
on how SerializeElements can work for CList and CMap and demonstrate portable
Serialize functions, see Listing Twelve and afxtempl.cpp in the MFC source
directory.
The three basic steps in CArray::Serialize are: 
Serialize the parent class, CObject, for CArray.
Serialize local member data, the array size m_nSize.
Serialize the element data using SerializeElements.
While SerializeElements is called a different number of times for each
collection type (CArray=once, CList=number of items, CMap=2xthe number of
items), the purpose is the same--to let you control what is archived. Thus, a
cross-platform database can be created using the MFC template-based
collections.


Modifying Scribble for Cross-Platform Serialization


Scribble is a sample application found in the samples/mfc/scribble/step6
directory on the MFC 3.0 distribution. Since Scribble uses the MFC
serialization architecture, you can modify it for portability. The main change
required is to override the SerializeElements helper function for the
CArray<CPoint, CPoint> template array; see Listing Thirteen . Since the
standard Scribble application does not define a SerializeElements for a CArray
of CPoints, the default SerializeElements is used. It simply saves an untyped,
nonportable byte stream. The locations in the CPoint structures need to be
byte-swapped. Creating the SerializeElements function to serialize the CPoint
array point-by-point allows for byte-swapping. While this function has no
direct dependencies to Scribble, it can be added to the bottom of
scripdoc.cpp.
With the addition of a single function, Scribble's application database has
become a cross-platform application database. Now, scribbles created on
Windows 95, Windows NT, Macintosh, and UNIX share a common database
format--one that the Windows versions have already been writing. Furthermore,
this format requires no changes in the MFC DLL shipped with Visual C++ on
Windows.


Summary


Creating a cross-platform application database has always been a tedious task
full of trade-offs. However, using the MFC serialization functions and
creating a compact binary database with Little-endian byte ordering has never
been easier. Not only can you create a portable database with MFC
serialization on Windows with few changes to your source code, but you can
also use that same portable database for UNIX and Macintosh applications.
Figure 1: Little-endian byte ordering for four bytes of data.
bits: [ 7 6 5 4 3 2 1 0 ] Byte 0 < Address
bits: [ 15 14 13 12 11 10 9 8 ] Byte 1
bits: [ 23 22 21 20 19 18 17 16 ] Byte 2
bits: [ 31 30 29 28 27 26 25 24 ] Byte 3
Figure 2: Big-endian byte ordering for four bytes of data.
bits: [ 31 30 29 28 27 26 25 24 ] Byte 0 < Address
bits: [ 23 22 21 20 19 18 17 16 ] Byte 1
bits: [ 15 14 13 12 11 10 9 8 ] Byte 2
bits: [ 7 6 5 4 3 2 1 0 ] Byte 3
Table 1: Byte-ordering for different CPU/OS combinations.
Processor OS Order 
Alpha All Little-endian
HP-PA NT Little-endian
HP-PA UNIX Big-endian
Intel x86 All Little-endian
Motorola 680x0 All Big-endian
MIPS NT Little-endian
MIPS UNIX Big-endian
PowerPC NT Little-endian
PowerPC non-NT Big-endian
RS/6000 UNIX Big-endian
SPARC UNIX Big-endian
Table 2: MFC Objects that support serialization. (The initial version of MFC
3.0 for the Macintosh does not support byte-swapping in the CPoint, CRect, or
CSize serialization functions.) 
 Object Win32 Macintosh UNIX 
BYTE (unsigned char) Yes Yes Yes

double Yes Yes Yes
DWORD (unsigned long) Yes Yes Yes
float Yes Yes Yes
LONG (long) Yes Yes Yes
WORD (unsigned short) Yes Yes Yes
int No No No
CObject Yes Yes Yes
CPoint Yes Partial Yes
CRect Yes Partial Yes
CSize Yes Partial Yes
CString Yes Yes Yes
CTime Yes Yes Yes
CTimeSpan Yes Yes Yes
Table 3: MFC simple value types. 
Function Description 
CPoint Encapsulates the Window POINT structure which
 contains x- and y-coordinates.
CRect Encapsulates the Window RECT structure which
 contains left, top, right, and bottom coordinates.
CSize Encapsulates the Window SIZE structure which
 contains a relative coordinate or position.
CString A variable-length sequence of characters. The
 characters are of type TCHAR, which is a 16-bit
 character for Unicode and an 8-bit character
 for normal ASCII applications.
CTime Represents an absolute time and date.
CTimeSpan Represents a relative time span of approximately
 68 years.
Table 4: CString header information.
String length Encoding 
Less than 255 Byte 1 contains string length.
Less than 65535 Byte 1=0xFF; bytes 2 and 3 are a WORD
 containing the string length.
Greater than 65534 Byte 1=0xFF; bytes 2 and 3=0xFFFF;
 bytes 4, 5, 6, and 7 are a DWORD
 containing the string length.
Table 5: Template classes.
Type Portable Reason 
CByteArray Yes Reads/writes byte-array bits. Since
 the data is bytes, it is not
 impacted by byte-swapping.
CDWordArray Partial Reads/writes DWORD array bits. Since
 DWORD storage is architecture
 dependent, data must be
 byte-swapped.*
CObArray Yes Loops through all objects in the
 array serialization, each
 object in turn.
CPtrArray NA Does not support serialization.
CStringArray Yes Loops through all CString objects
 in the array serialization,
 each object in turn.
CDWordArray Partial Reads/writes WORD array bits. Since
 WORD storage is architecture
 dependent, data must be
 byte-swapped.
CUIntArray NA Does not support serialization.
CObList Yes Loops over all CObject pointers
 in the list serialization,

 each object in turn. The
 CObject-derived object is
 responsible for portability.
CStringList Yes Loops through all CString objects
 in the arrayserialization,
 each object in turn.
CPtrList NA Does not support serialization.
CMapPtrToPtr NA Does not support serialization.
CMapPtrToWord NA Does not support serialization.
CMapStringToOb Yes Loops over all elements in the
 map, serializing each key and
 value individually. The CObject
 value object relies on object's
 serialization to the portable.
CMapStringToPtr NA Does not support serialization.
CMapStringToString Yes Loops over all elements in the
 map, serializing each key and
 value CString individually.
CMapWordToOb Yes Loops over all elements in the
 map, serializing each key and
 value CString individually.
CMapWordToPtr NA Does not support serialization.
* Check latest production documentation. The initial release of MFC 3.0 on the
Macintosh did not byte-swap the array, but MFC 3 on UNIX did.
Table 6: MFC template collection classes.
Class Description 
CArray Stores elements in an array.
CMap Maps keys to values.
CList Stores elements in a linked list.
CTypedPtrList Typesafe collection that stores pointers to
 objects in a linked list.
CTypedPtrArray Typesafe collection that stores pointers to
 objects in an array.
CTypedPtrMap Typesafe collection that maps keys to values;
 both keys and values are pointers.

Listing One 

_AFX_INLINE CArchive& CArchive::operator<<(BYTE by)
 { if (m_lpBufCur + sizeof(BYTE) > m_lpBufMax) Flush();
 *(UNALIGNED BYTE*)m_lpBufCur = by; m_lpBufCur += sizeof(BYTE); return *this;}
#ifndef _MAC _WU_BIG_ENDIAN
_AFX_INLINE CArchive& CArchive::operator<<(WORD w)
 { if (m_lpBufCur + sizeof(WORD) > m_lpBufMax) Flush();
 *(UNALIGNED WORD*)m_lpBufCur = w; m_lpBufCur += sizeof(WORD); return *this; }
_AFX_INLINE CArchive& CArchive::operator<<(LONG l)
 { if (m_lpBufCur + sizeof(LONG) > m_lpBufMax) Flush();
 *(UNALIGNED LONG*)m_lpBufCur = l; m_lpBufCur += sizeof(LONG); return *this; }
_AFX_INLINE CArchive& CArchive::operator<<(DWORD dw)
 { if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax) Flush();
 *(UNALIGNED DWORD*)m_lpBufCur=dw;m_lpBufCur += sizeof(DWORD); return *this;}
_AFX_INLINE CArchive& CArchive::operator<<(float f)
 { if (m_lpBufCur + sizeof(float) > m_lpBufMax) Flush();
 *(UNALIGNED _AFX_FLOAT*)m_lpBufCur = *(_AFX_FLOAT*)&f; 
 m_lpBufCur += sizeof(float); return *this;
 }
_AFX_INLINE CArchive& CArchive::operator<<(double d)
 { if (m_lpBufCur + sizeof(double) > m_lpBufMax) Flush();
 *(UNALIGNED _AFX_DOUBLE*)m_lpBufCur = *(_AFX_DOUBLE*)&d; 
 m_lpBufCur += sizeof(double); return *this; }

#endif
_AFX_INLINE CArchive& CArchive::operator>>(BYTE& by)
 { if (m_lpBufCur + sizeof(BYTE) > m_lpBufMax)
 FillBuffer(sizeof(BYTE) - (UINT)(m_lpBufMax - m_lpBufCur));
 by = *(UNALIGNED BYTE*)m_lpBufCur; m_lpBufCur += sizeof(BYTE); return *this;
}
#ifndef _MAC WU_BIG_ENDIAN
_AFX_INLINE CArchive& CArchive::operator>>(WORD& w)
 { if (m_lpBufCur + sizeof(WORD) > m_lpBufMax)
 FillBuffer(sizeof(WORD) - (UINT)(m_lpBufMax - m_lpBufCur));
 w = *(UNALIGNED WORD*)m_lpBufCur; m_lpBufCur += sizeof(WORD); return *this; }
_AFX_INLINE CArchive& CArchive::operator>>(DWORD& dw)
 { if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax)
 FillBuffer(sizeof(DWORD) - (UINT)(m_lpBufMax - m_lpBufCur));
 dw = *(UNALIGNED DWORD*)m_lpBufCur;m_lpBufCur += sizeof(DWORD);return *this;}
_AFX_INLINE CArchive& CArchive::operator>>(float& f)
 { if (m_lpBufCur + sizeof(float) > m_lpBufMax)
 FillBuffer(sizeof(float) - (UINT)(m_lpBufMax - m_lpBufCur));
 *(_AFX_FLOAT*)&f = *(UNALIGNED _AFX_FLOAT*)m_lpBufCur; 
 m_lpBufCur += sizeof(float); return *this; }
_AFX_INLINE CArchive& CArchive::operator>>(double& d)
 { if (m_lpBufCur + sizeof(double) > m_lpBufMax)
 FillBuffer(sizeof(double) - (UINT)(m_lpBufMax - m_lpBufCur));
 *(_AFX_DOUBLE*)&d = *(UNALIGNED _AFX_DOUBLE*)m_lpBufCur; 
 m_lpBufCur += sizeof(double); return *this; }
_AFX_INLINE CArchive& CArchive::operator>>(LONG& l)
 { if (m_lpBufCur + sizeof(LONG) > m_lpBufMax)
 FillBuffer(sizeof(LONG) - (UINT)(m_lpBufMax - m_lpBufCur));
 l = *(UNALIGNED LONG*)m_lpBufCur; m_lpBufCur += sizeof(LONG); return *this; }
#endif



Listing Two

#ifdef _MAC WU_BIG_ENDIAN
struct _AFXWORD
{
 BYTE WordBits[sizeof(WORD)];
};
struct _AFXDWORD
{
 BYTE DwordBits[sizeof(DWORD)];
};
struct _AFXFLOAT
{
 BYTE FloatBits[sizeof(float)];
};
struct _AFXDOUBLE
{
 BYTE DoubleBits[sizeof(double)];
};



Listing Three

CArchive& CArchive::operator<<(WORD w)
{
 if (m_lpBufCur + sizeof(WORD) > m_lpBufMax)

 Flush();

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXWORD wAfx;
 *(WORD*)&wAfx = w;

 ASSERT(sizeof(WORD) == 2);

 BYTE* pb = m_lpBufCur;
 *pb++ = wAfx.WordBits[1];
 *pb = wAfx.WordBits[0];
 }
 else
 {
 *(WORD FAR*)m_lpBufCur = w;
 }
 m_lpBufCur += sizeof(WORD);
 return *this;
}
CArchive& CArchive::operator>>(WORD& w)
{
 if (m_lpBufCur + sizeof(WORD) > m_lpBufMax)
 FillBuffer(sizeof(WORD) - (UINT)(m_lpBufMax - m_lpBufCur));
 w = *(WORD FAR*)m_lpBufCur;
 m_lpBufCur += sizeof(WORD);

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXWORD wAfx;
 *(WORD*)&wAfx = w;

 ASSERT(sizeof(WORD) == 2);

 (*(_AFXWORD*)&w).WordBits[0] = wAfx.WordBits[1];
 (*(_AFXWORD*)&w).WordBits[1] = wAfx.WordBits[0];
 }
 return *this;
}



Listing Four

CArchive& CArchive::operator<<(LONG l)
{
 ASSERT(sizeof(LONG) == sizeof(DWORD));
 return operator<<((DWORD) l);
}
CArchive& CArchive::operator>>(LONG& l)
{
 ASSERT(sizeof(LONG) == sizeof(DWORD));
 return operator>>((DWORD&) l);
}



Listing Five


CArchive& CArchive::operator<<(DWORD dw)
{
 if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax)
 Flush();

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXDWORD dwAfx;
 *(DWORD*)&dwAfx = dw;

 ASSERT(sizeof(DWORD) == 4);

 BYTE* pb = m_lpBufCur;
 *pb++ = dwAfx.DwordBits[3];
 *pb++ = dwAfx.DwordBits[2];
 *pb++ = dwAfx.DwordBits[1];
 *pb = dwAfx.DwordBits[0];
 }
 else
 {
 *(DWORD FAR*)m_lpBufCur = dw;
 }
 m_lpBufCur += sizeof(DWORD);
 return *this;
}
CArchive& CArchive::operator>>(DWORD& dw)
{
 if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax)
 FillBuffer(sizeof(DWORD) - (UINT)(m_lpBufMax - m_lpBufCur));

 dw = *(DWORD FAR*)m_lpBufCur;
 m_lpBufCur += sizeof(DWORD);

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXDWORD dwAfx;
 *(DWORD*)&dwAfx = dw;

 ASSERT(sizeof(DWORD) == 4);

 (*(_AFXDWORD*)&dw).DwordBits[0] = dwAfx.DwordBits[3];
 (*(_AFXDWORD*)&dw).DwordBits[1] = dwAfx.DwordBits[2];
 (*(_AFXDWORD*)&dw).DwordBits[2] = dwAfx.DwordBits[1];
 (*(_AFXDWORD*)&dw).DwordBits[3] = dwAfx.DwordBits[0];
 }

 return *this;
}



Listing Six

CArchive& CArchive::operator<<(float f)
{
 if (m_lpBufCur + sizeof(float) > m_lpBufMax)
 Flush();

 if (!(m_nMode & bNoByteSwap))

 {
 _AFXFLOAT fAfx;
 *(float*)&fAfx = f;

 ASSERT(sizeof(float) == 4);

 BYTE* pb = m_lpBufCur;
 *pb++ = fAfx.FloatBits[3];
 *pb++ = fAfx.FloatBits[2];
 *pb++ = fAfx.FloatBits[1];
 *pb = fAfx.FloatBits[0];
 }
 else
 {
 *(_AFXFLOAT FAR*)m_lpBufCur = *(_AFXFLOAT FAR*)&f;
 }
 m_lpBufCur += sizeof(float);
 return *this;
}
CArchive& CArchive::operator>>(float& f)
{
 if (m_lpBufCur + sizeof(float) > m_lpBufMax)
 FillBuffer(sizeof(float) - (UINT)(m_lpBufMax - m_lpBufCur));

 *(_AFXFLOAT FAR*)&f = *(_AFXFLOAT FAR*)m_lpBufCur;
 m_lpBufCur += sizeof(float);

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXFLOAT fAfx;
 *(float*)&fAfx = f;

 ASSERT(sizeof(float) == 4);

 (*(_AFXFLOAT*)&f).FloatBits[0] = fAfx.FloatBits[3];
 (*(_AFXFLOAT*)&f).FloatBits[1] = fAfx.FloatBits[2];
 (*(_AFXFLOAT*)&f).FloatBits[2] = fAfx.FloatBits[1];
 (*(_AFXFLOAT*)&f).FloatBits[3] = fAfx.FloatBits[0];
 }

 return *this;
}



Listing Seven

CArchive& CArchive::operator<<(double d)
{
 if (m_lpBufCur + sizeof(double) > m_lpBufMax)
 Flush();

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXDOUBLE dAfx;
 *(double*)&dAfx = d;

 ASSERT(sizeof(double) == 8);


 BYTE* pb = m_lpBufCur;
 *pb++ = dAfx.DoubleBits[7];
 *pb++ = dAfx.DoubleBits[6];
 *pb++ = dAfx.DoubleBits[5];
 *pb++ = dAfx.DoubleBits[4];
 *pb++ = dAfx.DoubleBits[3];
 *pb++ = dAfx.DoubleBits[2];
 *pb++ = dAfx.DoubleBits[1];
 *pb = dAfx.DoubleBits[0];
 }
 else
 {
 *(_AFXDOUBLE FAR*)m_lpBufCur = *(_AFXDOUBLE FAR*)&d;
 }

 m_lpBufCur += sizeof(double);
 return *this;
}
CArchive& CArchive::operator>>(double& d)
{
 if (m_lpBufCur + sizeof(double) > m_lpBufMax)
 FillBuffer(sizeof(double) - (UINT)(m_lpBufMax - m_lpBufCur));

 *(_AFXDOUBLE FAR*)&d = *(_AFXDOUBLE FAR*)m_lpBufCur;
 m_lpBufCur += sizeof(double);

 if (!(m_nMode & bNoByteSwap))
 {
 _AFXDOUBLE dAfx;
 *(double*)&dAfx = d;

 ASSERT(sizeof(double) == 8);

 (*(_AFXDOUBLE*)&d).DoubleBits[0] = dAfx.DoubleBits[7];
 (*(_AFXDOUBLE*)&d).DoubleBits[1] = dAfx.DoubleBits[6];
 (*(_AFXDOUBLE*)&d).DoubleBits[2] = dAfx.DoubleBits[5];
 (*(_AFXDOUBLE*)&d).DoubleBits[3] = dAfx.DoubleBits[4];
 (*(_AFXDOUBLE*)&d).DoubleBits[4] = dAfx.DoubleBits[3];
 (*(_AFXDOUBLE*)&d).DoubleBits[5] = dAfx.DoubleBits[2];
 (*(_AFXDOUBLE*)&d).DoubleBits[6] = dAfx.DoubleBits[1];
 (*(_AFXDOUBLE*)&d).DoubleBits[7] = dAfx.DoubleBits[0];
 }

 return *this;
}



Listing Eight

_AFXWIN_INLINE CArchive& AFXAPI operator<<(CArchive& ar, POINT point)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Write(&point, sizeof(POINT));
#else 
 ar << point.x << point.y;
#endif
 return ar; 
}

_AFXWIN_INLINE CArchive& AFXAPI operator>>(CArchive& ar, POINT& point)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Read(&point, sizeof(POINT));
#else
 ar >> point.x >> point.y;
#endif
 return ar;
}



Listing Nine

_AFXWIN_INLINE CArchive& AFXAPI operator<<(CArchive& ar, RECT rect)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Write(&rect, sizeof(RECT));
#else 
 ar << rect.left << rect.top << rect.right << rect.bottom;
#endif
 return ar; 
}
_AFXWIN_INLINE CArchive& AFXAPI operator>>(CArchive& ar, RECT& rect)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Read(&rect, sizeof(RECT));
#else
 ar >> rect.left >> rect.top >> rect.right >> rect.bottom;
#endif
 return ar;
}
_AFXWIN_INLINE CArchive& AFXAPI operator<<(CArchive& ar, SIZE size)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Write(&size, sizeof(SIZE));
#else 
 ar << size.cx << size.cy;
#endif
 return ar; 
}
_AFXWIN_INLINE CArchive& AFXAPI operator>>(CArchive& ar, SIZE& size)
{
#ifndef _MAC WU_BIG_ENDIAN
 ar.Read(&size, sizeof(SIZE));
#else
 ar >> size.cx >> size.cy;
#endif
 return ar;
}



Listing Ten

template<class TYPE>
void AFXAPI SerializeElements(CArchive& ar, TYPE* pElements, int nCount)
{
 ASSERT(AfxIsValidAddress(pElements, nCount * sizeof(TYPE)));


 // default is bit-wise read/write
 if (ar.IsStoring())
 ar.Write((void*)pElements, nCount * sizeof(TYPE)); // untyped write
 else
 ar.Read((void*)pElements, nCount * sizeof(TYPE)); // untyped read
}



Listing Eleven

CStroke : public CObject { . . . };
CArray< CMyKidsObject, CMyKidsObject& > kidsArray;

void SerializeElements( CArchive& ar, CMyKidsObject* pKids, int nCount )
{
 for ( int i = 0; i < nCount; i++, pKids++ )
 {
 // Serialize each CMyKidsObject object
 if ( ar.IsStoring() )
 ar << pKids;
 else
 ar >> pKids;
 }
}



Listing Twelve

template<class TYPE, class ARG_TYPE>
void CArray<TYPE, ARG_TYPE>::Serialize(CArchive& ar)
{
 ASSERT_VALID(this);

 CObject::Serialize(ar);
 if (ar.IsStoring())
 {
 ar << (WORD) m_nSize;
 }
 else
 {
 WORD nOldSize;
 ar >> nOldSize;
 SetSize(nOldSize);
 }
 SerializeElements(ar, m_pData, m_nSize);
}



Listing Thirteen

void SerializeElements( CArchive& ar, CPoint* pPoints, int nCount )
{
 for ( int i = 0; i < nCount; i++, pPoints++ )
 {
 // Serialize each CPoint object

 if ( ar.IsStoring() )
 ar << pPoints;
 else
 ar >> pPoints;
 }
}

























































PROGRAMMING PARADIGMS


Vision and Revision




Michael Swaine


Most programmers who have had any experience with visual programming would
agree that it is a fine idea. A fine idea for somebody else, some would say,
but a fine idea. It's not a new idea; visual programming dates back at least
to 1946, when Herman Goldstine and John von Neumann put it forth as a model of
programming. It's arguably as old as that other visual technique,
flowcharting. It's been around a long time. The idea, that is. Implementations
of that idea are something else again. Some programmers, no doubt the same
wits who would say it's a fine idea for somebody else, would claim that visual
programming today is nothing but an idea. But that's demonstrably wrong; there
are plenty of visual-programming languages (VPLs) around, or at least they
claim to be visual. Some are even named Visual Something-or-Other. But most
VPLs get faulted for losing their visual nature when you get to the guts of
the program. They are visual only when functioning as interface builders.
That's not really the idea.


The Visual Vision


The basic idea probably occurs to every programmer at some point. You write
your thirty-fifth flowchart and it occurs to you that it would be peachy if
you could stop there. Compile the flowchart and run it. That was pretty much
Goldstine and von Neumann's idea. Or, you sketch the structure of your program
on the back of a Denny's place mat, and as you fold it up with the mustard
blot on the inside so it doesn't stain your shirt pocket, you realize that
essentially the whole program is there in that sketch, that all that remains
to be done is to implement it; and you think, not for the first time, that
somebody ought to write a place-mat compiler.
Place-mat compilers. Executable flowcharts. That was Goldstine and von
Neumann's idea. Today, the visual programming idea is more likely to be framed
in object-oriented terms, flowcharts being sort of dated. That's nice, because
an object-oriented visual programming language seems more natural, more
realizable, than a procedural visual programming language. The structure of a
program written in C++ is inherently more visualizable than the structure of a
program written in 1960s-vintage spaghetti-code Basic.
The object-oriented paradigm ought to make visual programming languages easier
to design. That seems obvious.
But as obvious as the idea is, nobody seems to be writing that place-mat
compiler for you. The visual-programming languages that exist all seem to have
a hole in the middle.
It's like that classic S. Harris cartoon. A professor is standing at the
blackboard, which is covered with equations. A bunch of equations on the left,
a bunch of equations on the right. Linking them is the chalked comment in the
center of the board: "...and then a miracle happens."
Visual-programming languages can look awfully good on the outside, but when
you get to the inside, the visualness evaporates and they shrug and say,
"...and then a miracle happens."
What's needed, apparently, is a VPL in which the V runs deep, in which the
object-oriented structure of a program can be fully visualized and the visual
aspects map fully onto the desired functionality. Then, too, it would be nice
if the resulting visually designed programs were competitive with good C
programs in speed and memory and disk-space demands, and so forth.


A Visible Influence


Is it possible that Prograph CPX from Prograph International (Halifax, NS) is
that happy marriage of object orientation and visual programming?
Prograph has become hot stuff in the past year. Fans of the language, among
whom you have to number Apple's Kurt Schmucker, are raving about it.
Speed comparable to C++ code, rather than to Smalltalk or compiled 4D. Truly
clean and object oriented, yet viable as a production environment. Remarkably
easy program development and maintenance. Lets you do things you simply
couldn't have done before. Order-of-magnitude increase in programmer
productivity. The C of the next decade. Those are the claims, not of the
company, but of its most satisfied customers.
I've mentioned Prograph in this space twice before, but then I was talking
about an earlier version of the language. The latest is reputed to be a
radical revision.
The product was formerly just Prograph. Now it's Prograph CPX, the "CPX"
meaning cross platform, they say. It's currently available for UNIX, Windows,
and Macintosh, including native PowerMac machines, and has players for moving
applications across platforms. In addition to cross-platform support, Prograph
CPX now includes a complete application framework, object editors, and a
project structure that lets multiple programmers work together simultaneously
on a project.
The new features add a lot to the ease of use and power of the development
environment. And it doesn't hurt that they cut the price last year from $1495
to $695.
But the basic appeal of Prograph CPX is the appeal of OOP inside a VPL. That's
Prograph's message, and it seems to be getting across.
One sign of Prograph's growing influence is the fact that a healthy cottage
industry has grown up to supply third-party development tools for Prograph CPX
developers. For example:
Tangent Systems (San Diego, CA) is selling tools for Fourier analysis,
engineering graphics, and multimedia sound management.
Breathing Software (Nashville, TN) has a suite of animation classes.
Davies Bosch Associates (Newport Beach, CA) offers a Prograph class library
that lets developers integrate D8, its object-oriented database engine, into
their Prograph applications.
RKP Software (Oakton, VA) has a set of interface tools, like calendars and
specialized buttons.
EveryDay Objects (Merrillville, IN) is distributing HotDAM!, a set of tools
for DAM/DAL, ODBC, and SQL database communication tools, including a complete
application-source-code example.
StoneTablet Publishing (Portland, OR) has a library of functions that replace
a portion of the Mac Toolbox dealing with lists. Last year StoneTablet made
its library accessible from Prograph CPX.
There are also many independent consultants and code shops doing custom
development using Prograph.


What Vision Looks Like


The development environment of Prograph contains several distinctive visual
features.
First, icons represent the components of a project. Each Prograph project
consists of one or more Sections, each of which corresponds to a file on disk.
The icon for a section looks like a cube viewed corner-on, and each visible
face of the cube represents a different aspect of the section. There is a
Classes face, a Methods face, and a Persistents face. Each face of the icon is
separately clickable, to display the classes, methods, or persistents of the
section. (There is also a text-view version of these multiple-part icons.)
Inheritance is represented by lines connecting icons for classes.
The multiple-part icon approach crops up elsewhere: Icons for classes have two
sides, the Methods side and the Attributes side. Clicking on the Attributes
side brings up a window in which the class's attributes and their values are
listed. Clicking on the Methods side brings up a visual representation of the
methods of the class. As with the Section icons, there is also a text-view
version of these multiple-part icons.
The difficult thing is representing the methods, of course. It's easy to see
how you can visualize objects using labeled icons, and how you can picture the
inheritance structure using lines linking the icons. But making a picture of a
method is the tough part.
Prograph uses data-flow diagrams.
The data-flow diagram for a given (universal or class) method consists of one
or more cases. A case is, visually, a window containing an input bar at the
top, an output bar at the bottom, and icons representing operations between.
The operation icons have inputs and outputs connected either to other icons or
to the input or output bars via datalinks, which are lines.
Operations can be primitives supplied with the language or external code, or
they can be user defined. Allowed operations include: constant, accessing a
persistent, producing a new instance of a class, getting and setting
attributes, evaluating string and math functions and expressions, and controls
and matches. User-defined operations are built visually by hooking up
primitives.

Another visual feature of the environment is the ability to attach comments to
icons, a feature common to many VPLs.
A nice feature of the editor is that third-party universal methods can be
added to the editor's Tools menu by simply choosing the Install Tool item in
that menu.
Finally, there is an interpreter as well as a compiler, so you can get
immediate feedback on your designs as you implement them, but without the
penalties that a strictly interpreted language imposes.


Inner Vision


A few other points about the language: It's probably impossible for any
language to be fully visual if it has to interface with other languages.
Prograph lets you tie C or Pascal code into your Prograph applications, and
naturally that C or Pascal code isn't presented visually. 
The latest version includes a manual on developing externals. Externals can be
either XPrims, external primitives written in C such that they can be called
from Prograph, or XDefs, external references written in either C or Pascal and
integrated into the Prograph environment via the Prograph C Interface or the
Prograph Pascal Interface. Mac Toolbox calls are regarded as XDefs.
Having two types of externals means that you can bring in prewritten routines
or use C for particular routines if you think it will give you an advantage.
At least the Mac version (the only one I've seen) has exceptional tools for
interface building. There is a collection of over 100 Application Building
Classes (ABCs) that let the developer build interfaces by drag-and-drop
programming. The ABCs are supported by individual editors (ABEs) written in
Prograph that include full source code, so you can customize them.
So is Prograph CPX the real thing? The real OOP VPL?
It's interesting that the programming model is not simply OOP, but OOP
augmented by the dataflow model. One consequence: Operations are expected to
operate when data arrives at their inputs. This leaves the flow of execution
ambiguous in those cases where data arrives at the inputs of several
operations simultaneously. In such cases, the Prograph interpreter chooses an
operation at random from those ready to execute. It's possible to override
this action, but it's the default behavior.
Whatever the ultimate judgment on Prograph, it will also be something of a
judgment on VPL as a paradigm. Because Prograph pushes the paradigm hard. The
visual nature of the language runs deep. The power of the language is
significant; this is not a niche tool, but a contender for your primary
programming language. The stakes are high, and if Prograph doesn't validate
the visual-programming paradigm, its failure to do so could make the paradigm
a harder sell in the future.
And if Prograph does validate the visual-programming paradigm, it could be a
signal of where programming languages are going in the next decade.


The Invisible Programming Language


Next topic: AppleScript, which I think has finally arrived.
I really ought to supply you with a smooth transition here, but I think I'd
better owe you one. Because AppleScript is almost the opposite of what we've
been talking about, nearly the antithesis of a visual-programming language.
Apple not only didn't provide a visual-programming environment for AppleScript
when it introduced the system-level scripting language, it barely provided a
programming environment. The AppleScript script editor was it.
And there was another barrier to visibility with AppleScript: Its vocabulary
included a core lexicon, extended by suites of application-category-specific
words defined by vendor committees. There was no single place to look to learn
the vocabulary.
With invisible languages, the more documentation the better. Enter Danny
Goodman.
Danny Goodman wrote one of the most successful books on any programming tool,
certainly one of the all-time best-sellers in the Macintosh area, in his
HyperCard book.
In 1993, Danny turned his explanatory skills to AppleScript, producing an
excellent general book on Apple's system-level scripting language. Recently,
the second edition of Danny Goodman's AppleScript Handbook (Random House,
1993) came out, and this edition could be the real starting point for a lot of
AppleScript scripters.
AppleScript has been around for several years, but it's been handicapped by
lack of support from third-party applications and the absence of development
tools. Advanced developers needed sophisticated editing and debugging tools,
and more casual scripters needed a front end, something like HyperCard.
AppleScript also lacked an adequate front end for users.
Without support from third-party applications, the power of AppleScript to let
the user control applications from the outside and the power for applications
to work together via AppleScript weren't realizable. Without the third-party
support, there could be no market for AppleScript-based add-ons to existing
applications. Without third-party support, AppleScript was just a tool for
scripting the Finder.
And the Finder wasn't scriptable.
A lot has changed lately. Many applications now boast some form of AppleScript
support, including most of the major applications. There are decent
AppleScript editor/debuggers. HyperCard now supports AppleScript so fully
that, thanks to the open scripting architecture, stack developers can program
their stacks using AppleScript in place of HyperTalk. Since HyperCard can now
produce standalones, this means that the stack developer who learns
AppleScript can produce stand-alone applications embodying AppleScript
functionality with a HyperCard interface. FaceSpan, distributed by Apple to
AppleScript developers, is a tool that lets developers put a Mac-like
interface on their scripts with less effort than programming in HyperTalk, and
with less overhead than HyperCard. With System 7.5, Apple installs a
collection of scripts under the Apple menu, making AppleScript look like
something Apple actually expects people to use.
And the Finder is now scriptable.
Just as Danny's HyperCard book was, in effect, the documentation for
HyperCard, Danny's AppleScript book is the documentation for AppleScript, and
the second edition is a timely update. It doesn't require, but does assume,
System 7.5. It has new chapters on scripting third-party applications: a
chapter each on FileMaker Pro, Excel, Word, WordPerfect, MacWrite Pro,
Touchbase Pro, HyperCard, and QuarkXPress, each chapter including useful
scripts; plus a general chapter on what to think about in scripting
third-party applications and a brief chapter on where to get help in working
with third-party apps. There are new chapters on building user interfaces for
your scripts, third-party scripting tools, and third-party scripting
additions.
Scripting additions, or "OSAXen," are script components. Danny includes many
of the best available OSAXen on the companion disk. In all, with the scripts
from all the chapters of the book and the OSAXen, there are 235 files on the
disk.
Danny's book also has a new chapter on scripting the scriptable Finder. It's
solid information, but there's another book you're less likely to find on the
bookstore shelves that may be more useful in scripting the Finder. Heizer
Software (Pleasant Hill, CA) has begun publishing programming books, and the
first out is Scripting the Scriptable Finder (Heizer Software, 1995), by Steve
Michel. Steve is one of the most savvy and committed AppleScript scripters
around. Happily, he's also a good writer. You may have seen his columns in
MacWeek in years past, when MacWeek covered scripting better than it does now.
Scripting the Scriptable Finder is the key to creating scripts that tame the
Mac's operating system.
And that beast, having grown huge and out of control over the past ten years,
can stand some taming.



























C PROGRAMMING


The Standard Template Library, Visual C++ Training, Text-Search Wrap-up




Al Stevens


Last month I interviewed Alexander Stepanov, whose Standard Template Library
(STL) was accepted in 1994 as a major part of the Standard C++ library. STL is
a library of container-class templates and algorithmic-function templates.
This discussion (which I adapted from the last chapter of my book Teach
Yourself C++, Fourth Edition) provides an overview of STL.
STL's rationale is found in its generic-programming model. Given one set of
data types, another set of container types, and a third set of algorithms, the
amount of software to be developed with traditional, nongeneric C++ methods is
a product of the number of elements in the three sets. If you have integer,
Date, and Personnel objects to contain in lists, queues, and stacks, and you
need insert, extract, and sort algorithms for each, then there are 27 (3x3x3)
traditional C++ algorithms to develop. With traditional templates, you can
define the containers as generic classes and reduce the number to nine
algorithms: three algorithms for each of the three containers. If, however,
you design the algorithms as templates that perform generic operations on
parameterized containers, then there are only three algorithms to write, and
that orthogonal organization is the underlying basis for STL's
generic-programming model.
That argument is a simplification of the STL rationale, but it hints at larger
advantages, ones that cannot be ignored. First, if class template containers
are sufficiently generic, they can support any user-defined data type that
meets their requirements with respect to operator overloading and behavior.
You can contain any data type within any kind of supported container without
having to develop custom container code. Second, if the algorithms are
sufficiently generic, you can use them to process containers of objects of
user-defined data types. Third, you introduce new containers by conforming to
the rules of STL. The existing algorithms automatically work with the new
containers. Finally, conforming new algorithms work with all present
containers and contained data types.
If you stick to the rules, you can add to any of the three components that
comprise STL--the containers, the algorithms, and the contained data
types--and all existing components will automatically accept the new addition
and work seamlessly with it.


The Standard Containers


STL supports several container types cate-gorized as sequences and associative
containers. Access to containers is managed by a hierarchy of iterator objects
that resemble C++ pointers. Iterators point to objects in the containers and
permit the program to iterate through the containers in various ways.
The containers all have common management member functions defined in their
template definitions: insert, erase, begin, end, size, capacity, and so on.
Individual containers have member functions that support their unique
requirements.
A standard suite of algorithms provides for searching, copying, reordering,
transforming, and performing numeric operations on the objects in the
containers. The same algorithm is used to perform a particular operation for
all containers of all object types.


Sequences


A sequence is a container that stores a finite set of objects of the same type
in a linear organization. An array of names is a sequence. You would use one
of the three sequence types--vector, list, or deque--for a particular
application, depending on its retrieval requirements.
A vector is a sequence that you can access at random. You can append entries
to and remove entries from the end of the vector without undue overhead.
Insertion and deletion at the beginning or in the middle of the vector take
more time because they involve shifting the remaining entries to make room or
to close up the deleted object space. A vector is an array of contiguous
objects with an instance counter or pointer to indicate the end of the
container. Random access is a matter of using a subscript operation.
A list is a sequence that you access bidirectionally and perform inserts and
deletes anywhere without undue performance penalties. Random access requires
forward or backward iteration to the target object. A list consists of
noncontiguous objects linked together with forward and backward pointers.
A deque is like a vector, except that it allows fast inserts and deletes at
the beginning as well as the end of the container. Random inserts and deletes
take more time.


Associative Containers


Associative containers provide for fast, keyed access to the objects in the
container. They are constructed from key objects and a compare function that
the container uses to compare objects. Associative containers consist of set,
multiset, map, and multimap containers. You would use associative containers
for large dynamic tables that you can search sequentially or at random.
Associative containers use tree structures to organize the objects rather than
contiguous arrays or linked lists. These structures support fast random
retrievals and updates.
The set and multiset containers contain objects that are key values. The set
container does not permit multiple keys with the same value; the multiset
container does.
The map and multimap containers contain objects that are key values. They
associate each key object with another parameterized type object. The map
container does not permit multiple keys with the same value; the multimap
container does.


Iterators


Iterators provide a common method of access into containers. They resemble and
have the semantics of C++ pointers. In fact, when the parameterized type is a
built-in C++ type (int, double, and so on), the associated iterators are C++
pointers.
Each container type supports one category of iterator depending on the
container's requirements. The categories are: Input, Output, Forward,
Bidirectional, and Random Access. STL defines a hierarchy of iterators, as
shown in Figure 1.
Each iterator category has all the properties of those above it in the
hierarchy. Those properties specify the behavior that the iterator must
exhibit in order to support the container. Iterators are "smart" pointers.
They are permitted to have values that represent one of a set of defined
states. These states are listed and explained in Table 1.
Iterators can be initialized, incremented, and decremented, and their bounds
can be limited by the current extent of the containers. If you can cause one
iterator to be equal to another by incrementing the first, the second iterator
is reachable from the first. The two iterators are also known to refer to the
same container and can therefore define a range of objects in the container.
Iterators can be set as the result of a search of the container or by
subscripted reference into the container. Containers include member functions
that return iterators that point to the first object and the past-the-end
object position. Iterators are the objects with which STL algorithms work.


Algorithms


Algorithms perform operations on containers by dereferencing iterators. Each
algorithm is a function template parameterized on one or more iterator types.
Algorithms are the backbone of STL. Table 2 lists the standard algorithms
provided with STL.
Algorithms accept iterators as arguments. The iterators tell the algorithm on
which object or range of objects in a container to operate.

To build and sort a vector container of pseudorandom integers, you could use
the program in Listing One which instantiates a vector container of integer
objects named "vct." Then it uses the vector template-class member function
insert to insert random numbers into the vector container. It displays the
container's contents, retrieving each of the objects by using the overloaded
[] operator. Next, the program uses the STL sort algorithm to sort the
container's contents. The sort function accepts a range of objects expressed
as a pair of iterators. The second iterator must be reachable from the first.
The container begin and end member functions return iterators that refer to
the first object in the container and the past-the-end position of the
container. This pair constitutes a range that represents the entire container.
Not every algorithm works with every container. The hierarchy of iterator
types maintains this relationship. It is inefficient, and therefore illegal,
to apply some algorithms to some data structures. The association of a
container with its supporting algorithms is controlled by the container's
iterator types. You cannot, for example, use the sort algorithm, which
requires random iterators, to sort a list, which uses bidirectional iterators.
The sort algorithm expects iterators to point to memory-adjacent objects so
that it can rearrange the objects in a contiguous array. Iterators for lists
do not have that property. Each one points to an entry whose physical address
is coincidental to the other objects in the list. Their logical juxtaposition
in the list is represented by hidden list pointers instead of by a physical
position in a contiguous array. The sort algorithm would be extremely
inefficient if it tried to sort such a data structure. If you replace vector
with list in the program in Listing One, two kinds of compiler errors occur.
First, the [] operator does not work for iterating through and extracting
objects. The list<T> container class does not overload the [] operator. You
have to use iterators for that operation. The other errors occur deep down
inside STL functions when STL tries to subtract and compare iterators by using
operators that are not overloaded. Here you have to be somewhat familiar with
the underlying library. The errors are typical C++ cryptic compiler messages
relating to the parameterized usage that STL tries to instantiate. "Illegal
structure operation," for example. The real clue is the error message: 
Could not find a match for
'__insertion_sort(list<int>::iterator,
undefined)'
which tells you that you can't sort a list. It follows that if you want to
invent a container that can be sorted with the generic sort algorithm, the
container must include an iterator derived from the STL random-access
iterator.


Predicates


Algorithms accept predicates, which are function-object arguments. A function
object overloads operator(); you pass it to an algorithm as a
callback-function argument. The algorithm calls the predicate for each object
that it processes from the container. In some cases, the predicate is a bool
function that returns true or false to tell the algorithm whether to select
the object. In other cases, the predicate processes the objects that the
algorithm finds and returns an object of the type in the container. STL
provides a set of standard arithmetic, comparison, and logical function
objects that you can use as predicates. 


Allocators


Allocators are STL objects that encapsulate information about the container's
storage medium, specifically, the compiler's memory models. Each container
template uses an embedded allocator object to allocate memory for objects in
the container. The behavior of the allocator object is parameterized for the
type, and users can override the allocator object by defining their own.
In its current implementation, STL disables the _new_handler pointer. Then it
displays a message on cout and calls exit(1) if memory is exhausted by an
allocator. This behavior might not be appropriate for a program that manages
its own memory, does not use standard console devices, or needs a more orderly
exit to release interrupt vectors, for example. Current versions of STL do not
throw exceptions, and the committee is looking into that. In the meantime, you
can install custom allocators if you need different memory-exhaustion
strategies.


STL Summarized


STL consists of generic containers with iterators and algorithms that operate
on those containers through their iterators. STL is almost a different
programming model--another paradigm, if you will. It differs from pure
object-oriented theory by separating the data from the functions. Algorithms
are not encapsulated in classes. They are not methods. They are function
templates. Their binding to the data occurs as a function of their orthogonal
relationship to their parameterized iterators.
The generic-programming model is an important paradigm, and it adds to,
instead of replaces, what you already know. It will get a lot of publicity in
the coming months. But it isn't a bandwagon, so don't start jumping
prematurely. Just as object-oriented programming did not replace structured
programming, generic programming does not offset object-oriented programming.
Problem-domain class design still involves abstraction, encapsulation,
polymorphism, and inheritance. Generic programming applies quite specifically
to the design and development of container data structures, those collections
that manage and hold objects of your object-oriented classes. Two different
things.
You can experiment with STL by downloading it as an anonymous ftp from butler
.hpl.hp.com under /stl and using Borland C++ 4.5 or IBM's CSet++ compiler for
OS/2.


SD Training Intensives


STL's introduction of "generic programming" into our lexicon reminds me that
programming keeps getting harder. It was never supposed to do that. They
promised. Productivity tools are meant to increase productivity and make our
lives easier. But each new language, paradigm, library, framework, tool,
utility, and operating platform adds one more layer of complexity to the
things that a programmer needs to know. It is not always certain that the
degree of increased empowerment equals the effort and knowledge required to
achieve that degree. 
In December, I attended the Software Development Training Intensives classes
conducted in Orlando by Bruce Eckel and Richard Hale Shaw. The Orlando
sessions were part of a multicity, traveling road show, the details of which
are advertised in DDJ and other Miller Freeman publications.
Bruce's two days are a total immersion into C++ for C programmers. Already
knowing C++, I attended only the last part of that session. My attention
turned to the audience. They were spellbound and a little glassy-eyed after
two intensive days of nonstop, wall-to-wall, brain-soaking C++. Bruce uses his
new book, Thinking in C++ (Prentice Hall, 1995) for the classroom textbook,
and I spent some time later with that book. Bruce's treatment of iostreams is
the most comprehensive I've seen in an introductory book.
Richard's two days are devoted to Windows programming with Visual C++ 2.0 and
the Microsoft Foundation Classes, and I attended both days. I had already run
the VC++ 2.0 tutorial at home and wanted to know more. Without a doubt, those
two days were the most educational I've ever spent learning anything. Not only
is Richard a master teacher, but the subject is compelling. With respect to my
earlier concerns about programming getting harder, let me say here without
reservation that the Visual C++ development environment empowers the
programmer far beyond the effort required to climb the learning slope. The
learning curve is steep, but you come out of it with a lot of development
power at your fingertips. If you are about to plunge into Windows 95 or NT
development and need to learn the fundamentals in a hurry, there is no better
way to do it than to spend two days with Richard Hale Shaw.


The Text-Search Project


I'll wrap up the text-search project this month by discussing the C
source-code module of the interface between the Visual Basic (VB) front end
and the Windows DLL that implements the search engine. There are several other
C source files in the engine to support query parsing, database retrievals,
decompression, and so on. Most of that code comes from past column projects,
so I won't discuss it again. All the source code is available in a
configuration that builds this project. I'll tell you how to get it at the end
of this discussion.
Listing Two, bibsrch.c, serves two purposes. I tested the engine as a
stand-alone DOS program. The source code includes a compile-time directive
that compiles either a Windows DLL or the stand-alone test program. The test
program includes a stubbed text-mode front end to exercise all the functions.
This approach lets me use the more-familiar DOS environment to test the
program before it becomes a mysterious DLL. Later, when I know more about
Windows programming, that approach will not be necessary.
The first part of the program is the stub, which includes a table of chapter
names for the query responses. Next is a main function that opens the database
and fires off a simple, menu-driven interactive session. Each menu selection
calls one of the engine's interface functions and displays the result. 
When you compile the source program to be included in the DLL, the stub code
does not get compiled. Instead, the program compiles standard WinMain and WEP
functions to handle the DLL's startup and exit code. There is also a
Windows-specific RegisterAppl function that the VB front end calls every time
the user runs the program. If a copy is already running, the function does not
try to open the database again. The VB front end can tell from the return
value that a copy of the program is already running. In that case, the program
calls into the previously running copy and exits instead of trying to execute
multiple copies.
Several utility functions compile for both the DOS and DLL versions of the
program. The ComputeDocNo function converts book, chapter, and verse variables
into a document number. The DocNotoBCV function converts them back. The
ReadVerse and ReadVerseText functions read text for the current verse and any
user-added annotations. Following that are three front-end interface
functions--GetVerse, NextVerse, and PrevVerse--declared with type qualifiers
FAR _export PASCAL. Those conventions identify DLL functions that outside
programs can call. When the program is compiled to run under DOS, those tokens
are #defined to a null string.
Each interface function has parameters that are far pointers to memory that
will receive the text of the selected verse and any annotation. Other
parameters are pointers to the book, chapter, and verse integer
specifications. The front end passes these values to GetVerse by ensuring that
the arguments point to valid values. The NextVerse and PrevVerse functions
determine the values for themselves based on the next or previous logical
document in the database. Then they return the values to the front end through
the argument pointers. That way the front end does not need to keep track of
how many verses are in every chapter and how many chapters are in every book.
The SetRetrievalMode function tells the engine whether the user is currently
navigating through verses or annotations.
Other front-end interface functions are in other source-code modules. Table 3
lists these functions. All of the functions may be assumed to have the FAR
_export PASCAL qualifier.
The LexicalScan function initiates a query, parsing and processing the query.
The function returns False if the query is invalid. In this case, the offset
argument is set to the offset in the query expression where the syntax checker
found the problem. If the expression is okay, the function copies the number
of hits into the caller's memory. The phrase argument is True if the caller is
requesting a phrase search rather than a Boolean search. The difference is
that in the former case, all the words in the expression are assumed to be
text words to be searched with And operators. The first pass of a phrase
search builds a hit list of documents that contain all the words in the phrase
without concern for whether the hit is a coincidence or actually contains the
phrase.
When LexicalScan returns, the query retrieval is finished, and the DLL has a
hit list to process. If the caller is processing a phrase search, a call to
the SearchPhrase function tells the engine to scan the documents in the hit
list to see if the phrase really exists.
With a hit list built, the front end calls NextFoundVerse successively to
retrieve the book, chapter, and verse numbers for each of the documents in the
hit list. The front end knows how many calls to make from the count of hits
returned by the LexicalScan function. After retrieving all the document
specifications from the hit list, the front end calls EndSearch to terminate
the search. It is up to the front end to call GetVerse to retrieve the text
and annotation of any particular verse. The Bible application displays a list
of found verses in a list box and retrieves and displays the text of only the
currently selected verse in the list box.
The engine keeps a record of the most-recently retrieved document so that
subsequent calls to NextVerse and PrevVerse retrieve the correct document.


Source Code and Database


The database and Visual Basic source code for the Bible application and the C
source code for the text engine are free. You can download them from the DDJ
Forum on CompuServe, from DDJ Online, and via anonymous ftp; see
"Availability," page 3.
If you cannot get to one of the online sources, send two high-density 3.5-inch
diskettes and an addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the source code and
database. It's free, but if you care to support my Careware charity, include a
dollar for the Brevard County Food Bank.

Figure 1 STL iterator hierarchy.
Table 1: STL iterator states.
 Iterator State Meaning 
Singular The iterator's value does not dereference any object in any
container.
 (The iterator could be uninitialized or set to a logical null value.)
Dereferenceable The iterator points to a valid object in the container.
Past-the-end The iterator points to the first adjacent object position beyond
 the last object in the container.
Table 2: Standard STL algorithms.
Nonmutating Mutating Generalized 
Sequence Sequence Sorting Numeric 
Operations Operations Operations Operations 
for_each copy accumulate
find copy_backward stable_sort inner_product
find_if swap partial_sort partial_sum
adjacent_find swap_ranges partial_sort_copy adjacent_difference
count transform nth_element
count_if replace lower_bound
mismatch replace_if upper_bound
equal replace_copy equal_range
search replace_copy_if binary_search
 fill merge
 fill_n inplace_merge
 generate includes
 generate_n set_union
 remove set_intersection
 remove_if set_difference
 remove_copy set_symmetric_difference
 remove_copy_if push_heap
 unique pop_heap
 unique_copy make_heap
 reverse sort_heap
 reverse_copy min
 rotate max
 rotate_copy max_element
 random_shuffle min_element
 partition lexicographical_compare
 stable_partition next_permutation
 n prev_permutation
Table 3: Search-engine interface functions.
int LexicalScan(char *exp, int *offset, unsigned *hits, int phrase);
void SearchPhrase(unsigned *hits);
void NextFoundVerse(int *Book, int *Chapter, int *Verse);
void EndSearch(void);
int AddNote(char far *Note);
void DeleteNote(void);

Listing One 

#include <iostream.h>
#include <iomanip.h>
#define __MINMAX_DEFINED
#include <stdlib.h>
#include <algo.h>
#include <vector.h>
int main()
{
 int dim;
 // --- get the number of integers to sort

 cout << "How many integers?\n";
 cin >> dim;
 // --- a vector of integers
 vector<int> vct;
 // --- insert values into the vector
 for (int i = 0; i < dim; i++)
 vct.insert(vct.end(), rand());
 // --- display the random integers
 cout << "\n----- unsorted -----\n";
 for (i = 0; i < dim; i++)
 cout << setw(8) << vct[i];
 // --- sort the array with the STL sort algorithm
 sort(vct.begin(), vct.end());
 // --- display the sorted integers
 cout << "\n----- sorted -----\n";
 for (i = 0; i < dim; i++)
 cout << setw(8) << vct[i];
 return 0;
}



Listing Two

/* bibsrch.c -- dll for bible windows application */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <ctype.h>
#include "textsrch.h"
#include "table.c"

HFILE fp = HFILE_ERROR;
static HFILE fn = HFILE_ERROR;
unsigned DocNo = FIRSTDOCNO - 1;
char DatabasePath[255];

#ifndef DOS_VERSION
HCURSOR WtCurs;
HCURSOR PCurs;
#endif

static int RetrievalMode;
#ifdef DOS_VERSION
/* ===================================================
 DOS Query Program-specific code
 =================================================== */
static char *BookName[] = {
 "Genesis",
 "Exodus",
 "Leviticus",
 "Numbers",
 "Deuteronomy",
 "Joshua",
 "Judges",
 "Ruth",
 "I. Samuel",
 "II.Samuel",

 "I. Kings",
 "II. Kings",
 "I. Chronicles",
 "II. Chronicles",
 "Ezra",
 "Nehemiah",
 "Esther",
 "Job",
 "Psalms",
 "Proverbs",
 "Ecclesiates",
 "Song of Solomon",
 "Isaiah",
 "Jeremiah",
 "Lamentations",
 "Ezekiel",
 "Daniel",
 "Hosea",
 "Joel",
 "Amos",
 "Obadiah",
 "Jonah",
 "Micah",
 "Nahum",
 "Habakkuk",
 "Zephaniah",
 "Haggai",
 "Zechariah",
 "Malachi",
 "Matthew",
 "Mark",
 "Luke",
 "John",
 "The Acts",
 "The Romans",
 "I. Corinthians",
 "II. Corinthians",
 "Galatians",
 "Ephesians",
 "Philippians",
 "Colossians",
 "I. Thessalonians",
 "II. Thessalonians",
 "I. Timothy",
 "II. Timothy",
 "Titus",
 "Philemon",
 "To the Hebrews",
 "Epistle of James",
 "I. Peter",
 "II. Peter",
 "I.John",
 "II. John",
 "III. John",
 "Jude",
 "Revelation"
};
int FAR _export PASCAL GetVerse(
 char far *Text, char far *Note,

 int Book, int Chapter, int Verse);
void FAR _export PASCAL NextVerse(
 char far *Text, char far *Note,
 int *Book, int *Chapter, int *Verse);
void FAR _export PASCAL PrevVerse(
 char far *Text, char far *Note,
 int *Book, int *Chapter, int *Verse);
static void Display(int bk, int chap, int verse, char *text)
{
 printf("\nDocNo: %u ", DocNo);
 printf("%s %d:%d\n", BookName[bk-1], chap, verse);
 printf(text);
}
static void doquery(void)
{
 static char query[300];
 unsigned hits;
 int c = 0, offset, phrase;
 int bk, chap, verse;
 static char text[512];
 static char note[257];
 while (c != 'x') {
 c = 0;
 while (c != 'p' && c != 'q' && c != 'n' && c != 'x') {
 printf("\nP-hrase, Q-uery, N-avigate, or e-X-it: ");
 c = getch();
 putch(c);
 }
 if (c == 'n') {
 printf("\nN-ext, P-revious, B-ook/chapter/verse: ");
 c = 0;
 while (c != 'p' && c != 'b' && c != 'n') {
 c = getch();
 putch(c);
 }
 if (c == 'b') {
 printf("\nBook/chapter/verse: ");
 scanf("%d %d %d", &bk, &chap, &verse);
 *text = '\0';
 GetVerse(text, note, bk, chap, verse);
 Display(bk, chap, verse, text);
 }
 else if (c == 'n') {
 NextVerse(text, note, &bk, &chap, &verse);
 Display(bk, chap, verse, text);
 }
 else if (c == 'p') {
 PrevVerse(text, note, &bk, &chap, &verse);
 Display(bk, chap, verse, text);
 }
 }
 else if (c != 'x') {
 phrase = c == 'p';
 if (phrase)
 printf("\nEnter Phrase:\n");
 else
 printf("\nEnter Query:\n");
 gets(query);
 if (!LexicalScan(query, &offset, &hits, phrase))

 printf("\n\aSyntax error");
 else {
 int c;
 printf("\n%d hits. Continue? ", hits);
 c = getch();
 putch(c);
 if (tolower(c) == 'y') {
 int ct = 0;
 if (phrase && hits) {
 SearchPhrase(&hits);
 printf("\n%d more hits. Continue? ", hits);
 c = getch();
 putch(c);
 if (tolower(c) != 'y')
 hits = 0;
 }
 while (hits--) {
 putchar('\n');
 NextFoundVerse(&bk, &chap, &verse);
 *text = '\0';
 GetVerse(text, note, bk, chap, verse);
 Display(bk, chap, verse, text);
 ++ct;
 if ((ct % 4) == 0) {
 printf("\n[more...]");
 if (getch() == 27)
 break;
 }
 }
 }
 EndSearch();
 }
 }
 }
}
int main()
{
 strcpy(DatabasePath, "bible.dat");
 fp = OpenDataFile();
 if (fp != HFILE_ERROR) {
 strcpy(DatabasePath, "");
 if ((fn = OpenNotes(DatabasePath)) != HFILE_ERROR) {
 doquery();
 _lclose(fn);
 }
 }
 _lclose(fp);
 return 0;
}
/* ===================================================
 End of DOS Query Program-specific code
 =================================================== */
#else
/* ===================================================
 Windows DLL-specific code
 =================================================== */
static HINSTANCE hInst = NULL;
int FAR PASCAL LibMain(HINSTANCE hInstance, WORD wDatSeg,
 WORD cbHeapSize, LPSTR lpCmdLine)

{
 if (hInst == NULL) {
 hInst = hInstance;
 if (cbHeapSize > 0)
 UnlockData(0);
 }
 return 1;
}
int FAR PASCAL WEP(int nParameter)
{
 if (fp != HFILE_ERROR)
 _lclose(fp);
 if (fn != HFILE_ERROR)
 _lclose(fn);
 fp = fn = HFILE_ERROR;
 return 1;
}
int FAR _export PASCAL RegisterAppl(HANDLE hWnd, char *path)
{
 static HANDLE semaphore = 0;
 HANDLE sem = semaphore;
 if (semaphore == 0) {
 int plen;
 semaphore = hWnd;
 strncpy(DatabasePath, path, 255);
 plen = strlen(DatabasePath);
 strcat(DatabasePath, "\\bible.dat");
 fp = OpenDataFile();
 if (fp != HFILE_ERROR) {
 *(DatabasePath+plen) = '\0';
 fn = OpenNotes(DatabasePath);
 }
 }
 return sem;
}
/* ===================================================
 end of Windows DLL-specific code
 =================================================== */
#endif
/* Compute the book/chapter/verse spec from a document number */
void DocNotoBCV(unsigned docno, int *book, int *chapter, int *verse)
{
 unsigned char *tb = BCVtable;
 int chapterct;
#ifdef DOS_VERSION
 DocNo = docno;
#endif
 *book = *verse = 0;
 while (*verse == 0) {
 (*book)++;
 *chapter = 0;
 chapterct = *tb++;
 while (chapterct--) {
 (*chapter)++;
 if (docno <= *tb) {
 *verse = docno;
 break;
 }
 docno -= *tb++;

 }
 }
}
/* Compute a document number from a book/chapter/verse spec */
static int ComputeDocNo(int book, int chapter, int verse)
{
 unsigned char *tb = BCVtable;
 int chapterct;
 unsigned int docno = 0;
 if (book == 0 chapter == 0 verse == 0)
 return 0;
 /* ---- get to the book ---- */
 while (--book) {
 /* ---- bypass chapters in the preceding books ---- */
 chapterct = *tb++;
 while (chapterct--)
 docno += *tb++;
 }
 chapterct = *tb++;
 /* ---- test valid chapter ---- */
 if (chapter > chapterct)
 return 0;
 /* ------ get to the chapter ---- */
 while (--chapter)
 docno += *tb++;
 /* ---- test valid verse ---- */
 if (verse > *tb)
 return 0;
 /* ---- build document number ---- */
 DocNo = docno + verse;
 return 1;
}
/* Read the verse text specified by the current document number */
void ReadVerseText(char far *Text)
{
 /* ---- get offset and bit from DocNo ---- */
 struct versercd vr;
 long offset = sizeof(struct versercd);
 offset *= DocNo-1-(FIRSTDOCNO-1);
 _llseek(fp, offset+BCVOFFSET, SEEK_SET);
 _lread(fp, &vr, sizeof(struct versercd));
 GetLine(Text, vr.offset, vr.bit);
}
/* Read verse text and note text specified by the current document number */
static void ReadVerse(char far *Text, char far *Note)
{
 ReadVerseText(Text);
 GetNote(Note);
}
/* Get the next verse for the user */
void FAR _export PASCAL NextVerse(
 char far *Text, char far *Note,
 int *Book, int *Chapter, int *Verse)
{
 if (RetrievalMode)
 NextNotedDocNo();
 else {
 if (DocNo == MAXVERSE)
 DocNo = FIRSTDOCNO;

 else
 ++DocNo;
 }
 if (DocNo != FIRSTDOCNO - 1) {
 ReadVerse(Text, Note);
 DocNotoBCV(DocNo, Book, Chapter, Verse);
 }
}
/* Get the previous verse for the user */
void FAR _export PASCAL PrevVerse(
 char far *Text, char far *Note,
 int *Book, int *Chapter, int *Verse)
{
 if (RetrievalMode)
 PrevNotedDocNo();
 else {
 if (DocNo > FIRSTDOCNO)
 --DocNo;
 else
 DocNo = MAXVERSE;
 }
 if (DocNo != FIRSTDOCNO - 1) {
 ReadVerse(Text, Note);
 DocNotoBCV(DocNo, Book, Chapter, Verse);
 }
}
/* Get the verse specified by the user in a book/chapter/verse spec */
int FAR _export PASCAL GetVerse(
 char far *Text, char far *Note,
 int Book, int Chapter, int Verse)
{
 if (ComputeDocNo(Book, Chapter, Verse)) {
 ReadVerse(Text, Note);
 return 1;
 }
 return 0;
}
/* Set the retrieval mode (next note/next verse) for next/previous */
void FAR _export PASCAL SetRetrievalMode(int mode)
{
 RetrievalMode = mode;
}





















ALGORITHM ALLEY


Computing the Day of the Week




Kim S. Larsen


Kim has a PhD in computer science and is primarily interested in databases,
algorithmics, and data structures. He can be contacted at rtebjerggrdvej 31,
DK-5270 Odense N, Denmark or at kslarsen@imada.ou.dk.


Introduction 
by Bruce Schneier
Thirty days have September, April, June, and November.... So the song goes. Or
did you learn to count the months forward and backwards on your fingers, with
even-numbered fingers being the short months? Mnemonics such as these work
well for people (I recited that silly song well into adulthood) but are less
intuitive for computers. Of course, you can write a program that manually
computes the day of the week for any date, using a whole lot of If-Then-Elses,
or a few Case statements.
However, I like the technique Kim Larsen presents in this month's "Algorithm
Alley" because it approaches the problem from another direction. There isn't
really a mathematical formula for computing the day of the week for any given
date, but maybe we can cobble one together. The formula works, and you have an
easy-to-program mathematical formula for computing the day of the week
automatically.
By the way, if you've developed a clever new algorithm, or come up with a new
twist on an old idea, I'd love to hear from you. Please contact me at
schneier@chinet.com, or just drop a note to me at the DDJ offices.
Have you ever wondered how your computer knows that today is Wednesday? Even
if your machine has been down and you specify a new date when starting it up
again, it immediately knows which day of the week it is.
When you were a kid, you probably saw tables with entries for the date, the
month, and the year. You added up a few numbers and another table gave you the
day of that date. Of course, such tables could be hardwired into your
computer's operating system. However, there exists a simple formula for
computing the correct day of the week. This formula takes up very little
space, whereas a collection of tables covering just a few hundred years would
take up quite a bit.
If your machine is not capable of computing the day of the week, then you can
use this formula in your own programs and applications.


Creating a Formula


The starting point for the formula is a date represented by the variables D,
M, and Y. For example, for the date March 1, 1994, D=1, M=3, and Y=1994. Our
goal is to compute a number between 0 and 6, where 0 represents Monday, 1
represents Tuesday, 2 represents Wednesday, and so on.
It turns out that March 1, 1994 is a Tuesday, so the formula D mod 7 would
actually work for the rest of the month of March. For example, the 18th is a
Friday, and 18 mod 7=4, which represents Friday. (Remember that integer
division and modulo are closely connected. For example, 26 divided by 7 is 3
with the remainder 5. This means that the integer division of 26 by 7 equals
3, and 26 modulo 7 (abbreviated 26 mod 7) equals 5. This also implies that 19
mod 7=12 mod 7=5 mod 7=5. In fact, this works similarly for negative numbers,
so --2 mod 7=5, --9 mod 7=5, and so on.
More formally, it can be shown that for any integers n and k, n can be written
as n=qk+r in exactly one way, where q and r are also integers and 0r<k. Now q
is defined to be the integer division of n and k (written n/k), and r is
defined to be n mod k).
What about April? Well, if March 1 is a Tuesday, then April 1 is a Friday. So,
the formula needs to be shifted. There are 31 days in March, and since 31 mod
7=3, the formula that would work in April is (D+3) mod 7. Of course, the same
problem arises when we go from April to May, except that the shift will be 2,
since April has only 30 days. Table 1 lists the shift information for all
months. Note that in order to obtain as much regularity as possible, the short
month of February (and, hence, also January) has been moved to the end.
Example 1(a) is a formula that imitates the pattern depicted in the shift
column. The division is integer division, so the result is rounded down to the
nearest integer. The interesting values for this function are given in Table
2. Intuitively, when M increases with one (going from one month to the next),
2M increases with two, and 3(M+1)/5 increases three out of five times, which
is what we need to imitate this repeated pattern 3,2,3,2,3 of shifts
(indicated by the curly brackets to the right of the table). Notice that since
we are working modulo 7, going from 6 to 2 is an increase of 3 (counting
6,0,1,2).
We have now found a formula to adjust our calculations correctly when we go
from one month to the next, and we want to add this formula to our first
attempt, namely D mod 7. The only problem is getting it to start out right.
Again using March 1, 1994 (that is, M=3), notice that in Example 1(b) 8 mod
7=1, so we must subtract 1 when the whole formula is put together. Working
modulo 7, this is the same as adding 6, since --1 mod 7=6 mod 7=6.
Now the formula in Example 1(c) will work for the rest of the year. In fact,
since we have placed January and February at the end of Table 1, the formula
will also work for these two months in 1995, provided that we refer to these
as months 13 and 14. This is because they start a new, though incomplete,
3,2,3,2,3 sequence. To make a nicer formula, we adopt the convention from now
on of treating January and February as the months 13 and 14 of the previous
year.


Incorporating the Year


In going from one year to the next, we observe that March 1, 1995 is a
Wednesday. This means that changing to a new year should have the effect of
adding one to our formula. That is easy: We simply add Y to what we already
have. Again, we have to make sure things start out right. Since 1994 mod 7=6,
we must subtract 6 when we combine Y with the formula we already have. Example
2(a) then becomes a new and better formula.
Our next problem occurs with 1996--a leap year. March 1 is a Friday, not a
Thursday, as our formula would currently predict. So we need to add one every
time we enter a leap year. The rule is that a year is a leap year if it is
divisible by four, except that years divisible by 100 are only leap years if
they are also divisible by 400. In effect, we add Y/4--Y/100+Y/400 to what we
already have. Again, we must make sure that we start out correctly. Since
(1994/4--1994/100+1994/400) mod 7=(498--19+4) mod 7=483 mod 7=0, no
adjustments need to be made, and Example 2(b) is our final formula. This
formula works indefinitely (unless we change calendar systems). As an example,
let us try July 4, 2000: (4+2*7+3(7+1)/5+2000+2000/4--2000/100+2000/400)
mod7=(4+14+4+2000+500--20+5) mod 7=2507 mod 7=1, so it is a Tuesday.
This also works backwards in time; however, we switched to the current
calendar system on Thursday, September 14, 1752, so it does not work for dates
earlier than this. But if we try the standard "where were you when..." date of
November 22, 1963, we find:
(22+2*11+3(11+1)/5+1963+1963/4--1963/100+1963/400)mod7=(22+22+7+1963+490--19+4)mod
7=2489 mod 7=4, which is a Friday.
These formulas have been implemented in Example 3, a C program for computing
the day of the week automatically.
Table 1: Shift information for each month.
March 31 --
April 30 3
May 31 2
June 30 3
July 31 2
Aug. 31 3
Sept. 30 3
Oct. 31 2
Nov. 30 3
Dec. 31 2
Jan. 31 3
Feb. 28 3
Table 2 A function for imitating the shift pattern.

Example 1 Creating a formula that works for the year 1994.
Example 2 Extending the formula to account for different years.
Example 3: The formula expressed as a C program.
/* Computing day of the week from the date. It is assumed that input */
/* represents a correct date. */
#include <stdio.h>
char *name[] = { "Monday",
 "Tuesday",
 "Wednesday",
 "Thursday",
 "Friday",
 "Saturday",
 "Sunday"
 };
void main(){
 int D,M,Y,A;
 printf("Day: "); fflush(stdout);
 scanf("%d",&D);
 printf("Month: "); fflush(stdout);
 scanf("%d",&M);
 printf("Year: "); fflush(stdout);
 scanf("%d",&Y);
/* January and February are treated as month 13 and 14, */
/* respectively, from the year before. */
 if ((M == 1) (M == 2)){
 M += 12;
 Y--;
 }
 A = (D + 2*M + 3*(M+1)/5 + Y + Y/4 - Y/100 + Y/400) % 7;
 printf("It's a %s.\n",name[A]);
}
































UNDOCUMENTED CORNER


Inside the Pentium FDIV Bug




Tim Coe


Tim, who received his BSEE and MSEE from MIT in 1986, is a chip designer at
Vitesse Semiconductor. He can be contacted at coe@vitsemi.com.


Introduction 
by Andrew Schulman 
I take out a cheap pocket calculator--actually a $19.95 Roget's Thesaurus &
Spell Checker, to which Seiko threw in an 8-digit calculator "for free"--and
divide 4195835 by 3145727. I get the answer 1.3338204. How do I know this is
correct? By multiplying 1.3338204 and 3145727. This should give the result
4195835. 
Of course, it doesn't. Division often produces a result whose decimal
representation is inexact. On my cheap calculator, 1.3338204'3145727 yields
not 4195835, but 4195834.8--the answer is too low by 0.2. Close enough for
government work, perhaps, but to get a better answer, we need more digits. 
My $20.00 calculator offers only eight digits. But my $3000 Pentium-based Dell
Dimension XPS P60, which (according to the Pentium Processor User's Manual)
complies with IEEE Standard 754 for Binary Floating-Point Arithmetic,
guarantees 1516 significant decimal digits (double precision). On a Pentium,
then, you should be able to produce a far more accurate answer than on a cheap
8-digit calculator.
Unfortunately, as even David Letterman's audience has heard by now ("how about
some defective Pentium salsa to go with those defective Pentium chips?"), the
Pentium floating-point divider has a bug. If you do a floating-point divide
(FDIV) of 4195835.0 by 3145727.0, you get the answer 1.333739068902.
Multiplying this by 3145727 yields, not 4195835 or even the 4195834.8 produced
by the calculator, but 4195579--a full 256 too low! Whereas the cheap
calculator's answer would require only additional digits to gain precision,
the Pentium's answer is simply wrong starting in the fifth digit. This is less
precision than even IEEE single-precision numbers are supposed to have.
Given that the division operation has a trivially simple algorithm which we
all learned in school, how is it that the Pentium sometimes divides
incorrectly? The Pentium manual page for FDIV simply shows DESTDEST/SCR,
reinforcing the idea that division is a trivial, atomic operation. However,
division is anything but simple if you want to do it quickly. (Growing up, I
could always tell when my father was doing division on his Friden or Marchant
calculator, because the whole house would shake for about five minutes.)
Entire books, journals, and conferences are devoted to discovering newer,
faster ways of performing the seemingly simple arithmetic operations,
particularly division.
In this month's "Undocumented Corner," Tim Coe shows that the Intel Pentium
uses a division algorithm, first discovered in 1958, called "radix 4," or
"SRT." Even to many programmers, the idea that there is a division
algorithm--that a complex piece of software runs "inside" the simple-looking /
operator or FDIV instruction--is a revelation. It's also quite revealing that
we're talking about an algorithm that dates back to 1958: Many of the ideas in
"cutting edge" products are real-ly quite old. At any rate, there is a small
bug (basically, a 0 appears several places in a table where a +2 ought to be)
in the Pentium's implementation of the SRT divider.
Tim's article is a model of reverse engineering. Not only did he use the
pattern of numbers whose reciprocals the Pentium calculates incorrectly to
deduce the division algorithm used by the Pentium, but he constructed a model
that predicted which other divisions would fail. He then confirmed these
predictions on an actual Pentium. Most so-called "software engineering" is
never like this, alas. Tim has demonstrated a genuinely scientific approach to
software analysis.
As Tim points out, thanks are due to Dr. Thomas Nicely, not only for
uncovering this bug, but also "for providing this window into the Pentium
architecture." Flaws reveal far more than success, and the full story of the
Pentium debacle is far from over. Intel lost nearly half a billion dollars
because it tried to cover up the bug. The company recently instituted new
policies to inform customers about future processor defects.
You can expect further "Undocumented Corners" on the Pentium processor. For
some time, I have been trying to put together a piece on the undocumented
"Appendix H" features of the Pentium, particularly its Virtual-8086 mode
extensions. If you have any comments on this or any other important,
undocumented, or buggy interface, please contact me on CompuServe at
76320,302.
On October 30,1994, Professor Thomas Nicely sent e-mail to several people
(including Andrew Schulman) regarding a bug in the Pentium divider. For
example, he wrote, 1 divided by the prime number 824,633,702,441 (a twin-prime
pair with 824,633,702,443) "is calculated incorrectly (all digits beyond the
eighth significant digit are in error)." Dr. Nicely provided several other
values for which the Pentium produces an incorrect reciprocal, noting that the
bug can be observed "by calculating 1/(1/x) for the above values of x. The
Pentium FPU will fail to return the original x (in fact, it will often return
a value exactly 3072=6*0x0200 larger)."
Schulman (who didn't have a Pentium machine at the time) forwarded the e-mail
to Richard Smith of Phar Lap Software, asking him if he knew anything about
the bug. After verifying the bug, Smith reposted Nicely's message to the
CompuServe Canopus forum hosted by Will Zachmann.
Alexander Wolfe at the Electronic Engineering Times saw this post and
contacted Terje Mathisen in the comp.sys.intel Internet newsgroup. Mathisen
also verified the bug and wrote a small program to test for it. Mathisen then
posted his work, a brief excerpt from which appears in Listing One , to
comp.sys.intel, starting the thread "Glaring FDIV bug in Pentium!" Andreas
Kaiser saw this and promptly wrote a program to do reciprocals of random
numbers and let it run for a day. On November 4, he posted the divide failures
he saw back to comp.sys.intel. Wolfe's article, the first published
description of the FDIV bug, appeared on the front page of EE Times (November
7, 1994).
Kaiser wrote that he had performed roughly 25,000,000,000 reciprocals and that
the division was usually correct. He knew that the exponent did not matter: If
X fails, then X*(2N) will also fail, so he divided each failure by 2 until
odd. The 23 numbers shown in Listing Two were the failing reciprocals he
found; I will refer to these numbers as "Kaiser's list."


Radix 4 and SRT Division


At that time, I was considering buying a Pentium machine and was following
several PC-related newsgroups. On November 6, the two snippets in Listings One
and Two flowed across my terminal. 
There was a pattern in Kaiser's list. As a floating-point designer, I wondered
what could be derived about the Pentium divider design from that pattern, so I
started writing the numbers out in binary. Listing Three shows Nicely's prime
and one number from Kaiser's list in binary. Analysis of the numbers from
Kaiser's list reveals that all but two of them are of the form in Figure 1(a),
where J and K are integers greater than or equal to 0 and delta is a real
number that has varying ranges depending on J, generally between 0 and 1.
The 2K factors common to all the terms in Figure 1(a) reflect the
arbitrariness of the exponent in the occurrence of an error. The 2(2*J)
factors common to the terms that express the deviation of an operand from a
binary scaled 3 indicate that the Pentium divider must be an iterative divider
that computes two bits of the quotient per cycle. This is because this
deviation must be creating some specific pattern in the remainder that acts
like a key--unlocking the bug and releasing it to do its damage.
That this key can be multiplied by a somewhat arbitrary power of 4 (note:
2(2*J)= 4J) and still unlock the bug, means two things: 1. On each cycle, the
Pentium is multiplying the key by a factor of 4; and 2. greater values of J
represent this key starting deeper in the remainder and therefore reaching the
point where it unlocks the bug on a later cycle. To multiply the remainder by
4 on each cycle, two bits of quotient must be generated.
Two bits/cycle is in rough agreement with the quoted 39 cycles/extended long
division from the Pentium data book: 32 cycles are needed to generate roughly
64 bits of quotient, with a few extra bits generated to allow for correct
rounding and a couple of additional cycles required to set the divide up and
finish it off. 
The technical name for this type of divider is "radix 4," which essentially
means the operation is performed in base 4. The longhand decimal division
taught in school is a radix 10 iterative divide algorithm. The algorithm
selects an appropriate quotient digit on each iteration and then recalculates
the remainder and quotient according to Figures 1(b) and 1(c).
An "appropriate quotient digit" is defined as a digit for which the remainder
after the application of equation Figure 1(b) is both greater than a geometric
sum along the radix of the least-possible quotient digit, times the divisor,
and less than a geometric sum along the radix of the greatest-possible
quotient digit, times the divisor; see Figures 1(d), 1(e), and 1(f).
There are several multiplies in Figures 1(b) and 1(c). Hardware to implement
multiplies by anything other than a power of 2 is very expensive both in chip
area and time. Multiplications by positive and negative powers of two are just
shifts and inverts. Having radix 4 and the possible quotient digits of 2, 1,
0, 1, and 2 meet the multiplication criteria nicely.
Having five possible digits in a radix 4 divider brings us to what is known as
an "SRT divider," named after its three independent discoverers--D. Sweeney
(IBM), J.E. Robertson (University of Illinois), and K.D. Tocher (Imperial
College of London). The Robertson and Tocher papers were published in 1958.
Selecting a quotient digit when the number of possible digits equals the radix
leaves no margin for error. By turning around the equations in Figures 1(b)
and 1(c) for longhand radix 10, we get the equation in Figure 1(g). Making
perfect digit selections would be very expensive in terms of hardware. What
the originators of the SRT algorithm proposed was that, by having more
possible digits than the radix, a certain amount of slop in quotient-digit
selection would be recoverable. Turning around the equations in Figures 1(b)
and 1(c) for radix 4 and the digit set 2, 1, 0, 1, 2 leads to the equation in
Figure 1(h).
Now there is only one possible digit selection only if it is well within the
two bounds. A reasonable implementation of the equation in Figure 1(h) can be
achieved using only a limited number of the most-significant bits of both the
divisor and the remainder. In hardware, this can be realized with a table,
comparators, or random logic. The Pentium uses a lookup table (I found this
out from Intel much later in the game); my models use functionally equivalent
sets of comparisons. Listing Four is an example of longhand division in base
4.


History


Having made the multiplies in the equation in Figure 1(b) easy to perform, the
next issue is how to perform the add. The most conceptually straightforward
way to do this is with a regular adder. But regular adders that produce a
normal sum have to deal with the problem of propagating carries. 
For example, examine the two adds in Listing Five . Note that only changing
one input bit resulted in many of the output bits changing through the
propagation of a carry. A regular adder must handle all possible occurrences
of this situation anywhere in the add, and it must handle this situation
correctly. Propagating carries is expensive in terms of both hardware and
time.
Carry-save adders are often used in hardware floating-point design to perform
adds in such a manner that carries need not be propagated. Carry-save adders
can be of several different types. One of the simpler types that is adequate
for the remainder calculation in a radix 4 divider is known as a "3-to-2
carry-save adder." At each individual bit position, the carry-save adder takes
three bits (one from each of the input operands) and computes their sum. This
sum is expressed as a sum bit, which has the same significance as the input
bits, and a carry bit, which has twice the significance as the input bits. The
truth table in Table 1 illustrates the logic performed at each individual bit
position of the carry-save adder.
The input words to the carry-save adder are the old-remainder sum word,
old-remainder carry word, and digit*divisor for the given cycle. The outputs
of the carry-save adder are the new-remainder sum word (shifted left 2 to
reflect the multiplication by the radix) and the new-remainder carry word
(shifted left 2 to reflect the multiplication by the radix and shifted left 1
more to reflect the extra significance of the carry bits). Table 1 is very
easy to implement in hardware.
The true remainder value on any given cycle is the sum of the remainder sum
word and the remainder carry word. When a normal carry-propagate adder is used
to calculate the remainder, there is only one way to represent a remainder of
a given value. When a carry-save adder is used to calculate the remainder, the
way in which the true remainder value is apportioned between the sum word and
the carry word depends upon the history of carry-save adds performed. 
In particular, I had noticed a while back that long, coincident sequences of
1s in both the sum word and the carry word occurred following a very specific
history. To get a coincident sequence of 1s of length N+1 in the remainder at
the end of a cycle, there must have been a coincident sequence of 1s in the
remainder of at least length N at the beginning of a cycle and the digit*
divisor must have a sequence of consecutive 1s of at least length N aligned
with the sequence in the remainder. The initial remainder, which is just the
dividend in the sum word, has all 0s in the carry word.
If a very specific history is necessary to create a pattern, that pattern will
be extremely rare. All sources indicated that divide errors were extremely
rare (1 in 1010 for random divides quoted from Intel in the November 7 EE
Times and 1 in 109 for random reciprocals from Kaiser's list). However, the
only way I could imagine the error being this rare was that the Pentium
divider was using a carry-save adder to do the remainder calculation and that
the long, specific, and therefore rare history associated with the buildup of
long coincident sequences of 1s in carry-save remainders was involved with the
failure.

Terje Mathisen had provided the quotients that resulted from taking the
reciprocal of Nicely's prime on both the Pentium and the 486. I already knew
the divisor and initial remainder (1 for a reciprocal), and from the quotients
I could extract the digit sequences, giving me the history of digit*divisor.
It turns out that multiple digit sequences are possible (see Listing Six) that
can give the same quotient, and indeed there was some ambiguity near the cycle
where I surmised the error was occurring. 
When I saw the long sequences of 1 digits, I wrote a simple model of a divider
with the digit sequence hardwired to +1, followed by endless 1s and a
carry-save adder to do the remainder recalculation. The bit patterns that
developed included large (>= 5 bits long near the failure), coincident
sequences of 1s in the remainder. I started running the numbers included in
Kaiser's list and numbers near them that I surmised were not failing. I noted
two conditions had to be met at the beginnings of cycles 14 and 15 for numbers
near Nicely's prime to fail.
The first condition was associated with the selection of 1 as the quotient
digit on cycle 14 and could be expressed thus: The sum of the eight
most-significant bits in my representation of the remainder sum word and the
remainder carry word (chopped remainder sum, note that chopping of the 56
least-significant bits comes first and summing comes second) at the beginning
of the cycle could not be greater than 250 or less than 239. Since I was
carrying five bits to the left of the binary point (four are adequate, but I
left plenty of room) and representing everything in two's-complement format,
this corresponds to a chopped remainder sum between 17/8 and 6/8, inclusive.
The second condition was associated with the value of the remainder on cycle
15, the cycle after the last selection of a 1 digit for the quotient. I
determined that to get an error, the value of the chopped remainder sum had to
be 30 (30/8 if the binary point is taken into account) and the first three
bits in both the remainder sum word and the remainder carry word that were
chopped off (six bits total of significance ranging from 24 to 26) all had to
be 1. (I found out three weeks later that this condition, while empirically
correct about whether an error would occur, is actually a consequence of the
root cause of the error.)
The 5-bit long sequence of coincident 1s in the carry-save remainder at bit
positions 24 to 28 and at least one at bit position 29 at the beginning of
cycle 14 is the requirement to generate the conditions tested for at the
beginning of cycle 15.
Exactly the same conditions applied to other numbers on Kaiser's list, but
they applied to cycles 12 and 13; J=2 in Figure 1(a) for Nicely's prime.


Move it Up


The pileup of 1s in the remainder that was a precondition for the error didn't
start until about halfway through the long sequence of 1 digits. A long
sequence of 1s was necessary in the divisor for the pileup. When doing
reciprocals, both this sequence and the starter pattern associated with the
1149 term in Figure 1(a) had to have the divisor as their root source. I knew
that hardware dividers were by no means restricted to doing reciprocals, so I
wondered whether I might be able to move the starter pattern into the dividend
and just use the divisor as a bed upon which pileups of 1s could grow. I
thought I might be able to get the pileup to start growing in the first cycle,
thereby meeting the conditions I had determined necessary for an error in the
seventh or eighth cycle.
After playing with the divisors and dividends, I determined that the following
dividend and divisor constituted the smallest pair of integers that would
induce the error on the earliest cycle possible:
hex 800bf6 / bffffc
decimal 4195835 / 3145727
I had been doing all my modeling on a Sun IPC running UNIX, and I had no
access to a Pentium. After predicting the above divide would fail, I drove
down to my local CompUSA where I asked a salesperson how to load up Windows
calculator on a Pentium (I had never used Windows before). I did the above
divide, then multiplied back by the divisor, and what do you know--the answer
was off by 256. This represented an error of 1 part in ~16000. Listing Seven
provides an algebraic analysis of the relationship between the numbers on
Kaiser's list and the above ratio.
I mulled this over for a couple of days and then wrote an abbreviated
description of my reasoning. On November 14, I posted the reasoning and
program (TWO-THIRDS.C, available electronically; see "Availability," page 3)
along with an explanation of the output of TWOTHIRDS.C and ITERATOR.C to
comp.sys.intel. 


More of the Same


It turns out I was not the only one writing the numbers on Kaiser's list in
binary. Dik Winter had also been doing so and queried Andreas Kaiser about the
two entries on his list that don't match the equation in Figure 1(a); Kaiser's
response is shown in Listing Eight .
To address these final two cases, the divider model had to be fleshed out with
a full quotient-digit-selection algorithm and a fully capable calculator of
digit*divisor. The second item was trivial to accomplish (do the appropriate
shift, invert, or clear). The creation of a quotient-digit selector required
the application of theory and the appropriate choices where the algorithm
allowed multiple possibilities for the quotient digit; see Figure 1(h).
Due to the nature of the condition for failure on cycle 14 for numbers near
Nicely's prime, the approximation of the remainder used to determine the
quotient digit had to be the previously defined chopped remainder sum. The
quotient digit of 1 selected for the chopped remainder sum of 250 (6/8) could
equally well have been assigned to 0. I took this to mean that I should choose
the quotient digit farthest away from 0 consistent with correct divider
operation versus a given combination of remainder and divisor. 
So what approximation of the divisor would be appropriate to use in
quotient-digit selection? My analysis convinced me that I needed the first six
bits of the divisor. But it turns out there was an error in my analysis; only
five bits of the divisor are necessary for quotient-digit selection. This
error led me to believe that the full divider model would not handle certain
divisors correctly, but this turned out to be wrong; the error had no adverse
impact on the divider model's ability to predict errors.
The quotient-digit selector is implemented in two parts. Once, at the
beginning of the divide, the most-significant six bits of the divisor are
checked against several thresholds to pick the thresholds to be checked
against the remainder. At the beginning of each cycle of the divide the
chopped remainder sum is checked against these thresholds to determine the
appropriate quotient digit, and digit*divisor is subsequently calculated.
I ran the final two cases from Kaiser's list through the model and saw that
big pileups of 1s and conditions very similar to the original second condition
for failure were occurring on corresponding cycles for these divides. The only
differences in the conditions for failure were different values for the
chopped remainder sum. The requirement for six 1s in the most-significant
chopped-off bits of the remainder remained the same.
To model the occurrence and amount of the error, a check for the conditions
for failure was inserted into the program; if an error condition was detected,
the program would ask the user whether a correct or incorrect result was
desired. The amount that appeared to be incorrectly subtracted from the
remainder on the cycle of interest was too small to be due to a remainder
overflow. Also, the amount subtracted off the remainder was 3 for divisors
beginning with 0x8f and 4 in other cases. 
I could not conceive of a specific error in the logic design that would result
in this somewhat bizarre specification and consequences of failure. So I
attributed the error to some complexity (unknown to me) used to speed up digit
selection in the Pentium divider's quotient-digit selector and remainder
calculator. To model the amount of error precisely, I simply subtracted the
appropriate amounts from the remainder when an error condition was
encountered.
I posted my more-complete divider model to comp.sys.intel on November 16. At
this point, the following divisors and digit sequences near the error had been
found to produce errors: 
0xbf.. 1 +2
0x8f.. 1 +2
0xa7.. 2 +2
Upon examination of the quotient-digit-selection algorithm, the following
divisors and digit sequences appeared to create very similar situations:
0x8f.. 2 +2
0xbf.. 2 +2
0xd7.. 2 +2
0xef.. 1 +2
0xef.. 2 +2
I played around with the possible combinations of dividend and divisor that
would appear to induce failure and discovered that the following fairly simple
conditions would generate error conditions along the 2 +2 digit sequence:
(intdividenddeltadividend)/
(intdivisordeltadivisor)
intdivisor=3, 9, 15, 21, 27
intdividend>0
108>deltadividend/intdividend>
deltadivisor/intdivisor>0
Either intdividend modulo intdivisor=intdivisor/3 or intdividend modulo
intdivisor= 2*intdivisor/3 must hold (which one depends on the relative binary
exponents of the operands); for example, 6.9999995/2.99999999. These operands
were simple enough to generate on the fly. I also constructed a test case for
the remaining 1 +2 digit sequence case and made another trip to CompUSA. All
of the cases listed earlier produced divide errors. I also tested a couple of
cases halfway between these cases on the outside chance that they would
produce divide errors; these divides were performed correctly.
I updated the divider model to test for and correctly model all the divide
errors that I knew of and posted my results along with some general
characteristics of operands that were at risk and an analysis of what the
probabilities of error actually were. The post went out to comp.sys.intel on
November 20. The divider model available electronically (ITERATOR.C) has more
extensive comments and improved variable names; the original error test and
modeling are included but commented out.


A Final Insight


Over the two weeks following my November 20 posting, I was contacted by Cleve
Moler of The Mathworks and became involved in his and Mathisen's effort to
produce an efficient and accurate software workaround to the Pentium divide
problem. We determined that in order to achieve accuracy, the software patch
would have to do a divisor check prior to each division to determine if the
divide had any risk of failing. If this check came back positive, both the
divisor and dividend would be scaled by 15/16 and the divide would then be
performed as normal. This scaling was guaranteed to map any at-risk divisor
into a divisor that was not at risk. My part in this effort was to determine
what constituted an at-risk divisor.
I had been investigating the possible quotient-digit sequences and divisor-bit
sequences that could cause an error by using my model of the Pentium divider.
In particular, I had been trying to construct erroneous divides with the least
possible consecutive 1s in the divisor starting at bit position 25. I had been
able to construct an error with only eight consecutive 1s (bit positions 25 to
212) but no fewer. So we decided that a bit mask checking for 1s in these
positions along with a table lookup to ensure that the first five bits of the
divisor were also at risk would be a good divisor check.
At this point Moler, Mathisen, and I got together with the Intel compiler
group and Peter Tang to jointly produce a software patch that Intel would then
distribute to all compiler and assembler vendors. We immediately concluded
that a divisor check based only upon empirical results would not be of
sufficient quality. What was needed was closed-form proof that a certain
number of 1s was required to address the flaw in Intel's P-D (Partial
Remainder-Divisor) table. This proof would use the design itself, not
empirical results, as its starting point. 
On December 2, I received a copy of the Intel white paper on the flaw, but I
had a difficult time reconciling the flaw in the P-D table with my empirically
determined conditions for failure. I initially thought that I would need to
know some additional design details to produce a proof. Later that day,
however, I realized that what I really needed was better insight into the
mechanism of failure.
For at-risk divisors just less than 3/2 (first five bits 1.0111), my
empirically determined conditions for failure were a chopped remainder sum of
30 (30/8) and six 1s in the highest bits of the chopped-off portion of the
carry-save remainder. The flawed P-D entry for this divisor was associated
with a chopped remainder sum of 31 (31/8).

The insight was that my model was detecting the error one cycle before the
error actually occurred! Meeting the model's condition for failure on
cycle[flaw1] always resulted in the selection of a +2 digit and the addressing
of the flaw on cycle[flaw]. In addition, this was the only way to reach the
flaw. Using this insight, Tang and I were able to produce a closed-form proof
that showed that six 1s in bit positions 25 to 210 in the divisor were
required to address the flaw. I was later able to construct a divide failure
that had a 0 in bit position 211 showing that my originally determined eight
1s was incorrect.
After addressing the flaw, which incorrectly selects a 0 digit instead of a
+2, the remainder overflows. This gives the appearance that 16 has been
subtracted from the remainder at the end of cycle[flaw]. For divisors
beginning with 1.0100, 1.0111, 1.1010, and 1.1101, this results in a remainder
that abides by the equation in Figure 1(f), and the divide continues normally.
For a divisor beginning with 1.0001, the remainder is out of the bounds of the
equation in Figure 1(f) at the beginning of cycle[flaw+1]. A 0 digit is
selected again on this cycle and the remainder overflows again in the opposite
direction. Since this overflow occurred on the next cycle, it is a factor of 4
less than the first overflow, leading to a net effective subtraction from the
remainder of 12. The remainder is then back in bounds, and the divide proceeds
normally.
With the exception of the five flawed P-D entries, no other design details are
relevant to these divide errors other than those that were reverse-engineered.
Intel's P-D table is functionally identical to my thresholding mechanism in
selecting quotient digits. Modification of the model to reflect the flawed P-D
entries actually made my modeling of the error conditions simpler. A model of
the flaw is included in ITERATOR.C.
I would like to thank Cleve Moler, Terje Mathisen, Peter Tang, and the forces
at Intel for involving me in their effort to create a software workaround to
this problem; Andreas Kaiser for providing such incredibly valuable
information; and my employer, Vitesse Semiconductor, for supporting my
involvement in this firestorm. Most of all, I would like to thank Dr. Thomas
Nicely for opening this window into the Pentium's divider design.


Bibliography


Atkins, Daniel E. "Higher-Radix Division Using Estimates of the Divisor and
Partial Remainders." IEEE Transactions on Computing, 1968.
Koren, Israel. Computer Arithmetic Algorithms. Englewood Cliffs, NJ: Prentice
Hall, 1993.
Omondi, Amos R. Computer Arithmetic Systems: Algorithms, Architecture and
Implementations. Englewood Cliffs, NJ: Prentice Hall, 1994.
Sharangpani, H.P. and M.L. Barton. "Statistical Analysis of Floating Point
Flaw in the Pentium Processor (1994)." Intel Corp., November 30, 1994.
(Available from http://www.intel.com/product/pentium/white11/ index.html.) 
Figure 1 Equations describing (a) the common features of the failing
reciprocals; (b - f) the correct operation of dividers in general; (g) radix
10; and (h) radix 4 SRT dividers.
Table 1 Carry-save adder truth table.

Listing One 

Pentium (60 & 90)
 8.24633702441000E+0011 = 4026BFFFFFB829000000 824633702441
 1.00000000000000E+0000 = 3FFF8000000000000000 1
 1.21265962489116E-0012 = 3FD7AAAAAADFDB8E4CCB 1/824...
 9.99999996274710E-0001 = 3FFEFFFFFFF000000001 (1/824..)*824...
486DX
 8.24633702441000E+0011 = 4026BFFFFFB829000000
 1.00000000000000E+0000 = 3FFF8000000000000000
 1.21265962940867E-0012 = 3FD7AAAAAAEA8638FB73
 1.00000000000000E+0000 = 3FFF8000000000000000



Listing Two

 3221224323 12884897291 206158356633
 824633702441 1443107810341 6597069619549
 9895574626641 13194134824767 13194134826115
13194134827143 13194134827457 13194138356107
13194139238995 26388269649885 26388269650425
26388269651561 26388276711601 26388276712811
52776539295213 52776539301125 52776539301653
52776539307823 52776553426399 



Listing Three

1011111111111111111111111011100000101001 = 824633702441
1011111111111111111110111000001000110111101101 = 52776539295213



Listing Four

In base 4 representation the operands are:

dividend 1.00000113323 X 2^22
divisor 1.13333333332 X 2^21

step number ==> 1 2 3 4 5 6 7 8 9

plus digits ==> 1.0 0 0 0 0 0 2 ? <== Add these two numbers
minus digits ==> -0.1 1 1 1 1 1 0 ? <== to create the quotient
 -------------------------
 1.13333333332 1.0 0 0 0 0 1 1 3 3 2 3 [dividend=remainder 1]
 -1.1 3 3 3 3 3 3 3 3 3 2 (minus 1*divisor)
 -----------------------
 -1.3 3 3 3 2 2 0 0 0 3 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 -1.3 3 3 2 2 0 0 0 3 2 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 -1.3 3 2 2 0 0 0 3 2 2 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 -1.3 2 2 0 0 0 3 2 2 2 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 -1.2 2 0 0 0 3 2 2 2 2 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 -0.2 0 0 0 3 2 2 2 2 2 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*divisor)
 -----------------------
 3.3 3 3 0 1 1 1 1 1 2 0 [remainder 8]
 -2.3 3 3 3 3 3 3 3 3 3 0 (minus 2*divisor)
 -----------------------
 3.3 3 0 1 1 1 1 1 3 0 0 [remainder 9]
 ??.? ? ? ? ? ? ? ? ? ? ? (minus ??*divisor)
 -----------------------
 ??.? ? ? ? ? ? ? ? ? ? ?
plus digits ==> 1.0 0 0 0 0 0 2 0 0 0...
minus digits ==> -0.1 1 1 1 1 1 0 0 0 1...
 -------------------------
 3.3 3 0 1 1 1 1 1 3 0 0
[The Pentium's Choice :-)] 0.0 0 0 0 0 0 0 0 0 0 0 (minus 0*divisor)
 -----------------------
[overflow remainder and 3 3.3 0 1 1 1 1 1 3 0 0
wrap around to negative ==>] -0.0 3 2 2 2 2 2 1 0 0 0
[The division continues] 0.0 0 0 0 0 0 0 0 0 0 0 (minus 0*divisor)
 -----------------------
 -0.3 2 2 2 2 2 1 0 0 0 0
 1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*dvsr.)
 -----------------------
 2.1 1 1 1 1 2 3 3 3 2 0
 .
 .
plus digits ==> 1.0 0 0 0 0 0 2 2 2 2...
minus digits ==> -0.1 1 1 1 1 1 0 0 0 0...
 -------------------------
 3.3 3 0 1 1 1 1 1 3 0 0
[Correct Answer] -2.3 3 3 3 3 3 3 3 3 3 0 (minus 2*divisor)
 -----------------------
 3.3 0 1 1 1 1 1 3 1 0 0
 -2.3 3 3 3 3 3 3 3 3 3 0 (minus 2*divisor)
 -----------------------
 3.0 1 1 1 1 1 3 1 1 0 0
 -2.3 3 3 3 3 3 3 3 3 3 0 (minus 2*dvsr.)
 -----------------------

 0.1 1 1 1 1 3 1 1 1 0 0
 .
 .



Listing Five

 10011010111011001 10011010111011001
 + 1000101000101001 + 1000101000100001
 ^ ^
 ----------------- -----------------
 11100000000000010 11011111111111010


Listing Six

Cycle number 13 14 15

Pentium ==>
 +1 -1 (ten -1's) -1 -1 +2 0 0 ...
 or
 +1 -1 (ten -1's) -1 0 -2 0 0 ...

Correct ==>
 +1 -1 (ten -1's) -1 -1 +2 +2 +2 ...
 or
 +1 -1 (ten -1's) -1 0 -1 -1 -2 ...



Listing Seven

All but two of the numbers posted by Andreas Kaiser
(including Nicely's prime) had the form:

3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J)))

Normalize this expression to a number in [1,2) by
dividing by 2^(K+31):

3/2 - 1149*(2^(-31 - 2*J)) - delta*(2^(-31 - 2*J))

Delta has to be generally between 0 and 1 but these bounds
vary with J. Taking note that 1149 = 1152 - 3 = (9/8)*1024 - 3,
the above can be restated as:

3/2 - (3/2)*((3/4)*(2^(-21 - 2*J)) - delta*(2^(-31 - 2*J)))
 ---------------------x-----------------------

The restrictions on delta are now that it must be in general between
4/3 and 2. It turns out the upper limit on delta of 2 varies very
little with J but the lower limit of 4/3 varies greatly with J. (For
J large it goes towards a limit of 0 and for J negative it is greater
than 2, i.e., no failures.)

It is now clear that the criteria for failure of a divide is an
at-risk divisor (an at-risk divisor is one with a bit sequence
appropriate for the buildup of a pile of 1's for a given quotient

digit selection sequence; for this digit sequence this means ~19
consecutive 1's) coupled with a specific digit selection sequence
during the divide. A specific digit selection sequence is precisely
equivalent to getting a specific quotient. Do a simple one term
Taylor series expansion to get a simple expression for the quotient:

(1 + y)/(3/2 - (3/2)*x) = (2/3)*((1 + y)/(1 - x)) ~=
(2/3)*(1 + y)*(1 + x) ~=
(2/3)*(1 + y + x)

The variable named 'y' is 0 in the initial reciprocals. The reason
there are no failures for J negative in the above expression for 'x'
is that the divisor becomes not sufficiently at risk. If more of the
contribution to the quotient is moved from 'x' to 'y' J can be
brought negative and while still maintaining an at risk divisor:

y + x = (3/4)*(2^(-21 - 2*J)) - delta*(2^(-31 - 2*J))
0 < x < 2^-21 : This is the restriction associated
 : with the lower bound on delta

For the pair 4195835/3145727 the analysis works as follows:

4195835 = 2^22 + 1531 = 2^22 + (3/4)*2^11 - 5
 = (2^22)*(1 + (3/4)*(2^-11) - (5/2)*(2^-21))
3145727 = 3*(2^20) - 1 = (2^21)*((3/2) - (2^-21))
 = (2^21)*(3/2)*(1 - (2/3)*(2^-21))
===>
y = (3/4)*(2^-11) - (5/2)*(2^-21)
x = (2/3)*(2^-21)
y + x = (3/4)*(2^-11) - (11/6)*(2^-21)

This corresponds nicely to the above equation with
J = -5 and delta = 11/6.



Listing Eight

dik@cwi.nl writes in article <CytpMv.6D2@cwi.nl>:

> 1000111111111111111000110101111000010101000100 = 9895574626641
> 1010011111111111111101101101011000010010100000 = 1443107810341
> 1011111111111111111110111000001000110111101101 = 52776539295213
>
> Except for the first two there is a common definite pattern:
> a leading 10, followed by a bunch of 1's, followed by 0111000001.
> If the random numbers are random enough this seems to be
> significant. I would like to see verification of the first
> two numbers listed (perhaps a transcription error or some-such?).

No error of mine, but the results of these two numbers have a
significantly longer correct mantissa as the results of all others:

 9895574626641
9.895574626641000e+12 = 402A8FFFE35E15100000
1.000000000000000e+00 = 3FFF8000000000000000
1.010552734661427e-13 = 3FD3E38E6622AB7F2614
9.999999999998295e-01 = 3FFEFFFFFFFFFFD00000
 1443107810341

1.443107810341000e+12 = 4027A7FFF6D612800000
1.000000000000000e+00 = 3FFF8000000000000000
6.929489209567026e-13 = 3FD6C30C3B66AAA79320
9.999999999999858e-01 = 3FFEFFFFFFFFFFFC0000



























































PROGRAMMER'S BOOKSHELF


More on Win95: Inquiring Minds Want to Know




Al Stevens


Al is a DDJ contributing editor and can be reached on CompuServe at
71101,1262.


In last month's "Programmer's Bookshelf," I examined Inside Windows 95, by
Adrian King. Unauthorized Windows 95, by DDJ contributing editor Andrew
Schulman, is the second book to be published about Windows 95 and, with
typical candor and humor, Schulman tells the other side of the story. By
examining Windows 95 internals as only he can, Schulman dismantles most of the
Microsoft party line about what Windows 95 really is, while at the same time
reassuring us that not only will Windows 95 dominate the desktop over the next
several years, but that it is a good operating system to boot (and to boot).
This is clearly a book for systems programmers. You have to care about all
those arcane constructs that start with the letter V--VMM, VxD, VXDLDR, and
the like. I do, but I won't get into the details here. While purporting to
reveal heretofore untold secrets about the unreleased operating system,
Schulman's book goes beyond that simple tabloid appeal. It is an excellent
study in the reverse engineering of a complex piece of software. Most
programmers never have to do that, but everyone should understand how it
works, and Schulman's blow-by-blow description of how he took Win95 apart is
fascinating. His past experience with dissecting DOS and Windows 3.x helped
him a lot, as did his association with authors who write books with
"Undocumented" and "Internals" in their titles. He brought to the task a solid
suite of diagnostic tools and a thorough understanding of how things work and
how to find out how things work.
I find, therefore, three threads running through Unauthorized Windows 95: the
deep and heavy technical descriptions of how things work within Windows 3.1,
Windows 95, and DOS; Schulman's hacker techniques, which explain how he viewed
the system's internals; and the ever-present reminder that Microsoft
represents Windows 95 one way, while Schulman's investigation reveals a quite
different picture.
Who needs to know what this book reveals about Windows 95? I can think of
several reasons to want or need this knowledge. A programmer who uses
undocumented features to extract that last ounce of performance from an OS
benefits from this knowledge. (Take care, though. Hitching your wagon to
unpublished behavior in an unreleased OS has perils.) Programmers with the
hacker drive just want to know how things work. Others like to see giant
Microsoft caught with its pants down. Some like to feed the ubiquitous
conspiracy frenzies. There are other reasons, too. Schulman points out many
issues that programmers need to know. For example, DOS programmers who use INT
2Fh need to know that DOS has Windows-specific hooks into a 2Fh interrupt
handler, and Schulman provides details.
What Microsoft claims does Schulman assail? I'll discuss that shortly. First,
however, be aware that he sometimes associates King's words with Microsoft by
quoting the King book and treating the words as though they were Microsoft
claims. You have to watch for that. In other places, Schulman uses as a source
the Windows 95 Reviewers Guide that beta testers received. In any event, some
of these claims are King's, some are Microsoft's, some are from both, and most
have been echoed in the trade press. With all that ballyhoo and without
dissenting opinions, readers might believe the claims, and Schulman is not
about to let that happen.
King does not pretend to be a Microsoft spokesman, even though he apparently
worked with their sanction and without the constraints of a nondisclosure
agreement (NDA). King's book is published by Microsoft Press and is mostly
pro-Windows 95, but he gives at least enough criticisms to suggest that the
facts presented are as King understands them; they are not official Microsoft
positions. Maybe they represent what Microsoft told King, trusting that he
would believe them and dutifully report the details and positions that
Microsoft wanted to convey. Certainly he did not probe the software to the
extent that Schulman did.
Among other things, Schulman asserts and proves that Windows 95 does not
bypass MS-DOS, that the shell does not use OLE, that the OS is not a complete
rewrite, that the OS does not consist of closely integrated parts, that MS-DOS
is Windows-aware, and that Windows 95 makes extensive use of existing 16-bit
code, even when running Windows 95 alone with no command processor and no
16-bit Windows or DOS applications.
How important to a developer are these journalistic expositions? I'm not sure
because I don't know what difference they make to well-behaved programs
written to run under Windows 95. As I said earlier, the value of this work is
found as much in the details of Schulman's hacking techniques as in the
conclusions that he draws. What difference does it make, for example, that
Windows 95 is not a complete rewrite and that it uses 16-bit code? Who cares
about those details if the API works as advertised and users rush to install
Windows 95 and buy new applications? I don't know who cares; I'm not sure I
do, but it certainly makes fascinating reading.
The point of this book seems to be that Microsoft is putting out a great
operating system but that to support some vague marketing objective about
perception of the product, the company is misrepresenting the internal
details. Schulman implies that Microsoft has used King and the press as
conduits to perpetuate this misinformation. What marketing objective does
Schulman think Microsoft is pushing? It is a complex subject, but here is an
example. According to Schulman, Microsoft is promoting Windows 95 as a new
integration of the user interface and the underlying operating system. This
picture is painted to distance Windows 95 from the Windows 3.x/MS-DOS
environment, which is perceived as an operating system that runs an unrelated
operating environment--a "thing on a thing." By emphasizing the integration of
the two parts into a single operating system, Microsoft wants to build an
image of Windows 95 as an integrated OS as easy to use as the Macintosh and as
rugged as OS/2. These, of course, are perceptions. Not everyone thinks the Mac
is a breeze and that OS/2 holds together all that well, but most users,
particularly those who use Windows, have had to listen to those opinions for
years and probably believe them. Microsoft wants to instill the notion that
Windows 95 is as integrated as those other guys. Schulman's point is that
there is nothing new about that integration. The Windows 95 integration is
very much like the Windows/DOS integration. Once again, this is an issue of
how Microsoft represents its products to users rather than a technical issue
of interest to programmers. Certainly we are interested in how it works. But
how much do we care about how its image is marketed? If he is right, if
Microsoft is misleading the public, what good does it do to expose them at
such length to programmers? The users, not the programmers, need to know when
they are being misled, and this book is too technical for the typical user who
would be influenced by Microsoft's marketing ploys.
Schulman's agenda permeates the book. He occasionally uses a practice that is
common when strongly held opinions are advanced: He reports events that
support his position and ignores those that do not. For example, to illustrate
that the press misunderstands and misreports the facts, he quotes Ray Valds
in the March 1994 Dr. Dobb's Developer Update newsletter. Ray said, "Chicago's
shell uses OLE 2.0 extensively," a claim that Schulman assails. There are many
such quotes throughout the book to suggest that Microsoft misleads the press
and controls the news. But Schulman ignores a subsequent article by Ray in the
July 1994 issue of the same newsletter, where he said, 
...one myth promulgated by Microsoft that should be dispelled is Chicago's
relationship with OLE_. OLE has little to do with the Beta1 version of
Chicago, which is supposed to be "95 percent code complete." 
This point, reported by Ray in July, is one of Schulman's exposs in November.
Ray goes on to explain that Microsoft's shell developers had no choice--OLE
was not ready when they needed it, so they implemented an emulation, a
reasonable explanation, but one that Schulman ignores. 
Its agenda notwithstanding, the book will do well. I expect that by the time
you read this review it will have been a bestseller for several months,
primarily because it discusses in detail the kinds of things that interest
programmers, but also because the author is known and respected for his
loyalty to the technology and to the readers without concern for the interests
of the establishment. As you will see, this loyalty almost cost him the
ability to take early shots at future releases of anything.
Windows 95 involves two developer issues that Schulman addresses, but that he
does not relate to each other. First, he tells how Microsoft traditionally
enhances its operating-system products to include features previously provided
by other vendors. DOS's memory management was added when add-on products such
as QEMM and 386Max did well. There was the Stacker compression lawsuit. DOS
has gradually gained a better command-line interpreter, disk-management
utilities, virus detectors, a full-screen editor, a task swapper, and so on.
Each improvement cut into the revenues of third-party add-on vendors. Windows
95 continues that tradition with HyperTerminal, Microsoft Exchange, and so on.
If another developer comes up with something that enhances the operating
environment or an application, chances are that Windows 96 (or 95.5 or
whatever) will incorporate a similar feature, perhaps putting the developer
out of business. Schulman makes the point several times that Microsoft
consistently goes into direct competition with its own customers, wanting 100
percent of the general-purpose software market.
The second issue relates to certification. To use the Windows 95 logo on its
packaging, an application must conform to some new rules that Microsoft has
set out. Schulman discusses this subject briefly and questions Microsoft's
motives, suggesting that the requirement for applications to be NT-compatible,
for example, exists only to boost interest in NT, which isn't doing well in
the stores. But the issue raises questions that Schulman does not ask. Who
certifies applications? And when? Given Microsoft's reputation for hijacking
the ideas and markets of other developers, who in their right mind would
submit a new product for Microsoft review before running the first ad even if
third-party confidentiality is promised? 
A certain amount of controversy surrounds the publication of Unauthorized
Windows 95. Immediately following its appearance, Schulman lost his access
privileges to the private Windows 95 Beta forum on CompuServe. The Microsoft
sysops grant and deny that access, and the privilege is tied to the terms of
the beta NDA, which you must sign before you are allowed on the forum.
Schulman's removal set off a thread of more than 100 messages, mixed in their
opinions about the appropriateness of the action, but generally agreeing that
Microsoft has the right to remove anyone for any reason. Microsoft did not
participate in the thread and offered no explanation or other comments, but
they told Schulman that he was no longer eligible to receive pre-release
software.
At issue were the terms of the NDA. Until Microsoft Press released Inside
Windows 95, all beta testers were sworn to the usual strict oath of silence.
With the release of Beta2 (occurring coincidental to publication of King's
Inside Windows 95), Microsoft relaxed the NDA, permitting beta testers/authors
to publish descriptions of the product and screen shots. Demonstrations of the
operating system are still verboten, as is the publication of benchmarks. All
beta testers are furthermore enjoined by the NDA from reverse engineering or
disassembling the software except where such restrictions are prohibited by
applicable law.
By his own admission in the book, Schulman both disassembled and reverse
engineered Windows 95. He shows disassembled DEBUG output of many Windows 95
functions in order to prove his points. He tells how he used Soft-ICE and some
of his own programs to set breakpoints and examine Windows 95 code and
internal data structures, all of which can be reasonably interpreted as being
in clear violation of the NDA. Mind, I'm not chastising Schulman for doing it.
That was his decision, and it resulted in a very good technical book for the
rest of us.
Were these intrusions enough to get Schulman the boot from the hallowed
grounds of beta privilege? Probably not on their own, and probably not if the
author had been anyone else. But Andrew Schulman is Undocumented Man,
unabashedly publishing undocumented details of any and all system-software
products, mostly those from Microsoft. He wrote the Windows
AARD-detection-code expos in the September 1993 issue of Dr. Dobb's Journal.
He was Stac Electronics' expert consultant in its two-way lawsuit against
Microsoft. He is an outspoken critic of Microsoft in public discussions of the
FTC and Justice Department investigations into Microsoft's business practices.
He is alternately Mike Wallace, Ralph Nader, and Clark Kent, almost always
lined up in the ranks against Microsoft. In other words, Schulman was no doubt
already at or close to the top of the Bill Gates enemies list well before
publishing this book. And in this book he misses no opportunity to take shots
at Microsoft, further endearing him to the powers at Redmond.
Schulman compares Microsoft to a "school board trying to wriggle out of a
court order" when they assert that Windows 95 is integrated. Ironically, the
analogy returned to haunt him. Even if Microsoft cannot legally prevent
reverse-engineering activities, it, like Lester Maddox years ago, seems to
have reserved the right to refuse service to anyone for any reason--at least
as far as beta privileges go--and for a few days, Schulman bore the mark of
the ax handle. Happy ending. After a lot of publicity about Schulman's shabby
treatment, Microsoft relented and let him back into the forum.
Unauthorized Windows 95 is very well written. If you ignore the incessant and
tiresome Microsoft-bashing, you will find an abundance of solid, technical
data, the result of Schulman's incisive penetration of a complex software
system. It is a superb achievement to have uncovered as much detail given the
time available, and it is a testament to Schulman's writing skills that he
made such details so readable. Of course, Windows 95 is still in beta. It
remains to be seen whether the released version will warrant all the same
criticisms.
Unauthorized Windows 95
Andrew Schulman
IDG Books, 1994, 608 pp., $29.99
ISBN 1-56-884-1698






















SWAINE'S FLAMES


Behind the Interface


The PARC/Mac/Windows user interface has grown way too complicated. There have
been a few improvements from Apple since 1984, and at least one from
Microsoft, but in terms of usability it's been going downhill for a decade.
Anybody trying to design a replacement for this unwieldy GUI deserves our
respect. 
But it ain't easy.
In academic circles, you sometimes hear talk of "computer/user interface"
rather than just "user interface." The former is a better term, really. An
interface is the surface of contact between two realms; fully specifying the
interface requires that you specify the two realms. Although "computer/user
interface" isn't very specific, at least it reminds us that building an
interface starts by asking a two-slot question.
Interface what to what?
Obviously, a computer/user interface interfaces the user to the computer,
giving the user access to--what? To masses of information, apparently, and to
functionality that the user can use to massage that information. That's what
the user needs to access on every computer or computer network I've
encountered. The information typically resides on disks and other storage
media, and typically has an inherent hierarchical structure. There are
containers and things contained, and some of the things contained are
themselves containers. This hierarchy is certainly a feature of every modern
operating system, and is arguably a feature of the information itself.
The user interface to this information ought, then, to represent those things
and containers and their hierarchical relationship. You don't want to limit
the information by requiring the interface to know something about the
contents of those "things," so their representation should be generic, but
they need to be identifiable, so they probably need labels. And the user needs
to be able to look inside the containers to see what things they contain. Now,
an interface ought to be as thin as possible. It should place the user in as
close contact with the information as possible. That sounds like direct
manipulation.
Labeled, generic representations of things and containers, directly
manipulatable, openable containers_that sounds like nested icons that open to
windows. It's hard to think what else it could sound like. The other thing
that the user needs to be interfaced to is the functionality of the computer
and its software. For functionality, there may also be a hierarchy, but it's
not a literal hierarchy of containment. Rather, functions may fall into
logical groups, such as file-oriented commands, or may be appropriate only in
certain contexts, such as the commands that belong to one application rather
than to another or to the operating system. At any given moment, there is a
set of actions the user can take, and this set changes with context. And the
actions may be better represented by text than by pictures because verbs are
harder to picture than nouns.
That sounds like a dynamic menu bar. Let's see, dynamic menu bar, file and
folder icons, windows: That's the PARC/Mac/Windows interface! I've tried here
to show how hard it is to imagine a better interface to information and
functionality, and I've assumed that that's what we always want to interface
to. It may not be. Next month, an alternative model.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com













































OF INTEREST
LeadTools Win32 4.0 is an update to Lead Technologies' imaging-software
development toolkit. LeadTools Win32 is a C/C++ toolkit that lets you add
file-format support, image compression, manipulation, and processing to any
Windows NT, Windows 95, or Win32s application. The toolkit provides 27
image-processing operations and supports over 42 file formats, including Kodak
Photo CD, PCX, GIF, TGA, BMP, TIFF, MAC, PICT, and JPEG. LeadTools can be used
as a Windows DLL, DOS library, or Visual Basic/Visual C++ custom control. The
royalty-free toolkit sells for $995.00.
Lead Technologies
900 Baxter Street
Charlotte, NC 28204
704-332-5532
Surety Technologies has announced the Digital Notary System, an
Internet-accessible system that securely certifies the contents of digital
documents and electronic records. The system works by affixing a secure
digital time-stamp without revealing the document's contents to a third party.
The Digital Notary System can be used to certify any digital document: word
processing files, text documents, database records, e-mail messages,
spreadsheet files, graphic images, audio or video recordings, or any other
kind of electronic files. The company claims that the Digital Notary System is
to electronic records what a notary public is to paper records. 
The Digital Notary System, which was developed and patented by Scott Stornetta
and Stuart Haber, is based on the one-way-hashing cryptographic technique. In
the certification process, the Digital Notary software creates a unique
"digital fingerprint" for each electronic document and sends the fingerprint
to a Digital Notary coordinating server over the Internet, a leased line, or
an ISDN connection. Based on information sent back by the server, the Digital
Notary software then issues time-stamped certificates for electronic documents
and stores the certificates in a local Digital Notary database. The entire
certification process takes only seconds, and the document stays on the user's
computer. No one can alter or backdate the document in the record. The Digital
Notary software has an API so that developers can integrate it with other
software in a network or other computing environments. 
A special single-user version, called the "Personal Edition for Windows," is
available over the Internet via ftp to ftp.surety.com or by downloading it
from Surety's World Wide Web home page at http://www.surety.com. The Personal
Edition software, account setup, and 50 certificates sells for $49.00. Users
can order additional Digital Notary certificates over the Internet for
$37.50/50 certificates. A Personal Edition for SunSPARC systems is
forthcoming.
Surety 
One Main Street
Chatham, NJ, 07928
201-701-0600 
info@surety.com
Pixar has released Classic Textures: Volume 2, a new edition of its
photographic-texture library. Available on ISO-9660-format CD-ROMs, the
library includes 100 new textures such as exotic marbles, milled metals,
flowers, and water. All images are 512x512, 24-bit TIFF format. The Adobe
Photoshop-compatible images are accessible to any Macintosh or Windows
application that handles TIFF files. Volume 2 of the CD-ROM sells $169.00. 
Pixar
1001 West Cutting
Richmond, CA 94804
510-236-4000
Oracle has announced Oracle Objects for OLE, its first OLE-enabled interface
for database access to the Oracle7 relational database. The company claims
that this middleware, bundled with Personal Oracle7 and Oracle7 Workgroup
Server, simplifies database access from any OLE-enabled Windows app using OLE
2.0 Automation technology. To enhance performance, the software uses
client-side data caching to provide immediate access to currently viewed data.
Oracle Objects for OLE functionality can be encapsulated in Visual
Basic/Visual C++ custom controls. As a stand-alone tool, Oracle Objects for
OLE sells for $199.00.
Oracle 
Oracle Pkwy.
Redwood Shores, CA 94065
415-506-7000
ObjectSpace has announced STL<Toolkit>, its implementation of the C++ Standard
Template Library. Available for Windows, UNIX, and OS/2, STL<Toolkit> provides
multithread extensions, including read/write locking. The company claims this
is the only implementation of STL that is compatible with cfront-based
compilers. Providing the multithread support is the ThreadKit library, which
is bundled with STL<Toolkit>. The toolkit also includes examples, test suites,
and commented source code. 
The STL<Toolkit> is bundled free with ObjectSpace's ObjectSystems library, or
is available separately for $149.00 for use with C++ compilers from Borland,
Microsoft, and Symantec. ObjectSystems is a C++ framework for cross-platform
development that contains more than 120 classes. For more information on STL,
see Al Stevens' "C Programming" column on page 115 of this issue.
ObjectSpace 14881 Quorum Drive, Suite 400
Dallas, TX 75240
214-934-2496
Object International has released Together/C++ Version 1.2, a programming tool
that lets you work with the object model or code and, at any time and with the
touch of a button, make changes to one with the update reflected on the other.
Among other features, the new version of this object-modeling tool
automatically generates documentation for Windows help files, provides support
for additional object models (including attribute types, service parameters,
drag-and-drop, and the like), and supports language-specific features such as
configurable header files, private attributes, and public services by default.
Together/C++ Solo, a single-user version, sells for $997.00. Together/C++
Team, a multiuser version, is also available.
Object International
8140 N. MoPac 4-200
Austin, TX 78759
512-795-0202
info@oi.com
Neuron Data has released its C/S Elements++ package, which provides GUI
development and heterogeneous database access for object-oriented,
client/server applications. The environment includes a C++ API that uses its
own C++ libraries as well as third-party libraries. Additionally, the software
automatically generates C++ code and class definitions. C/S Elements++
initially supports Windows, Sun Solaris, SunOS, and HP 9000/HP-UX systems.
Macintosh, OS/2, and Windows NT support is forthcoming. Licenses for the
software start at $6850.00. 
Neuron Data
156 University Ave.
Palo Alto, CA 94301
800-876-4900
Trans-Ameritech has announced that Linux Plus BSD 4.0 is available on CD-ROM.
Linux Plus BSD 4.0, which is an extension of the Linux UNIX-like kernel,
includes the Slackware distribution, kernel versions after the September 1994
code freeze, the X Window system, and DOOM for Linux. The OS is bootable from
the CD-ROM, running UNIX files from a 15-Mbyte MS-DOS partition. The CD-ROM
sells for $39.95.
Trans-Ameritech Systems
2342A Walsh Ave.
Santa Clara, CA 95051
408-727-3883
The Echo is a newsletter for DOS programmers. Published on a monthly basis,
the publication includes coverage of all aspects of DOS, from Command script
files to Debug "peepholes." Subscriptions to the newsletter cost $12.36 per
year.
The Echo
8401 E. Desert Steppes Drive
Tucson, AZ 85710
Cosmic Software has announced availability of its Cosmic C-cross compilers for
the 38HC11 and 68HC16 microcontrollers. In addition to being ANSI C and ISO
Standard C compliant, the compilers provide compile-time type checking, C
run-time support, optimization, and support for EEPROM variables, inline
assembly, and C-level bank switching. Additionally, each compiler package
includes a source-level debugger, macro assembler, linker, object inspector,
and object-format converter. PC-hosted cross compilers sell for $1500.00,
while SunSPARC and HP9000/7000 hosted systems sell for $2500.00. 
Cosmic Software
100 Tower Office Park
Woburn, MA 01801
617-932-2556
CPI has announced a set of wavelet-compression routines for graphics
developers, as well as a 3-D game and virtual-reality engine for DOS/Windows
developers. The wavelet-compression routines are available as Photoshop
modules ($149.00) or C-library DLLs (for licensing). 
The real-time interactive 3-D game engine is a multithreaded, 32-bit engine
that includes texture mapping, six-degree freedom of movement, 2-D and 3-D
sprites, movie-format support (AVI), multiple light sources and types,
collision detection, MIDI and real-time audio effects and mixing, network
support, multiplayer support, and more.
CPI

1117 Cypress Street
Cincinnati, OH 45206
513-281-3222
Cornerstone for Windows, from BBN Software, is a data-analysis tool which lets
you visualize data relationships, then perform numerical analysis. The
software employs graphs, including xy line, scatter plots, box plots,
histograms, bar graphs, interval plots, and 3-D graphs, in the visualization
process. The software is built upon the ODBC standard so that it can be used
with a variety of relational databases. Cornerstone for Windows sells for
$995.00 per seat.
BBN Software Products
150 Cambridge Park Drive
Cambridge, MA 02140
617-873-5000
North Coast Software has released its Common Image Format Libraries, a set of
DLLs that read and write image files in a variety of formats. A 16-bit library
supports Windows 3.1, while 32-bit libraries are provided for Windows NT 3.5
and Windows 95. Additional libraries support Windows NT running on PowerPC,
Digital Alpha, and MIPS workstations. File formats supported include JPEG,
AVI, BMP, GIF, PCX, TIFF, DIB, RGB, IFF, RAS and TARGA. According to North
Coast, the Common Image Format Libraries provide a common API to all libraries
and make full use of the 32-bit platforms. The image libraries are based on
the same code used in Conversion Artist, North Coast Software's
image-conversion tool. The Common Image Format Libraries are priced at $500.00
for Windows 3.1 and $700.00 for Windows NT and Windows 95.
North Coast Software
265 Scruton Pond Rd.
Barrington, NH 03825
603-664-7871
Wolfram has introduced an add-on package to its Mathematica programming
environment, which aids engineers working on laser designs and other optical
systems. The package, called "Optica,'' features 3-D ray tracing, symbolic
parameter manipulation, publication-quality graphics, and a set of standard
optical components which can be customized. These components include lenses,
mirrors, prisms, optical fibers, screens, pinholes, and gratings. Predefined
surfaces include spherical, cylindrical, parabolic, and elliptical. Source
code is also included for both built-in components and functions, allowing
developers to design, model, and analyze virtually any optical system. 
Optica draws on Mathematica's capabilities for equation solving,
visualization, and animation. Mathematica is a high-level programming
environment that provides hundreds of built-in functions to perform a broad
range of mathematical operations and is capable of numeric, symbolic, and
graphical computation. Mathematica also allows users to create documents that
combine text, 2- and 3-D graphics, and animation to create technical reports,
blueprints, and the like.
Optica is available for Windows, Windows NT, Macintosh, Power Macintosh,
Silicon Graphics, SunSPARC, Solaris, NextStep, HP 700 Series, IBM RISC System/
6000, DEC MIPS, and OSF/1AXP. The software sells for $695.00.
Wolfram Research
100 Trade Center
Champaign, IL 61820-7237
217-398-0700
With products such as Visual C++ completely abandoning their MS-DOS
constituencies, it is rare to find new development products for the DOS
programmers--particularly high-end products. Nevertheless, Borland has quietly
begun shipping its Power Pack for DOS, a collection of development tools aimed
at high-end developers still working in MS-DOS. The product comprises 16- and
32-bit DOS extenders, and two older libraries updated for 16- and 32-bit
protected mode. It also includes the Turbo Vision C++ class library and the
Borland Graphics Interface (BGI) library.
The DOS extenders are based on the same technology that Borland has long used
in its protected-mode compilers and database tools. The extenders rely on DPMI
0.9, which, if absent, is simulated by a Borland utility. Borland has made
some unique extensions to the DOS extenders. The 16-bit extender uses the same
executable file format as Windows 3.x and OS/2 1.x. Because of this
compatibility, DOS programs can make calls to Windows DLLs or generate and use
Windows-compatible DLLs of their own. To make use of Windows DLLs, the Borland
run-time manager (shipped with the Power Pack) implements a subset of the
Windows API. This subset consists primarily of the memory-management routines
and some ancillary functions.
The 32-bit extender uses the Portable Executable (PE) format. It, too,
supports DLLs and implements a sizable portion of Win32--eliminating
Windows-specific functions, multithreading, and multitasking capabilities.
Under Windows NT, the 32-bit applications can only run in console mode. Both
extenders offer virtual memory with swapping to disk.
The BGI libraries are protected-mode versions of Borland's graphical
libraries; while Turbo Vision 2.0 is an update to Borland's C++ class library
for character-mode interfaces. The Power Pack can be seamlessly integrated
into Borland's C++ IDE, provided that the environment is version 4.02 or
later. The software sells for $99.95.
Borland International 
100 Borland Way 
Scotts Valley, CA 95066 
408-431-1000



































EDITORIAL


So Many Rules, So Little Time


Upon hearing that The Federal Registry, a daily accounting of proposed
regulations, will grow to more than 70,000 pages this year, you begin to think
that maybe we're being hamstrung by too many rules and regulations. That
hasn't stopped legislators from coming up with more and more laws, however,
and Senator Jim Exon (D-NE) has introduced a ring-tailed tooter. 
On the surface, Exon's Electronic Decency Act of 1995 (S.314) aims to
incorporate the concept of "digital communication" into the Communications Act
of 1934. But once the Cornhusker's finest gets on his white horse, Exon
proceeds to put the spurs to civil liberties by making Internet hosts, BBS
operators, in-house LAN supervisors, and online-services providers subject to
a $100,000 fine and two years in the pokey if an indecent message passes
through its system. "I want to keep the information superhighway from
resembling a red-light district," Exon has said. This from someone who doesn't
even have an Internet e-mail address (come on, Jimbo, get with the program).
According to Exon, if someone in, say, France sends an obscene image over the
Internet to someone in, say, Korea and the message goes through an Internet
host in Nebraska, the owner of the U.S. host should be held liable. The
proposal doesn't take into consideration foreign languages or attempts to hide
objectionable information or images through encryption. It does conjure up
subjective definitions of what's obscene and visions of Big Brother setting on
a network node, monitoring messages as they pass by. Exon's bill, which has
been sent to the Senate Committee of Commerce, Science, and Transportation,
reportedly has a good chance of passing.
Of course, you could ask Jake Baker if he thinks we need another law
regulating indecent messaging on the Internet. Baker, a University of Michigan
student, posted a story in the alt.sex.stories newsgroup describing abduction,
torture, mutilation, rape, and murder. Because Baker ill-advisedly used the
name of a woman in one of his classes, the university suspended him, a U.S.
attorney charged him with interstate transmission of a threat to injure, the
FBI arrested him, and a U.S. magistrate judge held him without bond.
Clearly, Baker has a big problem--and not just a legal one--and I'm as
disgusted as the next person at what he wrote. But still, you have to wonder
what poses a greater threat--sophomoric fantasies or senatorial delusions.
According to his mother, Baker heard the woman's name (which hasn't been
released) called out in a class of 200 students and used it in his story
because the last name constituted a sexual pun. It's interesting that the
original complaint didn't come from the woman, but from an Internet user in
Russia who complained to the university. Baker, who posted the story four
months prior to his arrest, has never met or approached the woman. If
convicted, he faces five years in jail. 
Another student, Daniel Bernstein of the University of California at Berkeley,
is putting his money where his mouth is when it comes to fighting
technology-related regulations. Bernstein, in concert with the Electronic
Frontier Foundation, has launched a federal lawsuit (Civil Action No.
C95-0582-MHP in Federal District Court for the Northern District of
California) seeking to bar the government from restricting publication of
cryptographic documents and software. Bernstein argues that encryption-related
export-control laws are "impermissible prior restraint on speech, in violation
of the First Amendment.
And speaking of lawsuits, Apple recently unleashed another legal broadside at
Microsoft, this time over QuickTime code that's present in Microsoft's Video
for Windows. At issue is low-level assembly code in a file called DCISVGA.DRV,
which deals with graphics chip registers.
Okay, I've taken potshots at Microsoft from time to time, most recently in the
March 1995 DDJ, where I chided the company over its Windows 95 logo-licensing
policy. At the time, I said that Microsoft should start treating third-party
developers as partners, rather than serfs.
Giving credit when it's due, Microsoft appears to be treating third-parties as
partners (or at least acting as a benevolent father) in this instance. In
particular, Microsoft has said it will "insulate_developers from the impact of
Apple's legal actions." In doing so, Microsoft sent a letter to developers
stating that "if Apple sues any developer over use and distribution of
Microsoft Video For Windows 1.1d, Microsoft will defend any such lawsuit."
Continuing in the spirit of credit giving, I'd like to extend thanks to Scott
Johnson of NTergaid, Mark Mallett of MV Communications, and Manny Sawit of Dr.
Dobb's Journal for the work they did in getting the Dr. Dobb's Journal World
Wide Web home page up and running. If you're into net surfing, stop by and
visit us at http://www.ddj.com. In addition to ftp links to source-code
listings that appear in Dr. Dobb's Journal and the Dr. Dobb's Sourcebook
series, you'll find selected articles from current and past issues, and the
complete text of the Dr. Dobb's Developer Update newsletter. We're also making
available samplers from the Dr. Dobb's CD-ROM Library, author guidelines, job
postings, subscription information, and e-mail contact for DDJ staff members.
Jonathan Ericksoneditor-in-chief











































LETTERS


FFTs: Fast, Faster, and Wow! 


Dear DDJ,
In reference to "Faster FFTs" ("Algorithm Alley," February 1995). I would like
to offer a slightly faster version of the ShuffleIndex() function.
The new function rbi (reverse binary index) takes the fbi (forward binary
index) as it first parameter and pTable (a pointer to a table of bit masks) as
its second parameter. It operates by:
 Isolating each pair of symmetric bits { AND with each MASK in the table}
 Testing to see if the bits are the same { compare with MASK and ZERO }
 If both 0 or both 1 then nothing to do { continue }
 Else we need to flip the bits { EXCLUDE OR with MASK }
The table of bit masks is specific to each length of the binary index; see the
tables for length of 8 and 9 in Example 1(a). I would expect that in any
particular application, the length of the index would be fixed. If not, then
an array of pointers to bit-mask tables (indexed by the length of the ftt
binary index) could easily be built. The function rbi in Example 1(b) is about
20 percent faster than ShuffleIndex().
Ken Allen
pgmkra@ibi.com
Dear DDJ,
I enjoyed Iwan Dobbe's article on FFTs. I like to really squeeze cycles out of
algorithms, too. Example 1(c) is a faster way to do the bit-reverse shuffle. 
The basic idea is that the inner loop (where most of the time is spent) simply
toggles the 512 bit, the next-to-inner loop toggles the 256 bit, and so on.
Some more time could be saved by unrolling the inner loop and using pointers
instead of index arithmetic. I have never seen this method published.
Mike Dunlavey
70363.27@compuserve.com
Dear DDJ,
In "Faster FFTs," D.G.G. Dobbe gives a good overview of the factors affecting
numeric performance. However, there are two cheap tricks for improving FFT
speed even more, without resorting to assembly language. These tricks are most
useful for large FFTs (more than 16,384 points) and fast CPUs coupled to slow
RAM. 
The first trick is to improve the "locality of reference" so the L1 cache gets
used more efficiently. To do so, you can replace the separate Re[], Im[]
arrays with an array of structures; see Example 1(d). The inner loop of the
butterfly then becomes Example 1(e).
For pure compiled code (Watcom C 9.5), this change increases the speed of the
forward FFT by a factor of 1.46 on a 90-MHz Pentium. The main reason for the
improvement: When the Re and Im components are packed into a structure, a read
from slow DRAM is very likely to put both components on the same L1 cache
line, where they will be available for subsequent (fast) fetches. 
The second trick is to use less math. The "split radix" FFT (Duhamel and
Hollman, Electronics magazine, "Letters," January 1984) modifies the butterfly
to use fewer multiplications and additions. Combined with trick number 1, the
split radix is about 1.88 times faster than the classic butterfly for single
precision, and 1.68 times faster in double precision, even when sine and
cosine are explicitly calculated. A nice implementation of the split radix
method can be found in the file tfftdp.c at ftp.nosc.mil, in the /pub/aburto
directory. 
Harlan W. Stockman
hwstock@abq-ros.com


VC++ and NT


Dear DDJ,
I would like to comment on John LaPlante's article, "Building an OLE Server
Using Visual C++ 2.0" (DDJ, February 1995). First, I would like to say that
Visual C++ 2.0 is a good product, but I am sure we will see as many bug fixes
with it as with the Visual C++ 1.x series. 
John's statements, "NT's responsiveness and robustness_" and "NT's crash
resistance_", don't give 2.0 the credit it deserves. Since installing NT 3.5
and VC++ 2.0, I've written three separate apps. In doing so, I've had eight
crashes in two weeks due to VC++. One crash was from trying to open a source
file. These crashes would not be so bad if they didn't take the source files
out with them. NT leaves them with size 0 bytes. A chkdsk will turn them up at
the root (This is very helpful!).
As for the ClassWizard, it sometimes chokes up if the source files are not
saved prior to invoking it from the resource editor. This is especially
disheartening after you have added several members and functions, only to get
the pop-up error!
Anyway, for those who struggle through it, I have a few helpful hints:
 Install on a FAT file system, so you can recover easily from a crash.
 After a crash, delete the PAGEFILE.SYS, run chkdsk /f, and save the
FILE*.CHKs; you can recover any 0-byte files from these.
 NT will also trash your <username>.LOG, so it should also be deleted.
 Also, save all files before running ClassWizard.
Generally, it takes two reboots in a row to get the system fully back;
however, my hints usually help the first time. 
Jon Friedline
76557.1643@compuserve.com


The Win95 Logo


Dear DDJ, 
Isn't it true that IBM doesn't have rights to any of the Win32 source? If so,
then I think this might be the strongest reason yet for Microsoft's new logo
rules described by Jonathan Erickson in his March 1995 "Editorial." If
Microsoft can convince all the major players to release only Win32 versions of
their apps using OLE, long filenames, and other Windows 95- and NT-specific
features, then suddenly IBM and OS/2 are in trouble. IBM will have to choose
between clean-room engineering Win32 support (no mean feat) or living with
fewer new and supported apps that run under OS/2. 
Microsoft would clearly like to see more NT apps, but I'd bet it's even more
interested in cutting OS/2's supply lines. If it wanted to push NT, it could
announce a 50 percent price cut, and sales would surge overnight.
Lou Grinzo 
71055.1240@compuserve.com
Dear DDJ,
I read with much interest Jonathan Erickson's editorial on Microsoft's Win95
logo issue. I am a small developer of vertical-market database applications.
All of my clients are 20 to 50 workstations on Novell networks. My clients
understand the demand of workgroup applications, and I have a good track
record of delivering applications that fill this need.

Although many of these clients use Windows for word processing or
spreadsheets, none have their workgroup apps under Windows. The applications I
have developed are under Clipper, Clipper with Btrieve, C with Btrieve, C with
c-tree plus, and Foxpro for DOS.
The area Erickson is talking about in his article is the rarefied world of
mass-market applications--those spreadsheets, word processors, or e-mail
packages that can sell tens of thousands of packages.
In a nutshell, his beef has little or nothing to do with me and my client
base. The fact that Microsoft is prodding the marketplace to be more
streamlined just makes it easier for my clients to make a thousand-dollar
decision. This represents the everyday level of pain for the small
businessman, myself included.
I would like to get a Windows database project. Visual Basic, Foxpro for
Windows, dBase for Windows, the new CA Visual- Objects would be great. But my
clients don't see this as a cost-effective option for their workgroup needs.
Oracle, Sybase, Powerbuilder, are you kidding?
We are talking about workgroups with 30 386/16s with 4 Mbytes of RAM. We are
talking about a learning curve of months (with the associated downtime). I am
helping my clients move over to Windows one step at a time, replacing a few
machines with 486/33s and 8 Mbytes of RAM as budgets and time permit.
If Microsoft can give the warm and fuzzy that a move to a new machine will be
reasonably painless, just add some money and plug-n-play, all the better.
Better for my clients and better for me.
I'm paying for my own learning curve with my time and energy (and missed
opportunity). My clients expect me to have answers when they need the question
answered. Windows 95 is not yet a question they need answered. If Microsoft
can make the answer easier to live with (read "pay for"), my clients and I are
all the better served.
Just remember a few years ago; the landscape had a lot of mainframe and mini
iron, all incompatible with each other. They are being slowly squeezed out of
existence because of the leveling of the playing field with the PC revolution.
I can only see the same thing happening with software. And I am one
application developer that likes what I see.
Name Withheld
Dear DDJ,
Upon reading the "Editorial" in the March issue, I was struck by the
similarity of Microsoft's logo requirements with Apple's Macintosh
development-partner requirements of February 1984. 
Apple said that to qualify, your company had to have x amount of dollars in
the bank and y number of existing apps sitting on store shelves to "qualify"
to become a certified Macintosh development partner. This policy closed the
door on many early would-be Mac developers and, in my opinion, is a major
reason why the Mac never became the platform of choice.
Gee, I sure would have thought that Bill Gates would be smart enough to learn
from Steve Job's mistakes.
Andy Bentley 
71055.3060@compuserve.com
Dear DDJ,
I just got my copy of DDJ (March 1995), and already I'm smiling. First off, I
commend you in including Linus Torvalds in your "Excellence in Programming
Award." I am calling my provider with Minicom, a Telix-like communications
program running under Linux 1.1.59. I believe this operating system to be a
wonderful thing, and join you in congratulating the winners.
Also, Jonathan Erickson's "Editorial" made me smile. Why? Because finally
someone is talking about Windows and Microsoft without being afraid. One
magazine, which I will not name, called OS/2 better Windows than Windows,
rated it "Good," and went on to rate Windows better than OS/2. Weird. Erickson
notes that Microsoft should "start treating third-party developers as
partners, rather than serfs." I think that Windows is dead, and Bill doesn't
seem to realize that he's killing it. No matter, I develop only for DOS, OS/2,
and Linux. I commend you on your honest and great magazine!
Martin Brown 
martin@nezumi.demon.co.uk
Example 1: FFTs: Fast, faster, and wow!
(a)
static unsigned int MaskTable08[] =
{
 0x81, /* 10000001 */
 0x42, /* 01000010 */
 0x24, /* 00100100 */
 0x18, /* 00011000 */
 0x00 /*end of table*/
};
static unsigned int MaskTable09[] =
{
 0x101, /* 100000001 */
 0x082, /* 010000010 */
 0x044, /* 001000100 */
 0x028, /* 000101000 */
 0x00 /*end of table*/
};

(b)
unsigned int rbi(unsigned int fbi, unsigned int * pTable)
{
 unsigned int result, mask, temp;
 result = fbi;
 for(mask = *pTable; mask; mask = *(++pTable))
 {
 temp = fbi & mask; /* isolate symetric bits pair */
 if (temp == 0 temp == mask) /* are the symetric bits the same ?*/
 continue; /* if so, then don't need to change */
 result ^= mask; /* bits differ, so flip them */
 }
 return(result);
}

(c)
float real[1024], imag[1024], temp;
#define SWAP(a,i,j)(temp=a[i], a[i]=a[j], a[j]=temp)
int i=0, j=0, i0,i1,i2,i3,i4,i5,i6,i7,i8,i9;

#define TWICE(k)
 for(i##k = 2; -- i##k >= 0; j ^= (1<<k) )
TWICE(0) TWICE(1) TWICE(2) TWICE(3) TWICE(4)
TWICE(5) TWICE(6) TWICE(7) TWICE(8) TWICE(9)
{ if (j > i){
 SWAP(real,i,j);
 SWAP(imag,i,j);
 }
 i++;
 }

(d)
struct cmplx {float Re; float Im;} xc[SIZE], *cp;

(e)
tempr = Qr * (cp+index2)->Re - Qi * (cp+index2)->Im;
tempi = Qr * (cp+index2)->Im + Qi * (cp+index2)->Re;

(cp+index2)->Re = (cp+index1)->Re - tempr; /* For Re-part */
(cp+index1)->Re = (cp+index1)->Re + tempr;
(cp+index2)->Im = (cp+index1)->Im - tempi; /* For Im-part */
(cp+index1)->Im = (cp+index1)->Im + tempi;









































Implementing Loadable Kernel Modules for Linux


Loading and unloading kernel modules on a running system




Matt Welsh


Matt works with the Cornell University robotics and vision laboratory. He is
the author of two books on the Linux operating system, including Running Linux
(O'Reilly & Associates, 1995), and he is a contributor to Linux Journal. He
can be contacted at mdw@cs.cornell.edu.


The Linux operating system, a freely distributed UNIX clone developed via the
Internet, is an excellent platform for operating-systems research and
development. In fact, you can inspect, modify, and experiment with any part of
the system because the entire source code for the kernel, all of the basic
system utilities, and the libraries are freely available (they're covered by
the GNU General Public License). Linux runs primarily on PCs with Intel
386/486/Pentium processors, but ports are in the works for architectures such
as the DEC Alpha, Motorola 68000, PowerPC, and more. Apart from its
versatility for systems research, Linux is also a very stable and useful UNIX
implementation for the PC: It supports software such as emacs, the X Window
System, gcc, and much more (see the accompanying text box entitled, "Getting
Linux").
One of the most important recent developments for Linux is dynamically loaded
kernel modules. The Linux kernel design is similar to that of classic UNIX
systems: It uses a monolithic architecture with file systems, device drivers,
and other pieces statically linked into the kernel image to be used at boot
time. While there are currently no plans to restructure Linux around the
microkernel architecture, the use of dynamic kernel modules allows you to
write portions of the kernel as separate objects that can be loaded and
unloaded on a running system.
In this article, I'll describe the dynamic-kernel-module implementation for
Linux, concentrating on the steps required to load a module on a running
system. The Linux implementation is fairly straightforward and could be
adapted on other UNIX systems that don't already provide this functionality.
Surprisingly enough, most of the necessary support is found not within the
kernel itself, but in the run-time loader. 


Overview


A kernel module is simply an object file containing routines and/or data to
load into a running kernel. (If multiple source files are used to build a
module, the corresponding object files can be prelinked into a single object
using ld --r.) When loaded, the module code resides in the kernel's address
space and executes entirely within the context of the kernel. Technically, a
module can be any set of routines, with the one restriction that two
functions, init_module() and cleanup_module(), must be provided. The first is
executed once the module is loaded, and the second, before the module is
unloaded from the kernel. Of course, programmers must also observe all of the
precautions and conventions used by kernel-level code when writing modules. 
Loading a module into the kernel involves four major tasks: 
Preparing the module in user space. Read the object file from disk and resolve
all undefined symbols. A module may access only those modules already in the
running kernel. This "linking" step takes place within the run-time module
loader, a utility that runs in user space. 
Allocating memory in the kernel address space to hold the module text, data,
and other relevant information.
Copying the module code into this newly allocated space and provide the kernel
with any information necessary to maintain the module (such as the new
module's own symbol table).
Executing the module initialization routine, init_module() (now in the
kernel).
Because the first step is the most complex, I'll focus on it in this article.
Once all external references in the module have been resolved, it can easily
be copied into space set aside by the kernel and executed from there.
There are several important issues to address when using the approach I've
just described, the first being symbol resolution. All of the external symbols
that the module can reference correspond to variables or routines in the
kernel. Symbols can be either "resident" (compiled into the original kernel
image) or "transient" (provided by other modules that are already loaded). For
the module loader to resolve all of these references, the kernel must provide
a list of valid symbols, copied to the module loader via a system call.
Instead of allowing modules to access and use all resident symbols, the kernel
provides a list of those variables and routines "stable" enough for modules to
employ. Otherwise, modules could depend too heavily on low-level aspects of
the kernel code, and thus break if those symbols were to change. Currently,
this is a static list found in one of the kernel source files, but changes are
planned to allow each portion of the kernel to provide its own part of the
resident symbol list. Similarly, when each module is loaded, it must
contribute a symbol table with entries for each symbol that it will provide to
other modules. 
Another issue to consider is intermodule dependencies. The current
implementation requires that modules with such dependencies be loaded in a
particular order; that is, for module A to use a symbol defined by module B,
module B must be loaded first. Similarly, the kernel must maintain reference
lists for each module so that a module cannot be unloaded until all modules
referencing it have been unloaded themselves. This mechanism can be used to
build module stacks. 
A third issue is version coherency. The system must be able to guarantee that
all symbols and data structures used in a module are identical to those used
in the running kernel. If the kernel's definition of a data structure differs
from that used in a module, the module could corrupt important kernel data and
real havoc could result. To deal with this, the current implementation
requires that modules only be loaded against the version of the kernel that
was running when they were compiled. Data from uname(2) is stored in the
module itself at compile time, and when loaded, this data must match the data
in the currently running kernel. At the time of this writing, a new design is
being tested which assigns version information individually to kernel symbols.
Although the current paranoid approach can be annoying to those who rebuild
kernels often, it is nearly foolproof.


The Module Loader


The Linux module loader, insmod, is responsible for reading an object file to
be used as a module, obtaining the list of kernel symbols, resolving all
references in the module with these symbols, and sending the module to the
kernel for execution. While insmod has many features, it is most commonly
invoked as insmod module.o, where module.o is an object file corresponding to
a module. In walking through the steps that insmod uses to load a module, I'll
point out the data structures and function prototypes used. (The entire source
is far too long to print here; source for insmod and related utilities, as
well as the entire Linux kernel, are available freely. See the accompanying
text box entitled "Getting Linux.")
Step 1. Open the object file and read it piece by piece. Linux systems use the
classic a.out object-file format (although ELF support is becoming available,
and the newest versions of the module utilities support it). The data
structures used by insmod, defined in both insmod and &lta.out.h&gt, are in
Listing One . a.out format object files are stored as a header, followed by
text and data segments, relocation information, the symbol table, and the
string table; see Figure 1. Each portion of the file is read and stored by
insmod for later use.
The symbol table is stored in the object file as an array of struct nlist. The
symbol names are actually found in the string table, located immediately after
the symbol table in the file. Each symbol-table entry contains the offset
(into the string table) of the associated name in the n_un.n_strx member.
Step 2. Read the symbol table. Within insmod, each symbol is read and inserted
into a binary tree (actually, a splay tree) to make symbol lookups for
relocation more efficient. The addresses of the symbols _init_module and
_cleanup_module, the module initialization and deletion functions, are saved
for later use.
Step 3. Resolve external references. insmod obtains the list of resident and
transient kernel symbols using the get_kernel_syms() system call; see Listing
Two, page 96. This call returns an array of struct kernel_sym, each entry of
which contains the name and kernel address of a kernel symbol. If the name
field begins with the # character, the address field contains the kernel
address of a struct module describing a previously loaded module. The entries
following those referring to a struct module contain the names and addresses
of symbols in that module. Kernel-resident symbols are followed by a "dummy"
entry with the name field #.
For example, let's say that two modules, gonzo and alice, were loaded in that
order. gonzo provides the transient symbols _gonzo_1 and _gonzo_2, while alice
provides the symbol _alice_3. The kernel symbol table will look like Table 1.
Note that modules are listed in reverse order of loading and that
kernel-resident symbols are listed last. This property allows modules to
override symbols provided by previously loaded modules or the kernel itself.
The entries containing addresses to struct module precede the symbols from the
corresponding module and allow insmod to keep track of the modules referenced
by the module to be loaded.
Once the kernel symbols have been obtained, insmod looks up each one in the
splay tree constructed from external references made by the module, updating
the n_value member of each struct nlist entry in the tree as it is found with
the actual symbol value, which is a kernel address. If any references in the
tree are not resolved with the data from get_kernel_syms(), insmod complains
of an undefined symbol and exits.
Step 4. Relocate with kernel addresses by updating the addresses in the text
and data segments of the module that use the symbol values obtained from the
kernel. Sixty-four bits are stored for each address to be reloaded in a struct
relocation_info; see Listing One. Each address that refers to an external
symbol is updated using the relocation information, which is stored in the
object file after the data segment. Once this is complete, all external
references in the module point to the correct kernel addresses.
Step 5. Allocate kernel memory for the module by calling the create_module()
system call; see Listing Two. Pass create_module() the name of the module (a
string generated from the name of the object file; for example, the module
gonzo.o will have the name gonzo and the total size of the module--the sum of
the text, data, and BSS segment sizes).
Step 6. Load the module into kernel memory using the init_module() system
call; see Listing Two. This call takes a number of arguments, and insmod must
build up the associated data structures before making the call. The parameters
include the module name, the code (just a character array containing the text
and data segments), the size of the module code in bytes, a struct
mod_routines containing pointers to the module's init_module and
cleanup_module routines, and a struct symbol_table that describes the symbols
exported by the module and the other modules referenced. The struct
symbol_table is constructed from the symbol tree built up by insmod. As the
loader resolves references to kernel symbols, it keeps track of the modules
referenced in a list that becomes part of the struct symbol_table parameter.
Note that the init_module() system call is not the same as the init_module()
routine provided by the module.
This summarizes insmod operation. The program also includes many other
options; see the code and associated man pages for details. 


Kernel Details


Most of the work involved in loading a module takes place within insmod.
However, it is instructive to look at the implementation of the various system
calls used by the loader.
get_kernel_syms(). This call returns the table of kernel symbols that the
module may access. The system call takes a pointer to a memory buffer in user
space, where it will fill in the information. When passed a null pointer as an
argument, get_kernel_syms() returns the number of kernel symbols, which the
loader uses to allocate enough space to hold the previously described table
(an array of struct kernel_sym; see Listing Two). 
The kernel symbol table is generated by traversing a list of all loaded
modules in LIFO order. The kernel maintains a list of struct modules for each
loaded module, one element of which is the struct symbol_table passed in with
a previous call to init_module(). The kernel simply copies the names and
kernel addresses of each module symbol to the memory provided by the user
process (within the Linux kernel, the memcpy_tofs function is used for this).
Information on resident symbols is copied last. The struct symbol_table for
resident symbols can be found as a static list in one of the kernel source
files, ksyms.c.

create_module(). This is used to allocate memory for a kernel module. It is
passed the module name and the size, in bytes, required for the module. This
call first checks that the user making the call is root and that the
parameters are valid. Then memory within the kernel address space is allocated
for a struct module to represent this module, as well as for the module code
itself. The newly allocated struct module is added to the front of the linked
list of loaded modules, and various members of the structure are initialized.
The system call returns the kernel address at which to store the module code.
init_module(). This system call does most of the dirty work for loading a
module. The call takes several arguments: the module name, the location and
size of the module code in user space, a struct mod_routines pointing to the
module-initialization and cleanup functions, and the struct symbol_table
constructed for this module by insmod.
First, the struct module is located by name on the linked list of modules
maintained by the kernel. The module code is copied from user space into the
memory allocated by create_module, and the BSS portion is zeroed out. The
module cleanup-routine address is stored in the struct module for later use.
Next, the list of transient symbols to export for later module loads is
updated with the information contained in the struct symbol_table parameter.
The size element of this structure is read, kernel memory allocated, and the
structure copied into the kernel. Sanity checking is done to ensure that the
fields of this structure make sense.
The format of the struct symbol_table passed to init_module() is shown in
Figure 2. Note that the string table is stored immediately after the symbol
table itself in the memory passed to init_module(). struct symbol_table
contains an array of struct internal_symbols, with each entry holding the name
and address of a symbol exported by the module. The name field of this
structure is actually an offset address from the beginning of the structure,
pointing to the location of the actual string stored after the symbol table in
user memory. The string table doesn't show up on the definition of struct
symbol_table, but it's there. The size element, >, includes the size of the
string table. In this way, the symbol table and associated names can be copied
from a single block of user memory. After copying the data, the kernel updates
each name field with the base address of the newly allocated symbol table, so
that the absolute address for each name will be correct.
The last step is to update the list of references to other modules. The struct
module_ref array contains pointers to modules being referenced by the new
module. The kernel adds the new module to the dependency list for each
referenced module, after checking that each such module is in fact loaded. In
other words, each referenced module points back to the module being loaded.
The kernel won't allow a module to be unloaded unless this dependency list is
empty.
Once this is complete, the kernel executes the module's own init_module()
function. If this succeeds, the module's state flag is set to MOD_RUNNING and
the system call returns 0. The module is now loaded in the kernel and its code
and data accessed accordingly. If at any point the loading procedure fails,
the module memory is freed and an appropriate error code returned.
The module's init_module() routine is generally used to initialize the
appropriate hooks that the rest of the kernel needs to access functions
provided by the module. For example, in the case of a device driver written as
a module, init_module() would register module routines in the table of
callbacks required for each device.
delete_module(). This system call takes a single argument: the name of the
module to delete. It simply searches for the corresponding struct module by
name. If no modules reference the module to be deleted, the cleanup_module
routine is called, and all memory associated with the module is freed.
References to other modules are also cleared. The user program rmmod invokes
this system call to unload modules.


New Features


This implementation corresponds to the module utilities for Linux kernel
Version 1.1.67. As with many aspects of Linux, this code is constantly under
development, and new features are added weekly.
The newest version of the module utilities (for 1.1.85) includes support for
more-intelligent tracking of symbol versions. Instead of requiring modules to
run on the kernel under which they were compiled, version information is
attached to each kernel-resident symbol. While the kernel is compiled, the
source file ksyms.c, which contains a list of resident symbols to export, is
processed with gcc --E to expand the declarations of functions and data
structures. A 32-bit CRC checksum is generated from each expanded symbol, and
the output of each symbol name along with the CRC is written to the file
/usr/include/linux/modules/ksyms.ver. This checksum will change if the
declaration of the associated kernel symbol changes.
When individual modules are compiled, the symbol names and checksums in this
file are stored in a table. When insmod loads a module, the call to
get_kernel_syms() returns the list of kernel symbols, as before, along with
the CRC used in the running kernel. insmod checks that each checksum used when
the module was compiled corresponds to the checksum in the running kernel. If
the checksums don't match for any symbol, insmod won't allow the module to be
loaded.
There you have it--loadable kernel modules. Most of the code is quite
straightforward, but certain issues might not be so obvious. Again, I invite
readers interested in this design (and in improving upon it!) to grab the
code. Anyone is welcome to contribute to the development effort.
I'd like to thank the people responsible for the development of the module
code: Jon Tombs, Bas Laarhoven, Jacques Gelinas, Jeremy Fitzhardinge, and
especially Bjrn Ekwall, who gave me a great deal of information and help in
preparing this article.
Getting Linux
Linux is a popular, free, UNIX-like operating system for PCs. It currently
runs on 80386, 80486, and Pentium PCs, with ports for other systems underway. 
If you're interested in obtaining Linux or learning more, there are a number
of places to look. If you have Internet access, the ftp site
sunsite.unc.edu:/pub/Linux/docs is the first place to go. Users with WWW
access can look at the URL http://sunsite.unc.edu/mdw/linux.html. The first
documents to read are the Linux Frequently Asked Questions list, the
INFO-SHEET (which gives a technical introduction to the system), and the
META-FAQ (which outlines the other documents available). Others to look at
include the "HOWTO" documents that each detail a particular aspect of the
system, such as installation or network configuration, and the Linux
Documentation Project manuals. All of these are at the aforementioned Linux
FTP and WWW addresses given.
To obtain Linux from the Internet, you need to select a "distribution," a set
of ready-to-install software packages. The most popular distribution is
Slackware, which can be obtained via ftp from
sunsite.unc.edu:/pub/Linux/distributions/slackware and consists of a set of
diskette images that you download and use to install the software on your own
system. Linux is installed on its own partitions on your drives, and it exists
independently of other operating systems such as MS-DOS, Windows, or OS/2. The
Linux Installation HOWTO document describes how to obtain and install this
distribution. 
Linux, and much of the software that it supports, are covered by the GNU
General Public License, which allows vendors to sell the software, and Linux
is available from a number of software companies, usually on CD-ROM. The Linux
Developer's Resource, a CD-ROM set from InfoMagic (800-800-6613,
info@infomagic.com) has the contents of the Linux ftp sites, several
distributions, and documentation; it's updated every few months. If you don't
have Internet access, this is a good place to start.
There are several books about Linux. I wrote the Linux Documentation Project
manual, Linux Installation and Getting Started. It is available via the
Internet and from many commercial Linux vendors (including InfoMagic). The
Linux Bible is published by Yggdrasil (info@yggdrasil.com) and contains all of
the manuals and HOWTOs from the Linux Documentation Project in one book.
Linux: Unleashing the Workstation in Your PC has been published by
Springer-Verlag. My book, Running Linux, is available from O'Reilly &
Associates. 
The code described in this article is part of the Linux kernel sources, which
are available on any Linux system. Alternatively, you can grab the current
kernel source tree from sunsite.unc.edu:/pub/Linux/kernel/VERSION, where
VERSION is the latest version of the kernel. (By the time you read this, v1.2
will be the "stable" kernel version, with new development continuing on v1.3.)
The file linux/kernel/module.c in this tar file contains most of the
kernel-level module code. The file modules-1.1.67.tar.gz contains the module
utilities (insmod, rmmod, and so on) and some documentation. Again, a newer
version of this package will be available when you read this. 
--M.W.
Table 1: Example of kernel symbols returned by get_kernel_syms().
Name Address 
#alice struct module describing alice.
_alice_3 alice_3.
#gonzo struct module describing gonzo.
_gonzo_1 gonzo_1.
_gonzo_2 gonzo_2.
# Dummy struct module for resident symbols.
_verify_area Resident symbol verify_area.
_do_mmap Resident symbol do_mmap.
Figure 1 a.out object-file format.
Figure 2 struct symbol_table passed to init_module().

Listing One 

/* Header for a.out object and executable files. */
/* Data structures used for loading modules. */
struct exec {
 unsigned long a_info; /* Describes object file. */
 unsigned a_text; /* Length of text segment in bytes */
 unsigned a_data; /* Length of data segment */
 unsigned a_bss; /* Length of BSS segment */
 unsigned a_syms; /* Length of symbol table */
 unsigned a_entry; /* Entry point address */
 unsigned a_trsize; /* Length of text relocation info */
 unsigned a_drsize; /* Length of data relocation info */
};
/* The object file symbol table is an array of struct nlist. */
struct nlist {
 union {
 /* Only one of the following are available, based on
 * context. E.g., n_strx is used when the data is stored

 * in a file, n_name when in core.
 */
 char *n_name; /* Symbol name */
 struct nlist *n_next; /* Next symbol in list */
 long n_strx; /* Index to 
 } n_un;
 unsigned char n_type; /* Type of symbol, e.g., text or data */
 char n_other; /* Unused by the system, but useful for insmod */
 short n_desc; /* Used by symbolic debuggers */
 unsigned n_value; /* Address of this symbol */
};
/* Binary tree of symbols used for relocation. Defined by insmod. */
struct symbol {
 struct nlist n;
 struct symbol *child[2];
};
/* Relocation information stored in the module object file */
struct relocation_info {
 int r_address; /* Address to be relocated */
 unsigned int r_symbolnum:24; /* Index of symbol in symbol table */
 unsigned int r_pcrel:1; /* 1 for PC-relative offset */
 unsigned int r_length:2; /* Relocate (1<<r_length) bytes */
 unsigned int r_extern:1; /* 1 if relocating with addr of symbol */
 unsigned int r_pad:4; /* Unused */
};



Listing Two

/* Obtains list of symbols from kernel for module relocation */
/* System calls and data structures used by insmod. */
int get_kernel_syms(struct kernel_sym *table);

/* Allows kernel to allocate space for module */
int create_module(char *module_name, unsigned long size);

/* Sends module code and data to kernel, as well as init/cleanup
 * routines and symbol table used by module. */
int init_module(char *module_name, char *code, unsigned codesize,
 struct mod_routines *routines, struct symbol_table *symtab);

/* Removes module from kernel */
int delete_module(char *module_name);

/* An array of struct kernel_sym is returned by get_kernel_syms */
struct kernel_sym {
 unsigned long value; /* Symbol value */
 char name[SYM_MAX_NAME]; /* Symbol name */
};
/* The init and cleanup functions provided by the module */
struct mod_routines {
 int (*init)(void); /* Module init routine */
 void (*cleanup)(void); /* Module cleanup routine */
};
/* Symbol table passed to init_module */
struct symbol_table {
 int size; /* Total size, including string table */
 int n_symbols; /* Number of symbols */

 int n_refs; /* Number of module references */
 struct internal_symbol symbol[0]; /* Array of symbols; space
 * allocated elsewhere */
 struct module_ref ref[0]; /* Array of module references */
};
/* Symbols provided by the module */
struct internal_symbol {
 void *addr; /* Address of symbol */
 char *name; /* Name of symbol */
};
/* Reference to another module. */
struct module_ref {
 struct module *module; /* Module referenced */
 struct module_ref *next; /* Next module in list */
};
/* Kernel data structure describing a module */
struct module {
 struct module *next; /* Next module in list */
 struct module_ref *ref; /* List of modules referring to this one */
 struct symbol_table *symtab; /* Symbol table given to init_module */
 char *name; /* Name of module */
 int size; /* Size of module in (4K) pages */
 void* addr; /* Address of module code in kernel */
 int state; /* State (running, deleted, uninitialized) */
 void (*cleanup)(void); /* Cleanup function */
};





































Shared Memory and Message Queues


C++ classes for OS/2, AIX, and Windows NT




Richard B. Lam


Dick is a member of the research staff at IBM's T.J. Watson Research Center.
He can be contacted at rblam@watson.ibm.com.


In the article, "Communication Classes for Cross-Platform Development" (DDJ,
March 1995), I presented a method of separating a C++ class interface from the
underlying implementation details when writing cross-platform classes for
event and mutex semaphores. Although there are several ways to separate the
interface and implementation, I'll continue with the same approach, applying
it here to the cross-platform coding of named shared memory and message
queues. In doing so, I'll support interprocess communication (IPC) mechanisms
for OS/2, AIX, and Windows NT. 


Shared Memory


Shared memory is a single address space allocated as a block of memory by some
process (or thread). This process gives the memory to one or more additional
processes, and all processes then use the memory as if it were part of their
normal address space. To gain access to an existing shared-memory block,
processes can either be given a pointer or handle to the block, or they can
reference the block by a name agreed upon beforehand.
If the shared memory is unnamed, the memory pointer must be passed from the
creating process to any other process that wishes access to the shared memory.
This can be done using other forms of IPC such as DDE, a message queue, or a
pipe. In this article, I'll consider only named shared memory--a shared-memory
block with a specific name that allows any process which knows the name of the
block to gain access to the memory.
The interface to the generic shared-memory class is shown in Listings One and
Two . There are two constructors for ipcSharedMemory. One is used by the
process or thread which actually creates the memory block, and it takes the
name of the block and the desired size of the block in bytes as input
arguments. The second constructor is used by other processes or threads which
need access to an existing block, and this constructor requires only the block
name as a parameter.
Two member functions return the block name and a flag indicating whether the
process or thread creates ("owns") or accesses the block. The Pointer() member
function returns a void * pointer to the start of the memory block. The
implementation of the member functions simply refers to the corresponding
member functions in the implementation class osSharedMemory.
Listing Three is the header file used to create the implementation code for
shared memory on individual operating systems. The only difference in the
constructor arguments to osSharedMemory is the additional pointer to the
ipcSharedMemory interface class. This pointer is kept so that the myState
variable in the interface class can be modified by the implementation-level
member functions. Note that osSharedMemory is a friend of ipcSharedMemory so
the myState variable can be set directly in case an initialization error
occurs.
The implementation header file also defines a block id required for the AIX
and Windows NT implementations, along with CreateBlock(), OpenBlock(), and
CloseBlock() methods that call the corresponding operating-system-specific
shared-memory API functions.
The OS/2 implementation, os2shmem.C, is available electronically (see
"Availability," page 3). The OS/2 API requires that all named shared-memory
blocks have a name which starts with "\SHAREMEM\" (for example,
"\SHAREMEM\TEST", "\SHAREMEM\MYBLOCK", and so on). Thus, memPath is defined at
the top of the module as a constant string containing the name prefix, and
this is prepended to the block name passed to the constructors to form the
complete shared-memory-block name. The OS/2 API functions DosAllocSharedMem(),
DosGetNamedSharedMem(), and DosFreeMem() are called to create, access, and
close the memory block.
For AIX, the shared-memory API is handled similarly to the semaphore
implementation--ftok() is called on a unique filename to get a key for use by
the AIX IPC functions. In aixshmem.C (the AIX implementation is available
electronically) the constructors prepend the string "/tmp/" to the input block
name and then create a file with the full block name.
To create the block under AIX, the shmget() routine with an IPC_CREAT flag is
used, and the memory is attached to the process with the function shmat().
Accessing the block is carried out the same way, except that the creation flag
is omitted in the call to shmget(). The CloseBlock() member function calls
shmdt() to detach the shared memory from the process, and calls shmctl() to
remove the shared-memory id from the system. The osSharedMemory destructor
also deletes the temporary file created in the constructor if the
shared-memory owner is destroyed. Windows NT implements shared memory via file
mapping, which allows you to treat a file as a block of memory. The
CreateBlock() member function in winshmem.C (available electronically) calls
the NT function CreateFileMapping() with an input handle argument of
0xFFFFFFFF. This tells the system to use the system swap file rather than an
actual disk file to create a file-mapping object. The function returns a
mapped file-object handle which is passed to MapViewOfFile() to get a pointer
to the block of shared memory. The OpenFileMapping() function is called to
access an existing shared-memory block; the memory is freed by calling
UnmapViewOfFile() and CloseHandle().


Message Queues


The message queues I deal with here are quite distinct but similar in function
to the event queues used in Windows or OS/2. These event queues work only for
windowed applications, whereas message queues are also valid in character-mode
sessions. OS/2 and AIX provide direct API support for message queues on the
same platform, but Windows NT provides an alternative mailslot API (also
available on other platforms) that I'll use. Mailslots can be used for
intersystem communications, but I will limit this ipcMessageQueue
implementation to IPC.
There is a distinction for message queues between the owner or creator of the
queue, which in general is the server process, and the clients that access the
queue. We allow clients of the queue write-only access, and queue owners
read-only access. Member functions in the ipcMessageQueue interface class (see
Listing Four and Listing Five) are also provided for queue owners to peek at
the number of messages currently waiting in the queue, and to purge the queue
of all messages.
In case direct access to the queue is required, the ID() member function
returns the operating-system-specific queue handle. Also, there is a Pid()
function that returns the process id of the queue owner. This is provided
because the OS/2 queue API functions require clients to know the queue owner's
process id in the call to DosOpenQueue(). Therefore, the ipcMessageQueue
constructor for clients includes both the queue name and the process id of the
server, while the server constructor only needs the queue name.
As before, the implementation details are delegated to an instance of
osMessageQueue, which calls the system-specific API functions. 
In addition to requiring the server-process id, the OS/2 queue API uses an
event semaphore that can inform the server that a client has posted data to
the queue. Consequently, the implementation-class declaration in Listing Six
includes a pointer to an ipcEventSemaphore, which is created in the server
constructor.
There are also three protected member functions declared in osMessageQueue for
creating, opening, and closing a queue.
The OS/2 implementation (os2queue.C, available electronically) of message
queues requires that all queue names begin with "\QUEUES\", so a constant
string variable (queuePath) is defined at the top of the OS/2 module. A call
to DosGetInfoBlocks() in the server constructor is used to retrieve the
server's process id, which is stored in the myPid variable. Finally, the event
semaphore is created (using the same name as the queue input argument) and
DosCreateQueue() is called. The client constructor simply forms the complete
queue name and calls DosOpenQueue() to access the queue.
The osMessageQueue::Read() member function resets the event semaphore and then
calls DosReadQueue() with either a DCWW_WAIT or DCWW_NOWAIT flag, depending on
the value of the Read() function's wait input argument. The data are then
copied to the input buffer, and the memory is freed. The Write() member
function sets up an unnamed shared-memory block, gives the server process
access to the block, and copies the data to be written into the block. The
shared-memory block pointer, rather than a message-buffer structure, is then
written directly to the queue. For each client write operation, the operating
system automatically posts the event semaphore used in DosReadQueue().
For AIX, the server process creates a file by prepending "/tmp/" to the input
name and calling creat()--this name is used in the ftok() function call (see
aixqueue.C, available electronically). The message queue is created or opened
through a call to msgget() and closed with msgctl(). The read/write operations
are handled by allocating a special msgbuf structure that is passed to
msgsnd() for writing and msgrcv() for reading.
Windows NT message queues are implemented using the mailslot API. Mailslots
must have a name that begins with "\\.\mailslot\" for use on the same machine,
so this string is defined at the top of the NT-implementation module (see
winqueue.C, available electronically). The CreateQueue() member function calls
the NT function CreateMailslot(), and the client constructor's OpenQueue()
function call accesses the mailslot through a call to CreateFile().
Reading and writing to the mailslot is handled simply by calls to the NT
functions ReadFile() and WriteFile(). The Peek() member function calls
GetMailslotInfo() to return the number of waiting messages, and Purge() reads
and discards each message until no further messages remain.


Test Programs


To test the shared-memory implementation, I've written a number of test
programs (which are available electronically). The file, mbtest.h, defines a
SharedVariables structure. The mbtest1.C program creates an ipcSharedMemory
block, "myblock," large enough to hold the SharedVariables structure. The
Pointer() function retrieves the pointer to the block of memory; the structure
fields are initialized and their values are printed. The mbtest2.C program can
then be started in a separate session to access the existing memory block,
change the values of the structure fields, and print the results.
In practice, processes should synchronize their access to shared-memory blocks
using a mutex semaphore. This controls access to the shared data to help
provide data integrity and consistency.
The QMsg structure, defined in the file qtest.h, represents the contents of
messages that will be written to the server process qtest1 in the file
qtest1.C. The server test program creates a message queue named myque and
starts the client program qtest2.C, passing the process id of the server as a
command-line argument to the client. The client then opens the queue and
writes several messages to the server, which reads messages and purges the
queue before ending.


Summary



Shared-memory blocks are perhaps the fastest IPC mechanism, especially for
transferring large structures between processes. However, they require careful
synchronization, or subtle bugs can occur in complex programs or systems.
Message queues are quite useful for one-way communications between a server
(say a display process) and a number of clients (data-collection processes,
for example). But the practical size of queue messages may be limited
(particularly on AIX) to small chunks of information.

Listing One 

//
****************************************************************************
// Module: ipcshmem.h -- Author: Dick Lam
// Purpose: C++ class header file for ipcSharedMemory
// Notes: This is a base class. It is the interface class for creating and
// accessing a memory block that is sharable between processes and threads.
//
****************************************************************************

#ifndef MODULE_ipcSharedMemoryh
#define MODULE_ipcSharedMemoryh

// forward declaration
class osSharedMemory;

// class declaration
class ipcSharedMemory {

friend class osSharedMemory;

public:

 // constructor and destructor
 ipcSharedMemory(const char *name, // unique name for creating block
 long blocksize); // requested size (in bytes)
 ipcSharedMemory(const char *name); // name of block to open
 virtual ~ipcSharedMemory();

 // methods for getting memory block parameters [name, pointer to the block,
 // and whether this is the owner (creator) of the block]
 char *Name() const;
 void *Pointer() const;
 int Owner() const;

 // class version and object state data types
 enum version { MajorVersion = 1, MinorVersion = 0 };
 enum state { good = 0, bad = 1, badname = 2, notfound = 3 };

 // methods to get the object state
 inline int rdstate() const { return myState; }
 inline int operator!() const { return(myState != good); }
protected:
 osSharedMemory *myImpl; // implementation
 state myState; // (object state (good, bad, etc.)
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 ipcSharedMemory(const ipcSharedMemory&);
 ipcSharedMemory& operator=(const ipcSharedMemory&);
};
#endif




Listing Two


//
****************************************************************************
// Module: ipcshmem.C -- Author: Dick Lam
// Purpose: C++ class source file for ipcSharedMemory
// Notes: This is a base class. It is the interface class for creating and
// accessing a memory block that is sharable between processes and threads.
//
****************************************************************************

#include "ipcshmem.h"
#include "osshmem.h"

//
****************************************************************************
// ipcSharedMemory - constructor for creating

ipcSharedMemory::ipcSharedMemory(const char *name, long blocksize)
{
 // init instance variables
 myState = good;
 myImpl = new osSharedMemory(this, name, blocksize);
 if (!myImpl)
 myState = bad;
}
//
----------------------------------------------------------------------------
// ipcSharedMemory - constructor for accessing
ipcSharedMemory::ipcSharedMemory(const char *name)
{
 // init instance variables
 myState = good;
 myImpl = new osSharedMemory(this, name);
 if (!myImpl)
 myState = bad;
}
//
----------------------------------------------------------------------------
// ~ipcSharedMemory - destructor
ipcSharedMemory::~ipcSharedMemory()
{
 delete myImpl;
}
//
----------------------------------------------------------------------------
// Name - returns the name of the memory block
char *ipcSharedMemory::Name() const
{
 if (!myImpl)
 return 0;
 return myImpl->Name();
}
//
----------------------------------------------------------------------------
// Pointer - returns a pointer to the start of the memory block
void *ipcSharedMemory::Pointer() const
{
 if (!myImpl)
 return 0;
 return myImpl->Pointer();
}
//
----------------------------------------------------------------------------
// Owner - returns 1 if this is the owner (creator), and 0 otherwise
int ipcSharedMemory::Owner() const
{
 if (!myImpl)

 return 0;
 return myImpl->Owner();
}




Listing Three

//
****************************************************************************
// Module: osshmem.h -- Author: Dick Lam
// Purpose: C++ class header file for osSharedMemory
// Notes: This is a base class. It contains general implementation methods
// for memory blocks shared between processes and threads.
//
****************************************************************************

#ifndef MODULE_osSharedMemoryh
#define MODULE_osSharedMemoryh

#include "ipcshmem.h"

// class declaration
class osSharedMemory {

public:
 // constructor and destructor
 osSharedMemory(ipcSharedMemory *interface,const char *name,long blocksize);
 osSharedMemory(ipcSharedMemory *interface, const char *name);

 virtual ~osSharedMemory();

 // methods for getting memory block parameters [name, pointer to the block,
 // and whether this is the owner (creator) of the block]
 char *Name() const;
 void *Pointer() const;
 int Owner() const;
protected:
 ipcSharedMemory *myInterface; // pointer to the interface instance
 unsigned long myID; // id of memory block
 char *myName; // shared memory block name
 int isOwner; // flag indicating owner

 void *myBlock; // pointer to the memory block

 // methods for handling the memory block
 void CreateBlock(long blocksize);
 void OpenBlock();
 void CloseBlock();
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 osSharedMemory(const osSharedMemory&);
 osSharedMemory& operator=(const osSharedMemory&);
};
#endif





Listing Four

//
****************************************************************************
// Module: ipcqueue.h -- Author: Dick Lam
// Purpose: C++ class header file for ipcMessageQueue
// Notes: This is a base class. It is the interface class for creating and
// accessing a message queue that handles messages between processes.
//
****************************************************************************

#ifndef MODULE_ipcMessageQueueh
#define MODULE_ipcMessageQueueh

// forward declaration
class osMessageQueue;

// class declaration
class ipcMessageQueue {

friend class osMessageQueue;

public:
 // constructor and destructor
 ipcMessageQueue(const char *name); // unique name to create queue
 ipcMessageQueue(const char *name, // name of queue to open
 unsigned long powner); // process id of queue owner
 virtual ~ipcMessageQueue();

 // methods for accessing the queue and queue parameters [name, queue id,
 // queue owner process id, and whether this is the owner (creator)]
 char *Name() const;
 unsigned long ID() const;
 unsigned long Pid() const;
 int Owner() const;
 // read/write methods for the queue (only a queue owner may read from
 // the queue, and only queue clients may write to a queue)
 virtual int Read(void *data, long datasize, int wait = 0);
 virtual int Write(void *data, long datasize);

 // methods to examine and remove messages from the queue (owner only)
 virtual unsigned long Peek();
 virtual int Purge();

 // class version and object state data types
 enum version { MajorVersion = 1, MinorVersion = 0 };
 enum state { good = 0, bad = 1, badname = 2, notfound = 3, notowner = 4,
 notclient = 5, readerror = 6, writeerror = 7, badargument = 8 };
 // methods to get the object state
 inline int rdstate() const { return myState; }
 inline int operator!() const { return(myState != good); }
protected:
 osMessageQueue *myImpl; // implementation
 state myState; // (object state (good, bad, etc.)
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 ipcMessageQueue(const ipcMessageQueue&);
 ipcMessageQueue& operator=(const ipcMessageQueue&);
};
#endif





Listing Five

//
****************************************************************************
// Module: ipcqueue.C -- Author: Dick Lam
// Purpose: C++ class source file for ipcMessageQueue
// Notes: This is a base class. It is the interface class for creating and
// accessing a message queue that handles messages between processes.
//
****************************************************************************

#include "ipcqueue.h"
#include "osqueue.h"

//
****************************************************************************
// ipcMessageQueue - constructor for server
ipcMessageQueue::ipcMessageQueue(const char *name)
{
 // init instance variables
 myState = good;
 myImpl = new osMessageQueue(this, name);
 if (!myImpl)
 myState = bad;
}
//
----------------------------------------------------------------------------
// ipcMessageQueue - constructor for clients
ipcMessageQueue::ipcMessageQueue(const char *name, unsigned long powner)
{
 // init instance variables
 myState = good;
 myImpl = new osMessageQueue(this, name, powner);
 if (!myImpl)
 myState = bad;
}
//
----------------------------------------------------------------------------
// ~ipcMessageQueue - destructor
ipcMessageQueue::~ipcMessageQueue()
{
 delete myImpl;
}
//
----------------------------------------------------------------------------
// Name - returns the name of the queue
char *ipcMessageQueue::Name() const
{
 if (!myImpl)
 return 0;
 return myImpl->Name();
}
//
----------------------------------------------------------------------------
// ID - returns the queue id
unsigned long ipcMessageQueue::ID() const
{
 if (!myImpl)
 return 0L;
 return myImpl->ID();
}
//
----------------------------------------------------------------------------

// Pid - returns the process id of the Queue owner
unsigned long ipcMessageQueue::Pid() const
{
 if (!myImpl)
 return 0L;
 return myImpl->Pid();
}
//
----------------------------------------------------------------------------
// Owner - returns 1 if this is the owner (creator), and 0 otherwise
int ipcMessageQueue::Owner() const
{
 if (!myImpl)
 return 0;
 return myImpl->Owner();
}
//
----------------------------------------------------------------------------
// Read - reads a message from the queue (queue owner only)
int ipcMessageQueue::Read(void *data, long datasize, int wait)
{
 if (!myImpl)
 return bad;
 return myImpl->Read(data, datasize, wait);
}
//
----------------------------------------------------------------------------
// Write - writes a message to the queue (queue clients only)
int ipcMessageQueue::Write(void *data, long datasize)
{
 if (!myImpl)
 return bad;
 return myImpl->Write(data, datasize);
}
//
----------------------------------------------------------------------------
// Peek - returns the number of entries in the queue
unsigned long ipcMessageQueue::Peek()
{
 if (!myImpl)
 return 0L;
 return myImpl->Peek();
}
//
----------------------------------------------------------------------------
// Purge - removes all entries from the queue
int ipcMessageQueue::Purge()
{
 if (!myImpl)
 return bad;
 return myImpl->Purge();
}




Listing Six

//
****************************************************************************
// Module: osqueue.h -- Author: Dick Lam
// Purpose: C++ class header file for osMessageQueue
// Notes: This is a base class. It contains general implementation methods
// for message queues for sending messages between processes.
//
****************************************************************************


#ifndef MODULE_osMessageQueueh
#define MODULE_osMessageQueueh

#include "ipcqueue.h"

// forward declaration
class ipcEventSemaphore;

// class declaration
class osMessageQueue {

public:
 // constructors and destructor
 osMessageQueue(ipcMessageQueue *interface, const char *name);
 osMessageQueue(ipcMessageQueue *interface, const char *name,
 unsigned long powner);
 virtual ~osMessageQueue();

 // methods for accessing the queue and queue parameters [name, queue id,
 // queue owner process id, and whether this is the owner (creator)]
 char *Name() const;
 unsigned long ID() const;
 unsigned long Pid() const;
 int Owner() const;

 // read/write methods for the queue (only a queue owner may read from
 // the queue, and only queue clients may write to a queue)
 virtual int Read(void *data, long datasize, int wait);
 virtual int Write(void *data, long datasize);

 // methods to examine and remove messages from the queue
 virtual unsigned long Peek();
 virtual int Purge();
protected:
 ipcMessageQueue *myInterface; // pointer to the interface instance
 unsigned long myPid; // process id of queue owner
 unsigned long myID; // id of queue
 char *myName; // queue name
 int isOwner; // flag indicating owner

 ipcEventSemaphore *mySem; // required for OS/2 only

 // methods for handling the message queue
 void CreateQueue();
 void OpenQueue();
 void CloseQueue();
private:
 // private copy constructor and operator= (define these and make them
 // public to enable copy and assignment of the class)
 osMessageQueue(const osMessageQueue&);
 osMessageQueue& operator=(const osMessageQueue&);

};
#endif








A Cross-Platform Binary Diff 


Seeing how one binary file differs from another




Kris Coppieters


Kris is manager of the service and support department of Logic, an
AppleCentre/IBM R/6000 VAR/Novell Partner. He can be reached on CompuServe at
100025,2724.


Binary file comparison is useful for many applications. One example is sending
updates of large files over a communications line: Instead of sending a
complete update each time, you could send the complete file once, then create
a diff file containing the differences between the original file and the
updated file. At the receiving side, this diff file could be used to update
the original file. Creating a diff file is processor and memory intensive.
Under DOS, such a process can easily exceed the 640K limit. On the other hand,
using a diff file to update the file is very lightweight and fast. In such
cases, it may be desirable to create the diff file on another platform and use
the resulting file under DOS.
Such were the requirements when I created BinDiff, a utility that
intelligently compares two versions of a binary file and creates a diff file.
The algorithm in BinDiff tries to find equal chunks in the two files being
compared. BinDiff then uses an indexing algorithm to find matching chunks so
equal chunks need not be in the same sequence in both files. BinDiff is built
from a single C source file that compiles on UNIX, OS/2, DOS, and the
Macintosh. A command-line user interface is used on the first three platforms,
while a point-and-click interface is used on the Mac. Because BinDiff is
insensitive to Endianness, diff files created on one platform can be used on
another.


Chunking the Binary File


Many diff utilities only work with pure text files because they depend heavily
on the concept of a "line" denoted by some kind of delimiter. Dividing files
into lines lets you index the lines in one file, then use the index to find
matches with lines in another file. A binary file, however, has no such
delimiter, making it harder to index. I chose an artificial line delimiter (a
single byte code between 0 and 255) to divide a binary file into fake lines,
which can be indexed and processed just like lines in a text file. 
The algorithm to choose a suitable delimiter involves simple statistics. For
each possible byte value, I calculate the mean length and standard deviation
of the lines in the file. From the byte values with a mean block length
between 20 and 80 bytes, I choose the one with the lowest standard deviation.
If there is no such byte, I gradually loosen the limits on the block length,
trying values from 20 to 130, 20 to 180, and so on. In the rare cases that
this does not help, 0 is used as the byte value. Choosing the lowest standard
deviation yields the byte that most evenly divides the file; the blocks are
more or less the same length. I tested this algorithm on a number of text
files, and in many cases, the most suitable delimiter coincided with the
actual line delimiter (CR or LF). 


Indexing and Matching


Once both files are divided into chunks, BinDiff builds an index and begins
matching chunks from both files. I've arbitrarily chosen to index the original
file (<file.old>). I read the file and create an unbalanced binary tree of
chunks. Each tree node contains a left and right pointer. In each node, I also
store some bytes from the file to speed up comparisons. The first few bytes to
compare do not have to be read from disk. I also add sequences of at least ten
equal bytes to the chunk-index tree, regardless of delimiters.
At this point, BinDiff can match the updated file (<file.new>) against the
index tree. The updated file is read and chunked using the same delimiter.
Each chunk or run of at least ten equal bytes is looked up in the tree. Every
match of sufficient length is expanded as much as possible and stored in a
linked list of matches. A match can be smaller than a complete chunk. If six
or more bytes match, the match is stored. Due to the file format used, matches
of less than six bytes increase the size of the diff file instead of reducing
it. I try to expand a match both forward and backward, because the two files
may already match prior to the location where the match is found, and they
might continue to match after the delimiter that ends the chunk; see Figure 1.


Distance, Expanding, and Enclosing


The "distance" of a match is the difference between the file positions in the
original and updated files. If a match is at the same position in both files,
for example, its distance is 0. A match is an "expanded" form of a smaller
match if the starting position of the smaller match is within the larger match
and if the distance of both matches is the same. A smaller match is "enclosed"
by a larger match if the corresponding smaller chunk in the updated file is
enclosed in the larger chunk. One match can enclose another without being
expanded: Enclosing matches have unequal distances; expanded matches have
equal distances.
The linked list of matches is kept sorted on starting position in the updated
file. Before expanding the match, the linked list is checked. If the match is
already present in an expanded form, it is not expanded or added; if no
expanded form of the match is present, it is expanded and inserted into the
linked list. While inserting the new match, all matches enclosed by it are
removed from the list. The new match is better because it is bigger. After the
match list is built, the complete match list is reviewed. Overlapping matches
are cut off, and if a match drops below six bytes, it is removed from the
list.


Writing the Diff File


The diff file consists of a header and a sequence of tagged entries. The
header contains a signature, file sizes, checksums, and data specifically for
Macintosh files (the type and the creator of the updated file). Some tags are
used as headers to data included in the diff file. Other tags encode
references to data present in the original file. 
Finally, the diff file is written. The tags are four bits in size and are
contained in the lower four bits of a byte. The upper four bits, together with
zero, one, or two extra bytes, contain a chunk size. There is also a block of
bytes or two, three, or four extra bytes to encode a chunk file position
within the original file.
A 4-bit field is used to encode sizes 1--16 bytes (size 0 is never used); a
12-bit field for sizes 17--4112 (4096+16); and a 20-bit field for sizes
4113--1,052,688. Larger chunks are encoded by using more than one tag.
Depending on the size and location of a chunk, I use either a short or a long
code: References to small chunks near the start of the file are encoded in
three bytes (4-bit tag and 4-bit size, two bytes for file position). The
largest reference to a big chunk near the end of the file can be seven bytes
(4-bit tag, 20-bit size, 4 bytes for file position), but most will not exceed
six.
The diff file contains two separate, sequential diff files. On DOS, UNIX, and
OS/2, the second diff file is always empty. On the Macintosh, the file
contains the diff information for the resource forks (if present).


Platform Specifics


Macintosh files differ in that they are composed of the data fork and the
resource fork. The data fork corresponds to a file on the other platforms. On
a low level, the resource fork can be seen as just another file. On a higher
level, it is used by the Macintosh OS for maintaining a database-like
structure of resources. In BinDiff, the resource fork is viewed as a second
file. To keep the code as portable as possible, I created new versions of
standard file I/O functions like fopen, fseek, fread, and fwrite. My version
of fopen has an extra option that lets you specify which fork to open.
Macintosh diff files from the data fork are usable on other platforms; those
from the resource fork are not.
On many UNIX platforms, a K&R C compiler is available. BinDiff uses double
headers: Each function has an ANSI header and a K&R header. If an ANSI
compiler is available on a particular UNIX platform, it can be activated with
one of the conditional compilation switches. Note that the UNIX version has no
progress bar--BinDiff simply displays a message after startup.


Putting it Together



BinDiff's complete C source code and project files are available
electronically; see "Availability," page 3. I've tested BinDiff with Symantec
C++ 7.0 and MetroWerks C++ 1.0 on the Macintosh, Borland C++ 3.1 under DOS,
Borland C++ 1.0 on OS/2, and a K&R C compiler on A/UX. For the Macintosh
version, you'll also need the file BinDiff.rsrc, which contains the resources
for a dialog-window layout. The Symantec project file should contain:
BinDiff.c, ANSI++, CPlusLib, and MacTraps. I put the library files in a
separate segment. I use B1dF as creator and .DIF as file type. Because the
standard file functions such as fopen and fseek are already present in ANSI++,
I define the corresponding functions FOPEN, FSEEK, and so on, and use macros
to convert lowercase functions to their uppercase variants (for example,
#define fclose(x) FCLOSE(x)). This prevents linking problems.
Using Borland C++ under DOS and OS/2, you must create a project that includes
BinDiff.c, and change the settings so that BinDiff.c is compiled in C++ mode.
On most UNIX systems, you can compile with the following command line:cc
bindiff.c --o bindiff --lm.
The source file contains multiple versions of the program; you can change
compilation variables according to the version being compiled. By setting
BDEXTR to 1 instead of 0, you compile a reduced version of BinDiff, called
"BdExtr," that contains only the code for applying a diff file to an original
file. Setting one of the values BorlandC, MacC, or StdUnixC to "1" identifies
the compilation platform. The routine ScanFile scans through a file and
calculates the mean block size and standard deviation for each byte value
0--255 if this byte were to be used as a delimiter. FindDelimiter checks the
tables built by ScanFile and chooses a suitable delimiter.
BuildTree scans a file and builds the chunk index tree. ExtendMatch extends a
match forward and backward, until the first nonmatching bytes or the file
limits. MatchFiles matches the second file to the first file's index tree.
DumpDiff writes the tagged diff file. Depending on the platform, part of the
routine is executed once (DOS/OS/2/UNIX) or twice (for both forks of a
Macintosh file). SubtractFiles is the highest-level routine to create a diff
from an original and an updated file. AddFiles is the highest-level routine to
apply a diff to an original file in order to create an updated file.


Conclusion


You can optimize BinDiff in several ways. For instance, the unbalanced tree
can become a balanced tree. This yields better performance with already-sorted
files (such as sorted text files). Next, consider the possibility of using the
zero delimiter when no good delimiter is found. The zero default is probably
one of the worst choices, but it is very rarely used, normally occurring only
on very small files.
Also, more data could be read into memory. Currently, only a very small part
of the file is read into the nodes of the tree, making the algorithm rather
dependent on disk I/O performance. When a lot of memory is available, more of
the file should be read into the tree. Another optimization is to use CRC
instead of the simple checksums used for checking the files. CRCs give more
security against using a diff on a nonmatching original file, and against diff
file corruption. Finally, the diff file could be compressed.
Figure 1 Matching both forward and backward.


















































MapMan: Building Windows Symbols Files


Rolling your own symbols for 16-bit Windows




Joseph Hlavaty


Joe is a systems programmer at a major hardware vendor. He is a graduate of
Georgetown University and currently lives and works in the Washington, DC
area. He can be contacted at jhlavaty@aol.com.


Almost every Windows programmer has wished for more symbols than those shipped
with the Windows SDK. Some of you might have even needed to debug another
application because it conflicts with your app. In this article, I'll present
a tool that lets you build .SYM files for any 16-bit Windows executable,
including the DLLs that make up Windows itself. I call this tool "MapMan,"
short for "Windows map-file manager."
MapMan runs on any 16-bit application for the Windows operating environment.
I've used many Windows 3.1 binaries in my test suite, including Write,
ProgMan, ClipBrd, Notepad, Krnl386, and User. MapMan also runs on Win-OS/2 2.x
and 3.0 applications (and can even be used on 16-bit OS/2 executables such as
EPM.EXE in OS/2 Warp). Although MapMan is currently a real-mode DOS program, I
intend to port the application to Windows.
In this article, I'll refer to a number of Windows features--DOS executable
headers, new executable headers, names tables, resident names tables, and the
like--which you may or may not be familiar with. For your convenience, a
discussion of these terms is provided electronically, along with the source
code, executables, and related MapMan files; see "Availability," page 3.
As most Windows programmers are aware, any procedure that will be called
external to the application must be exported. One such exported procedure is a
wndproc (or window procedure), which is not called directly by the
application, but rather by Windows. Exporting a function simply means adding
it to a few internal tables of the NE header, making it accessible to any
module, either by name or ordinal. This exporting process is much like adding
a chapter title to the table of contents of a book; without the chapter title
in the table of contents, it might be impossible to find the chapter by simply
skimming the book. Even if you were successful, you would probably find your
search very time consuming. Likewise, if Windows were not able to look up your
application's wndproc in your module's list of exported functions, it would be
difficult or impossible for Windows to call the procedure (to send it a
message, for example).
You can export functions by placing the function names in the Exports section
of an executable's module-definition (.DEF) file. Often, a compiler-dependent
keyword can be used in a function definition (for example, _export) to export
a function without a .DEF file entry. My source code uses the
compiler-independent Exports-section method to define exported entry points.
The sample .DEF in Example 1 contains exported functions, a Name field, a
Description field, and an Exports section, each of which is part of either the
resident or the nonresident names table in the NE header. The CODE and DATA
keywords, of course, define the application's segments, found in the segment
table in the NE header. And, yes, flags such as the EXE TYPE keyword as well
as other keywords such as HEAPSIZE and STACKSIZE all resolve to sections of
the NE header.


Map Files


A map file is an ASCII text file containing information that maps (or
identifies) pieces of a module by symbolic value to addresses in the module's
segments. Consider the map file in Figure 1, generated by the Microsoft 5.1
linker using object modules built with debugging information using the --Zi
option. This map file is linker specific; other linkers may generate different
files. (Table 1 provides an overview of the sections of a Microsoft linker
.MAP file.)
At the top of the .MAP file, you'll see the module name (TRAPMAN in Figure 1).
The second section of the .MAP file contains a description of the segments in
the application. I've condensed it since the original divided this small
application's two segments into over 30 pieces. The number to the left of the
":" is the segment number (in hex). This application has two segments: one of
type CODE, the other of type DATA. The first segment has 2bf2h bytes (decimal
11250), which is 25d0h+622h (the start of the last section plus the length of
the last section). The second segment has 0b3ah bytes (decimal 2874), which is
930h+20ah. 
The third section of the .MAP file gives the DGROUP of the module. Normally,
all code segments of an application are shared across multiple instances.
Windows will only load one copy of the code and read-only data for any and all
copies of the application in memory at any one time. In an application that
uses the DATA MULTIPLE keyword in its .DEF file, however, each instance of the
application will have its own private DGROUP segment (as the DGROUP is both
readable and writable). This third section denotes which segment number (in
hex) is the DGROUP. By Microsoft convention, the last segment in a module is
the DGROUP.
The fourth section of the .MAP file is the list of exported functions found in
this module. This application has two: one for the About box, another for the
main window. Both are wndprocs and must be exported so that Windows can call
them. Again, all offsets are in hex, so the About-box routine is actually 1564
bytes into the first segment of the application.
The fifth section of the .MAP file contains the public symbols sorted by name.
Public symbols are those symbols known to the linker. In other words, a public
symbol can be used within any executable module, not just the source file in
which it is defined. In C, for example, functions normally have such external
linkage.
This application was built with debug information, so the linker had much more
information than it normally would to put in the .MAP file. The About-box
routine is a Pascal routine, as required by Windows (the lack of a preceding
underscore hints at this). The routine is in the first segment at offset 61ch.
Nothing about exporting a function, however, requires the Pascal calling
convention. A CDecl function, for example, can be exported, but if you were to
export a CDecl wndproc, Windows would be unable to call the function
successfully. Windows normally assumes that exported functions have Pascal
calling conventions, that is, that the function called will clean its
parameters off the stack (usually with a RET N instruction). If the function
is CDecl, it assumes that the caller will clear parameters from the stack. The
stack will be left in an unstable state if the called function did not have a
void parameter list. A few Windows exported functions are CDecl by necessity
(wsprintf(), for example), as the Pascal calling convention doesn't support
variable argument functions.
The next line contains 0:0 for an address. The null value denotes this as a
far pointer requiring "fixup." At link time, the linker has no idea at what
segment:offset value the MESSAGEBOX routine will be found; it knows only that
MESSAGEBOX is in the USER module with ordinal 1. The Windows loader must
replace occurrences of MESSAGEBOX in the application with the appropriate
selector:offset to the function in memory.
The MYFARPROC and MYODS functions are actually assembly-language functions
(all uppercase with no leading underscore) marked as public symbols with the
PUBLIC keyword in the assembly source file.
The function _DPMIAllocateLDT-Descriptors is C code (note the leading
underscore and use of case in the symbol name).
Lastly, notice the symbol __astart in the last line of this table. __astart is
the actual entry point--the first piece of code executed by Windows when it
launches a new instance of the application. As the double leading underscore
indicates, this is part of the C run-time library for my compiler. Double
underscores are used for public C-library functions to avoid name collisions
with nonlibrary source code. Standard library functions in C are an exception;
strlen(), for example, has only a single underscore in this table. The
About-box routine also appeared in the list of exported functions. All
exported functions can be thought of as public symbols, so the exports are
also in this list. 
The sixth section of the .MAP file contains symbols identical to those in
section five, but sorted by address (although the section is called Publics by
Value in the .MAP file). It's convenient that the linker gives it to us both
ways. If you've broken into a debugger because your app has just trapped and
you're staring at CS:IP=103f:0886, it's helpful to know that part of your .MAP
file is sorted by address. If you're trying to find the segment and offset of
one of your symbols to set a breakpoint by address in that same debugger,
you'll appreciate the .MAP file being sorted by name. (Who said you can't have
your cake and eat it, too?)
My version of MapSym, however, considers only the Publics by Value section
essential. For reasons I haven't explored, removing the Publics by Name
section makes the resulting .SYM file slightly smaller, and MapSym doesn't
complain. Remove the Publics by Value section and MapSym will refuse to build
a .SYM file, giving a message to relink the executable. Since the two tables
should differ only in order and not in content, there's probably no reason for
MapSym to look at both.
The seventh section of the .MAP file contains Program entry point at
0001:25E1. This segment:offset is that of the __astart() library function for
this application. Our WinMain() is not called directly by Windows. It will be
called by the C library code that, in this case, is the Windows entry point
for the application.


The MapMan Program


MapMan was built with the Microsoft C 6.x compiler and generates .MAP files
compatible with Microsoft linker .MAP files and the Microsoft MapSym utility
used to create symbol files. To create symbol files for a different linker,
you may need to modify MapMan's output to match your particular linker's
output. This should not be time consuming, as the largest piece of work in
MapMan consists of the functions specific to the NE header, which are compiler
and linker independent (at least for our purposes).
The future Windows version of this application will reuse all source code
except for that which is platform specific (in mapdutil.c). For this reason, I
have my own TYPEDEFs for BOOL, WORD, and other standard Windows types (this is
a DOS app, of course, and won't include any Windows header files). I also have
wrapper functions around all calls to the standard C library, such as
printf(). This is because the Windows version will not call printf(), but some
other function instead (probably a file-system related function, but it may
simply append to a buffer in memory). It won't be writing to STDOUT with
printf(). I intend to use this DOS version of MAPMAN.EXE as the stub for the
Windows version, making an app that will run in either DOS or Windows. 
Our task is to create a .MAP file similar in form and function to that
generated by a Microsoft linker and acceptable to the Microsoft MapSym
symbol-file generator; see Figure 2. Since executables are binary files,
you'll need to open the executable for binary read, then parse the MZ and NE
headers, if any are found.
The overall flow of the MapMan executable is simple, as evidenced by the
main() routine. First, any user-supplied arguments are processed. Then, if a
name was given, you allocate a buffer and attempt to load a file by that name
into the buffer for processing. Finally, you free the buffer and return to
DOS.
The LoadExe() routine is no more complicated: It opens the file (as binary)
and loads it into the previously allocated buffer. At this step, pBuffer (a
pointer to the start of the allocated buffer) points to the beginning of the
file. If the file is a valid Windows executable, then you'll find at the start
of the buffer an MZ (old-style) executable header.
You call the SetMZ() function to set the pointer to the MZ header (pMZ) and
validate the new pointer by checking its signature. If SetMZ() returns False,
then the file loaded has no valid MZ header. It might be a .COM file or simply
a data file. In any case, you can exit after warning the user.
If you have a valid MZ header, then you must also verify that you have loaded
a valid Windows executable. You do this by verifying that an NE header exists
after the MZ header. If the MZ header relocation-table address is less than
0x40 (64 decimal), then no Windows header exists. Once again, exit after
warning the user.
Otherwise, call SetNE() to set and validate the pointer to the NE header (pNE)
that you'll use for further processing. If this routine returns False, no
valid NE header exists and you exit (after warning the user). If you do have a
valid NE header, begin processing it to create the internal structures needed
to build the map file.


Generating a Map File


Map-file generation is a two-stage process: First, you create internal
representations of the structures that you'll need from the NE; then you build
a .MAP file from them. The first stage of the process is found in the calls to
BuildResidentNamesTable(), BuildNonResidentNamesTable(), BuildEntryTable(),
and BuildSegmentTable(). The first three routines create and modify the list
of entry points pointed to by pEntryHead. The last one creates a list of
application segments pointed to by pSegmentHead. With just these two pointers,
you'll have most of the information needed to build a .MAP file.

BuildResidentNamesTable(), the resident names table, is pointed to by the
pResident pointer in the NE header structure. This pointer is based off the
start of the NE header. You can say that the ResidentNames table begins
pResident bytes after pNE. Remember, however, that the rules of C pointer
addition require you to assign pNE to a pointer to char to add pResident to
it. 
It's convenient to parse the resident names table first, so that the module
name needed to generate the .MAP file is the first element of our entry-point
list. This module name has an ordinal value of 0; it does not have a
corresponding entry in the entry table.
Windows applications often have no resident APIs. In such cases, the resident
names table still exists; its first and only entry contains the module name of
the exe- cutable. Remember that an API will show up in either the resident
names table or the nonresident names table, but not both.
Unfortunately, any structure written to map a resident names entry is
inherently unusable because there is a variable-length structure in the middle
of it! The structure begins with a length byte, n, followed by n characters--a
string (although it is not a valid C string because it is not terminated by a
null character) that names the exported function. A word follows the name
which gives the ordinal value (the number of the entry-table entry that
corresponds to this exported function).
For each entry in the table, you create a new entry-point node by calling
MakeEntryPointElement(). An entry-point node contains the following fields: a
pointer to a null-terminated function name, an API ordinal value, a segment
number, and an offset. Any value not yet known is set to INVALID_VALUE so that
you do not use it by mistake. Currently, segment number and offset are
invalid, as these values will not be known until we parse the entry table
later. The resident names table ends when a length byte of 0 is found, and in
this case the name and ordinal fields do not exist. The same holds for the
other tables discussed in this article--a length byte of 0 represents the end
of the current table, with none of the usual fields following.
There is only one nonstandard part to this function. We create a
null-terminated API name by overwriting the first byte of the ordinal number
in our buffer after first saving the value of the ordinal in our entry-point
element list. Of course, this requires that the pointer saved also point to
the first character of the string (the length byte must be skipped). Then you
won't have to reallocate a second piece of memory to hold a new
null-terminated character (or ASCIIZ) string. You need an ASCIIZ string so
that standard-library functions in DOS can be called--those that take strings
require their strings to be terminated with a null character.
BuildNonResidentNamesTable(), the nonresident names table, is pointed to by
the pNRes pointer in the NE header structure. The BuildNonResidentNamesTable()
function is nearly identical to BuildResidentNamesTable(). The only difference
is that the nonresident-names-table offset is based on the start of the MZ
header (that is, the start of the executable) to facilitate easy access. Once
the operating system has saved the pNRes offset to the table, it can reload
the table by opening the file again and seeking for pNRes bytes. In this case,
there is no need to find the NE, as the pointer goes directly to the table you
want.
BuildNonResidentNamesTable() also adds entries to the entry-point list. Once
this function completes, all exported entry points in the Windows module have
been placed on the entry-point list with their ordinals. The first entry in
the nonresident names table is the module description. Nonentry points, like
module name and module description, have a 0 ordinal and will not have an
entry in the entry table.
BuildEntryTable(), which builds the internal representation of the entry in
the entry table, is pointed to by the pEntry pointer in the NE header
structure. The pEntry pointer is based on the start of the NE header. When an
application exports a function to another module, this entry table associates
the name and number of the exported function to an actual segment:offset in
the exporting module. The NE header also has the cbEntry field, which gives
the size in bytes of the entry table. The first entry in the entry table
corresponds to application ordinal 1. All entry-table elements are ordered
sequentially by ordinal.
An entry-table entry can consist of various combinations of four structures.
It begins with a single byte, which is the count of records in this entry (or
0 at the end of the table), and is followed by a byte that describes the
records in this entry. This second byte can have several meanings. If the
second byte is 0, then the first byte is a count of entries to skip, so this
count must be added to the current entry number to calculate a new entry
number. This optimization reduces the size of the entry table by replacing
anywhere from 1--255 entries with only two bytes. The VGA.DRV that comes with
my copy of Windows 3.1 has several such skip counts in its entry-table
entries.
The second reserved value is 0xFE (254 decimal), which denotes that this entry
is a data value. Such data values must be extracted via GetProcAddr() and will
not be available via symbolic name. 
The last reserved value is 0xFF (decimal 255), which marks this group of entry
records as belonging to a movable segment. A movable entry-table entry
contains a byte of flags, an int 3fh instruction, the actual segment number,
and the offset of the particular entry. 
Any other values in the second byte give the segment number of the segment
containing these elements. This segment is a fixed (not movable) segment, and
the entry structure contains only flags and an offset.
Once you have extracted the information for the current entry, you call
GetEntryPointElement() to find the name associated with the current
entry-table index (remember, this is the same as the API ordinal). You then
update the entry-element list with the entry-point segment:offset from the
current entry-table record. Continue processing entry-table records until the
count is 0, or until you've read more bytes than the cbEntry field in the NE
header. Note that the count of records is often greater than 1 (a single
grouping can have multiple records, depending on how the ordinals are laid
out).
BuildSegmentTable() builds a segment table for the .MAP file. To do this, you
need to walk the segment table in the NE. This table is pointed to by pSegment
(based on the start of the NE header) and has a size of cbSegment*8. Each
segment record has four fields: offset, length, flags, and minimum size, and
there are cbSegment segment records in the segment table.
You allocate a block of memory large enough for the segment table, and the
pSegmentHead variable points to the start of that block. You then walk the
block by considering the pointer to point to an array of cbSegment segment
records. 


Building a .MAP File


The process of constructing a .MAP file is found in BuildMapFile(). There's no
real conceptual complexity, but the demands of Microsoft's MapSym symbol-file
generator complicate the process. For example, MapSym is unable to find the
entry point of a program if the .MAP file used does not contain the text
"Program entry point at" at the beginning of a line. Likewise, a section with
the heading Publics by Value must be in the .MAP file or MapSym will not
accept the .MAP file as valid for symbol-file generation. In the interest of
simplicity, MapMan does not sort the Publics by Name or Publics by Value
sections of the .MAP file.
As Table 1 shows, the map file requires the following sections in a specific
format: module name, segment table, DGROUP, exported entry points, public
symbols by name, public symbols by value, and application entry point. Figure
3 is an example .MAP file generated by MapMan for the application whose .MAP
file appears in Figure 1.
To generate .MAP files for other compilers, you'll need to update five
functions: four in mapllist.c (DumpEntryPointList(), DumpSegmentList(),
DumpPublicsByName(), and DumpPublicsByValue()), which refer to private data
structures found in that module; and the main function BuildMapFile() in
mapmkmap.c. 


Conclusion


I've found MapMan to be a useful Windows debugging tool by itself. But MapMan
also lets you modify the generated .MAP file to add new symbols for use in
your .SYM file. Just be sure to put your new symbols and addresses in the
Publics by Value section if you want MapSym to see them.
For example, I took hExeHead and CurTDB from the "THHook" section of
Undocumented Windows, by Andrew Schulman et al. (Addison-Wesley, 1992), and
added them to a .MAP file I generated for KRNL386.EXE with MapMan. You know
from the exports table that "THHook" is at 4:0218. From Undocumented Windows,
you find that hExeHead=THHook+0x04 or 0x21c and CurTDB is at THHook+0x10 or
0x228. Consequently, I added these lines (again, to the Publics by Value
section) to my map file, recompiled with MapSym, and ran my debugger.
0004:021C myhExeHead
0004:0228 myCurTDB
I could then dump hExeHead and CurTDB with complete symbols, using the names
that I had given in my generated .MAP file.
Example 1: A sample .DEF file.
NAME TRAPMAN
DESCRIPTION 'J. Hlavaty: Windows GP handler for debugging'
EXETYPE WINDOWS
PROTMODE
CODE LOADONCALL NONDISCARDABLE
DATA PRELOAD MULTIPLE
HEAPSIZE 1024
STACKSIZE 8096
EXPORTS
 MainWndProc @1
 About @2
Table 1: Microsoft linker .map file sections. Spacing, case, and terminology
in many of the section headers are mandatory. MapSym rejects the .MAP file if
it cannot find the string "Publics by Value", for example with an obscure "No
public symbols" error message.
 Section Contents 
1 Module name of executable
2 Segment table
3 DGROUP
4 Exported entry points
5 Public symbols by name
6 Public symbols by value
7 Application entry point
Figure 1: A sample .MAP file.
 TRAPMAN
 Start Length Name Class
 0001:0000 013AEH TRAPMAN_TEXT CODE

 0001:13AE 001A4H DPMI_TEXT CODE
 0001:1560 01069H HANDLER CODE
 0001:25CA 00000H TRAPDATA_TEXT CODE
 0001:25D0 00622H _TEXT CODE
 0002:0000 00000H DATA DATA
(portions removed to conserve space)
 0002:0930 0020AH c_common BSS
 Origin Group
 0002:0 DGROUP
 Address Export Alias
 0001:061C About About
 0001:0322 MainWndProc MainWndProc
 Address Publics by Name
 0001:061C About
 0000:0000 Imp MESSAGEBOX (USER.1)
 0001:16C2 MYFARPROC
 0001:2595 MYODS
 0001:1422 _DPMIAllocateLDTDescriptors
 0001:25E1 __astart
 Address Publics by Value
(removed to conserve space)
Program entry point at 0001:25E1
Figure 2: The .Map-file creation process.
1. Load header information of executable into memory or exit (on failure).
2. Verify valid MZ header or exit.
3. Verify valid NE header or exit.
4. Build internal representation of resident names entries.
5. Build internal representation of nonresident names entries.
6. Add entry-table information to internal representation from Steps #4 and
#5.
7. Build internal representation of segment table.
8. Write .MAP file format to STDOUT (can be redirected to a file).
Figure 3: MapMan-generated .MAP file for the same application as Figure 1.
 TRAPMAN
 Start Length Name Class
 0001:0000 02BF2H Seg1_TEXT CODE
 0002:0000 00B3AH Seg2_DATA DATA
 Origin Group
 0002:0 DGROUP
 Address Export Alias
 0001:0322 MAINWNDPROC MAINWNDPROC
 0001:061C ABOUT ABOUT
 Address Publics by Name
 0001:0322 MAINWNDPROC
 0001:061C ABOUT
 Address Publics by Value
 0001:0322 MAINWNDPROC
 0001:061C ABOUT
Program entry point at 0001:25E1















Lightweight Tasks in C


A robust implementation in a few lines of code




Jonathan Finger


Jonathan is a Boston-area consultant with many years of experience programming
in C and Mumps. He can be reached at jfinger@hmm.com.


Modern operating systems allow multiple processes to exist at the same time,
and each process can contain more than one thread of execution. For operating
systems that do not explicitly support threads, you can build a multithreading
package in a high-level language such as C, which allows for cooperatively
multitasked threads within a single process. 
One good commercial implementation of this is the Multi-C package from MIX
Software, which provides process scheduling, process synchronization, and
message passing. However, for a variety of reasons, you may want to roll your
own lightweight tasker in C. For instance, one project I was involved in
required porting a multiuser mailing-list management system from its original
minicomputer platform to a PC/DOS system. I decided to implement a
multitasking system in C to work in place of the original minicomputer
operating system. My system serves as the foundation for the multiuser
application, which consists of thousands of lines of production code. In this
article, I'll present a simplified version of my multithreading code. This
code is not a toy--the original version has been used extensively in a
commercial environment for several years. The code runs over DOS and can be
compiled with either the Watcom 10.0 32-bit compiler using the Tenberry
Software (formerly Rational Systems) DOS extender or the Microsoft C7.0 16-bit
compiler. 


Processes and Tasks


Conceptually, a process is a program in execution, plus its associated
environment. The operating system isolates processes so they don't interfere
with each other; often, processes can communicate with each other using
primitives provided by the operating system. Because of this, you can
completely define the state of a process by specifying the values of its
variables (global and local) and a unique point in the executing code.
In a multithreading system, each thread is effectively a copy of its original
process, but, unlike heavyweight processes, threads share resources and global
variables (and can therefore interfere with each other). On a multiprocessor
machine, different processors can run threads simultaneously. 
Switching between threads can be either cooperative or preemptive. In
cooperative task switching, a thread runs until it makes an explicit call to
swap_thread(), which allows other threads to run, eventually returning to its
caller. From the point of view of the calling thread, there is no perceptible
effect, except for the passage of time.
In preemptive task switching, a thread runs until some external event (such as
a timer interrupt or a device interrupt) causes the thread to be suspended and
another thread resumed.


Implementing Cooperative Task Switching


The trick to cooperative task switching in C is that for most machines and
compilers, only three things need to be set properly: the instruction pointer,
C stack, and machine registers.
The instruction pointer or program counter marks the current place in the
running code. The C functions setjmp() and longjmp() take care of the
instruction pointer and the registers, but they unwind the stack. To handle
the stack properly, our code must explicitly copy it to a "save" region for
each thread. In my version, these routines assume that the C stack is in
contiguous memory and that it grows from the bottom; see Listing One .
As you can see in the listing, the threads struct contains information about
individual threads. The start_new_thread (*program) function creates a new
thread and stores a pointer to the program in the threadp>function element of
the threads struct.
For this code to work, the compiler must implement local variables on a stack,
the stack must be contiguous, and the stack must grow down. Fortunately, these
conditions are satisfied on most machines and C compilers, but they are not
part of the C specification. It is pretty straightforward to modify my
implementation to account for stacks that grow up; your code can also test the
direction of stack growth and act accordingly.
The routines start_new_thread() and queue_thread() can be called from an
interrupt service routine as long as mutual exclusion is ensured. (For
example, if an interrupt occurs during execution of queue_thread and you call
queue_thread, the queues may get corrupted.) In MS-DOS, you can accomplish
simple mutual exclusion by masking out interrupts during critical sections of
code.


Implementing Preemptive Task Switching


Preemptive task switching is implemented by simply switching tasks at
predefined intervals. Usually, the process receives periodic interrupts from a
timer, and the multitasker arranges things so that the interrupt returns to
the next task. There are several ways of implementing this:
Under 16-bit MS-DOS, you can use assembly language to directly manipulate the
stack.
Under 32-bit MS-DOS, such stack manipulation may or may not be possible,
depending on the DOS extender you use.
On systems with C compilers that implement signal(SIGALRM,_), the signal is
usually implemented as a call, meaning that you can use the setjmp()/longjmp()
stack-copy tricks in Listing One.


Context Switch Walkthrough


To clarify the implementation, I'll walk through a context switch. For
simplicity, I'll assume two threads (A and B) are running and that all
task-switches occur with calls to swap_thread(). You should also assume that
thread A is running and thread B is waiting to run. The run-time stack is
shown in Figure 1.
The part of the run-time stack colored light pink contains thread A's local
variables and function return address. If thread B were running, B's local
variables and function return addresses would occupy the area colored light or
dark pink, but this area of memory has been copied to thread B's
thread[B].c_stack. (For this example, assume thread B is currently using more
stack than thread A.)
Thread A calls swap_thread() to allow other threads to run. swap_thread()
returns if there is no other runable thread. Finding thread B as the next
thread in the run queue, thread A gets moved from the head of the run queue to
the tail.
Next, swap_out_thread() is called. It saves any thread globals and copies the
area of the run-time stack colored light pink in Figure 1 to thread A's stack
save-area (thread[A].c_stack). Next, setjmp() sets the point where thread A
will resume execution. setjmp returns 0, so the return value from
swap_out_thread() is 0 and swap_in_thread() is then called.
The swap_in_thread() function sets up any global variables for thread B and
then copies the saved run-time-stack data to the run-time stack, so that the
area in Figure 1 colored light or dark pink now contains thread B's stack
info. This is why swap_in_thread() has a local variable called
UCHARbuffer[thread_SWAP_STACK _SIZE]. This guarantees that the locals used by
setup_thread_globals() are in a portion of the stack not overwritten by the
memcpy(). This area is a breeding ground for potential bugs, since an
optimizing compiler may realize that buffer[] is not used and leave the stack
pointer unchanged. If the compiler inlines setup_thread_globals() into swap_in
_thread(), it becomes apparent that buffer[] is not used.
The swap_in_thread() function now executes longjmp(). The cpr-->swap
_thread_buff structure element was previously set for thread B by the setjmp()
call in swap_in_thread(), so control is transferred to the setjmp() in swap
_out_thread(). The setjmp() returns a nonzero value, so swap_out_thread()
returns to swap_thread() with a nonzero value. Since the run-time stack now
returns thread B's stack, swap_thread() returns to thread B.



Conclusion


The setjmp() and longjmp() functions in C allow implementing a bare-bones
multitasker in only a few lines of code. This straightforward code can serve
as the basis for more full-featured implementations.
Figure 1 Diagram of the run-time stack when two threads are created.

Listing One 

/* Lightweight Multitasker in C -- by Jonathan Finger, 1995 (reformatted) */

#include <setjmp.h>
#include <conio.h>
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <dos.h>
#include <setjmp.h>

#define MAX_THREAD 10
#define THREAD_SWAP_STACK_SIZE 10000
typedef short int THREAD_NUM;
typedef unsigned char UCHAR;
typedef unsigned long ULONG;
typedef unsigned int UINT;
typedef unsigned short int USHORT;
typedef unsigned char * STR;

/*----------------------the threads structure-------------------------------*/
struct threads 
{
 THREAD_NUM thread_number;
 jmp_buf swap_thread_buff;
 UCHAR c_stack[THREAD_SWAP_STACK_SIZE];/* save area for c stack */
 size_t c_stack_size; /* save state info */

 THREAD_NUM volatile next_thread; /* forward chain for queues */
 THREAD_NUM volatile prev_thread; /* backward chain for some queues*/
 UCHAR volatile queue; /* current queue that thread is */
 /* on: 'R'means RUN, 'F'is Free */
 void (*function)(void);
 /* other info can be added here as needed */
};
typedef struct threads thrd;
thrd thread[MAX_THREAD];
thrd *cpr; /* pointer to the current running thread */
/*------------------------------------------------------------------------*/
jmp_buf new_thread_start_buff;
#define NO_THREAD (MAX_THREAD+100)
THREAD_NUM current_thread = NO_THREAD;
STR stack_swap_start;
/* note that the following variables have been declared volatile, since 
 * they may be altered by an interrupt service routine */
static THREAD_NUM volatile run_queue_head;
static THREAD_NUM volatile run_queue_tail;
/*------------------------Function prototypes-------------------------*/
void init_thread_table (void);
int start_new_thread (void (*program)(void));
void free_current_thread (void);
void setup_thread_globals (THREAD_NUM thread_no, STR buff);

void save_thread_globals (void);
int swap_out_thread (void);
void swap_in_thread (void);
void swap_thread (void);
void swap_thread_block (void); /*put curr thread to sleep on event*/
void queue_thread (THREAD_NUM thread_number);
void unqueue_thread (UCHAR new_queue);
void thread1(void)
{ while (1)
 { printf("\n\rthread 1");
 swap_thread();
 }
}
void thread2(void)
{ while (1)
 { printf("\n\rthread 2");
 swap_thread();
 }
}
void thread3(void)
{ while (1)
 { if (kbhit()) exit(0);

 printf("\n\rthread 3");
 swap_thread();
 }
}
main()
{ int i = 0;
 init_thread_table();
 run_queue_head = run_queue_tail = NO_THREAD;
 stack_swap_start = (STR) &i;
 if (!setjmp(new_thread_start_buff))
 { 
 /* starts three threads */
 start_new_thread(thread1);/*should error-check return value*/
 start_new_thread(thread2);
 start_new_thread(thread3);
 swap_thread_block();
 }
 (*(cpr->function))();
 free_current_thread();
}
void init_thread_table()
{ int i = 0;
 while (i < MAX_THREAD)
 { thread[i].thread_number = i;
 thread[i].queue = 'F';
 thread[i].next_thread = i + 1;
 thread[i].prev_thread = i - 1;
 i++;
 };
 thread[MAX_THREAD].next_thread = NO_THREAD;
}
int start_new_thread(void *(program)(void))
{
 THREAD_NUM thread_num;
 thrd *threadp;
 thread_num = get_free_thread_id();

 if (thread_num == NO_THREAD) return(NO_THREAD);
 threadp = &thread[thread_num];
 threadp->c_stack_size = 0;
 threadp->next_thread = NO_THREAD;
 threadp->function = program;
 memcpy(threadp->swap_thread_buff,
 new_thread_start_buff, sizeof(jmp_buf));
 /* the memcpy copies the contents of new_thread_start_buff 
 * to the thread's swap buff so that when swap_in_thread() 
 * calls longjmp(), control returns from the setjmp in main() */
 queue_thread(thread_num);
 return(thread_num);
}
static int get_free_thread_id()

{ int i = 0;
 do 
 { while ((i < MAX_THREAD) && (thread[i].queue != 'F')) 
 i++;
 if (i < MAX_THREAD) return(i);
 } while (i < MAX_THREAD);
 return(NO_THREAD);
}
void setup_thread_globals(THREAD_NUM thread_num, STR buff)
{ /*STR buff; needed to defeat optimizer */
 cpr = &thread[thread_num];
}
void save_thread_globals()
{
}
int swap_out_thread()
{ long int i;
 if (current_thread == NO_THREAD) return(0);
 save_thread_globals(); 
 i = stack_swap_start - ((STR)&i);
 cpr->c_stack_size = (size_t) i;
 memcpy(cpr->c_stack, ((STR)&i)+1, (size_t) i);
 return(setjmp(cpr->swap_thread_buff));
 /* the setjmp sets the return point where the thread will 
 * resume execution when longjmp() in swap_in_thread() */
}
void swap_in_thread()
{ UCHAR buffer[thread_SWAP_STACK_SIZE];
 /* make sure we are above (below) the swap stack */
 current_thread = run_queue_head;
 setup_thread_globals(current_thread, &buffer[0]);
 memcpy(stack_swap_start - cpr->c_stack_size + 1,
 cpr->c_stack, cpr->c_stack_size);
 longjmp(cpr->swap_thread_buff, 1);
 /* lonjmp() transfers control back to setjmp() in swap_out_thread */
}
void swap_thread()
{ int next_thread;
 next_thread = thread[run_queue_head].next_thread;
 if (next_thread != NO_THREAD)
 { run_queue_head = next_thread;
 thread[run_queue_tail].next_thread = current_thread;
 run_queue_tail = current_thread;
 thread[current_thread].next_thread = NO_THREAD;

 if (!swap_out_thread()) swap_in_thread();
 /* if swap_out_thread() returns 0, this is a return from the
 * call to swap_out_thread and we call swap_in_thread. If it 
 * returns !0 then swap_out_thread is returning from longjmp()
 * in swap_in_thread and task switch has already occurred */
 }
}
void swap_thread_block() /*put current thread to sleep on event*/
{ while (run_queue_head == NO_THREAD) 
 continue;
 /* if this loop is encountered and run queue is empty, process idles
 * until a process is queue either in an interrupt service routine or 
 * signal handler */
 if (current_thread != run_queue_head
 && !swap_out_thread()) 
 {
 swap_in_thread();
 }
}
void queue_thread(THREAD_NUM thread_number)
{
 /* If run queue can be modified by an interrupt service routine or
 * signal handler then code must be added to assure mutual exclusion */
 if (run_queue_tail != NO_THREAD) 
 {
 thread[run_queue_tail].next_thread = thread_number;
 }
 else 
 {
 run_queue_head = thread_number;
 }
 run_queue_tail = thread_number;
 thread[thread_number].next_thread = NO_THREAD;
 thread[thread_number].queue = 'R';
}
void unqueue_thread(UCHAR new_queue)
{ THREAD_NUM thread_num;
 thread_num = run_queue_head;
 if ((run_queue_head = thread[thread_num].next_thread) == NO_THREAD)
 run_queue_tail = NO_THREAD;
 thread[thread_num].queue = new_queue;
}
void free_current_thread()
{ free_thread(current_thread); 
 unqueue_thread('F');
 swap_thread_block(); /* this will never return */
}
int free_thread(THREAD_NUM thread_num)

{ thrd *threadp = &thread[thread_num];
 if (threadp->queue == 'F') return(0);
 if (current_thread == thread_num) current_thread = NO_THREAD;
 return(1);
}









Windows 95 Common Controls


Building blocks for GUI development




Vinod Anantharaman


Vinod is a program manager for Microsoft Office. He can be reached at One
Microsoft Way, Redmond, WA 98052.


Windows 95 includes several GUI building blocks, collectively called "common
controls." These controls are reusable, pretested, standardized software
components ("objects") for constructing GUIs. In fact, the Windows 95 user
interface itself uses these controls extensively, primarily in the Explorer
(an application that groups together the functionality of the Windows 3.1
"managers"--File Manager, Program Manager, Control Panel, and Print Manager). 
Common controls are a set of standard GUI components supported by the
common-control DLL included with Windows 95. Each common control is a child
window that an application uses in conjunction with other windows to perform
input and output tasks. Many of these controls--toolbars, tree controls, spin
boxes, and progress indicators--have been used in applications for some time. 
The common-control DLL defines window classes for common controls. The window
class and corresponding window procedure for each control determine its
appearance, properties, and functionality. To ensure that the common-control
DLL is loaded, you need to call InitCommonControls. You typically create a
common control by specifying the name of the window class in a call to
CreateWindow or CreateWindowEx.
Because common controls are windows, an application can manipulate them using
standard Windows messages, such as WM_SETTEXT or WM_GETFONT. In addition, the
window class of each common control supports a set of control messages that an
application can use to manipulate the control. An application can use any of
the message sending or posting functions to pass messages to the control. Some
common controls have a set of macros that an application can use instead of
the sending or posting functions.
Common controls send notification messages to the parent window when events
(user input, for example) occur in the control. The application relies on
these notification messages to determine what action the user wants it to
take. Most common controls send notification messages in the form of WM_NOTIFY
messages. Figure 1 illustrates the interaction between the application window
and controls.
Typically, the lParam parameter of the WM_NOTIFY message is the address of an
NMHDR structure; see Example 1. The structure contains a notification code and
identifies the common control that sent the notification message. Each common
control has a corresponding set of notification codes. The common-control
library also provides notification codes that can be sent by more than one
type of common control. Table 1 describes the shared codes.


Image Lists


Image lists are bitmap managers coupled with APIs. They provide a means of
managing and drawing these bitmaps, allowing the app to get rid of a lot of
memory device contexts (DCs), bitmap handles, and possibly BitBlt code. Image
lists have complex drawing mechanisms that can easily handle tedious graphics
operations such as drawing transparent bitmaps with masks. The graphical
elements for implementing drag-and-drop are provided. 
The internal means of storage used by the image list is a bitmap, and each
image is simply a portion of a bitmap. The image list always stores individual
images in one row.
Image lists are either nonmasked or masked. Nonmasked image lists consist of a
color bitmap that contains one or more images. Masked image lists consist of
two bitmaps of equal size: a color bitmap containing the images and a
monochrome bitmap containing a mask for each image in the first bitmap. When a
nonmasked image is drawn, it is copied into the target DC. When a masked image
is drawn, the bits of the image are combined with the bits of the mask,
typically to produce transparent areas in the bitmap where the background
color of the target DC shows through. You can specify several drawing styles
when drawing a masked image (for example, the image can be dithered to
indicate a selected object).


Toolbar Common Control


A toolbar control is a control window that contains one or more buttons; see
Figure 2(a). Typically, the buttons in a toolbar are made to correspond to
items in the application's menu, providing an additional, more-direct way for
the user to access an application's commands. Windows 95 provides an optional
toolbar control in all its Explorer windows.
The toolbar control supports docking and windowing functionality. In the
docked state, toolbars are aligned to an edge of the parent window. In a
windowed state, toolbars display the controls in a sizable palette window that
includes a title bar so users can move the control bar around.
Toolbar controls have built-in customization features, including a
system-defined customization dialog box that lets users insert, delete, or
rearrange toolbar buttons. An app determines whether the customization
features are available to users and controls the extent to which the user may
customize the toolbar. 
The CreateToolbarEx function creates a toolbar control and adds an initial set
of buttons to it. You can also create a toolbar control using CreateWindow or
CreateWindowEx by specifying the toolbar-control window class. This method
creates a toolbar that initially contains no buttons. You add buttons by using
the TB_ADDBUTTONS or TB_INSERTBUTTON message. 
The window procedure for a toolbar control automatically sets the size and
position of the toolbar window. The height is based on the height of the
buttons in the toolbar. The width is the same as the width of the parent
window's client area. The CCS_TOP and CCS_BOTTOM styles determine whether the
toolbar lies along the top or bottom of the client area; CCS_TOP is the
default.
The toolbar window procedure automatically adjusts the size of the toolbar
control whenever it receives a WM_SIZE or TB_AUTOSIZE message. An app should
send either of these messages whenever the size of the parent window changes
or after sending a message that requires the size of the toolbar to be
adjusted.
Only one class-specific style is associated with toolbar controls:
TBSYTLE_TOOLTIPS. When you specify this style, the toolbar creates and manages
a tooltip--a small pop-up window that contains a line of text describing a
toolbar button; see Figure 2(b). The tooltip is displayed only when the user
leaves the cursor on a toolbar button for approximately one second. It then
appears near the cursor.
An app that needs to send messages directly to the tooltip control can
retrieve its handle using the TB_GETTOOLTIPS message. An application can
replace the tooltip control of a toolbar using TB_SETTOOLTIPS.
Each button in a toolbar control can include a bitmapped image. A toolbar
stores the information that it needs to draw the images in an internal list.
When calling CreateToolbarEx, you specify a monochrome or color bitmap that
contains the initial images, and the toolbar adds the information to the
internal list of images. You can add additional images by using the
TB_ADDBITMAP message. Each image has a zero-based index (you use an image's
index to associate the image with a button). Windows 95 assumes that all of a
toolbar control's bitmapped images are the same size. You specify the size
when you create the toolbar using CreateToolbarEx. If you use CreateWindow or
CreateWindowEx to create a toolbar, the size of the images is set to the
default dimensions of 16'15 pixels. You can use TB_SETBITMAPSIZE to change the
dimensions of the bitmapped images, but you must do it before adding any
images to the internal list of images.
Each button can display a string in addition to, or instead of, an image. A
toolbar control maintains an internal list that contains all of the strings
available to toolbar buttons. You add strings to the internal list using
TB_ADDSTRING, specifying the address of the buffer containing the strings to
add.
A button's style determines how the button appears and how it responds to user
input. The TBSTYLE_BUTTON style creates a toolbar button that behaves like a
standard push button. A button that has the TBSTYLE_CHECK style is similar to
a standard push button, except that it toggles between the pressed and
nonpressed states each time the user clicks it. You can create groups of
toolbar buttons by using the TBSTYLE_GROUP or TBSTYLE_CHECKGROUP style,
causing a button to stay pressed until the user chooses another button in the
group. The TBSTYLE_SEP style creates a separator between buttons; a button
with this style does not receive user input.
Each button in a toolbar control has a current state. The toolbar updates a
button's state to reflect user actions, such as clicking the button. The state
indicates whether the button is currently pressed or not pressed, enabled or
disabled, hidden or visible, and so on. An application sets a button's initial
state when adding the button to the toolbar, and can change and query the
state by sending messages to the toolbar. An app can use TB_GETSTATE and
TB_SETSTATE to query and set the state of buttons; additional messages query
or set a particular button state.
Each button has a command identifier associated with it. When users select a
button, the toolbar control sends the parent window a WM_COMMAND message that
includes the command identifier of the button (IDM_CUT, IDM_PASTE, and so on).
The parent window examines the command identifier and carries out the command.

A toolbar control keeps track of its buttons by assigning each button a
zero-based position index. An application must specify the index of a button
when sending messages to retrieve information about the button or set the
button's attributes. Position indexes are updated as buttons are inserted and
removed. An app can retrieve the current position index of a button by using
TB_COMMANDTOINDEX. The message specifies the command identifier of a button,
and the toolbar window uses the identifier to locate the button and return its
position index.
All buttons in a toolbar control are the same size. CreateToolbarEx requires
you to set the initial size of the buttons when you create the toolbar. When
you use CreateWindow or CreateWindowEx to create a toolbar, the initial size
is set to the default dimensions of 24'22 pixels. You can use TB_SETBUTTONSIZE
to change the button size, but you must do so before adding any buttons to the
toolbar. TB_GETITEMRECT retrieves the current dimensions of the buttons. When
you add a string to the toolbar that is longer than any current toolbar
string, the width is automatically set to accommodate the longest string in
the toolbar.
You can give a toolbar control built-in customization features by specifying
the CCS_ADJUSTABLE style. The customization features allow users to drag a
button to a new position or remove it by dragging it off the toolbar. A
Customize Toolbar dialog box that allows users to add, delete, and rearrange
toolbar buttons is included. An app can display the dialog box by sending a
TB_CUSTOMIZE message to the control.


Status-Bar Common Controls


A status bar is a horizontal window at the bottom of a parent window that
displays contextual information, usually about the current state of what is
being viewed in the window; see Figure 2(c). The status window can be divided
into parts to display more than one type of information at once. Status bars
are commonly used for descriptive messages about a selected menu or toolbar
button, keyboard state, and time. Windows 95 includes an optional status bar
in each of its Explorer windows.
The default position of a status window is along the bottom of the parent
window, but you can specify the CCS_TOP style. SBARS_SIZEGRIP will include a
sizing grip at the right end of the status window that is used to resize the
parent window.
A status window can have different parts, each displaying a different line of
text. You divide a status window into parts by sending the window an
SB_SETPARTS message. A status window can have a maximum of 255 parts (although
applications typically use far fewer). You can retrieve a count of the parts
and their coordinates in a status window by sending the window an SB_GETPARTS
message. To set the text of any part of a status window, send the SB_SETTEXT
message; to retrieve it, use SB_GETTEXTLENGTH and SB_GETTEXT.

You can define individual parts of a status window to be owner drawn. Then,
for example, you can display a bitmap rather than text, or draw text using a
different font. To define a window part as owner drawn, send SB_SETTEXT to the
status window, specifying the part and the SBT_OWNERDRAW drawing technique.
When SBT_OWNERDRAW is specified, the lParam parameter is a 32-bit
application-defined value that the application can use when drawing the part.
For example, you can specify a font handle, a bitmap handle, a pointer to a
string, and so on. When a status window needs to draw an owner-drawn part, it
sends the WM_DRAWITEM message to the parent window, which is responsible for
drawing the part. 
You can put a status window into "simple mode" by sending it an SB_SIMPLE
message. This is useful for displaying help text for menu items while users
scroll through a menu. The string that a status window displays while in
simple mode is maintained separately from the strings it displays otherwise.
This means you can put the window in simple mode, set its text, and switch
back to nonsimple mode without modifying the text used in either mode, which
is convenient. Simple-mode status windows, however, do not support owner
drawing.


Trackbar Common Control


A trackbar control is a window containing a slider and optional tick marks;
see Figure 2(d). When users move the slider, the control sends notification
messages to indicate the change. This is useful when you want users to select
a discrete value or a set of consecutive values in a range. In Windows 95,
trackbars are used in several control-panel applets--to allow users to set
mouse-pointer speed, keyboard repeat rate, desktop area, and so on.
Trackbar controls can have either a vertical or horizontal orientation. They
can have tick marks on either side, both sides, or neither. They can also be
used to specify a range of consecutive values. These properties are controlled
by using trackbar styles, which you specify when you create the trackbar.
Trackbar controls notify their parent windows of user actions by sending
WM_HSCROLL messages, indicating their similarity with scrollbars.


Progress-Bar Common Controls


A progress-bar control is a window that an application can use to indicate the
progress of a lengthy operation such as file transfer or file copying. It
consists of a rectangle that is gradually filled, from left to right, with the
system highlight color as an operation progresses; see Figure 2(e).
A progress-bar control has a range and a current position. The range
represents the entire duration of the operation, and the current position
represents the progress the application has made toward completing it. The
window procedure uses the range and the current position to determine the
percentage of the progress bar to fill with the highlight color and to
determine the text, if any, to display within the progress bar. Because the
range and current position values are expressed as unsigned integers, the
highest possible range or current position value is 65,535. If you do not set
the range values, the system sets the minimum value to 0 and the maximum value
to 100.
A progress-bar control provides messages that you can use to set the current
position. PBM_SETPOS sets the position to a given value. PBM_DELTAPOS advances
the position by adding a specified value to the current position. PBM_SETSTEP
allows you to specify an increment for a progress-bar control. Subsequently,
whenever you send the PBM_STEPIT message to the progress bar, the current
position advances by the increment that you specified. The step increment
default is 10.


Tab Common Controls


A tab control is analogous to dividers in a notebook or labels in a file
cabinet; see Figure 2(f). Using a tab control, an app can define multiple
pages for the same area of a window or dialog box. Each page consists of a set
of information or a group of controls that the app displays when users select
the corresponding tab.
An app displays the current page in the display area. Typically, an app
creates a child window or dialog box, setting the window size and position to
fit the display area. Given the constraining window rectangle for a tab
control, you can calculate the bounding rectangle of the tab control's display
area using TCM_ADJUSTRECT. 
You can control specific characteristics of tab controls by specifying
tab-control styles. You can cause the tabs to look like buttons by specifying
the TCS_BUTTONS style. Tabs in this type of tab control should serve the same
function as button controls. That is, clicking a tab should carry out a
command instead of displaying a page. Because the display area in a button tab
control is typically not used, no border is drawn around it. 
By default, a tab control displays only one row of tabs. If not all tabs can
be shown at once, the tab control displays an up-down control so that the user
can scroll additional tabs into view. You can cause a tab control to display
multiple rows of tabs, if necessary, by specifying the TCS_MULTILINE style.
Tabs are left-aligned within each row unless you specify TCS_RIGHTJUSTIFY.
A tab control automatically sizes each tab to fit its icon, if any, and its
label. TCS_FIXEDWIDTH sizes all the tabs to fit the widest label, or you can
assign a specific width and height by using TCM_SETITEMSIZE. Within each tab,
by default, the control centers the icon and label with the icon to the left
of the label, but certain styles let you modify this default behavior.
When users select a tab, a tab control sends its parent window notification
messages in the form of WM_NOTIFY messages. The TCN_SELCHANGING notification
message is sent before the selection changes, and TCN_SELCHANGE is sent
afterward. You can process TCN_SELCHANGING to save the state of the outgoing
page, or to prevent the selection from changing, maybe even put up a message
box. TCN_SELCHANGE is typically processed by the app to display the incoming
page in the display area. This might entail changing the information displayed
in a child window. More often, each page consists of a child window or dialog
box. In this case, an app might process this notification by destroying or
hiding the outgoing child window or dialog box and by creating or showing the
incoming child window or dialog box.
Each tab can have an associated icon, which is specified by an index into an
image list for the tab control. When created, a tab control has no image list
associated with it. An app can create one using ImageList_Create and then
assign it to a tab control with TCM_SETIMAGELIST. To retrieve the handle of
the image list currently associated with a tab control, use TCM_GETIMAGELIST.
You can add images to a tab control's image list just as you would to any
other. However, an application should remove images using the TCM_REMOVEIMAGE
message instead of ImageList_Remove. This message ensures that each tab
remains associated with the same image. When a tab control is destroyed, it
destroys any image list associated with it unless you specify the
TCS_SHAREIMAGELISTS window style. This is useful if you want to assign the
same image list to multiple common controls.
If a tab control has the TCS_OWNERDRAWFIXED style, the parent window must
paint tabs by processing the WM_DRAWITEM message. The tab control sends this
message whenever a tab needs to be painted. And as with toolbar buttons, you
can use a tooltip control to provide a brief description for each tab in a tab
control. A tab control that has the TCS_TOOLTIPS style creates a tooltip
control when it is created, and when the tab control is destroyed, it destroys
the tooltip control.


List-View Common Controls


A list-view control is a window that displays a collection of items, each
consisting of an icon and a label. List views provide ways of arranging and
displaying individual items; see Figure 3. 
List-view controls can display their contents in four different views: large
icon, small icon, list, and report. You can change the view type after a
list-view control is created. To retrieve and change the window style, use
GetWindowLong and SetWindowLong. To control item arrangement in icon or small
icon view, specify LVS_ALIGNTOP (the default style), LVS_ALIGNBOTTOM,
LVS_ALIGNLEFT, or LVS_ALIGNRIGHT. You can change the alignment after a
list-view control is created. LVS_ALIGNMASK isolates the window styles that
specify the alignment of items. Additional window styles control other
options.
The icons for list-view items are contained in image lists, which you create
and assign to the list-view control. One image list contains the full-sized
icons used in icon view; another contains smaller versions of the same icons
for use in other views. You can also specify a third image list that contains
state images, which are displayed next to an item's icon to indicate an
application-defined state.
You assign an image list to a list-view control using the LVM_SETIMAGELIST
message, specifying whether the image list contains large icons, small icons,
or state images. You can use GetSystemMetrics to determine appropriate
dimensions for large and small icons in the system, and ImageList_Create to
create the image lists. You can retrieve the handle of an image list currently
assigned to a list-view control using LVM_GETIMAGELIST.
The large- and small-icon image lists typically contain icons for each type of
list-view item. Obviously, you need not create both of these image lists if
only one icon size is used. If you create both image lists, they must contain
the same images in the same order because a single index is used to identify a
list-view item's icon in both image lists. The large- and small-icon image
lists can also contain overlay images designed to be superimposed on item
icons. If a state-image list is specified, a list-view control reserves space
to the left of each item's icon for a state image. An application can use
state images (such as checked and cleared check boxes) to indicate
application-defined item states.
By default, a list-view control destroys the image lists assigned to it when
it is destroyed, but if it has the LVS_SHAREIMAGELISTS window style, the app
is responsible for destroying the image lists when they are no longer in use.
You should specify this style if you assign the same image lists to multiple
list-view controls; otherwise, more than one control might try to destroy the
same image list.
Each item in a list-view control consists of an icon, label, current state,
and application-defined value. One or more subitems can also be associated
with each item. A subitem is a string that, in report view, can be displayed
in a column to the right of an item's icon and label (Size, Kind, and Modified
fields in Figure 3). All items in a list view have the same number of
subitems. Using list-view messages, you can add, modify, retrieve information
about, and delete items. You can also find items with specific attributes.
A callback item is a list-view item for which the application, rather than the
control, stores the text, icon, or both. Although a list-view control can
store these attributes for you, you may want to use callback items if your app
already maintains some of this information. The callback mask specifies which
item-state bits are maintained by the application, and it applies to the whole
control rather than to a specific item. The callback mask is zero by default,
meaning that the control tracks all item states. If an application uses
callback items or specifies a nonzero callback mask, it must be able to supply
list-view item attributes on demand. A list-view control requests any
information it needs to display an item by sending its owner window an
LVN_GETDISPINFO notification message. If item attributes or state bits
maintained by the app change, the list-view control sends its owner window an
LVN_SETDISPINFO notification that enables the application to update its
information. If you change a callback item's attributes, you can use
LVM_UPDATE to force the control to repaint the item. This message also
arranges the list-view control if it has the LVS_AUTOARRANGE style. You can
use LVM_REDRAWITEMS to redraw a range of items by invalidating the
corresponding portions of the list view's client area.
Columns control the way items and their subitems are displayed in report view.
Each column has a title and width, and it is associated with a specific
subitem, subitem zero being the item's icon and label. Unless the
LVS_NOCOLUMNHEADER window style is specified, column headers appear in report
view. Users can click a column header, causing LVN_COLUMNCLICK to be sent to
the parent window. Typically, the parent window sorts the list view by the
specified column when this occurs.
You can use list-view messages to arrange and sort items and to find items
based on their attributes or positions. Arranging repositions items to align
on a grid, but the indexes of the items do not change. Sorting changes the
sequence of items (and their corresponding indexes) and then repositions them
accordingly. You can arrange items only in icon and small icon views, but you
can sort items in any view. To arrange items, use LVM_ARRANGE. You can ensure
that items are arranged at all times by specifying LVS_AUTOARRANGE.
To sort items, use LVM_SORTITEMS to specify an application-defined callback
function that is called to compare the relative order of any two items. By
supplying an appropriate comparison function, you can sort items by label,
subitem, or any other property. You can ensure that a list-view control is
always sorted by specifying the LVS_SORTASCENDING or LVS_SORTDESCENDING window
style. You cannot supply a comparison function when using these window styles.
The list view sorts the items by label in ascending or descending order.
You can find a list-view item with specific properties by using LVM_FINDITEM.
You can find a list-view item that is in a specified state and bears a
specified geometrical relationship to a given item by using the
LVM_GETNEXTITEM message. For example, you can retrieve the next selected item
to the right of a specified item.
Every list-view item has a position and size, which you can retrieve and set
using messages. You can also determine which item, if any, is at a specified
position. The position of list-view items is specified in view coordinates,
which are client coordinates offset by the scroll position.
Unless the LVS_NOSCROLL window style is specified, a list-view control can be
scrolled to show more items than can fit in the client area of the control.
A list-view control that has the LVS_EDITLABELS window style enables users to
edit item labels in place by clicking the label of an item that has the focus.
An application can begin editing automatically using LVM_EDITLABEL. The
list-view control notifies the parent window when editing begins and when it
is canceled or completed. When editing is completed, the parent window is
responsible for updating the item's label, if appropriate. During label
editing, you can get the handle of the edit control used for label editing by
using LVM_GETEDITCONTROL. To limit the amount of text users can enter, send
the edit control EM_LIMITTEXT. You can even subclass the edit control to mask
its input.


Tree-View Controls


A tree-view control is a window that displays a hierarchical list of items,
such as the files and directories on a disk; see Figure 4. Each item consists
of a label and an optional bitmapped image and can have a list of associated
subitems ("child items") that users can expand or collapse.
An item that has one or more child items is called a "parent item." A child
item is displayed below its parent item and indented to indicate that it is
subordinate to the parent. An item that has no parent is at the top of the
hierarchy and is a root item. 

Tree-view controls have a number of styles. The TVS_HASLINES style enhances
the graphic representation of a tree-view control's hierarchy by drawing lines
that link child items to their corresponding parent item. This style does not
link items at the root of the hierarchy. To do so, you need to combine the
TVS_HASLINES and TVS_LINESATROOT styles. 
Users can expand or collapse a parent item's list of child items by
double-clicking the parent item. A tree view that has the TVS_HASBUTTONS style
adds a button to the left side of each parent item, on which users can also
click to expand or collapse the child items. TVS_HASBUTTONS does not add
buttons to items at the root of the hierarchy. To do so, you must combine
TVS_HASLINES, TVS_LINESATROOT, and TVS_HASBUTTONS. Tree controls in Windows
95's Explorer windows have these three styles turned on, and use plus and
minus icons to indicate if a parent item can be expanded or collapsed. As with
list-view controls, a TVS_EDITLABELS style lets users edit the labels of
tree-view items.
You add an item to a tree-view control by sending it the TVM_INSERTITEM
message. At any given time, the state of a parent item's list of child items
can be either expanded or collapsed. When the state is expanded, the child
items are displayed below the parent item; when it is collapsed, they are not.
The list automatically toggles between the expanded and collapsed states when
the user double-clicks the parent item or the button associated with it. An
application can expand or collapse the child items by using TVM_EXPAND. A
tree-view control sends the parent window a TVN_ITEMEXPANDING message when a
parent item's list of child items is about to be expanded or collapsed. The
notification gives an application the opportunity to prevent the change or to
set any attributes of the parent item that depend on the state of the list of
child items. After changing the state of the list, the tree view sends the
parent window a TVN_ITEMEXPANDED message.
When a list of child items is expanded, it is indented relative to the parent
item. You can set the amount of indentation by using TVM_SETINDENT or retrieve
the current amount by using TVM_GETINDENT.
A tree-view control allocates memory for storing each item; the text of the
item labels takes up a significant portion of this memory. If your application
maintains a copy of the strings in the tree-view control, you can decrease the
memory requirements of the control by specifying the LPSTR_TEXTCALLBACK value
instead of passing actual strings to the tree view. Using LPSTR_TEXTCALLBACK
causes the tree view to retrieve the text of an item's label from the parent
window whenever the item needs to be redrawn. To retrieve the text, the tree
view sends a TVN_GETDISPINFO message, and the parent window must provide the
appropriate information. 
An item's initial position is set when the item is added to the tree-view
control using TVM_INSERTITEM, which includes a TV_INSERTSTRUCT structure that
specifies the handle of the parent item and the handle of the item after which
the new item is to be inserted. The second handle must identify either a child
item of the given parent or TVI_FIRST, TVI_LAST, or TVI_SORT. The tree-view
control places the new item at the beginning or end of the given parent item's
list of child items when TVI_FIRST and TVI_LAST are specified. The tree-view
control inserts the new item into the list of child items in alphabetical
order based on the text of the item labels when TVI_SORT is specified. 
You can put a parent item's list of child items into alphabetical order by
using TVM_SORTCHILDREN, which includes a parameter that specifies whether all
levels of child items descending from the given parent item are also put into
alphabetical order.
TVM_SORTCHILDRENCB allows you to sort child items based on criteria that you
define; when you use this message, you specify an application-defined callback
function that the tree-view control can call to determine the relative order
of two child items.
A tree-view control notifies the parent window when the user wants to begin
dragging an item. The parent window receives a TVN_BEGINDRAG message when
users begin dragging an item with the left mouse button and a TVN_BEGINRDRAG
message when users begin dragging with the right button. An application can
prevent a tree-view control from sending these notifications by giving control
the TVS_DISABLEDRAGDROP style. You obtain an image to display during a
dragging operation by using TVM_CREATEDRAGIMAGE. The tree-view control creates
a dragging bitmap based on the label of the item being dragged. Then the tree
view creates an image list, adds the bitmap to it, and returns the handle of
the image list. You must provide the code that actually drags the item. This
typically involves using the dragging capabilities of the image-list functions
and processing the WM_MOUSEMOVE and WM_LBUTTONUP (or WM_RBUTTONUP) messages
sent to the parent window after the drag operation has begun.
If items in a tree-view control are to be the targets of a drag-and-drop
operation, you need to know when the mouse cursor is on a target item. You can
find this out with the TVM_HITTEST message. To indicate that an item is the
target of a drag-and-drop operation, use the TVM_SETITEM message to set the
state to the TVIS_DROPHILITED value. An item that has this state is drawn in
the style used to indicate a drag-and-drop target.


Property Sheets


A property sheet is a window that allows the user to view and edit the
properties of an item; see Figure 5. For example, a spreadsheet application
can use a property sheet to allow users to set the font and other formatting
properties of a cell. A property sheet contains one or more overlapping child
windows called "pages," each containing control windows for setting a group of
related properties. Each page has a tab that users can select to bring the
page to the foreground of the property sheet.
A property sheet and the pages it contains are actually dialog boxes. The
property sheet is a system-defined modeless dialog box that manages the pages
and provides a common container for them. It includes a frame, title bar,
system menu, and the buttons OK, Cancel, Apply Now, and Help. The dialog-box
procedures for the pages receive notification messages when the user selects
the buttons. Each page in a property sheet is an application-defined modeless
dialog box that manages the control windows used to view and edit the
properties of an item. You provide the dialog-box template used to create each
page as well as the dialog-box procedure that manages the controls and sets
the properties of the corresponding item. 
A property sheet sends notification in the form of WM_NOTIFY messages to the
dialog-box procedure for a page when the page is gaining or losing the
activation and when the user chooses an OK, Cancel, Apply Now, or Help button.
The lParam parameter points to an NMHDR structure that includes the window
handle of the property-sheet dialog box. 
A property sheet must contain at least one page, but no more than the value of
MAXPROPSHEETPAGES. Each page has a zero-based index that the property sheet
assigns according to the order in which the page is added to the property
sheet. These indexes are used in messages that you send to the property sheet.
Each page has a corresponding icon and label; the property sheet creates a tab
for each page and displays the icon and label in the tab. If a property sheet
contains only one page, the tab for the page is not displayed. The dialog-box
procedure for a page must not call the EndDialog function. Doing so will
destroy the entire property sheet, not just the page. 
Before creating a property sheet, you must define one or more pages. This
involves filling a PROPSHEETPAGE structure with information about the page,
icon, label, dialog-box template, dialog-box procedure, and so on, and then
specifying the address of the structure in a call to CreatePropertySheetPage.
The function returns a handle of the HPROPSHEETPAGE type that uniquely
identifies the page. To create a property sheet, you specify the address of a
PROPSHEETHEADER structure in a call to PropertySheet; the structure defines
the icon and title for the property sheet and includes a pointer to an array
of HPROPSHEETPAGE handles. When PropertySheet creates the property sheet, it
includes the pages identified in the array. The order of the array determines
the order of the pages in the property sheet. Alternatively, you can specify
an array of PROPSHEETPAGE structures instead of an array of HPROPSHEETPAGE
handles when creating a property sheet. In this case, PropertySheet creates
handles for the pages before adding them to the property sheet. 
When a page is created, the dialog-box procedure for the page receives a
WM_INITDIALOG message. The message's lParam parameter points to the
PROPSHEETPAGE structure used to create the page. The dialog box can save the
information in the structure and use it later to modify the page.
The PropertySheet function automatically sets the size and initial position of
a property sheet. The position is based on the position of the owner window,
and the size is based on the largest page specified in the array of pages when
the property sheet was created. An application can add a page after creating a
property sheet by using the PSM_ADDPAGE message, but the size of the property
sheet cannot change after it has been created, so the new page must be no
larger than the largest page currently in the property sheet. An app removes a
page by using the PSM_REMOVEPAGE message. When you define a page, you can
specify the address of a ReleasePropSheetPageProc callback function that the
property sheet calls when it is removing the page. ReleasePropSheetPageProc
lets you perform cleanup operations for individual pages. 
When a property sheet is destroyed, it automatically destroys all of the pages
that have been added to the property sheet. The pages are destroyed in reverse
order from that specified in the array used to create the pages. To destroy a
page created by CreatePropertySheetPage but not added to the property sheet,
use DestroyPropertySheetPage.
You specify the title of a property sheet in the PROPSHEETHEADER structure
used to create the property sheet. An application can change the title after a
property sheet is created by using the PSM_SETTITLE message. By default, a
property sheet uses the name string specified in the dialog-box template as
the label for a page.
A property sheet can have only one active page at a time: the page at the
foreground of the overlapping stack of pages. Users activate a page by
selecting its tab; an application activates a page by using the PSM_SETCURSEL
message. The property sheet sends the PSN_KILLACTIVE message to the page that
is about to lose the activation. In response, the page should validate any
changes that the user has made to the page. If the page has invalid property
settings and requires additional user input before losing the activation, it
should prevent deactivation and also display a message box that describes the
problem and recommended action. The property sheet sends the PSN_SETACTIVE
message to the page gaining the activation before the page is visible. The
page should respond by initializing its control windows.
The PSN_HASHELP notification requires a page to indicate whether it supports
the Help button; otherwise, the button is disabled. When users choose the Help
button, the active page receives the PSN_HELP notification message. The page
should respond by displaying help information, typically by calling the
WinHelp function.
The OK and Apply Now buttons are similar; both direct a property sheet's pages
to validate and apply the property changes that users have made. The only
difference is that the OK button causes the property sheet to be destroyed
after the changes are applied and the Apply Now button does not.
When users choose the OK or Apply Now buttons, the property sheet sends the
PSN_KILLACTIVE notification to the active page, giving it an opportunity to
validate the user's changes. If the changes are valid, the property sheet
sends the PSN_APPLY notification to each page, directing them to apply the new
properties to the corresponding item. If users' changes are not valid, the
page can display a dialog box informing users of the problem. The Apply Now
button is initially disabled when a page becomes active, indicating that there
are not yet any property changes to apply. When the page receives user input
through one of its controls indicating that the user has edited a property,
the page should send the PSM_CHANGED message to the property sheet. The
message causes the property sheet to enable the Apply Now button. If the user
subsequently chooses the Apply Now button, the page should reinitialize its
controls and then send the PSM_UNCHANGED message to disable again the Apply
Now button.
The property sheet sends the PSN_RESET notification message to all pages when
the user chooses the Cancel button, indicating that the property sheet is
about to be destroyed. A page should use the notification to perform cleanup
operations.


Wizard Property Sheets


A "wizard" is a special type of property sheet that consists of a sequence of
dialog boxes that guide the user through the steps of an operation. Windows 95
uses wizard property sheets to help you set up printers, modems, and other
devices. In a wizard property sheet, the pages do not have tabs, and only one
page is visible at a time. Also, instead of having OK and Apply Now buttons, a
wizard property sheet has a Back button, Next or Finish button, and Cancel
button. Use the PSM_SETWIZBUTTONS message with the PSWIZB_BACK, PSWIZB_NEXT,
and PSWIZB_FINISH flags to tell the property sheet which buttons to enable. As
with standard property sheets, the Help button is included if the page
indicates that it has one in response to the PSN_HASHELP notification.
You create and initialize a wizard property sheet just as you would a standard
property sheet, except that you must include the PSH_WIZARD value in the
dwFlags member of the PROPSHEETHEADER structure. The system ignores the
property-sheet caption member; instead, it puts the label of the current page
in the title bar of the property sheet. When the user switches from one page
to the next, the system updates the title using the label of the current page.

Use the WIZ_CXDLG and WIZ_CYDLG values to set the sizes of the pages in your
wizard property sheet. Doing so ensures that the pages conform to the standard
page size for wizards.
The dialog-box procedure for a page in a wizard property sheet receives all of
the same notification messages as that in a standard property-sheet page, plus
three additional messages: PSN_WIZBACK, PSN_WIZNEXT, and PSN_WIZFINISH. These
are received when users choose the Back, Next, or Finish button. When users
choose the Back or Next buttons, the property sheet automatically advances to
the previous or next page. The system automatically destroys the wizard
property sheet when users click the Finish button.


Column Headings, Spin Boxes, and Rich-Text Boxes


You can use a column-heading control to display properties of a selected
object in a multicolumn list; see Figure 6(a). The control allows you to
define the displayed property and the sort order based on the property for
items in the list.
Spin boxes are text boxes that accept a limited set of discrete ordered input
values; see Figure 6(b). The buttons on the control allow users to increment
or decrement values in the text box.
Users can type a text value directly into the control or use the buttons to
change the value. Pressing the arrow keys also changes the value. You can use
a single set of spin-box buttons to edit a sequence of related text boxes; for
example, time as expressed in hours, minutes, and seconds. The buttons affect
only the text box that currently has the input focus.
A rich-text box supports the same basic text editing support as a standard
text box, and in addition, it supports individual character font and paragraph
formatting properties; see Figure 6(c).


Conclusion


Windows 95 common controls are standardized, well-tested GUI building blocks
that give your applications a consistent look and feel that blends smoothly
with the Windows 95 user interface. Tightly integrated with wrapper classes in
MFC, they can be used in an object-oriented way for easy customization,
letting you build new features on top of them while avoiding costly
reimplementation of built-in functionality.
Figure 1 Interaction between applications and common controls.
Figure 2: (a) Toolbar; (b) tooltip; (c) status bar; (d) trackbar; (e) progress
indicator; (f) tab control.
Figure 3 List-view control in report view.
Figure 4 Tree-view control.
Figure 5 Property sheet.
Figure 6: (a) Column headings; (b) spin control; (c) rich-text box.

Example 1: NMHDR structure. 
typedef struct tagNMHDR {
 HWND hwndFrom; // handle of control sending message
 UINT idFrom; // identifier of control sending message
 UINT code; // notification code; see Table 1
} NMHDR;
Table 1: Shared notification codes.
Code Description 
NM_CLICK User has clicked the left mouse button within the control.
NM_DBLCLK User has double-clicked the left mouse button within the control.
NM_ENDWAIT The control has completed a lengthy operation.
NM_KILLFOCUS The control has lost the input focus.
NM_OUTOFMEMORY The control could not complete an operation because
 not enough memory is available.
NM_RCLICK User has clicked the right mouse button within the control.
NM_RDBLCLK User has double-clicked the right mouse button within the control.
NM_RETURN Control has the input focus, and the user has pressed the ENTER key.
NM_SETFOCUS Control has received the input focus.
NM_STARTWAIT Control has started a lengthy operation.












































Bob as a Macro Processor Library


Turning a tiny OO language into a macro language




Brett Dutton


Brett recently completed his studies at Queensland University of Technology in
Brisbane, Australia. He now heads RAX Information Systems, a consulting firm
specializing in application software, and can be contacted at
bdutton@gil.ipswichcity.qld.gov.au.


How many applications have you written that have never needed modifications
because of user demands, suggestions, or customization? Not many, I'll bet.
Luckily, application macro languages give you and users a way of modifying and
customizing the application without recompiling. 
Macro languages are user-programmable interfaces to applications. They allow
users (and developers) to access internal functionality and to customize or
extend the application. Macro languages are used in many applications,
including emacs, AutoCAD, Brief, WordPerfect, Excel, and Lotus 1-2-3, to name
a few.
Besides adding the ability to customize, macro languages also provide benefits
such as easy prototyping, automated testing, distribution of multiple
configurations, smaller executables, better application design, and easier
upgrades. 
I recently added a macro language to a Syntax Directed Text Editor (STex) as
part of a project at Queensland University of Technology (QUT) in Brisbane,
Australia. Since STex will be used by first-year undergraduates, many factors
influenced the choice of macro language: ease of programming, ease of
implementation, and consistency with the programming models supported by
university teaching methods. 
Among the macro languages we considered were: 
XLisp. Originally written by David Betz and available on most archive sites,
XLisp made the list because it has a proven track record with AutoCAD.
However, Lisp is considered a little difficult for first-year undergraduates
to understand. 
ReXX. The scripting language for VM/CMS, OS/2, and Amiga, ReXX has
implementations available on most platforms, and again, a proven track record.
However, the project coordinator did not want to use ReXX.
Bob. Also written by David Betz and presented in Dr. Dobb's Journal ("Your Own
Tiny Object-Oriented Language," DDJ, September 1991), Bob is a readily
available language with a C/C++ syntax implemented in ANSI C. The modularity
concepts native to C/C++ made Bob particularly desirable to the university.
That the language is object oriented was also a bonus. 


Changes to Bob 


Bob was originally designed by David as a stand-alone interpreter. In 1994, it
was extended into a language for building online conferencing systems (see "An
Online Conferencing System Construction Kit," by David Betz in Dr. Dobb's
Information Highway Sourcebook, Winter 1994). Furthermore, David has since
written a version of Bob that runs as a Windows DLL; see the accompanying text
box entitled, "Callable Bob." By adding an API to the original Bob, my
extensions take a different course. However, to make my implementation of Bob
complete, I had to modify the original code. This article focuses on the
changes I made to the stand-alone Bob interpreter to turn it into a macro
processor library, and on how you can use that library. The complete source
code to the Bob macro processor library is available electronically; see
"Availability," page 3. For details on the Bob language and syntax, refer to
David's 1991 article. 
Originally, Bob was only meant to be run once. One of my first changes was to
make Bob reentrant. The ability to initialize on load-up lets you load up
another file, which in turn will run another initialization, and so on. The
position on the stack is now maintained by the call-return set of functions
rather than set up on the first call.
In addition, I provided remote function access by creating the new data type
DT_RCODE for remote functions. Data values of this type are handled in the
same way as strings, but are used to interface with application functions. The
remote functions are available to Bob after they have been registered with the
Bob API.
Next I made the Bob macro processor its own process so that it works
independently of the application. Communication with the interface is by
socket communication. It uses two sockets: one error socket that interrupts
processing; and a general socket for normal communication.
To extend internal functionality, I extended the number of internal functions
to include a useful subset of C functions; see Table 1. 
I also added the ability to initialize on load-up by using a function of the
same name as the file being loaded up. Finally, I maintained backward
compatibility so that saying "make Bob" will build an executable that works as
originally designed.
Figure 1 represents a model of how the Bob macro processor library works.
Although complicated, it is not necessary for you to fully understand it
unless you would like to modify the Bob library. Luckily, my API hides all the
complexity within a few functions.


Bob API 


Listing One is a sample application that demonstrates use of the Bob API. (The
API itself is fully documented in the bob.h source file.) Although this
particular example was written with the X Toolkit Intrinsics (Xt), you do not
need an X Window-based system to use it. All of Bob's processing is done
through the function BobCheckClient, which polls the Bob communications socket
for traffic and processes any packets that have come through. It returns True
if there is a packet to process, and False if not. In the Xt library, the
function XtAppAddInput "listens" on a socket. Bob has a similar function,
BobBlockWait, that can be passed a socket to listen on. BobBlockWait returns
when traffic arrives on either the Bob comms socket or the passed socket.
The power of Bob is in the external functions defined by the application. This
application defines two functions (message and error), but unlimited external
functions can be added. External function arguments are limited to three data
types: number, string, and NIL. This limitation arises because the data-packet
has only been made to transfer these data types across the socket.
Because Bob is a typeless language, a variable is defined as a VALUE. It is up
to the function to test the type of the values being passed to it and either
coerce the value or reject it as an error.
In examining Listing One, note that:
int BobInitialize (void) initializes the Bob macro processor and returns
either the socket on which to listen or --1 on failure.
int BobLoadFile (char *fname) sends a request for Bob to load and compile a
file. The filename is passed in fname and needs neither a suffix of .bob nor a
path if it is in either the home directory or the path pointed to by the
environment variable BOBLIB. Errors in the compilation are reported to stderr,
and the processing stops. If the file loads successfully, Bob looks for the
function of the same name as the file (illegal characters are replaced by
underscores). If this function exists, it is executed and deleted upon
completion. This function is considered the initialization function. 
int BobAddFunction (char *fname,BobExternFunc func) adds an external function
to Bob. When that function is called by Bob or the application, the function
will execute. The definition of BobExternFunc is typedef int
(*BobExternFunc)(int, VALUE *args, VALUE *etval);. 
int BobExecute (char *name,int cnt,_) executes the passed function name,
passing the arguments to the function. The argument list is in pairs,
indicating the type of the argument and the value. The exception to this case
is when NIL is passed. It has type only, no value. An example of running the
print function with a number of arguments would be BobExecute ("print", 4, DT_
STRING, "Hi", DT_INTEGER, 20, DT_NIL, DT_STRING, " World\n" ); and the result
would be Hi 20nil World. 
char *BobGetString (char *buf, int len, VALUE *val) is a utility function to
extract a string from a value. It is used in remote functions.
int BobGetInteger(VALUE *val) is a utility function to extract the integer
part of a VALUE. 
int BobCheckClient(void) polls the socket, checking if there is anything to
read, and processes one waiting packet. If there are more packets, they will
be ignored until the next call to BobCheckClient.
int BobExtArgs(int argc, int mn, int mx,VALUE *retval) checks the number of
passed arguments. mn is the minimum number of arguments that this function
needs, while mx is the maximum. If there is no minimum number of arguments for
the function, set mn to --1. If there is no maximum, set mx to --1. If mn or
mx is set to --1, the external function must verify argument counts.
int BobExtCheckType(VALUE *arg, int type,VALUE*retval) checks the type of the
passed VALUE and handles or checks the type of arguments. 
int BobReturnValue(VALUE *retval, int type,_) returns a value to the function
that invoked the Bob function call. The use of this function is similar to
BobExecute. 
int BobTalkTerm(void) terminates the Bob macro processor. It's sufficient to
simply exit, but using this function is a little cleaner. 
int BobBreak(char *fmt,_) sends a break signal to Bob. If Bob encounters a
break, it stops any processing and returns to the EventLoop. This function is
useful to break large, slow functions. Its arguments are the same as those for
sprintf-type functions.
void BobBlockWait(int fd) waits for activity on either the Bob sockets or on
the passed socket and returns when activity is detected. None of the packets
are processed. This function was originally used when the application was
written using Xlib, and it was difficult to add a socket to the application.
It was easier to have Bob keep an eye on the X socket and return if there was
activity on the passed socket or on the Bob socket.
When you execute the program in Listing One (its makefile is included with the
complete source code), you'll be presented with a dialog box similar to Figure
2. The Message dialog shows the output of the External function message
defined in the application and registered with Bob using BobAddFunction. The
Error dialog shows the output of the External function error, also defined in
the application and registered with Bob using BobAddFunction. Every
application should define an error function so that Bob has a place to display
errors. The Load file dialog lets the user enter a filename, load the file,
and execute the initialization function. Finally, the Exec function lets the
user type in the function name to execute.

When you start the example application, it will give you a list of functions
to execute and files to load. Try these by typing them into the appropriate
dialog box, then selecting OK (but don't press Return; the application isn't
that smart).


Limitations and Future Enhancements


Bob is not without its limitations. For instance, it currently has limited
memory. The memory size is defined by the macro SMAX (Stack Max Size) defined
in bob.h but it could be changed to use virtual memory. Memory is currently
statically allocated because of garbage collection requirements. 
Bob currently does not support floating-point arithmetic, although this could
be implemented via classes. The internal functions are only a subset of C-type
functions. It is possible to extend the number of internal functions, but it
would be better if the Bob macro processor library stayed free from
application-dependent code. The file bobfcn.c contains all the internal
functions; see Table 1. It may also be necessary in the future to extend the
data types that external functions can handle. 
You can now link Bob into your applications and get all the benefits that a
macro language provides.
Callable Bob
David Betz
David is a DDJ contributing editor and can be contacted through the DDJ
offices.
Last fall I presented a version of my Bob programming language that was
extended to support an object store designed for use as the basis for building
an online conferencing system (see "An Online Conferencing System Construction
Kit," Dr. Dobb's Information Highway Sourcebook, Winter 1994). While that Bob
interpreter was easy to extend with additional built-in functions, it was
essentially a stand-alone interpreter. Bob was the main program, and
extensions were called as subroutines. This works well in many applications
but falls short when trying to use Bob as an extension language for an
existing application.
To solve this problem, I recently designed a version of Bob that runs as a
Windows DLL. Along the way, I separated Bob into several modules so that each
could be used independently. The memory manager is now an independent module
that can be used as the basis for other languages that need a heap with
automatic garbage collection. I've also separated the Bob interpreter from the
Bob compiler, since some applications only need to run already-compiled code.
In fact, I've separated the run-time library from the rest of the interpreter
so it is possible to run programs that only need the intrinsic functions
without any library at all. To make things simpler, I've included all of these
modules in the Bob DLL even though they are logically separate. The complete
source code to this version of Bob is available electronically; see
"Availability," page 3.
Windows DLLs have only a single data segment, even though several applications
may be linked to them at the same time. This was a problem, since Bob had many
global variables, most having to do with the memory manager. My first step in
turning Bob into a callable library was to move all of the globals into
context structures and add a parameter to every function to explicitly pass in
the appropriate context. I created two context structures: the interpreter
context, which contains the bytecode interpreter variables as well as the
memory manager variables; and the compiler context, which contains the
compiler and scanner variables. The compiler context also points to an
interpreter context so that the compiler has access to the memory manager for
creating objects.
Passing the compiler and interpreter contexts into each function explicitly
makes it possible to create more than one context at a time. This allows a
multithreaded program to have multiple threads, all executing Bob programs
independently. It also means several programs linked to the same Bob DLL can
operate without interfering with each other.
Now I'll show how to invoke the Bob DLL to create a simple read/eval/print
loop for Bob expressions. First, it is necessary to create an interpreter
context, as in Example 1(a). The first parameter is the size of the heap, and
the second is the size of the stack. The second line sets up an error handler.
Bob will call this error handler whenever an error occurs passing it an error
code and relevant data.
If you need access to the run-time library functions, that is arranged by
Example 1(b). The second line sets up a handler that the interpreter will call
to get the address of a function handler, given a function name. This is
necessary when the interpreter restores a saved workspace because the saved
workspace format on disk contains only the names of library functions, not
their addresses. This allows saved workspaces to work correctly even after the
DLL has been rebuilt, causing the function-handler addresses to change.
It's now necessary to initialize the compiler, as in Example 1(c). This
creates a compiler context with the specified interpreter context. The numeric
parameters are the sizes of the compiler-bytecode staging buffer and the
literal staging buffer.
The Bob memory manager is a compacting, stop-and-copy garbage collector and
can change the address of objects when a garbage collection occurs. Because of
this, the memory manager must know about all variables that could contain a
pointer to an object in the heap. The variables that the interpreter uses are
contained within the interpreter context structure and can therefore be
located by the memory manager. However, it is sometimes useful for a
memory-manager client to maintain its own pointers into the heap. The Bob
memory manager allows for this by providing the function ProtectPointer to
register an object pointer with the memory manager. This registers the
specified pointer with the memory manager and guarantees that its value is
updated whenever the garbage collector moves the object it points to.
This leaves the read/eval/print loop itself; see Example 2. Bob does all of
its I/O through "streams." A stream is an object with some data and a pointer
to a dispatch table. The dispatch table has pointers to handlers to carry out
various stream operations. At the moment, there are handlers for getting and
putting characters and a handler for closing the stream. The call to
CreateStringStream creates a stream that allows the Bob compiler to read
characters from the string. The interpreter context structure contains
pointers to the standard I/O streams that must be set up by the client of the
Bob DLL. These streams should arrange for characters to be read and written to
the standard input and output of the application.
The call to CompileExpr compiles a single Bob expression and returns a
compiled function which, when called with no arguments, will cause the
expression to be evaluated.
CallFunction calls a function with arguments; see Example 3(a). The arguments
after argumentCount are passed to the specified Bob function. They are of type
ObjectPtr (a pointer to a Bob heap object), and argumentCount indicates their
number. You can also call a Bob function by name, using Example 3(b). 
Of course, the Bob DLL has many other functions. It contains a full compliment
of object-creation and access functions for creating and manipulating objects
of type ObjectPtr, as well as functions to control the interpreter.
This is just my first step in making Bob easier to embed in applications. I
plan to extend the Bob language to support full function closures and optional
arguments. I'll also add a "fast load" format for storing precompiled Bob code
in disk files. This would make it possible to distribute Bob functions without
including the source code, a necessary feature for using Bob to build
commercial applications.
Example 1: (a) Bob interpreter context; (b) accessing the run-time library
functions; (c) initializing the compiler.
(a)
InterpreterContext *ic = NewInterpreterContext(16384,1024);
ic->errorHandler = ErrorHandler;
(b)
EnterLibrarySymbols(ic);
ic->findFunctionHandler = FindLibraryFunctionHandler;
(c)
 CompilerContext *c = InitCompiler(ic,4096,256);
(d)
ObjectPtr val;
ProtectPointer(ic,&val);
Example 2: The read/eval/print loop.
for (;;) {
 printf("\nExpr> ");
 if (gets(lineBuffer)) {
 Stream *s = CreateStringStream(lineBuffer, strlen(lineBuffer));
 if (s)
 {
 val = CompileExpr(c,s);
 val = CallFunction(ic,val,0);
 printf("Value: ");
 PrintValue(ic,val,ic->standardOutput);
 CloseStream(s)
;
 }
 }
 else
 break;
}
Example 3: Calling a Bob function (a) by reference; (b) by name. 
(a)
 ObjectPtr CallFunction(InterpreterContext ic, ObjectPtr function,
 int argumentCount,...);

(b)
 ObjectPtr CallFunctionByName(InterpreterContext ic, char *functionName,
 int argumentCount,...);
Figure 1 How the Bob macro library works.
Figure 2 Sample dialog box.
Table 1: Bob internal functions.
Function Description 
char chr (ascii_value); Converts the ASCII value into a string.
string date_time (); Returns the current date and time in
 the format "Mon Nov 21 11:31:54 1983"
string downcase (string); Converts the passed string to lowercase.
string editcase (string); Converts the passed string to edit case.
 Edit case is where the first character
 after a space is uppercase and the rest
 are lowercase.
val exec_function (fname[,arg1 Executes the passed function name with
 the arguments. The number of arguments
 must be consistent with the function
 that is being called. This function returns
 the value that the function would have
 returned.
int fclose (file); Closes the passed file.
file fopen (fname,mode); Opens the passed file in the mode: r
 for read, w for write.
int gc (); Does a garbage collection.
int getc (file); Returns the next character from the file.
string getenv (envname); Returns the string associated with passed
 environment variable name.
bool keyboard_quit (); Stops the current processing. Returns NIL.
list list_functions (); Returns a list (vector) of function names.
bool load_bob (filename); Loads the Bob macro file. If the file
 is not available, then returns NIL.
 The file is not compiled into memory
 until the current function is processed.
string newstring (size); Returns a blank string of the passed size.
vector newvector (size); Returns a vector of the passed size.
bool nothing (); This function does nothing. It could be
 used for disabling key translations.
int print (val1 [,val2 Prints the passed values to stdout.
[,...valn]]);
int putc (file,char); Puts the passed character to the file.
int sizeof (value); Returns the number of elements in a
 vector or the length of a string,
 or 1 for any other type of value.
int str_to_nam (string); Returns the passed string as a number.
int strchr (string,char); Returns the position of char in
 string. If returns < 0, then
the string was not found.
int strlen (string); Returns the length of the passed
 string. This is an alias for sizeof.
int strstr (string1,string2); Returns the position of string2
 in string1. If < 0 is returned, then
 the string was not found.
string substring Returns the substring starting at
 the position pos for the length len.
 If the length arg is not there, then
 returns the rest of the string. Pos of
 0 is the beginning of the string.
(string,start-pos,[len]);

int system (command); Sends a command to the operating
 system. Returns the OS exit code.
string typeof (value); Returns the type of the passed
 string, which is one of
the following: NIL, CLASS, OBJECT, VECTOR, INTEGER, STRING, BYTECODE, CODE,
DICTIONARY, VAR, FILE.
string upcase (string); Converts the passed string to all uppercase.
string val_to_string (value); Converts any value to the equivalent string
 and returns it.
string version (); Returns the current version string of Bob.

Listing One 

/* example.c: Exemplifies the use of the Bob macro processor library
 * Copyright (c) 1994 Brett Dutton
 *** Revision History: * 13-Dec-1994 Dutton Initial coding
 */

/* Description: */
/* includes */
#include <stdio.h>
#include <stdlib.h>
#include "bob/bob.h"
#include <X11/Intrinsic.h>
#include <X11/StringDefs.h>
#include <X11/Xaw/Label.h>
#include <X11/Xaw/Command.h>
#include <X11/Xaw/Box.h>
#include <X11/Xaw/Dialog.h>

/* macros */
#define APPNAME "example"
#define VERSION APPNAME " 1.0 By Brett Dutton"

/* typedefs */

/* prototypes */
void AppCheckBob ( XtPointer cl_data, int *fid, XtInputId *id );
void Quit ( Widget w, XtPointer cl_data, XtPointer call_data );
void Break ( Widget w, XtPointer cl_data, XtPointer call_data );
void LoadOk ( Widget w, XtPointer cl_data, XtPointer call_data );
void ExecuteOk ( Widget w, XtPointer cl_data, XtPointer call_data );
int Message ( int argc, VALUE *arg, VALUE *retval );
int Error ( int argc, VALUE *arg, VALUE *retval );
void ShowError ( char *msg );

/* variables */
/* these resources are usually external */
static String resources[] = {
 "*example.width: 300",
 "*example.height: 400",
 "*quit*label: Quit",
 "*break*label: Break",
 "*Command*background: green",
 "*message*label: Message:",
 "*message*width: 275",
 "*error*label: Error:",
 "*error*width: 275",
 "*value: ",
 "*load*label: Load file:",

 "*load*loadok*label: Ok",
 "*load*width: 275",
 "*execute*label: Execute function:",
 "*execute*executeok*label: Ok",
 "*execute*width: 275",
};

/* global widgets */
Widget message, errordia, load, loadok, executedia, executeok;

/* functions */
 * Function: main -- main function
 * Returns: Nothing 
 */
void main ( int argc, char *argv[] ) 
{
 XtAppContext app_context;
 Widget topLevel;
 Widget box, quit, brkwid;
 int bobSock;
 /* create a application */
 topLevel = XtVaAppInitialize ( &app_context, APPNAME, NULL, 0, 
 &argc, argv, resources, NULL ); 
 /* create all the buttons and dialogs for the application */
 box = XtVaCreateManagedWidget ( "box", boxWidgetClass, topLevel, NULL ); 
 quit = XtVaCreateManagedWidget ( "quit", commandWidgetClass, box, NULL ); 
 brkwid = XtVaCreateManagedWidget ("break", commandWidgetClass, box, NULL); 
 message = XtVaCreateManagedWidget ( "message",dialogWidgetClass,box,NULL );
 errordia = XtVaCreateManagedWidget ("error", dialogWidgetClass,box, NULL );
 load = XtVaCreateManagedWidget ( "load", dialogWidgetClass, box, NULL ); 
 loadok = XtVaCreateManagedWidget ( "loadok",commandWidgetClass,load,NULL );
 executedia = XtVaCreateManagedWidget ( "execute", dialogWidgetClass, box, 
 NULL );
 executeok = XtVaCreateManagedWidget ( "executeok", commandWidgetClass, 
 executedia, NULL );
 /* set up all the callbacks for the buttons */
 XtAddCallback ( quit, XtNcallback, Quit, 0 );
 XtAddCallback ( brkwid, XtNcallback, Break, 0 );
 XtAddCallback ( loadok, XtNcallback, LoadOk, 0 );
 XtAddCallback ( executeok, XtNcallback, ExecuteOk, 0 );
 /* initialize the bob interface language */
 if ( ( bobSock = BobInitialize ( ) ) < 0 ) {
 fprintf (stderr,"Unable to initialize Bob\n" );
 exit(1);
 }
 /* add this socket to the event loop for monitoring */
 XtAppAddInput ( app_context, bobSock, (XtPointer)XtInputReadMask, 
 AppCheckBob, NULL );
 /* register the external functions with BOB */
 BobAddFunction ( "message", Message );
 BobAddFunction ( "error", Error );
 /* load up the application defaults macros */
 BobLoadFile ( "." APPNAME "rc" ); /* user definitions */
 /* this has just been put in to demonstrate calling BOB functions */
 BobExecute ( "print", 4, DT_STRING, "Hi ", DT_INTEGER, 20,
 DT_NIL, DT_STRING, " World\n" );
 /* create windows for widgets and map them */
 XtRealizeWidget ( topLevel );
 /* loop for events */

 XtAppMainLoop ( app_context );
}
 * Function: AppCheckBob -- The is the work proc called when there is * input
from Bob
 * Returns: Nothing 
 */
void AppCheckBob ( XtPointer cl_data, int *fid, XtInputId *id )
{
 /* Call bob to get the events of the Bob comms socket */
 BobCheckClient ( );
}
 * Function: Quit -- Exits from the windows system
 * Returns: Nothing 
 */
void Quit ( Widget w, XtPointer cl_data, XtPointer call_data )
{
 BobTalkTerm (); /* shutdown comms with Bob */
 exit ( 0 );
}
 * Function: Break -- Exits from the windows system
 * Returns: Nothing 
 */
void Break ( Widget w, XtPointer cl_data, XtPointer call_data )
{
 BobBreak ( "BOB Inturrupted" );
}
 * Function: LoadOk -- The load dialog is complete and to load up file
 * Returns: Nothing 
 */
void LoadOk ( Widget w, XtPointer cl_data, XtPointer call_data )
{
 String str; /* filename to load */
 char msg[500]; /* Error message */
 Arg xargs[1]; /* New value */
 /* get the string and try to load it */
 str = XawDialogGetValueString ( load );
 if ( BobLoadFile ( str ) ) {
 /* clear the box if no error */
 XtSetArg ( xargs[0], XtNvalue, (XtArgVal)"" );
 XtSetValues ( load, xargs, 1 );
 } else {
 /* send an error to the error box */
 sprintf ( msg, "Unable to load file: %s", str );
 ShowError ( msg );
 }
}
 * Function: ExecuteOk -- Execute dialog is complete 
 * Returns: Nothing 
 */
void ExecuteOk ( Widget w, XtPointer cl_data, XtPointer call_data )
{
 String str; /* function to execute */
 char msg[500]; /* error message */
 Arg xargs[1]; /* New value */
 /* get the string and try to load it */
 str = XawDialogGetValueString ( executedia );
 if ( BobExecute ( str, 0 ) ) {
 /* clear the box if no error */
 XtSetArg ( xargs[0], XtNvalue, (XtArgVal)"" );
 XtSetValues ( executedia, xargs, 1 );

 } else {
 /* send an error to the error box */
 sprintf ( msg, "Unable to execute function: %s", str );
 ShowError ( msg );
 }
}
 * Function: Message -- Displays a message in the dialog boc
 * Returns: Tells Bob if it is an error or not
 */
int Message ( int argc, VALUE *arg, VALUE *retval )
{
 char msg[500]; /* message to put in dialog */
 Arg xargs[1]; /* New value */
 /* make sure that there is 1 args */
 /* make sure that it is a string */
 if ( ! BobExtArgs ( argc, 1, 1, retval ) ) return ( FALSE );
 if ( ! BobExtCheckType ( &arg[0], DT_STRING, retval ) ) return ( FALSE );
 
 BobGetString ( msg, sizeof ( msg ), &arg[0] );
 XtSetArg ( xargs[0], XtNvalue, (XtArgVal)msg );
 XtSetValues ( message, xargs, 1 );
 return ( BobReturnValue ( retval, DT_INTEGER, TRUE ) );
}
 * Function: Error -- Displays a error in the dialog boc
 * Returns: Tells Bob is there is an error or not
 */
int Error ( int argc, VALUE *arg, VALUE *retval )
{
 char msg[500]; /* error to put in dialog */
 Arg xargs[1]; /* New value */
 /* make sure that there is 1 args */
 /* make sure that it is a string */
 if ( ! BobExtArgs ( argc, 1, 1, retval ) ) return ( FALSE );
 if ( ! BobExtCheckType ( &arg[0], DT_STRING, retval ) ) return ( FALSE );
 BobGetString ( msg, sizeof ( msg ), &arg[0] );
 ShowError ( msg );
 return ( BobReturnValue ( retval, DT_INTEGER, TRUE ) );
}
 * Function: ShowError -- Displays the passed message in the error dialog box
 * Description:
 * Returns: Nothing 
 */
void ShowError ( char *msg )
{
 Arg xargs[1]; /* New value */
 XtSetArg ( xargs[0], XtNvalue, (XtArgVal)msg );
 XtSetValues ( errordia, xargs, 1 );
}















Portable Screen Handling


ANSI escape sequences for portable screen operations




Matt Weisfeld


Matt is a senior system test engineer at Allen-Bradley Company and is
responsible for the development of test software on VAX/VMS, UNIX, DOS, and
other platforms. He can be contacted at 1938 Aldersgate Dr., Lyndhurst, OH
44124.


The term "portable screen interface" is, in most cases, an oxymoron. Of all
the areas relating to software portability, screen handling is quite possibly
the least portable. This is not surprising, since screen operations are very
hardware (terminal) dependent. Although standards such as X Window are
available, programming in any windowing environment is anything but trivial.
Even curses, a text-based screen-handling package portable to many platforms,
requires some overhead. However, in many applications, simple screen
operations (such as bolding, underlining, and the like) are all that is
required. 
In this article, I'll present methods for using ANSI escape sequences to make
screen operations easier while increasing portability in the process. On the
software side, I'll focus on C; for the hardware, I'll include Intel 386/486,
DEC VAX, Sun, and HP. Source code and programmer notes for Basic, Fortran, and
Pascal are available electronically; see "Availability," page 3.
The C code presented here operates on DOS, VAX/VMS, and UNIX (Sun SLC and
HP-UX). These platforms represent a wide variety of configurations: The VAX
and the Sun SLC use VT100/200/300 terminal emulation, the Sun SLC C compiler
is non-ANSI, and the HP-UX workstation (Apollo series 700) does not support
ANSI escape sequences. While ANSI mode applies to the VT100/200/300 terminals,
VT52 mode is not ANSI compatible. The Fortran, Basic, and Pascal examples were
compiled and executed in the VAX/VMS environment.


Using ANSI Escape Sequences


Escape sequences are at the heart of these screen-handling functions, and I
rely on the ANSI standard whenever possible. (See Table 1 for a list of ANSI
escape sequences.) For example, the ANSI escape sequence for reverse video is
ESC[7m. For C, ESC is \033. Thus the sequence for reverse video is "\033[7m".
The escape sequence, which is an octal value, is set from C by printf
("\033[7m");. All ANSI escape sequences in C begin with the escape character
(\033) and the left bracket. Following the bracket are the parameters specific
to the particular sequence with a semicolon (;) separating one parameter from
another. Finally, there is a single-letter, case-sensitive command name. In
the reverse-video sequence, the m sets graphics mode on. Graphics mode has the
text attributes listed in Table 1(d). Other parameters include foreground and
background colors. 
Creating header-file definitions makes the sequences more readable, as in
scrlibs.h (Listing One). For example, the definition for reverse video is
static char reverse[]={"\033[7m"}; so that the code to set reverse video now
reads printf (reverse);. Figure 1 illustrates graphics-mode definitions.
On platforms with a color monitor (in this case, only my DOS machine supports
color), various color options are available. To bold characters with the
foreground color cyan, for example, use static char bold[]={"\033[36;1m";
where 36 represents the color cyan. Check the DOS manual for the color codes. 
The flexibility of this approach is evident when commands such as printf
("Here %swe%s go again\n", bold, normal); are used to print the word we in
bold.


Non-ANSI Escape Sequences


Not all platforms support the ANSI standard--the HP-UX system, for example,
does not--and the header files and the code must use #ifdefs. HP-UX
definitions are listed in Figure 2. Though different, the HP-UX escape
sequences behave in a manner similar to the ANSI sequences. However, beware of
subtle but significant differences. Some HP-UX platforms do not provide a
bolding mechanism, so this implementation substitutes the half-bright
attribute. Blink-attribute support also depends on the platform. 


Designing the Screen Library


While using printf() to set escape sequences works in many applications, they
sometimes become cumbersome or even unclear. This is especially true when the
screen function requires more than one sequence to perform its task. For
example, suppose you need to set the bold attribute (on and off). In Example
1(a), a call to do_bold() sets the bold attribute for all output to the
screen. Likewise, do_normal() clears all attributes to normal. 
It is important to remember that output is not limited to stdout or even the
screen for that matter. By passing the stream as a parameter, you can use any
stream (such as stderr) for output. Be aware that printing these attribute
sequences to a file will not produce the same effects as printing to the
screen. Try using a filename for the stream and view the results with an
editor or on paper. Instead of getting the anticipated graphics attribute, the
sequences [1mbold video, [0m appear in the file. Example 1(b) is the code for
do_bold(). The file scrlibs.c (Listing Two) contains this and the other
functions that make up the C library described here. The other graphics
functions are almost identical to do_bold(). 
Since C buffers I/O, it is important to include fflush(). Without it, the
attributes may not appear as expected. Not having to execute an fflush() after
each printf() is yet another reason for using functions to set screen
attributes. 


Absolute Cursor Movement


Nongraphics attributes behave in a manner consistent with the graphics
attributes. For example, the ANSI definition to clear the physical screen
(stdout) is "\033[nJ", where n is: 0 (the default), to erase from the cursor
position to the end of the display; 1, to erase from the beginning of the
display to the cursor (inclusive); or 2, to erase the entire display and
return the cursor to home position. 
On some platforms (specifically, the non-ANSI), the screen is cleared from the
current cursor position. If the cursor is halfway down the screen, only the
bottom half of the screen is cleared. To clear the entire screen, the cursor
must be at screen location (0,0). The ANSI escape sequence, in printf format,
for moving the cursor to an absolute screen position is "\033[%d;%dH". Note
that the string includes two %d escape sequences, specifying the screen's line
and column numbers, respectively. For portability, the routines will all place
the cursor at the home position prior to clearing the screen. The ANSI
definitions to clear the screen are in Example 2(a), while the HP-UX
definitions are in Example 2(b).
Another difference between ANSI and HP-UX sequences is the order of the
parameters. In the ANSI sequence, the screen location is mapped by the
coordinates (row,col). However, the HP-UX sequence maps to (col,row).
Fortunately, this is addressed in the function called cursor_pos() and is
hidden from you; see Example 3(a). To place the cursor in position (0,0),
cursor_home() is built; see Example 3(b). The erase_display() function that
clears the screen must use either cursor_home() or cursor_pos(0,0) to place
the cursor in the proper position; see Example 3(c).


Relative Cursor Movement


The cursor_pos() function represents absolute addressing. This means that the
cursor will go directly to the x,y screen coordinates specified. Relative
addressing operates with respect to the current cursor position--either up,
down, left, or right. For example, if the cursor is at row 5, moving the
cursor up two relative positions places the cursor at row 3. Example 4(a)
lists the ANSI sequences for relative addressing, while Example 4(b) lists the
HP-UX sequences. 
The code for cursor_up is in Example 4(c). The parameter move is simply the
number of coordinates that the cursor moves in the up direction. The other
relative addressing functions are similar (see Listing Two).



The C Test Program


Listing Three is test.c, a simple test program that illustrates how the screen
functions operate. After clearing the screen, the program tests the reverse,
bold, and blink attributes. The do_normal() function clears each one prior to
calling the next. Finally, calls to the relative addressing functions move the
cursor around the screen. The C function sleep() provides time to view the
cursor movements.
When building the executable, you must specify the platform by defining either
VMS, HP-UX, SLC, or BCC in the build file (the DCL procedure, makefile, or
integrated environment). Some platforms, like VMS, define these by default. 
All definitions are found in scrlibs.h (Listing One). If a compiler is
ANSI-compliant, then use the directive #define ANSI_COMPLIANT; otherwise, use
the directive #define K_R_COMPILER. Likewise, if the platform supports
ANSIescape sequences, you should use #define ANSI_SEQUENCES. For the HP-UX
platform, however, the directive is #define HPUX_ SEQUENCES. Finally if the
terminal used is a VT100, then use the directive #define VT100.
Keeping all this information in the header file ensures that no definitions
will be omitted or incorrect. It also provides the needed portability.


Conclusion


Although screen-handling operations are not very portable, building screen
libraries can overcome many portability problems. The libraries presented here
give you the flexibility to use the escape sequences in a more straightforward
manner. 
Each platform supports different levels of functionality. For example, the
VT100 terminals support double-height and double-width capabilities, while DOS
does not support character insertions and deletions. In this case, you would
have to write code to perform this task. As always, your specific needs, and
the capabilities of each platform, dictate what must be included in the
library.
Regardless of the implementation, building screen libraries makes basic,
text-based screen operations more portable, more readable, and much easier to
manage.
Table 1: Subset of ANSI escape sequences (p1=integer). (a) Cursor-control
escape sequences; (b) editing escape sequences; (c) page-control escape
sequences; (d) graphics escape sequences (ESC[p1m).
(a)
Cursor Forward (CUF) ESC[p1C
Cursor Backward (CUB) ESC[p1D
Cursor Down (CUD) ESC[p1B
Cursor UP (CUU) ESC[p1A
Cursor Position (CUP) ESC[p1;p2H
Horizontal & Verticle Position (HVP) ESC[p1;p2f
Save Cursor Position (SCP) ESC[s
Restore Cursor Position (RCP) ESC[u
Cursor Position Report (CPR) ESC[p1;p2R
Device Status Report (DSR) ESC[6p1
Wrap at End-of-Line (wrap) ESC[7h
 (no wrap) ESC[7l
(b)
Insert Line (IL) ESC[p1L
Insert/Replace Mode (IRM) (insert) ESC[4h
 (replace) ESC[4l
Delete Character (DCH) ESC[p1P
Delete Line (DL) ESC[p1M
Erase Line (EL) ESC[p1K
Erase Display (ED) ESC[p1J
(c)
Next Page (NP) ESC[p1U
Previous Page (PP) ESC[p1V
(d)
0 All attributes off
1 Bold on
4 Underscore (monochrome display)
5 Blink on
7 Reverse video on
8 Concealed on
Figure 1: Graphics-mode definitions.
static char normal[] = {"\033[0m"};
static char bold[] = {"\033[1m"};
static char blink[] = {"\033[5m"};
Figure 2: HP-UX definitions.
static char bold[] = {"\033&dH"};
static char reverse[] = {"\033&dK"};
static char underline[] = {"\033&dD"};
static char blink[] = {"\033&dA"};
static char normal[] = {"\033&d@"};
Example 1: (a) Setting the bold attribute for screen output; (b) code for
do_bold().
(a)

printf ("this is not bold\n");
do_bold(stdout);
printf ("this is bold\n");
do_normal(stdout)
printf ("this is not bold again\n");
(b)
void do_bold(stream)
FILE *stream;
{
 fprintf (stream, bold);
 fflush(stream);
 return;
}
Example 2: (a) ANSI definitions to clear the screen; (b) HP-UX definitions.
(a)
static char curpos[] = {"\033[%d;%dH"};
static char erasedisp[] = {"\033[J"};
(b)
static char curpos[] = {"\033&a%dx%dY"};
static char erasedisp[] = {"\033J"};
Example 3: (a) The cursor_pos() function; (b) positioning the cursor to the
home position using cursor_home(); (c) erase_display() clears the screen.
(a)
void cursor_pos(row,col)
int row, col;
#ifdef HPUX
 printf (curpos, col,row);
#else
 printf (curpos, row,col);
#endif
 fflush(stdout);
 return;
}
(b)
void cursor_home()
{
 cursor_pos(0,0);
 fflush(stdout);
 return;
}
(c)
void erase_display()
{
 cursor_home();
 printf (clear);
 fflush(stdout);
 return;
}
Example 4: (a) ANSI sequences for relative addressing; (b) the HP-UX
sequences; (c) code for cursor_up.
(a)
static char curfor[] = {"\033[%dC"};
static char curback[] = {"\033[%dD"};
static char curdown[] = {"\033[%dB"};
static char curup[] = {"\033[%dA"};
(b)
static char curfor[] = {"\033&a+%dC"};
static char curback[] = {"\033&a-%dC"};
static char cursedown[] = {"\033&a+%dR"};
static char curseup[] = {"\033&a-%dR"};
(c)

void cursor_up(move)
int move;
{
 int i;
 printf (curup, move);
 fflush(stdout);
 return;
}

Listing One 

/***************************************************
 FILE NAME : scrlibs.h 
 AUTHOR : Matt Weisfeld 
 DESCRIPTION : header file for scrlibs.c 
***************************************************/
#ifdef VMS
#define ANSI_COMPILER
#define ANSI_SEQUENCES
#define VT100
#endif

#ifdef SLC
#define UNIX
#define K_R_COMPILER
#define ANSI_SEQUENCES
#define VT100
#endif

#ifdef BCC
#define DOS
#define ANSI_COMPILER
#define ANSI_SEQUENCES
#endif

#ifdef HPUX
#define UNIX
#define ANSI_COMPILER
#define HPUX_SEQUENCES
#endif

#ifdef ANSI_COMPILER

void do_bold(FILE *);
void do_normal(FILE *);
void do_blink(FILE *);
void do_reverse(FILE *);

void cursor_right(int);
void cursor_left(int);
void cursor_down(int);
void cursor_up(int);
void cursor_pos(int, int);
void cursor_home(void);

void erase_display(void);

#else


void do_bold();
void do_normal();
void do_blink();
void do_reverse();

void cursor_right();
void cursor_left();
void cursor_down();
void cursor_up();
void cursor_pos();
void cursor_home();
void cursor_bottom();

void erase_display();

#endif

#ifdef ANSI_SEQUENCES

#ifdef DOS
static char bold[] = {"\033[36;1m"};
#else
static char bold[] = {"\033[1m"};
#endif
static char normal[] = {"\033[0m"};
static char blink[] = {"\033[5m"};
static char reverse[] = {"\033[7m"};

static char curright[] = {"\033[%dC"};
static char curleft[] = {"\033[%dD"};
static char curdown[] = {"\033[%dB"};
static char curup[] = {"\033[%dA"};
static char curpos[] = {"\033[%d;%dH"};
static char erasedisp[] = {"\033[2J"};

#endif

#ifdef HPUX_SEQUENCES

static char bold[] = {"\033&dH"};
static char normal[] = {"\033&d@"};
static char blink[] = {"\033&dA"};
static char reverse[] = {"\033&dK"};
static char underline[] = {"\033&dD"};

static char curfor[] = {"\033&a+%dC"};
static char curback[] = {"\033&a-%dC"};
static char curdown[] = {"\033&a+%dR"};
static char curup[] = {"\033&a-%dR"};
static char curpos[] = {"\033&a%dx%dY"};

static char erasedisp[] = {"\033J"};

#endif



Listing Two


/***************************************************
 FILE NAME : scrlibs.c 
 AUTHOR : Matt Weisfeld 
 DESCRIPTION : screen handling libraries 
***************************************************/
#include <stdio.h>
#include "scrlibs.h"

void do_bold(stream)
FILE *stream;
{
 fprintf (stream, bold);
 fflush(stream);
 return;
}
void do_normal(stream)
FILE *stream;
{
 fprintf (stream, normal);
 fflush(stream);
 return;
}
void do_blink(stream)
FILE *stream;
{
 fprintf (stream, blink);
 fflush(stdout);
 return;
}
void do_reverse(stream)
FILE *stream;
{
 fprintf (stream, reverse);
 fflush(stdout);
 return;
}
void cursor_right(move)
int move;
{
 int i;

 printf (curright, move);
 fflush(stdout);
 return;
}
void cursor_left(move)
int move;
{
 int i;

 printf (curleft, move);
 fflush(stdout);
 return;
}
void cursor_down(move)
int move;
{
 int i;


 printf (curdown, move);
 fflush(stdout);
 return;
}
void cursor_up(move)
int move;
{
 int i;

 printf (curup, move);

 fflush(stdout);
 return;
}
void cursor_pos(row,col)
int row,col;
{

#ifdef HPUX
 printf (curpos, col,row);
#else
 printf (curpos, row,col);
#endif
 fflush(stdout);
 return;
}
void cursor_home()
{
 cursor_pos(0,0);
 fflush(stdout);
 return;
}
void cursor_bottom()
{
 cursor_pos(23,0);
 fflush(stdout);
 return;
}
void erase_display()
{
 int i;

 cursor_home();
 printf (erasedisp);
 fflush(stdout);
 return;
}



Listing Three

/***************************************************
 FILE NAME : test.c 
 AUTHOR : Matt Weisfeld 
 DESCRIPTION : test file for screen libraries 
***************************************************/
#include <stdio.h>
#include "scrlibs.h"

#ifdef DOS
#include <dos.h>
#endif

main()
{
 erase_display();
 do_bold(stdout);
 printf ("bold video\n");
 do_normal(stdout);
 do_reverse(stdout);
 printf ("reverse video\n");

 do_normal(stdout);
 do_blink(stdout);
 printf ("blink video\n");
 do_normal(stdout);
 printf ("normal video\n");

 printf ("\n");

 cursor_right(20);
 printf ("Right");
 cursor_left(10);
 printf ("Left");
 cursor_down(10);
 printf ("Down");
 cursor_up(5);
 printf ("Up");

 cursor_bottom();
 return;
}






























Efficient MC68HC08 Programming


Reducing cycle count and improving code density




Rand Gray and Deepak Mulchandani


Rand and Deepak develop embedded-system development tools for Motorola's
microcontroller technologies group in Austin, Texas. Deepak can be reached at
rzfj70@email.sps.mot.com. Rand can be reached at rtnm10@email.sps.mot.com.


Although high-level languages such as C have become increasingly popular in
embedded-application development, assembly-language implementations can still
improve code density and reduce cycle counts. The downside is that such
implementations are prone to a wider variety of development errors, more
difficult to read and maintain, essentially nonportable, and generally more
expensive in the overall development effort. 
Implementations in C, on the other hand, provide improved portability and are
easier to maintain and realize. In this article, we introduce some aspects of
embedded programming using the Motorola MC68HC08 microcontroller and present a
portion of the CPU08 instruction set. We also provide some sample programs
written in C to expose the weaknesses of code generated by a typical C
compiler. Although a compiler might support many optimizations, we'll examine
two basic ones: common-subexpression elimination and constant-value
propagation. Our examples will demonstrate how compilers that do not support
simple techniques can incur penalties in terms of code density and cycle
count.


The MC68HC08


The MC68HC08 series of microcontrollers is based around the CPU08 central
processing unit, which can be combined with a selection of peripherals, such
as a serial communications interface (SCI), serial peripheral interface (SPI),
timers, PWMs, A/D converters, RAMs, masked ROMs, EPROMs, and EEPROMs. Although
there are currently just a few MC68HC08 devices, Motorola plans to proliferate
them, as has been done with the MC68HC05 family, which currently has around
160 derivatives.
The CPU08 core allows the designed MCU to be used in a wide variety of
applications. The 68HC08 MCUs will be a good match to many embedded-systems
applications, including stepper-motor control, general-purpose industrial
controls, pagers, HVAC controls, computer peripherals (printers, disk drives,
keyboards, mice, and trackballs), electronic appliances (TVs, cameras,
camcorders, and radios), household appliances, and security systems. The
modularity of the CPU08 core along with the capability to integrate it with a
number of predesigned modules makes possible the design of a custom
microcontroller for your application.
The architecture of the CPU08 consists of an accumulator-based processor with
index registers available for data-access functions. Operands are loaded from
memory and operated upon, and results are written back to memory. The CPU08
instruction set and architecture provide fairly comprehensive support for an
ANSI C compiler implementation. The CPU08 also provides features such as
direct page-addressing instructions, which allow users to manipulate data
utilizing less instruction bytes, and thus saving program space. Table 1
provides a summary of the CPU08 programming registers.
The addressing modes of an instruction set determine how effectively data can
be manipulated using the operations provided. The 68HC08 provides 16
addressing modes for flexibility in data access. Table 2 outlines some of the
major addressing modes. It is always important to utilize all addressing modes
to maximize data-access efficiency.
Figure 1 summarizes the memory map of the MC68HC08 XL36 MCU. The areas of the
memory map marked "unused" indicate sections of the MCU irrelevant to this
article. The memory map for each 68HC08 MCU is different: The data book
available for each 68HC08-based MCU describes the memory map for that
particular MCU. Extended-page addressing allows the user to access data
throughout the memory map of the MCU; that is, from $0000 to $FFFF, since the
MCU supports 16-bit addressing. 
The stack-pointer-relative addressing modes are useful for function calls and
temporary allocation of data. The CPU08 stack pointer can be relocated by the
user under program control to point anywhere in RAM (LDHX and TXS
instructions). If the stack pointer is set to point to a high section of RAM,
then frequently accessed variables can be stored in the direct page-addressing
mode. Instructions are provided to allow storage of the CPU Registers on the
stack (PSHA, PSHX, PSHH), and their complementary operations allow the
retrieval of their value from the stack (PULA, PULX, PULH). The AIS
instruction allows the program to allocate temporary storage on the stack.
Stack-pointer instructions and addressing modes provide extensive support for
implementation of C-style function calls.


Bit Addressing and Manipulation


Memory is a scarce resource on any microcontroller. Often variables are used
for flags. However, these variables have just two possible values such as
On/Off or True/False. Rather than use an entire byte or word of memory to
store this value, it's more efficient to represent these values using a single
bit (0 or 1). Devices and peripherals also require extensive bit addressing
and manipulation. Table 3 lists the CPU08 instructions for bit manipulation.
Because the CPU08 is a memory-mapped I/O processor, all supported devices can
be accessed using the instruction set of the CPU by manipulating preassigned
memory locations, which, on all processors, are all located in the first 256
bytes of the processor's memory map. Therefore, the program can use the fast,
direct-page addressing modes to access and write data to and from the
peripherals.
The bit-addressing modes of the CPU08 can be used to service devices quickly
and efficiently. As an example, consider manipulating the operation of the
PORT A I/O device on the XL36 MCU. PORT A is an 8-bit, general-purpose,
bidirectional I/O port; see Figure 2. It is assigned to address $0000 of the
memory map of the XL36 MCU. The data direction register A (DDRA) can be used
to determine whether each port A pin is an input or an output. 
Setting the DDRA bit to 1 enables the output buffer for the corresponding port
A pin. The BSET instruction can be used to set bit number 4 of the data
direction register A. The BRCLR or BRSET instructions can then be used to
check the bit and perform an operation based on the result. Without these
instructions, the normal flow would be: load value of memory location $0000 in
the accumulator; check the value of the specific value (requires multiple
instructions); and store the manipulated value back (if required).


Interrupt Handling


In many embedded applications, it is not necessary for the MCU to be running
at all times; it may be "put to sleep" in its STOP state, which conserves
power. An event in the system causes an interrupt to wake up the processor.
The MCU then processes the event and reenters the STOP state upon completion
of its task. In real-time systems, interrupts play a vital role, and it is
important to be able to process each interrupt event quickly enough that no
event is missed. "Interrupt latency," the amount of time which elapses between
the occurrence of the interrupting event and the execution of the first
instruction in the interrupt service routine (ISR), is important in all
applications that depend on the use of interrupts.
An interrupt causes program execution to be suspended and processing to be
transferred to the ISR. The MCU saves the CPU register state on the stack so
that upon returning from the ISR, the normal program flow continues with no
disturbance.


Code-Generation Issues


In embedded development, it is critical that a C compiler generate efficient
and optimal code for applications. Unfortunately, a C compiler rarely
generates code that is as efficient as hand-coded assembly language. Here we
examine how a typical C compiler optimizes and generates code for two common
situations: common-subexpression elimination and constant-value propagation.
Table 4 introduces CPU08 instructions that you'll find useful when examining
these examples. Note that many of these instructions support multiple
addressing modes. The CPU08 reference manual describes all the instructions
and their corresponding addressing modes.
Common-subexpression elimination is an optimization performed by the compiler
to get rid of redundant expressions calculating the same value. The compiler
avoids repeated calculation of the same expression by trying to keep the
result in a temporary variable. As shown in Example 1, the value of the
variable y can be determined to be 7. The compiler can then save the value of
y to its new value 7, calculate the value of the expression 2*y+1 (which
equals 15), move the value 15 into the variable x, and move the value 15 into
the variable z. Optimal assembly code for this program might look something
like Example 2 (other variations are possible). As shown, the byte count for
the hand-coded assembly is 15 bytes, and the cycle count for the instructions
is 21 cycles. The instructions in the compiler output use 46 CPU cycles; see
Example 3. The code density is 36 bytes, an increase of 21 bytes over the
hand-coded assembly language.
Constant-value propagation is a technique used by a compiler whereby a
reference to a variable with known contents is replaced by those contents.
Example 4 demonstrates how a compiler, by failing to perform a simple test,
generates useless instructions and consumes unnecessary bytes. This situation
results in dead code (code never executed) in ROM. As shown, the compiler can
determine that the value of variable x is 6 at line 3 of the program. When the
comparison "is x equal to 7" is made, the compiler can determine that the
value of this comparison is False. As a result, the compiler incorrectly
generates instructions for the comparison; see Example 6. Example 5 shows the
equivalent function hand assembled. The useless instructions take up 30 bytes
of ROM and require a total of 40 CPU clock cycles to execute.


Function Calls 


Compilers use the stack to provide temporary scratch storage for data while
the application executes. When translating function calls into assembly
language, the compiler will normally use the stack to pass parameters,
allocating storage for function-defined local variables and the return value.
This approach is feasible on microprocessor systems such as the MC68000, where
the stack space is extensive. On those processors, the C compiler can generate
stack frames for multiple levels of function calls. However, when writing
applications for an MCU that doesn't provide much stack space, compilers can
get into a real bind if they do not properly utilize memory resources such as
the stack.

There are two standard ways to make function calls; which one is used usually
depends upon how the generating compiler decides to implement function calls.
The first method uses the PSHA, PSHX, and PSHH instructions to save the state
of the CPU registers, makes the call to the function using JSR, allocates
temporary space on the stack using the AIS instruction, executes the function
call, and returns and cleans up the stack using the AIS instruction. Notice
that this method of calling functions is extremely stack intensive. On systems
where the RAM resources (including stack) might be a few hundred bytes, this
is definitely not the way to go if the application is modular and flow of
control passes through many functions. 
The second method is to simply save the program counter and make the function
call. Code-generation techniques that do not depend on the state of the CPU
registers do not have to save them on the stack every time they make a
function call. Function calls are then reduced to a regular group of
instructions that require no special setup to execute. This method might be
preferred since the only instruction required to set up the flow of control
between functions is the BSR instruction, which only saves the value of the
program counter on the stack. 
As you can see, efficient and optimal assembly-language generation is not
always guaranteed when using high-level languages. In fact, the compiler must
have a tremendous grasp of the MCU architecture it supports to maximize the
instruction set's potential.
Quick benchmarking techniques allow developers to understand the toolset with
which they are developing. Understanding what your compiler does when it
encounters a specific construct can indicate what the compiler might do if you
use that construct frequently. Some other areas where quick benchmarking can
aid the developer are: logical expressions (check for short-circuit
evaluation), arithmetic expressions, flow control, and loops. In particular,
look for optimizations such as invariant expression removal and the use of
loop-aiding instructions such as CBEQ and DBNZ.
Certain addressing modes are usually provided on MCUs to enable the developer
to perform an operation efficiently. When generating code for
accumulator-based load-store MCUs, compilers sometimes get carried away with
generating load-store instructions and hence do not pay attention to certain
addressing modes. Checking the generated assembly language for load-store
instructions and MCU-provided addressing modes might also help reduce cycle
count and improve code density in your application.


Conclusion


High-level languages free you from such tasks as keeping track of variable
addresses and program partitioning. However, compilers do not always generate
optimal code for an MCU's instruction set. Minimizing code density is
important since useless and redundant segments of code will exhaust all
available ROM and RAM resources. Furthermore, reducing cycle count is
important since it reduces power consumption, especially in battery-powered
applications.
Figure 1 Memory map of a 68HC08 XL36 MCU.
Figure 2 PORT A I/O device on the XL36 MCU.
Table 1: CPU08 programming-registers summary.
Name Size Description 
 (Bits) 
A 8 Accumulator. Used to perform arithmetic
 operations upon data.
CCR 8 Condition code register. Maintains status
 bits on results of previously executed operations.
H 8 Extension index register. Can be used in
 conjunction with X for 16-bit addressing.
PC 16 Program counter. Always points to the instruction
 to be executed.
SP 16 Stack pointer. Can be used for function parameter
 passing and temporary space allocation for data.
X 8 Index register. Can be used to access data at specific
 locations or temporary storage for operands.
Table 2: 68HC08 major addressing modes.
Mnemonic Description 
DIRECT Quick data manipulation within the first 256
 bytes of the MCU memory map.
EXTENDED Data manipulation within the entire 64-Kbyte MCU memory map.
INDEX POINTER RELATIVE Data manipulation with an offset from the X index
register.
INHERENT Single-byte instructions that have
 no associated operand fetch.
IMMEDIATE Data operand contained in the
 bytes following the instruction opcode.
STACK POINTER RELATIVE Data manipulation with an offset from the stack
pointer.
Table 3: CPU08 instructions for bit manipulation.
 Mnemonic Description 
 BSET Bit set.
 BCLR Bit clear.
 BRSET Branch if specified bit is set (the bit is 1).
 BRCLR Branch if specified bit is clear (the bit is 0).
Table 4: Subset of CPU08 instructions.
 Mnemonic Addressing mode(s) Description 
 AIS Immediate Add immediate value to stack pointer.
 CLR Direct Clear memory location.
 CLRA Inherent Clear accumulator.
 CLRH Inherent Clear extension index register H.
 CLRX Inherent Clear index register X.
 LDA Direct, Immediate, Indexed Load accumulator from memory;
A(M).
 ROLX Inherent Least-significant bit = -- Carry bit
Carry bit = most-significant bit.
 STA Direct, Indexed Store accumulator in memory;
(M)A
 STX Direct, Indexed Store index register X in memory;

(M)X
Example 1: Code which has a common subexpression.
 1 main()
 2 {
 3 int x;
 4 int y;
 5 int z;
 6
 7 y = 7;
 8 x = 2*y+1;
 9 z = 2*y+1;
10
11 }
Example 2: Optimal assembly code for Example 1.
Bytes Cycles Instructions
3 4 mov #7, y+1
2 3 clr y
3 4 mov #15, x+1
2 3 clr x
3 4 mov #15, z+1
2 3 clr z
Example 3: Density and cycle-count analysis for compiler-generated code.
Bytes Cycles Compiler-generated output
3 4 mov #7, y+1
2 3 clr y
2 3 ldx y
2 3 lda y+1
1 1 lsla
1 1 rolx
2 2 add #1
2 3 sta x+1
1 1 txa
2 2 adc #
2 3 sta x
2 3 ldx y
2 3 lda y+1
1 1 lsla
1 1 rolx
2 2 add #1
2 3 sta z+1
1 1 txa
2 2 adc #0
2 3 sta z
1 1 rts
Example 4: Program to demonstrate constant-value propagation.
 1 main()
 2 {
 3 int x = 6;
 4 int y;
 5
 6 if (x == 7)
 7 {
 8 y = 2*3+1;
 9 x = 2*y+1
10 }
11 y = x;
12
13 }
Example 5: Optimal assembly code for Example 4.

Bytes Cycles Instructions
3 4 mov #6, x+1
2 3 clr x
3 5 mov x, y
3 5 mov x+1, y+1
1 1 rts
Example 6: Code density and cycle-count analysis of Example 4.
Bytes Cycles Compiler-generated output
3 4 mov #6, x+1
2 3 clr x
2 3 lda x+1
2 2 eor #7
2 3 bne L0005
2 3 lda x
2 3 bne L0005
3 4 mov #7, y+1
2 3 clr y
2 3 ldx y
2 3 lda y+1
1 1 lsla
1 1 rolx
2 2 add #1
2 3 sta x+1
1 1 txa
2 2 adc #0
2 3 sta x
L0005:
3 5 mov x, y
3 5 mov x+1,y+1
1 1 rts

































Role-Based Network Security


Network security at the operating-system level




William F. Jolitz and Lynne Greer Jolitz


Bill and Lynne, the authors of 386BSD, can be contacted at
lynne@bsdserver.ucsf.edu.


With the flood of PC-based systems running TCP/IP, the number of Internet
hosts--now at 3.8 million--has increased dramatically, with a projected 100
million hosts by 1999. But as the now-infamous Morris worm demonstrated,
connecting to the Net brings with it the problem of system security. Of
course, there are facilities and packages such as Kerberos that can mitigate
this risk, but they require overhead to implement and administer. Since most
PC environments are managed by a single person (often the user), system
management is often taken care of by someone who has neither the time nor
expertise to properly implement and maintain security measures. 
"Role-based security" is a mechanism orthogonal to the familiar
authentication, encryption, and threat-detection mechanisms. It is a minimal
mandatory access control (MAC) policy that restricts access with a low-level
abstraction mechanism that is hard, if not impossible, to bypass, yet requires
little knowledge or maintenance from the user. Role-based security derives its
name from the concept of roles used to simplify the description of allowable
access characteristics of a host's user. Coupled with the concept of access
path, roles provide a degree of geographic classification (where the user is
located) to determine a specific role. Consequently, roles determine the scope
of access a user has to files and privileged operations. 


Conventional Approaches to Securing Systems


It is always difficult to balance the need for a secure environment with
convenience and accessibility. Typically, the routes to access are mediated by
"locks" requiring a "combination" (password authentication) and
file-access-mode discretionary access control (DAC). Account passwords provide
a reasonable degree of security when machines are relatively isolated. With
network access and some work, however, an intruder can either "pick" the
combination of an account or intercept the use of the password as it transits
the network. An intruder can also easily "case" large numbers of network host
computers looking for vulnerabilities (stale/common passwords, alterable
system control files, and so forth). In short, machines can be compromised
even if the intruder has no idea where the machine resides or what it looks
like.
Many existing facilities and packages have been enhanced to cope with this
increased threat. Among the most basic packages are Kerberos and COPS, which
plug holes by beefing up existing authentication mechanisms, while attempting
to discover vulnerabilities before the "bad guy" does. Kerberos-extended
network utilities (rlogin, ftp, and telnet) also encrypt communications to
avoid passing information in clear text where it might be intercepted.
Unfortunately, these solutions require additional administrative overhead
because they are yet another facility which must be maintained. Moreover, this
overhead increases with the sophistication of the security mechanisms and the
model of security that it attempts to reach. Failure to attend to the
system-management details required to keep such systems current may compromise
the integrity of the entire system, thereby defeating the purpose of these
facilities in the first place. 


Network-Level Security Elements


Like network-gateway firewalls, role-based security is simple to employ yet
hard to bypass. In effect, the firewall is integrated into the operating
system, so that it surrounds files, sockets, and system facilities. Unlike a
gateway firewall, it does not require administration and monitoring to review
new access requirements and intrusion attempts. The role-based security model
consists of: 
Roles and privileges.
Access path.
Transparency.
Mandatory access control. 
Each of these elements is crucial to creating a comprehensive yet simple
network-level security mechanism. 


Roles and Privileges


In many secure systems, privileges are recorded as bits set in a "list of
privileges" (a bit vector of all the privileges). Most privileges revolve
around system-management functions such as manipulating or adding new devices;
reformatting disks; changing the access, protection, or ownership of a file;
and so forth. As a system is enhanced, more privileges are added. You may even
create privileges merely due to the existence of other privileges as a way of
enabling groups of them or offering special treatment. At some point, the
management or control of these privileges becomes a formidable task in itself
(386BSD has more than 50 distinct privileges). 
Rather than affording such a fine-grain, difficult-to-manage concept, the
list-of-privileges model can be replaced by a "role" model. A role is akin to
an actor's assigned character. Unlike traditional UNIX systems (which have one
ultimate privilege, called the "superuser" or "root account"), systems using
role-based security identify the use of privileges inside the kernel on a
case-by-case basis. The role concept is essential to compressing the
complexities of the model of security, much as the description of a character
at the beginning of a book provides an intrinsic understanding of that
character over the course of the story.
Roles compress the notion of privilege. If new privileges or file-access
rights are added to the system, they are added only to suit the needs of
roles--the roles themselves generally do not change. This means that although
security needs in the operating system become more elaborate, the model stays
"simple" from the user's perspective.
Unlike the UNIX setuserid/setgroupid features that allow another group or user
ID to be used as a proxy, roles are set once by the access path and never
allowed to change--they are independent of these user/group ID concepts. Roles
are then used by lower levels of the kernel to bound access. 


Access Path 


A user accesses information or services from the computer via an "access
path." This may be done through physical access to the computer console, an
attached serial line, or network communications. Roles are defined by the
access path to the computer; thus, the way in which the information is
accessible may differ, depending on the information's destination. Since the
mechanisms that determine path are extremely low level, the bad guy must find
ways to "imitate" access, by either gaining or simulating physical access. 
With a role-based model, the scope of access is limited without increasing
system-management overhead--no user-account profile need be maintained. For
example, to gain fundamental access to management functions (through which
most computers are subverted), you must know not only the passwords but also
the required setting. This geographic determination makes it far more
difficult for an intruder, who must now attempt to subvert all of the
mechanisms that determine knowledge of geographic location. Thus, access path
qualifies a user to access restricted privileges and files in the same way
that a password is used to authenticate a user. 


Transparency


Many attempts at improving operating-system security affect both the operating
system and the host on which it is installed. As such, it may be almost
impossible to remove the security mechanism because it is interwoven into the
system. This makes such security mechanisms less desirable. 

With a role-based model, a high degree of transparency is a requirement;
otherwise it would be too troublesome for a user to consider employing it. The
demand for transparency affects all areas of the role-based security design: 
The role-based security implementation is located in the kernel program
entirely (except for a single utility program), requiring no changes to
utility or application programs.
There are no external interfaces for programmers to subvert or oversee, and no
conflicts with existing industry standards, either de facto or official.
It is entirely independent of other security facilities (encryption,
authentication, intrusion detection).
It requires minimal knowledge to manage and no change to system operation,
network management, or other procedures. 


Mandatory Access Control


Another requirement for a secure system is that it provide a mechanism which
can "guarantee" that certain sensitive files can be accessed only by certain
characters playing these roles--even if a more trusted role is "tricked" by a
lesser one. A MAC policy is thus used to maintain the integrity of the
information by ensuring that any files that may contain elements of the
information be kept at the same restricted access as that of the original
file. Consequently, the system is mandatory in that the computer, not the
user, maintains the restrictions in all cases so that a user who may not be
aware of a program's scope of vulnerability does not inadvertently compromise
the information.
With role-based security, files can be marked so that contents cannot be sent
outside of a specified geographic zone (currently, host, local network, and
external network). Once marked, even if a privileged account is compromised,
the information of a file so marked cannot be obtained outside of its
restricted zone. Files created by programs using such restricted files are de
facto restricted as well, avoiding the accidental release of information
outside of the zone. This effect is achieved without altering the user program
environment or interfering with existing industry standards (such as POSIX).
Thus, it can be added (or removed) from a system without affecting any
existing programs other than the kernel.
MAC is complementary to discretionary access control (DAC), which in a UNIX
system corresponds to the permission attributes of read/write/execute access
methods across user, user group, or system. (Many UNIX systems extend this
concept still further with the Multics-like concept of an Access Control List,
or "acl," which allows access rights to be specified on a per-user basis--this
mechanism also presents advantages and problems of an extreme sort.)
The virtue of MAC is that it is automatically managed--the user need not be
aware of it. Many UNIX systems, in contrast, are compromised simply because
the user does not manually set the access-permission bits properly so that
other users cannot read or write a file. With MAC, the operating system
automatically supplies the restriction.
By reducing management of restricted files to one simple operation that is
further managed by the system automatically in a fail-safe way, the user
obtains the benefits of a secure environment without having to intimately
manage such details. Since the mechanism is implemented at the lowest levels
of abstraction in the system, it is almost impossible to subvert directly.


Network-Level Security Using 386BSD


In the 386BSD 1.0 implementation of role-based security, there are four roles.
The primary role (purposely put into the background) controls the lion's share
of administrative privileges. The secondary role has access to most of the
privileged files, plus the ability to mark/unmark files as "privileged." The
third role can obtain access and mark as privileged a much-lesser grade of
files; it also has the ability to go between systems on the immediate LAN to
which the computer is attached. The last role is the least trusted. Since the
only thing we know for certain is that someone has come in from outside the
LAN, we don't trust him/her with anything privileged. Thus, any attempts to
invoke any substantive privilege or access to a privileged file will not be
successful. What is common among all of these roles is that we identify them
solely by the way in which they gain access to our system, each in varying
degrees of trusted path, with the last being trusted the least.
With 386BSD role-based security, a user assigned either the second or third
role can indicate a "sensitive" file by setting a mode on a directory in which
that file lies. A sensitive file cannot be referenced outside of the scope of
access allowed to those two roles. Thus, a file may not have its contents sent
by any user across the network if it is marked in this manner. If the file is
copied to another file or incorporated into a portion of another file, the new
file by default will likewise be marked at the level of the most sensitive
part incorporated into it.
With this implementation of network-level, role-based security, files can be
restricted to remaining on the host computer (when information should be
restricted to a user's own PC) or within the immediate local area network
(when the LAN is considered to be adequately secure, yet the information
should not be accessible or distributed outside of the LAN organization). 


Advantages and Drawbacks to a Simple Role-Based Model


While role-based security can be extended into a more elaborate arrangement,
its primary advantage is simplicity: It fits easily into any extant modern
operating system, and it is easy to understand and administrate, yet hard to
subvert because it remains so fundamental. This approach avoids the
possibility that as the complexity increases, holes develop that may
ultimately defeat the entire point. Simplicity helps you avoid this.
Another advantage of this simple arrangement is that, since you must be
physically present to install an operating system anyway, it is possible to
incorporate security at this point without requiring additional administrative
hurdles to gain access to simple, straightforward procedures. Easing the
decision-making burden for the user during installation in a "safe" way is
critical, since it is at this time that a user will know the least about what
items to secure. For example, a naive user wishing to prevent any possible
disclosure of personal files need only mark the home directory of his account
as being "not accessible" over a wide area network--the computer itself takes
care of sensitive, system-related files automatically.
The greatest strength of this approach is also its greatest weakness--its
inflexible binding of privilege and access rights with the path of access.
This mechanism is so low level that it is nearly impossible to bypass, even
when you want to read a privileged file via the network. 
Likewise, remote system management is impeded. However, since a
user/administrator generally does his own system administration on-site (there
really are no good network-based system-management tools intended for a single
user/administrator yet), this currently isn't a problem.


Removing Restrictions on Sensitive Files


To remove the restrictions on a particular file, a utility is needed to allow
a user with the appropriate role to remove restrictions. This utility,
however, requires authorization directly from the console through a known
secure path. If the authorization is not supplied, the restriction is not
removed. For example, suppose the bad guy somehow got a user running with a
privileged role to execute something useful for the bad guy--in this case, the
console would unexpectedly request authorization. The attempt at subversion
could be tracked down and the bad guy revealed, all without loss of integrity.
(Note that this is the only "violation" of our mandatory access-control policy
possible and it is again tightly controlled--it is a convenience for operation
only.)


Subverting Role-Based Security


Role-based security, like other security mechanisms, is not foolproof.
However, the roles provide the scope of access to privileges and restricted
files and are governed solely by access path. Clearly, the easiest way to
compromise the system is to gain access to a trusted path. Thus, role-based
security is not intended to deal with "insider" related threats. The implicit
assumption is that physical access to the machine itself is trusted, and
access to the immediate LAN is trusted within limits.


Conclusion


For many people putting PCs on the Internet, security considerations often get
lost in the difficulty of just installing the software, connecting to the
network, and learning the ropes. Role-based security is not intended as a
complete answer for all security needs--but it will give Morris wanna-bes a
harder time and save you aggravation. 













Inside the OLE 2 SDK


Building OLE 2.0 containers 




Ira Rodens


Ira is a software developer specializing in windows and Motif. He can be
contacted through CompuServe at 70711, 2570 or at 301-924-0596.


Object Linking and Embedding (OLE) was originally devised by Microsoft to
allow the creation of compound documents. The idea behind OLE 1.0 was to allow
container documents to embed objects--pictures, graphs, sound clips and the
like--within the document, treating these objects in a black-box fashion. In
this manner, the container document need not know the specific file formats of
the embedded objects--the container merely interfaces with the OLE API, which
loads and passes requests to the object applications. Similarly, client
applications (embedded objects) could write to this API, and by doing so
instantly gain interoperability with a large number of OLE container
applications. 
The OLE 2 specification addresses some of the deficiencies of OLE 1, as well
as extending the OLE 1 specification to a more generalized methodology for
dealing with black-box objects. To accomplish this, OLE 2 treats the
interfaces between objects in a more uniform and general way. For instance,
OLE 1 used DDE for communication, creating synchronization problems between
the client and server tasks. OLE 2 uses lightweight RPCs that rely upon the
Windows message queue, eliminating such problems.
In this article, I'll present techniques for building an OLE client
application using the Microsoft OLE 2 Software Development Kit (SDK). The
application, Collage, allows embedded objects to be placed within the Collage
container, then moved, sized, and saved, illustrating the basic techniques for
incorporating OLE within your applications.


OLE 2 SDK


The OLE 2 SDK, which is delivered on CD-ROM, comes with both debug and
production copies of the OLE 2 DLLs, plus complete documentation in electronic
format. Printed documentation is not provided, although that may be ordered
sepa-rately. In addition, a wealth of sample code and applications are
provided along with various helpful utilities.
The documentation includes Windows help files for OLE, which I found most
useful during development. These help files made it easy to find the correct
function or interface to call, and to plug in the correct parameters during
code development. The SDK additionally provides the complete OLE 2
specification, various useful pieces of background information, and PowerPoint
viewer with slides covering many OLE 2 aspects. All in all, the organization
and online viewing tools make it much easier to wade through the mountains of
documentation.
Microsoft also bundles a number of handy utilities in the SDK. The most useful
in the development of Collage were the IDataObject Viewer and Ole2View
applications. Ole2View lets you view information about all OLE servers
registered on your system, as well as the OLE 2 interfaces supported. The
IDataObject Viewer shows information on the IDataObjects passed during
clipboard transfers and drag-and-drop transfers. Clipboard transfers are a bit
trickier in OLE 2 than in standard Windows. The IDataObject Viewer allows the
content of the IDataObjects used for such transfers to be debugged more
easily.
Other tools include Docfile Viewer, which lists the elements within a docfile;
Automation Test, which tests applications implementing OLE Automation; Class
ID Generator, which generates a unique class ID (more on this in a moment);
Running Object Table Viewer, which shows information about objects in the
Running Object Table, displaying all monikers and "active" Automation objects;
LRPC Message Monitor, which displays the messages passed between two
applications; OLE 2 Free Module, a debugging tool that can be used for freeing
OLE 2.0 DLLs and the modules that loaded them; and Windows Process Status,
used for deleting Windows processes from memory. Finally, there's Windows
Executable, a tool allowing Windows programs to run from a DOS box. As the
documentation points out, this can be particularly useful when a Windows-based
tool needs to be launched from a DOS-based make process.


Interfaces


The most basic concept within OLE 2 is the notion of an interface. Almost all
operations require either obtaining an interface from the outside world or
exposing an interface from your application to the outside world. The standard
interfaces are defined within the API as C++ classes, with the member
functions defined as pure virtual functions. You create interfaces in the
application by deriving inherited classes from the API base class and defining
the virtual functions. All of the interface classes in the API inherit from
the base interface IUnknown. 
The IUnknown interface consists of three member functions: AddRef, Release,
and QueryInterface. All of the interfaces within OLE 2 use reference counting.
When first obtaining a pointer to an interface, the user must call the AddRef
function to increment the object's reference count. When the user finishes
with the interface, the application calls the Release function, causing the
reference count to be decremented. When the reference count reaches zero, the
object is free to destroy the interface. QueryInterface allows you to obtain
an interface pointer for any other interface to the object. Each interface has
a unique interface identifier (IID), a 128-bit number assigned by Microsoft.
When you pass an IID to the QueryInterface function, a pointer to the desired
interface is returned. Therefore, once you find a single interface to an
object, all of the other interfaces can be easily obtained. The QueryInterface
function, like most of the interface member functions, returns an HRESULT
indicating the success or failure of the operation. It also provides a
more-detailed status code, giving interface-specific status information. The
standard macros FAILED and SUCCEEDED determine whether the operation succeeded
or failed; the function GetScode extracts the status code.
Once you have an interface, all of the other interfaces are easy to find. But,
how do you get the first interface? Just as each interface has a unique IID
number, each object class has its own unique interface identifier, or CLSID.
Once the CLSID is known, an OLE API function creates the object and returns an
interface pointer. A registration database maintained by Windows lists the
CLSIDs for each server. When a server application is installed, it lists its
CLSID, the path to its executable, and a human-readable name in this
registration database. This database is accessible through the Windows API,
and the information in it can be viewed by running REGEDIT using the /V
parameter.


Structured Storage


Structured storage, a concept new to OLE 2, allows the creation of a
directory-like structure within a file. Under OLE 1, embedded objects were
stored within a standard DOS file. Since the container was responsible for
managing this file, it could not dynamically shrink or grow as required by the
object. To overcome this problem, OLE 1 resorted to storing the object
information within global memory. The container was then responsible for
saving this information in the appropriate place in its file. With structured
storage, each embedded object controls and manages its own storage, which can
grow and shrink as necessary, without intervention from the container.
In structured-storage terminology, a "storage" is analogous to a directory and
a "stream" represents a file. Structured storage is referenced through the use
of interface pointers, using the same scheme as all other OLE 2 interfaces.
The function StgCreateDocfile opens the file yielding an IStorage interface
pointer. Streams and storages below this root storage are opened using
IStorage::OpenStorage or IStorage::OpenStream. These functions in turn yield
interface pointers to their respective storages and streams. The Release
function closes storages and streams.
The structured-storage model also provides transaction processing. Opening a
storage in transaction mode causes all writes into the storage to go to a
temporary file. The application must call IStorage's Commit function to write
the information permanently in the file. An undo function uses
IStorage::Revert, which removes all information written since the last commit.
This allows the embedded objects to use their storage for maintaining current
data; information is permanently saved only on an explicit Save command from
the user.


Implementing a Container


Collage is a minimal application designed to demonstrate container documents;
it does not include support for linking or in-place editing. Nevertheless, the
application requires approximately 3000 lines of code to implement. The
complete application with source code and executables is available
electronically; see "Availability" on page 3.
The initial steps in creating an OLE application involve increasing the size
of the message queue and initializing the OLE libraries. As OLE 2 utilizes the
Windows message queue for communication between objects, the normal queue size
of eight messages is generally insufficient for OLE applications. The first
step is to increase this size to 96 messages; otherwise, inexplicable crashes
may result while running your application. The next standard step is to call
OleInitialize to initialize the OLE libraries. These two steps are performed
in WinMain before entering the message loop; see Listing One, page 106.
Collage establishes a main window containing the menu bars and decorations and
a child window containing the document. This structure makes it easy to add
toolbars and/or change the document from a single-document interface to a
multiple-document interface. OLE object functionality is embedded in the
OleClientObj class. An OLE client object must have at least an IUnknown
interface, an IOleClientSite interface, and an IAdvise interface; Collage
implements these through the ifUnknown, ifClient, and ifAdvise members of
OleClientObj. IUnknown is the most basic OLE interface, consisting of only
three member functions. It is defined in the OLE SDK as a class having three
pure virtual functions. Collage implements the class by defining a derivative
class, ImpIUnknown, using IUnknown as the base class. ImpIUnknown provides
functionality for the virtual-member functions. For the most part, the
functionality is provided by calling member functions of OleClientObj.
The IOleClientSite interface, implemented as the inherited class
ImpIOleClientSite, allows the embedded object to make various requests of the
container document. These include requests to scroll the object into view and
to show the object in its activated state. The IAdvise interface notifies the
client of events of interest happening to the embedded object. These include
notifications that the object has been saved, that its appearance or
underlying data has changed, or that it has been closed. 
There are three constructors for creating an object. The first creates a new
object; the second reads in an existing object from disk; and the third is
used in paste operations. When a completely new object is created from
scratch, the first thing Collage does is to allocate a substorage for it. It
then writes in basic information such as size, position, and object number.
This information is written in a stream located in the object's substorage.
Since this substorage belongs to the object and the server is free to use this
substorage in any way it sees fit, it must be clearly denoted as belonging to
the client. By convention, this is done by beginning the stream name with a
byte containing \3.
Once these preliminaries are complete, Collage invokes the server to create
the object using the API function OleCreate. Given a CLSID, OleCreate returns
an interface pointer to the object. For reasons never totally specified by
Microsoft, OleSetContainedObject must be called (probably to do some cleanup
left over from OleCreate). An advise loop is set up by calling the advise
function on the OleObject interface. At this point, the object is merely an
initialized empty object of the proper type. The next step is to present it to
the user for editing. This is done by activating the object.
The IOleClientSite interface allows the container to know when the object is
open for edit. It also lets the container know when it wants to do an update
through the SaveObject function. The container saves the object using the
OleSave function and commits the object's storage. The first time an object is
updated, the container asks for the object's desired size. After that, only
the user can resize the object. 
The file can be saved simply by committing the root storage of the file, which
causes all changes to be permanently written to disk. Implementing the "Save
As" function is only a little more difficult. First, each object must be
notified to release its private storage using the IOleObject::HandsOffStorage
function. Then the root storage is copied to the "Save As" root storage and
committed to capture all changes made in the document. Finally, each object's
storage is opened, and the object is notified of its new storage using the
SaveCompleted function. If you're accustomed to conventional file I/O, all of
this may at first seem a bit strange. However, it is both cleaner and easier.



Clipboard Transfers


Data Transfer is performed under OLE 2 using IDataObject objects. This applies
both to clipboard operations and drag-and-drop operations. The IDataObject
provides a uniform method of transferring data between applications and
affords a richer, more flexible environment than the standard Windows
clipboard transfer.
The IDataObject uses a FORMATETC data structure to describe the particular
formats it is capable of rendering. The FORMATETC not only describes a
particular clipboard format but also identifies how the data will be passed.
IDataObjects may pass data by a variety of methods, including global memory,
structured storage objects, bitmaps, metafiles, and text. Due to the use of
the IDataObject, embedded objects may be passed stored on disk in a structured
file format without consuming large chunks of global memory.
An object-descriptor format is also defined for the IDataObject that allows an
object's description to be passed to the application using the object. When
another application actually requests the data, the GetData or GetDataHere
members are called and the data is rendered. The OLE 2 libraries provide the
necessary interfaces so that non-OLE and OLE 1 applications can interact with
the clipboard in the usual way without using IDataObjects.
When the application closes, the IDataObject interface (supported by the
application code) can no longer be used. The function OleFlushClipboard is
provided to remove the IDataObject from the clipboard, leaving the
conventional clipboard formats such as bitmaps, metafiles, and so on.


Conclusion


OLE 2 is based on the notion of interfaces. The hardest part of OLE is
learning the various interfaces and how they interrelate. OLE 2 is a very
powerful tool but its API is large and complex. This article explains some of
the basics of developing OLE applications. Other aspects, such as OLE
automation and OLE controls, follow the same methodology as compound
documents. As OLE becomes more popular, it will become necessary for all
Windows applications to provide at least some OLE functionality.
The OLE 2 SDK possesses a wealth of information. However, it is almost as
large as the entire Windows API, so it is no small undertaking to master it.
Collage, a minimal application designed to demonstrate container documents,
does not include support for linking or in-place editing and requires 3000
lines of code. To provide this functionality in an application requires
considerable work; application frameworks and other program development tools
that can take much of the grunt work out of developing OLE 2 applications are
still sorely needed.


For More Information


OLE 2 Software Development Kit 2.01
Microsoft
One Microsoft Way
Redmond, WA 98052-6399
800-227-4679

Listing One 

#include <windows.h>
#include <ole2.h>
#include <initguid.h>
#include <oleguid.h>
#include <coguid.h>
#include "mainwin.h"
#include "wdres.h"
#include "gdiobj.h"
#include "debug.h"

int bDebug = 0;
FILE *debug = NULL;
char * MainWindowClass = "MainWindow";
char * TextWindowClass = "TextWindow";

HCURSOR hPlain;
HCURSOR hVertArrow;
HCURSOR hHorizArrow;
HCURSOR hSwArrow;
HCURSOR hNwArrow;
HCURSOR hWaitCursor;
HBITMAP hMarkFore;
HBITMAP hMarkBack;
UINT cfNative;
UINT cfEmbedSource;
UINT cfEmbeddedObject;
UINT cfObjectDescriptor;
int pascal WinMain (HINSTANCE hInstance,HINSTANCE hPrevInstance,
 LPSTR lpCmndLine,int nCmndShow)
{
 int nMsg;
 WNDCLASS wndClass;

 MainWindow *Main;
 HACCEL hAccel;
 MSG msg;
 nMsg = 96;
 while (!SetMessageQueue (nMsg)&& (nMsg > 8))nMsg -= 8;
 if (bDebug)
 debug = fopen("debug.txt","w+");
 FPRINTF (debug,"Starting Collage\n");
 if (hPrevInstance == NULL) {
 wndClass.style = 0;
 wndClass.lpfnWndProc = BasicWindow::Dispatch;
 wndClass.cbClsExtra = 0;
 wndClass.cbWndExtra = sizeof (long);
 wndClass.hInstance = hInstance;
 wndClass.hIcon = LoadIcon (hInstance,MAKEINTRESOURCE(ICON_1));
 wndClass.hCursor = LoadCursor (NULL,IDC_ARROW);
 wndClass.hbrBackground = (HBRUSH)GetStockObject (WHITE_BRUSH);
 wndClass.lpszMenuName = MAKEINTRESOURCE(MAIN_MENU);
 wndClass.lpszClassName = MainWindowClass;

 RegisterClass (&wndClass);

 wndClass.style = CS_DBLCLKS;
 wndClass.lpfnWndProc = BasicWindow::Dispatch;
 wndClass.cbClsExtra = 0;
 wndClass.cbWndExtra = sizeof(long);
 wndClass.hInstance = hInstance;
 wndClass.hIcon = NULL;
 wndClass.hCursor = (HCURSOR) LoadCursor (NULL, IDC_CROSS);
 wndClass.hbrBackground = (HBRUSH) GetStockObject (WHITE_BRUSH);
 wndClass.lpszMenuName = NULL;
 wndClass.lpszClassName = TextWindowClass;
 RegisterClass (&wndClass);
 }
 if (FAILED(OleInitialize(NULL))) {
 MessageBox (NULL,"Error...","Cannot initialize OLE 2 LIbrary",
 MB_OKMB_APPLMODALMB_ICONSTOP);
 return (0);
 }
 hAccel = LoadAccelerators (hInstance,MAKEINTRESOURCE(ACC_MAIN));
 hPlain = LoadCursor(NULL, IDC_CROSS);
 hVertArrow = LoadCursor(NULL, IDC_SIZENS);
 hHorizArrow = LoadCursor(NULL, IDC_SIZEWE);
 hNwArrow = LoadCursor(NULL, IDC_SIZENWSE);
 hSwArrow = LoadCursor(NULL, IDC_SIZENESW);
 hWaitCursor = LoadCursor(NULL, IDC_WAIT);
 hMarkFore = LoadBitmap(hInstance,MAKEINTRESOURCE(IBM_MARKFORE));
 hMarkBack = LoadBitmap(hInstance,MAKEINTRESOURCE(IBM_MARKBACK));
 cfNative = RegisterClipboardFormat("Native");
 cfEmbedSource = RegisterClipboardFormat("EmbedSource");
 cfObjectDescriptor = RegisterClipboardFormat("Object Descriptor");
 cfEmbeddedObject = RegisterClipboardFormat("Embedded Object");
 Main = new MainWindow (hInstance,"Collage",nCmndShow);
 Main -> Create();
 while (GetMessage (&msg, NULL, 0, 0)) {
 if (!TranslateAccelerator(Main->GetWindow(),hAccel,&msg)) {
 TranslateMessage (&msg);
 DispatchMessage (&msg);
 }

 }
 OleUninitialize();
 DeleteObject(hMarkFore);
 DeleteObject(hMarkBack);
 DestroyCursor(hPlain);
 DestroyCursor(hVertArrow);
 DestroyCursor(hHorizArrow);
 DestroyCursor(hNwArrow);
 DestroyCursor(hSwArrow);
 DestroyCursor(hWaitCursor);
 FPRINTF(debug,"Ending Collage\n");
 if (bDebug) fclose(debug);
 return 0;
} End Listing

















































Photon and QNX


Visual Basic-like development for a real-time operating system




Peter D. Varhol


Peter is chair of the graduate department of computer science and mathematics
at Rivier College in New Hampshire. He can be reached at
pvarhol@mighty.riv.edu.


Photon is a windowing system specifically designed for the QNX operating
system. Photon is particularly unique because it is a GUI built around a
graphical microkernel. In fact, the Photon microkernel is primarily a resource
manager, creating a graphical event space and managing regions within the
event space and events as they occur between and within regions. QNX refers to
graphical events as "photons." The Photon microkernel is about 20 Kbytes of
code, plus an additional 40 Kbytes of data. Other key parts include shared
libraries, VGA (or other) graphics drivers, and a pen-input/touch-screen
driver. The total amount of memory needed for code and data is about 250
Kbytes.
Interestingly, Photon was designed on the same principles as the QNX
microkernel itself. QNX, originally developed for process-control
applications, is a POSIX-compliant, 32-bit real-time operating system (that
can also be installed as a 16-bit kernel for processors below the 386). Even
though QNX has a UNIX look and feel, it is definitely not UNIX--it uses no USL
source code and requires no UNIX license.
At the heart of QNX is an extremely small (about 10 Kbytes) microkernel that
supports four main functions: interprocess communication, network
communication, process scheduling, and interrupt dispatch. In contrast to
larger operating-system kernels, the QNX microkernel has only 14 system calls.
QNX is a message-passing operating system, utilizing blocking versions of
Send, Receive, and Reply function calls for message management. Messages don't
queue; rather, the message facility is a process-to-process copy, which QNX
claims provides performance comparable to traditional function calls.
Why would you want to do windowing programming on QNX? Primarily to develop
applications for resource-constrained environments--embedded systems,
hand-held computers, and the like. How do you do windowing programming on QNX?
With Photon and its development environment. Of course, Photon doesn't assume
that the resulting software is going to be a windowing system, and it doesn't
include the window manager within the base code. Since most embedded systems
don't include multiple windows, the memory and storage requirements can be
kept as low as possible. The Photon Motif-like window manager adds 30 Kbytes
of code and 64 Kbytes of data.
QNX also comes with an implementation of the X Window System that utilizes the
QNX message-passing scheme for communication between QNX kernels, and TCP/IP
for communication with other X servers. It is a full X11R5 implementation,
with the Motif window manager, scalable fonts, and font server. As a subset of
the X Window System, Photon provides a rich widget library that operates much
like the X widget set. Photon also includes a code-generating, visual
application-development environment called "Application Builder."
What is the purpose of a graphical windowing system on a real-time operating
system like QNX? The answer is in process-control systems that require human
intervention. A manufacturing process, for example, can be monitored and
controlled by a touch screen at a supervisor's station. The X Window System
would form the basis for the touch-screen user interface. For this purpose,
QNX includes touch-screen drivers both for its X server implementation and for
Photon.


Running Photon and Application Builder


QNX runs all of its device drivers as user processes, which makes it possible
to dynamically load drivers when you need them. This is what I did when
loading both X and Photon for the CD-ROM (for X only) and the mouse. The
drivers for these devices were not in my default installation, so I simply
started them from the command line before installing the windowing systems.
The drivers communicate with their devices through the microkernel, the only
part of the operating system that runs in kernel space. You might think that
running device drivers in user space would cause unacceptable performance
penalties, but that doesn't seem to be the case.
For UNIX-like software, Photon is remarkably easy to install. I simply
installed the requisite drivers, called the install program on the floppy or
CD-ROM, and launched. Since QNX runs everything as a process, I only had to
rebuild the kernel when I upgraded the base operating system before starting.
A good example of this ease was when I first used Photon and the Application
Builder would not launch. A cursory glance at the Photon documentation
revealed that I had to run it in Super-VGA mode. All I had to do was return to
the QNX command line, run the Super-VGA driver (my hardware supported up to
1024x768), and return to Photon. This is hardly the exercise in masochism it
would have been under UNIX.
Photon itself is simply a graphical screen with a small Photon icon in the
upper-left corner. In all other respects, it looks like X, with pop-up menus
and a terminal window. You navigate through it much the same as you would in
X, using the right mouse button to bring up the menu and the left button to
make selections.
In many ways, the Application Builder development tool is similar to Visual
Basic. For controls ("objects" in Visual Basic, "widgets" in this
environment), it includes push buttons, bitmaps, toggle buttons, labels, text
boxes, on/off buttons, windows, scroll bars, and (interestingly) signature
boxes. There are also several controls that let you easily navigate the
environment and move controls around on the screen. Figure 1 shows a sample
Application Builder workspace.
The Application Builder works with Watcom's 32-bit C compiler for QNX, which
has long been available for text-based development in the operating system.
The compiler is still the same, since Application Builder generates a set of C
source files, which are then compiled through the Watcom compiler.


Building a Graphical Application


The project I developed with Photon was an online time clock, on which users
would use a mouse or touch screen to log a particular type of activity and the
time it takes to perform that activity. Workers who have to charge the efforts
to different contracts would use such an application. This app seemed
straightforward to implement and was a typical application for a handheld
device. I had done a similar application in Visual Basic and was interested in
comparing the two development environments.
Being used to Visual Basic, I wanted to use a drop box to choose a project,
but there was not one to be had in Application Builder. Instead, the
Programmer's Guide suggested using a text box mapped to a pair of up and down
arrows. When the user clicks on the arrows, the text box can be made to scroll
through a list of text items. The arrows are not an existing widget; rather,
you use a bitmap widget, make the bitmap selectable, and draw an arrow on it
using the drawing tool.
I did the same thing for choosing the amount of time spent on each project,
including in the text list times at 15-minute increments. The user would
simply scroll down the list until the correct time period appeared in the
window. Once the project name and the number of hours had been selected, the
user would click on a Record button. The Record button has an associated C
function which pairs the two values together and writes them to an ASCII file.
Last, I put in a Help button and Help dialog. First, I built the Help dialog
box. Dialog widgets are available in Photon by selecting a Dialogs item from
the Application menu. Once in the dialog window, I used a pane from the
widget-control tool box as a container for the text and placed a Close button
immediately beneath the pane.
Back at the main development screen, I used the Callbacks dialog box
associated with the Help button to set the link type to Dialog, then entered
the name of the ASCII file containing my Help text. This will call that file
and display it in the pane in the Help dialog when the user clicks on the Help
button. This means I can change my Help text without having to change and
recompile the entire application. You could make the Help context specific by
associating a function with the Help button, but I left my Help generic.


Compiling the Application


Application Builder uses C, compiled by the Watcom compiler, to write
event-handling code. While many of the events were the same between my Visual
Basic implementation and the Application Builder version, it was rarely
possible to do a one-to-one translation between the two.
Application Builder also uses the concept of the callback function, rather
than the event handler, in invoking event code. Example 1(a) is sample code
associated with quitting a Visual Basic application. An analogous quit_process
callback function for an Application Builder application might look like
Example 1(b).
The process is the same. Most Windows programmers are familiar with the
callback function. For example, when writing DLLs, the DLL must have a WEP
(Windows Exit Procedure), which is, in effect, a callback. However, most
Windows development languages generate default callbacks, which work unless
you want do something other than simply returning to the calling routine.
I also had to develop functions that manipulated the text lists based on the
user clicking on the arrows, and provide a function with the Record button to
take the data from the text boxes and write them to a file. The event-handling
(that is, callback) code for the arrow buttons is shown in Example 2.
A completed Application Builder application consists of a set of object files,
a set of callback functions, and a number of associated source and header
files necessary to make the application behave properly. Many of the necessary
files are generated by Application Builder for the compile process. These
include a listing of the sources, headers, and object files (similar to a MAK
file in Visual Basic), a make file, function prototypes, and widget
descriptions and links to the appropriate widget libraries.
The compiled application is stand alone; you don't need any other files on the
QNX system to run the application. Unlike Visual Basic, a bulky, run-time DLL
is not necessary. This comes in handy for developing Photon applications for
handheld devices, where memory and storage is at a premium. The executable
file for my time recorder was about 130 Kbytes; this, plus a minimum QNX and
Photon installation, could easily run in 512 Kbytes of memory.


Taking Stock of Photon and Application Builder



While the programming models for Application Builder and Visual Basic are
similar, the terminology is substantially different. Without exposure to X
Window programming concepts, the transition would have been more difficult.
Anyone expecting to work with the Application Builder in the same manner as
Visual Basic will quickly become frustrated.
I am not a fan of the X look and feel, but then, there are very few graphical
desktop managers that I take to naturally. Even with the Application Builder,
programming for Photon seems much like programming for X, right down to
widgets and callback functions.
This doesn't mean that it's difficult, however. I kept working with the fear
that I would come up against an incomprehensible X-like structure, and it
never happened. Application Builder takes over many of the programming chores
you might have to do in X itself. I do a lot of work in Visual Basic, which
clearly has more object classes, more properties, and more events to respond
to, but then, the Windows and Visual Basic environments consume at least 30
Mbytes on my hard disk. As long as you don't set your expectations by these
much bulkier products, you'll appreciate QNX, Photon, and the Application
Builder for what they are--small, elegant tools for working in and developing
applications for process control and real-time systems.


For More Information


Photon Window System
QNX Operating System
QNX Software Systems
175 Terence Matthews Crescent
Kanata, ON
Canada K2M 1W8
613-591-0931
Figure 1 The Application Builder workspace.
Example 1: (a) Code typically used to quit from a Visual Basic app; (b) in
Application Builder, a callback function is used to end the application.
(a)
Sub DoneButton_Click ()
 End
End Sub

(b)
int quit_process(PtWidget_t *widget, void *data, PtCallbackInfo_t *cbinfo)
{
 exit(EXIT_SUCCESS);
}
Example 2: Sample event handler for the bitmap arrow buttons that scroll the
text field.
lint label = 0;
int label_change(PtWidget_t *widget, void *data, PtCallbackInfo_t *cbinfo)
{
char *txt_label, buffer[10];
ptArg_T args[2];
/*Determine which arrow was clicked*/
switch(ApName(widget)) {
 case ABN_prev_label;
 if (--label 0) label = 0
 break;
 case ABN_prev_label;
 if (--label 0) label = 0
 break;
 }
/* Initialize arguments for selecting appropriate text label; set label */
PtSetArg (&args[0], Pt_ARG_TEXT_STRING, &txt_label, 0);
PtGetResources(ABW_label, 1, &args);
label = atoi(txt_label);
if (label 0) label = 0;
if (label 254) label = 254;
PtSetArg(&args[0], Pt_ARG_FILL_LABEL, label Pg_INDEX_LABEL, 0);
PtSetArg(&args[1], Pt_ARG_LABEL, label Pg_INDEX_LABEL, 0);
PtSetResources(ABW_label_rect, 2, &args);
sprintf(buffer, "%d", label);
PtSetArg(&args[0], Pt_ARG_TXT_STRING, buffer, 0);
PtSetResources(ABW_label, 1, &args);
return(Pt_CONTINUE);
}







PROGRAMMING PARADIGMS


Visual Programming and Eastern Medicine




Michael Swaine


January's thesis that programming is unique among professions brought some
thoughtful mail. Several readers agreed with my (obvious) observation that
programmers are both tool users and tool builders, but challenged the
(not-so-obvious) conclusion that I drew from it. They pointed out that this
observation may have more to do with the comparative youth of programming as a
profession than with any unique qualities of the software art, citing examples
to show that the phenomenon of the workers building their own tools is common
in the early stages of most crafts or professions. Only as the profession
matures does the sort of vertical specialization I described creep in, and a
clear distinction grow between the tool makers and the tool users.
And all of this is, my correspondents argued, a Good Thing. It is, in fact,
the main reason that fields like medicine and physics have been able to
advance as impressively as they have.
The argument is compelling, and I accept it. Not only is it compelling, it
clearly applies very nicely to programming. If everyone were to code at all
levels, from the bare metal up to the GUI: 1. Only a few would be able to do
it; 2. it would take a long time to write a program; and 3. software
development wouldn't progress very fast, since everyone would be endlessly
reinventing the wheel. Vertical specialization, with one group of programmers
developing tools that another group uses to develop more powerful tools (that,
perhaps, another group uses to develop even more high-level tools, and so on)
is crucial to progress in the software art. Vertical specialization in
software development is a Good Thing. Point taken.
But the unmixed blessing is as rare as the free lunch. Good Things have their
dark sides, their hidden costs. The skeptic in me wants to know, what's the
hidden cost of vertical specialization?


A Pair o' Docs


In the case of Western medicine, the hidden cost of specialization seems to be
decades of blindness to the virtues of Eastern and traditional medicine. The
Western model of treating the disease differs in a deep way from the Eastern
model of treating the person.
The difference is a classic example of a paradigm clash.
A paradigm is a very broad thing, conceptually. It is a fundamentally
different sort of thing from an algorithm or a heuristic or a plan or a
method. Algorithms, heuristics, plans, and methods are all ways of solving
problems. A paradigm includes ways of solving problems, but it also includes
ways of formulating problems. More than that, it must include (generally very
informal or implicit) decision processes for deciding that something is in
fact a problem. What appears to be a difficult problem in one paradigm may not
be seen as a problem at all in another.
In Western medicine, when the patient has been diagnosed as healthy, the
problem space is empty. There is no disease, there is no problem, there is no
action to be taken. Eastern medicine, with its emphasis on health maintenance,
sees plenty of actions to be taken with a healthy patient. For the Eastern
practitioner, the problem space becomes empty only when the patient is dead.
Not only are the methods of Western and Eastern medicine different, their
goals and their perceived problems are also different.
This characterization of Western and Eastern medicine is a gross
oversimplification, of course. Preventive health measures have always been a
part of Western medicine, too, and Western medicine has now begun to adopt
some approaches of Eastern and traditional medicine. As penicillin and the
wonder drugs of the mid-20th century begin to lose their wondrous properties
due to the adaptations of viruses, Western medical researchers are turning
their attention to certain lichens, like Usnea, long believed to have
health-promoting properties. Even politicians get it: Last year's tragically
fruitless debate over health-care reform included a surprisingly enlightened
emphasis on preventive health care.
But from everything I've read on the subject, the difference between the two
systems of medicine is essentially as I have painted it, even if I have used a
broad brush, and even if Western doctors are now beginning to learn from
Eastern and traditional approaches.
And the paradigm clash was real. Western doctors were not slow to see the
benefits of Chinese and folk medicine simply because they were intellectual
snobs, convinced that these old remedies couldn't be useful. That would be
culture clash, and that was undoubtedly a factor. But the paradigm clash was
something deeper. The Western doctor looked at the practices and writings of
his or her Eastern counterpart and couldn't see that the Eastern doctor was
addressing the problems the Western doctor was facing. The Western doctor
looked for cures for diseases, treatments to relieve annoying symptoms, and
didn't find them, and so dismissed Eastern medicine as ineffective, useless.
All methods are useless when applied to the wrong problems, and when
practitioners are operating in different paradigms, they have different ideas
about what constitutes a problem. As a result, they have no basis on which to
evaluate each others' methods. They have no way to compare results, because
their results are incommensurable, apples and oranges.


A Monkey Wrench in Every Toolkit


And that, I claim, is where the danger in vertical specialization lies.
Western medicine is a mature, vertically specialized profession, with a highly
developed central paradigm of disease treatment.
Or rather, that's the paradigm of the Western doctor. Western drug and
medical-equipment manufacturers have their own paradigms, and the various
specialties of Western medicine, like plastic surgery and dermatology, have
their own variations on the central paradigm. One's paradigms, and
consequently the problems one sees, are defined by one's specialization.
What is true of medicine today could be true of software development in the
future. Different vertical strata of programming, supported by academic
programs that train programmers for a particular stratum, could eventually
lead to a situation in which two programmers from different strata are as
unable to communicate as Western and Eastern medical practitioners recently
were. In fact, the danger could be much greater, because the fact that
programming tools are typically programs themselves makes vertical
specialization much easier in programming than in other professions. There is
no obvious limit to the sequence of programs for building programs for
building programs for_.
That's the dark side: two programmers, possibly working on the same project,
but unable to communicate effectively because they do not see the same
problems.
But doctors are resolving similar problems, and Western medicine is learning
from Eastern. How do such problems get resolved? In theory, by deconstructing
the paradigms, going back to common goals or a common technological basis. You
tear (some of) it down and build it up again.
So I guess I'm not so concerned that every programmer should be able to build
a complex system from assembly-language scratch. But I think that some, if not
all, programmers should be able to tear down the edifices we construct and
start the process of rebuilding from scratch.
I forget who it was who recommended that, with respect to operating systems,
every generation or so we should tear it all down and start over again. In
that spirit, I suggest that every programmer's toolkit ought to include a
monkey wrench to throw in the works. I'm not advocating doing it; it may be
enough to know that we can.


GIGO Gotham and the Batch Cave


I'm leading a double life here in Stately Swaine Manor. At some time every
morning I attire myself like an upstanding citizen and go into town. By "town"
I mean GIGO Gotham. You know, the virtual village, Baudville, the Web, the
Grid, Gopherspace, Cyberspace, the Pleasant Land of Gif, the Internet. I
cruise its dark alleys, thread its broad thoroughfares choked with teeming
humanity. Crime prowls these streets, too, as do strange creatures who hide
behind masks.
But then, we all wear masks.
Time is an ever-present nemesis in Gotham. It expands and contracts and is far
too real and comes in different varieties. Connect time is parceled out in
slices, and it is beyond your control when your next slice is coming.
Meanwhile clock time ticks away in the corner of the screen, scarcely
connected to connect time but joined at the hip to your wallet. Hurry up and
wait is the anthem of Gotham life.


Everything is Simpler in the Batch Cave


Most afternoons I descend into the Manor's lower chambers. Here, beneath the
towers and battlements is the Batch Cave, where I go into batch mode,
descending when the sun is shining and emerging to a dark and chilly Manor.
The Batch Cave is a place where time, in the Gotham sense, does not exist,
where nothing exists but the current project.
If there were only one project, this monomania might accomplish something of
value, but every day I seem to find myself sucked into a different one.
Last month it was Prograph, an intensely visual language for Mac, Windows, and
UNIX systems. This month it was VIP-C.



C You Can See


Visual is in.
There's Visual Basic, of course, but there's also a visual anything else you
care to name. Visual Smalltalk, no less. Digitalk (Santa Ana, CA) has fixed on
the Visual Smalltalk nomenclature because: 1. Its Smalltalk is very visual; 2.
visual tools are the hot tool category in the client-server market that
Digitalk addresses; and 3. Digitalk wants to sell a lot of boxes. Attaching
"visual" to the box is a good way to sell more boxes. Attaching "visual" to
the product is harder, but I'm not suggesting that Digitalk hasn't done it.
That may be a subject for a later column.
This visual programming language (VPL) phenomenon is understandable. It's hard
not to think that programming could be more visual when you work with
object-oriented systems. VPL and OOP have an apparent rightness for one
another, at least if you don't look too deeply. OOP development makes it
easier to think of programs as collections of interacting things, things that
might be visually representable. VPL development suggests strongly that the
functionality of the program is or ought to be encapsulated in those little
pictures, or in objects represented by them. OOP seems to cry out for VPL, and
vice versa.
Of course, every advance in software development has to be considered from the
standpoint of the dominant programming paradigm, C. And C was never intended
to be an OOPL or a VPL. But C continues to evolve.
One direction of that evolution is VIP-C. Visual Interactive Programming C
Development System for Macintosh from Mainstay (Camarillo, CA), that is.
VIP-C has at its heart an interpreter. Working with the interpreter, you can
rapidly knock out applications, and VIP-C will wrap them up with the support
code necessary to produce standalones. The interpreter environment includes a
decent source-level debugger, and it's easy to imagine using the product for
in-house corporate development.
The interpreter is overlaid by a visual-development interface (hence the name)
that lets you grab objects like text boxes, dialog boxes, and buttons and
associate behaviors and properties with them, all without writing a line of
code. As you build, VIP-C produces a flowchart showing what your program looks
like. But the flowchart is not the visual metaphor of VIP-C. The visuality is
all in picking those objects and their properties. The central view that VIP-C
gives you is, surprisingly, not visual at all. It's a collection of windows,
each containing C code. There's a Main window for the Main routine and so
forth. When you pick objects and properties, VIP-C interprets your actions and
produces the lines of code. These lines of code are just as editable as if you
had typed them in; in fact, you can type them in. It would be possible
(although not particularly smart) to use VIP-C as a text editor for writing C
code, ignoring all the visual frills.


But Is It C?


I developed several simple applications with VIP-C. It's easy to work with.
You really can put together an app in a couple of hours. Once you've worked
through the tutorials, you have a reasonable grasp of the tools and how they
work, and of the design cycle. I admit that I have only developed a few simple
programs that spin off from examples supplied as tutorials, but "simple" in
this context means a bare-bones text editor or paint program.
If that were all there were to VIP-C, it might be useful to in-house corporate
developers and hobbyists, but it would be of little interest to commercial
developers. But there are some other features that make VIP-C worth a look.
These all revolve around exporting to other C development environments.
The ability to export code seems like the sort of extra step that
preprocessors always add. VIP-C makes it a little nicer by providing the hooks
so that your traditional C development environment can be tightly integrated
with VIP-C. In effect, a third-party compiler can be hooked in, so what you
have is one big development environment.
And you can move code back into VIP-C for further massaging after you've
tweaked it elsewhere. Specifically, you can import any ANSI-C source code into
a VIP-C project, with some exceptions, such as that inline assembly code is
not allowed.


The Soft Machine


PhonePro is a VPL of another color. This Mac-only product from Cypress
Research (Sunnyvale, CA) is a tool for developing telephony applications:
voice-mail systems, software-only answering machines that walk the caller
through a menu of messages, and software-based fax machines.
I did some PhonePro development a few months ago when my answering machine had
gone south. I'd had enough of malfunctioning hardware and decided to replace
the machine with a piece of software. If I were not so deeply mired in the
programming paradigm I'd probably see some humor in that, but I'm way too far
gone.
PhonePro is unlike VIP-C in that it is very much a flowchart-oriented system.
To create PhonePro applications, you drag and drop icons from a palette onto a
big sheet of virtual paper, and that's your program. You don't have the option
of looking at a textual representation. As with VIP-C, you control the
behavior and properties of these objects by making choices from lists or
clicking buttons in pop-up dialog boxes.
Because the flowchart is the program, PhonePro gives you full flexibility in
moving the icons around to make something comprehensible. You can modularize
the code with goto icons and you can add comments to sections of code or to
individual icons. The links are real visual objects, too, and when you click
on a link (the line connecting one icon with another) you can edit its
properties.


But Is It Programming? 


PhonePro is definitely a rapid-development environment; I was able to put
together a simple answering-machine application in a couple of hours, and
customize it over the next few days into something genuinely useful. Because
the time-critical elements are all prewritten code, the application had all
the speed it required.
But is it programming? I'd have to say that PhonePro is only charitably a
language or development system. It is in no way a universal programming
language; more a scripting system for creating phone software, presented in a
visual programming model.
It does provide all the tools you need for that domain (although you could
think up deviant applications that just wouldn't be possible using PhonePro).
And that makes me wonder, is this the proper use for VPL systems? Particular
domains like telephony applications? It seems that having a clear domain makes
it easier to iconize the elements, because there's more agreement about what
those elements are. You get around one problem of visual systems: that it's
hard to iconize an unlimited domain of objects. Limited domains like telephony
are not as open ended.
Postscript: Sadly, the soft answering machine is not presently in service. It
fell victim to conflict with a piece of hardware. Last week the fax machine
that shares its line went south.
























C PROGRAMMING


DFWrap: The D-Flat C++ Wrapper Classes




Al Stevens


This month's column tolls the final bell for D-Flat subjects. Several years
ago I launched the D-Flat project, a C function library that implements in DOS
text mode the IBM Common User Access (CUA) interface, the standard under which
Windows 3.1 and OS/2 operate. At the time, there were few such libraries, and
the project demonstrated not only the construction of a user-interface
library, but also the underlying message-based, event-driven architecture.
Later, I developed D-Flat++, a C++ version with limited features. D-Flat++ is
not a C++ wrapper, but a rewrite using features of the C++ language to
implement the interface. It lacks several D-Flat features that, at the time, I
thought I would never need. Consequently, every time I started another DOS,
text-mode project that needed menus, dialog boxes, and the mouse, I wrote it
in C and used D-Flat rather than D-Flat++. A case in point is Quincy, the C
interpreter.
A couple of months ago, I got my first Internet account on ddj.com. The host
provides a UNIX text-mode interface, and I decided to gradually shift my
e-mail from CompuServe to that account. The Internet system uses the standard
UNIX mail program, a text-mode operation that you access from a PC with a
communications program. My thoughts turned to a "C Programming" column project
from 1989, where I implemented Smallcom, a DOS communications program with a
C-like script language. I resurrected Smallcom, ran tests to see if it would
compile with contemporary compilers, installed an ancient copy of Turbo C 2.0
when it would not, and wrote a script to get my Internet mail. The program
works, but the interface is not as intuitive as it used to be. I decided to
rewrite the program and once again chose D-Flat as the operating platform and
C as the language.
The development of the new program, called "IMail," proceeded normally except
that the C++ paradigm kept nagging at me from the background. Every time I hit
a design or coding decision that involved anything complex, I would be
reminded that the job would be ever so much easier in C++. Changing to
D-Flat++ was not the answer. I wanted several D-Flat features that I left out
of D-Flat++. Ah, the precious wisdom of unerring hindsight. As the program
neared completion, I revealed to two programmer friends that I had actually
started this recent project in C. They shook their heads and looked solemnly
and knowingly, first at one another and then at the ground. They commented in
unison and in a low whisper, "I couldn't live like that."
Their remarks and their shared pity for me and my plight made me go away and
think. They are right. I am wrong. This masochistic regression must be put in
check. To action.
The next day saw a change of course. I put a C++ wrapper around D-Flat and
rewrote IMail in C++ (not all in the same day, mind you). The C++ wrapper,
which I called "DFWrap" for want of a better name, is the subject of this
column. The project called for some minor modifications to D-Flat to support
the wrapper, and I used that opportunity to install other small enhancements
and bug fixes. DFWrap requires the release of one more, and, I hope, final
version of D-Flat. 
The wrapper has several advantages over D-Flat++. First, D-Flat uses a
technique for describing menus and dialog boxes that resembles the resource
languages of Windows programs. Instead of a resource compiler, D-Flat uses
preprocessor macros to convert the resource language into initialized arrays
of C structures. D-Flat++ uses class definition and inheritance as the
resource language. I discovered early on that the earlier technique is more
intuitive, at least to my eye, even though the approach uses the dreaded,
scorned, and hated preprocessor. I felt better about doing it when I saw that
Microsoft used the same technique to implement message maps in Visual C++
windows programs. A second advantage is that D-Flat supports its own version
of the multiple-document interface. Third, because I have been using D-Flat
exclusively for several years, it is better tested, and I trust it more.
Fourth, a reader named Ed Diener ported D-Flat to the 32-bit, protected-mode
Watcom compiler, a preferred platform for DOS programs. Finally, I weighed the
effort required to write the wrapper against that required to incorporate all
those features into D-Flat++, and the wrapper effort won hands down.


DFWrap Applications


You write a DFWrap application by designing your menus and dialog boxes the
old D-Flat way and deriving an applications class from the dfApplication
class; see Example 1(a).
The MAPDEF statement defines the presence of the application's message map.
Other classes derived from other DFWrap base classes can have their own
message maps. The message map itself is defined in the CPP source-code file
for the class, as in Example 1(b).
It will be obvious to Windows programmers that the MSGMAP implementation
resembles that of MFC. The MSG entries may be COMMAND message-identification
codes or message codes themselves such as CLOSE_WINDOW (as the example shows),
LEFT_BUTTON, KEYBOARD, and so on. The function names are member functions of
the class. Message-connected functions are defined in Example 1(c).
The message functions return true if they fully process the message and want
to prevent it from being passed up the window hierarchy; otherwise they return
false. A message function can cause the message to be processed by higher
windows by calling the DispatchBaseDFMessage function and passing the message
identification as a parameter.
The application's main function declares an object of the derived
dfApplication class and calls its dfApplication::Run() member function; see
Example 2(a). The application defines dialog boxes by deriving classes from
the dfDialog class and providing message-map entries for the dialog's
controls, as in Example 2(b).
The definition of the dialog box itself is the same as in D-Flat, with the
preprocessor resource language. The definition is accompanied by the message
map and member functions that process the dialog box; see Example 3(a). The
application launches modal and modeless dialog boxes as in Example 3(b). 


DFWrap Source Code


Listing One is dfwrap.h, the header file that an application includes to use
the wrapper classes. It defines the ubiquitous bool type and includes dflat.h
as an extern "C" header file. Then it includes the other header files that
declare the wrapper classes.
Listing Two is dfrect.h, a utility window rectangle class that provides an
object-oriented interface to the rectangles that D-Flat manages.
Listings Three and Four are dfwnd.h and dfwnd.cpp, which declare and define
the dfWindow class, the highest class in the DFWrap hierarchy. The header file
defines the MAPDEF, MSGMAP, MSG, and ENDMAP macros, which expand into inline
and member functions that implement the message map. This expansion causes the
named member functions of the derived class to be called when the associated
messages are sent by the system. The header file declares the base dfWindow
class, which is little more than a wrapper around D-Flat window processing.
The notable exception is the static mf_WndProc function, which intercepts and
dispatches messages to the derived window. When the application instantiates
an object derived from dfWindow, its constructor calls the D-Flat CreateWindow
function, passing the mf_WndProc function address as the window-processing
module. That function receives messages sent to the window and calls the
class's DispatchDFMessage member function, which is declared by the MAPDEF
macro and defined by the MSGMAP, MSG, and ENDMAP macros. That function tests
the message in a switch statement with cases generated from the MSG macros.
Those cases call the derived-class member function assigned to the matching
message.


The dfApplication and dfDialog Classes


Listings Five and Six are dfappl.h and dfappl.cpp, respectively, the
source-code files that declare and define the dfApplication class. This is a
small wrapper class that associates the menu with the application and launches
the application. Similarly, Listings Seven and Eight are dfdial.h and dfdial
.cpp, respectively. They put a wrapper around the D-Flat dialog-box mechanism.
These two classes encapsulate as much of D-Flat as I needed for the mail
application. If I use DFWrap for other applications, it might become necessary
to add to the interface.


Common Dialog Boxes


The wrapper provides access to the common D-Flat dialog boxes--File Open, File
Save, Message Box, Error Box, YesNo Box, and so on--through the same function
calls that D-Flat C programs use. There seemed to be no point in changing that
interface.


The IMail Program


You can download D-Flat v20, DFWrap, and the IMail program from CompuServe,
ftp.mv.com, and DDJ Online. The IMail program is a far more comprehensive
example of the use of the wrapper classes than I can include in the column of
a single month. It uses a multiple-document interface to display the lists of
and text of incoming and outgoing mail messages. There are several modal and
modeless dialog boxes that implement logon options, communications options, an
address book, and a file cabinet for storing messages. The program also
includes modem and serial-port logic encapsulated into C++ classes.
I don't know if I will use D-Flat much more. The day of the DOS, text-mode
application is bygone, I suspect. Now if someone would only write a graphics
version of D-Flat that provides the look and feel of a Windows 95
application_.
Example 1: (a) Deriving an applications class; (b) message map; (c)
message-connected functions.

(a)
class MailAppl : public dfApplication {
 bool OnFile(); // menu command
 bool OnClose(); // "
protected:
 MAPDEF(dfApplication)
public:
 MailAppl();
 virtual ~MailAppl();
};

(b)
MSGMAP(MailAppl)
 MSG(ID_FILE, OnAbout)
 MSG(CLOSE_WINDOW, OnClose)
ENDMAP

(c)
bool MailAppl::OnFile()
{
 // ...
 return false;
}
Example 2: (a) Declaring an object of the derived dfApplication class; (b)
deriving classes from the dfDialog class.
(a)
MailAppl *mailappl;
int main()
{
 mailappl = new MailAppl;
 mailappl->Run();
 delete mailappl;
 return 0;
}

(b)
extern DBOX AddrBookDB;
class AddrBook : public dfDialog {
 bool OnOK();
protected:
 MAPDEF(dfDialog)
public:
 AddrBook() : dfDialog(AddrBookDB)
 { }
};
Example 3: (a) Definition of the dialog box; (b) launching modal and modeless
dialog boxes.
(a)
DIALOGBOX( AddrBookDB )
 DB_TITLE( "Address Book", -1, -1, 15, 60 )
 CONTROL(TEXT, "~Addresses:", 1, 0, 1,10, ID_ADDRESS)
 CONTROL(LISTBOX, NULL, 1, 1,12,45, ID_ADDRESS)
 CONTROL(BUTTON, " ~Select ", 48, 1, 1, 8, ID_OK)
ENDDB
MSGMAP(AddrBook)
 MSG(ID_ADDRESS, OnAddress)
 MSG(ID_OK, OnOK)
ENDMAP
bool AddrBook::OnAddress()
{
 // ...

 return true;
}
bool AddrBook::OnOK()
{
 // ...
 return new;
}

(b)
AddrBook *p_addrbook = new AddrBook;
p_addrbook->doModal(); // or doModeless()
delete p_addrbook;

Listing One 

// ------ dfwrap.h

#ifndef DFWRAP_H
#define DFWRAP_H

typedef int bool;
const int true = 1;
const int false = 0;

#include <assert.h>

extern "C" {
#include "dflat.h"
}
#include "dfrect.h"
#include "dfwnd.h"
#include "dfappl.h"
#include "dfdial.h"

#endif



Listing Two 

// -------- dfrect.h
#ifndef DFRECT_H
#define DFRECT_H

class Rect {
 RECT m_rect;
public:
 Rect(RECT rc) : m_rect(rc)
 { /* ... */ }
 Rect(int x=0, int y=0, int ht=1, int wd=1)
 {
 m_rect.lf = x;
 m_rect.tp = y;
 m_rect.rt = x+wd-1;
 m_rect.bt = y+ht-1;
 }
 operator RECT()
 { return m_rect; }
 int Left() const

 { return m_rect.lf; }
 int Top() const
 { return m_rect.tp; }
 int Height() const
 { return m_rect.bt - m_rect.tp + 1; }
 int Width() const
 { return m_rect.rt - m_rect.lf + 1; }
 int Right() const
 { return m_rect.rt; }
 int Bottom() const
 { return m_rect.bt; }
 int Area() const
 { return Height() * Width(); }
};
#endif



Listing Three

// ------- dfwnd.h
#ifndef DFWND_H
#define DFWND_H

#include <cstring.h>

// -------------------------------------------------------------
#define MAPDEF(base) \
virtual bool DispatchDFMessage(MESSAGE msg); \
bool DispatchBaseDFMessage(MESSAGE msg) \
{ \
 MESSAGE sm=m_msg;PARAM s1=p1,s2=p2; \
 bool rtn = base::DispatchDFMessage(msg); \
 if (rtn&&msg==CLOSE_WINDOW)m_Wnd=0; \
 m_msg=sm;p1=s1;p2=s2; \
 return rtn; \
}
// -------------------------------------------------------------
#define MSGMAP(ap) \
bool ap::DispatchDFMessage(MESSAGE msg) \
{MESSAGE sm=m_msg;PARAM s1=p1,s2=p2;bool rtn=true;switch(msg){
// -------------------------------------------------------------
#define MSG(id,fn) case id:rtn=fn();break;
// -------------------------------------------------------------
#define ENDMAP default:break;} \
 m_msg=sm;p1=s1;p2=s2; \
 return rtn?DispatchBaseDFMessage(msg):false; \
}
// -------------------------------------------------------------
class dfWindow {
protected:
 static dfWindow *mp_cwindow;
 WINDOW m_Wnd;
 MESSAGE m_msg;
 PARAM p1, p2;
 dfWindow();
 dfWindow(CLASS cl, const string& ttl, const Rect& rc = Rect(0,0,-1,-1),
 void *ext = 0, int attrib = 0);
 void CommonConstructor();

 virtual ~dfWindow()
 { }
 static bool mf_WndProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2);
 virtual bool DispatchDFMessage(MESSAGE)
 { return DefaultWndProc(m_Wnd, m_msg, p1, p2); }
public:
 void Show()
 { SendMessage(SHOW_WINDOW);}
 void Hide()
 { SendMessage(HIDE_WINDOW); }
 void CloseWindow()
 { SendMessage(CLOSE_WINDOW); }
 void Paint()
 { SendMessage(PAINT); }
 void Move(int x, int y)
 { SendMessage(MOVE, x, y); }
 void Size(int h, int w)
 { SendMessage(SIZE,
 GetTop(m_Wnd)+h-1, GetRight(m_Wnd)+w-1); }
 bool isAttributeSet(int attrib)
 { return TestAttribute(m_Wnd, attrib); }
 void SetAttribute(int attrib)
 { AddAttribute(m_Wnd, attrib); }
 void ResetAttribute(int attrib)
 { ClearAttribute(m_Wnd, attrib); }
 int SendMessage(MESSAGE msg, PARAM p1 = 0, PARAM p2 = 0)
 { return ::SendMessage(m_Wnd, msg, p1, p2); }
 void PostMessage(MESSAGE msg, PARAM p1 = 0, PARAM p2 = 0)
 { ::PostMessage(m_Wnd, msg, p1, p2); }
 int SendCommand(PARAM cmd, PARAM p2 = 0)
 { return ::SendMessage(m_Wnd, COMMAND, cmd, p2); }
 void PostCommand(PARAM cmd, PARAM p2 = 0)
 { ::PostMessage(m_Wnd, COMMAND, cmd, p2); }
 int Top()
 { return GetTop(m_Wnd); }
 int Bottom()
 { return GetBottom(m_Wnd); }
 int Left()
 { return GetLeft(m_Wnd); }
 int Right()
 { return GetRight(m_Wnd); }
 int Height()
 { return WindowHeight(m_Wnd); }
 int Width()
 { return WindowWidth(m_Wnd); }
 int ClientTop()
 { return GetClientTop(m_Wnd); }
 int ClientBottom()
 { return GetClientBottom(m_Wnd); }
 int ClientLeft()
 { return GetClientLeft(m_Wnd); }
 int ClientRight()
 { return GetClientRight(m_Wnd); }
 int ClHeight()
 { return ClientHeight(m_Wnd); }
 int ClWidth()
 { return ClientWidth(m_Wnd); }
#ifdef INCLUDE_MINIMIZE
 void Minimize()

 { SendMessage(MINIMIZE); }
 bool isMinimized()
 { return m_Wnd->condition == ISMINIMIZED; }
#endif
#ifdef INCLUDE_MAXIMIZE
 void Maximize()
 { SendMessage(MAXIMIZE); }
 bool isMaximized()
 { return m_Wnd->condition == ISMAXIMIZED; }
#endif
#ifdef INCLUDE_RESTORE
 void Restore()
 { SendMessage(RESTORE); }
 bool isRestored()
 { return m_Wnd->condition == ISRESTORED; }
 Rect RestoredRect()
 { return m_Wnd->RestoredRC; }
 void SetRestoredRect(Rect rc)
 { m_Wnd->RestoredRC = rc; }
#endif
 Rect WndRectangle()
 { return WindowRect(m_Wnd); }
 void SetWndRectangle(Rect rc);
 void SetWndCondition(Condition cnd)
 { m_Wnd->condition = cnd; }
 Rect ClientRectangle()
 { return Rect(ClientLeft(), ClientTop(), ClHeight(), ClWidth()); }
 void SetForeground(char fg)
 { WndForeground(m_Wnd) = fg; }
 void SetBackground(char bg)
 { WndBackground(m_Wnd) = bg; }
 void ShowCursor()
 { ::SendMessage(0, SHOW_CURSOR, 0, 0); }
 void HideCursor()
 { ::SendMessage(0, HIDE_CURSOR, 0, 0); }
 void SetCursor()
 { cursor(ClientLeft()+Column(), ClientTop()+Row()); }
 void CursorHome()
 { m_Wnd->CurrCol = 0; SetCursor(); }
 void CursorUp(int ct = 1);
 void CursorDown(int ct = 1);
 void CursorRight(int ct = 1);
 void CursorLeft(int ct = 1);
 void WriteChar(int ch);
 int Row()
 { return m_Wnd->WndRow; }
 int Column()
 { return m_Wnd->CurrCol; }
 void SetRow(int y)
 { m_Wnd->WndRow = y; }
 void SetColumn(int x)
 { m_Wnd->CurrCol = x; }
 void PositionCursor(int x, int y);
 void ScrollUp()
 { scroll_window(m_Wnd, ClientRectangle(), TRUE); }
 void ClearScreen();
 void ClearToEOL();
 void SetFocus()
 { SendMessage(SETFOCUS, true); }

 void SetRestoredAttrib(int attr)
 { m_Wnd->restored_attrib = attr; }
 int GetRestoredAttrib() const
 { return m_Wnd->restored_attrib; }
};
#endif



Listing Four

// ---------- dfwnd.cpp
#include "dfwrap.h"

dfWindow *dfWindow::mp_cwindow;

void dfWindow::CommonConstructor()
{
 m_Wnd = 0;
 p1 = 0;
 p2 = 0;
}
dfWindow::dfWindow()
{
 CommonConstructor();
}
dfWindow::dfWindow(CLASS cl, const string& ttl, const Rect& rc,
 void *ext, int attrib)
{
 CommonConstructor();
 if (cl == APPLICATION)
 init_messages();
 mp_cwindow = this;
 CreateWindow(cl,ttl.c_str(),
 rc.Left(),rc.Top(),rc.Height(),rc.Width(),
 ext,0,&dfWindow::mf_WndProc,attrib);
 assert(m_Wnd != 0);
}
bool dfWindow::mf_WndProc(WINDOW wnd, MESSAGE msg, PARAM pr1, PARAM pr2)
{
 bool rtn = false;
 dfWindow *This;
 if (wnd->wrapper == 0) {
 assert(mp_cwindow != 0);
 wnd->wrapper = mp_cwindow;
 mp_cwindow->m_Wnd = wnd;
 mp_cwindow = 0;
 }
 This = (dfWindow*)(wnd->wrapper);
 This->p1 = pr1;
 This->p2 = pr2;
 This->m_msg = msg;
 if (msg == COMMAND && (pr2 == 0 pr2 == LB_CHOOSE pr2 ==LB_SELECTION))
 rtn = This->DispatchDFMessage((MESSAGE)pr1);
 else
 rtn = This->DispatchDFMessage(msg);
 // --- done again because of reentrant calls
 This->p1 = pr1;
 This->p2 = pr2;

 This->m_msg = msg;
 if (rtn && msg == CLOSE_WINDOW)
 This->m_Wnd = 0;
 return rtn;
}
void dfWindow::SetWndRectangle(Rect rc)
{
 m_Wnd->rc = rc;
 m_Wnd->ht = rc.Height();
 m_Wnd->wd = rc.Width();
}
void dfWindow::WriteChar(int ch)
{
 SetStandardColor(m_Wnd);
 PutWindowChar(m_Wnd, ch, m_Wnd->CurrCol, m_Wnd->WndRow);
}
void dfWindow::PositionCursor(int x, int y)
{
 SetColumn(x);
 SetRow(y);
 SetCursor();
}
void dfWindow::CursorUp(int ct)
{
 while (ct-- && m_Wnd->WndRow) {
 m_Wnd->WndRow--;
 SetCursor();
 }
}
void dfWindow::CursorDown(int ct)
{
 while (ct-- && m_Wnd->WndRow < ClHeight()) {
 m_Wnd->WndRow++;
 SetCursor();
 }
}
void dfWindow::CursorRight(int ct)
{
 while (ct-- && m_Wnd->CurrCol < ClWidth()) {
 m_Wnd->CurrCol++;
 SetCursor();
 }
}
void dfWindow::CursorLeft(int ct)
{
 while (ct-- && m_Wnd->CurrCol) {
 m_Wnd->CurrCol--;
 SetCursor();
 }
}
void dfWindow::ClearScreen()
{
 SendMessage(CLEARTEXT);
 Paint();
 PositionCursor(0, 0);
}
void dfWindow::ClearToEOL()
{
 int col = m_Wnd->CurrCol;

 SetStandardColor(m_Wnd);
 while (col < ClWidth())
 PutWindowChar(m_Wnd, ' ', col++, m_Wnd->WndRow);
}



Listing Five

// --------- dfappl.h
#ifndef DFAPPL_H
#define DFAPPL_H

#include <fstream.h>

class dfApplication : public dfWindow {
 int m_currx, m_curry;
 MBAR& menu;
protected:
 MAPDEF(dfWindow)
 virtual bool LoadConfig()
 { return ::LoadConfig(); }
 virtual void SaveConfig()
 { ::SaveConfig(); }
 bool OnOpen();
 bool OnClose();
public:
 dfApplication(const string& ttl, MBAR& mnu,
 const Rect& rc = Rect(0,0,-1,-1));
 virtual ~dfApplication();
 void Run();
 void WriteStatus(const string& tx);
 void ClearStatus();
 bool GetCommandToggle(commands id)
 { return (bool) ::GetCommandToggle(&menu, id); }
 void SetCommandToggle(commands id)
 { ::SetCommandToggle(&menu, id); }
 void ClearCommandToggle(commands id)
 { ::ClearCommandToggle(&menu, id); }
};
#endif



Listing Six

// ------- dfappl.cpp
#include "dfwrap.h"

MSGMAP(dfApplication)
 MSG(OPEN_WINDOW, OnOpen)
 MSG(CLOSE_WINDOW, OnClose)
ENDMAP

dfApplication::dfApplication(const string& ttl, MBAR& mnu, const Rect& rc) :
 dfWindow(APPLICATION, ttl, rc, &mnu, HASSTATUSBAR), menu(mnu)
{
 curr_cursor(&m_currx, &m_curry);
 LoadHelpFile(DFlatApplication);

 ResetAttribute(CONTROLBOX);
 m_Wnd->condition = ISMAXIMIZED;
}
dfApplication::~dfApplication()
{
 cursor(m_currx, m_curry);
}
bool dfApplication::OnOpen()
{
 if (!LoadConfig())
 cfg.ScreenLines = SCREENHEIGHT;
 return true;
}
bool dfApplication::OnClose()
{
 DispatchBaseDFMessage(CLOSE_WINDOW);
 SaveConfig();
 return false;
}
void dfApplication::Run()
{
 SendMessage(SETFOCUS, TRUE, 0);
 while (dispatch_message())
 ;
}
void dfApplication::WriteStatus(const string& tx)
{
 ::SendMessage(m_Wnd->StatusBar, SETTEXT, (PARAM) tx.c_str(), 0);
 ::SendMessage(m_Wnd->StatusBar, PAINT, 0, 0);
}
void dfApplication::ClearStatus()
{
 ::SendMessage(m_Wnd->StatusBar, CLEARTEXT, 0, 0);
 ::SendMessage(m_Wnd->StatusBar, PAINT, 0, 0);
}






Listing Seven

// ---------- dfdial.h
#ifndef DFDIAL_H
#define DFDIAL_H

#include <cstring.h>
#include "icmds.h"

class dfDialog : public dfWindow {
 DBOX m_dbox;
 bool OnClose();
 bool OnSize();
 bool OnMove();
 bool OnLBChoose();
protected:
 MAPDEF(dfWindow)
 bool KeepLBSelectionInView(cmds id);

 bool m_isrunning;
public:
 dfDialog(DBOX& db) : m_dbox(db), m_isrunning(false)
 { }
 virtual ~dfDialog()
 { }
 bool doModal();
 void doModeless();
 bool isRunning() const
 { return m_isrunning; }
 void SetTitle(const char *title);
 void SetControlFocus(cmds id);
 void SetControlText(cmds id, const char *t, CLASS cl = (CLASS)-1);
 const char *GetControlText(cmds id, CLASS cl = (CLASS)-1);
 int SendCtlMessage(cmds id, MESSAGE msg, PARAM pr1=0, PARAM pr2=0);
 void SetWindowText(cmds id, const char *t)
 { SendCtlMessage(id, SETTEXT, (PARAM) t); }
 char *GetWindowText(cmds id) const;
 void AddWindowText(cmds id, const char *t)
 { SendCtlMessage(id, ADDTEXT, (PARAM) t); }
 bool TextChanged(cmds id);
 void ClearWindowText(cmds id)
 { SendCtlMessage(id, CLEARTEXT); }
 int GetLineCount(cmds id);
 int GetLBSelection(cmds id)
 { return SendCtlMessage(id, LB_CURRENTSELECTION); }
 void SetLBSelection(cmds id, int sel)
 { SendCtlMessage(id, LB_SETSELECTION, sel); }
 void GetLBTextLine(cmds id, int lno, char *t)
 { SendCtlMessage(id, LB_GETTEXT, (PARAM) t, lno); }
 void GetSelectedLBText(cmds id, char *t);
 int DeleteSelectedLBLine(cmds id, char *prompt = 0);
 void InsertTextLine(cmds id, int lno, char *t)
 { SendCtlMessage(id, INSERTTEXT, (PARAM) t, lno); }
 void DeleteTextLine(cmds id, int lno)
 { SendCtlMessage(id, DELETETEXT, lno); }
 void PaintControl(cmds id)
 { SendCtlMessage(id, PAINT); }
 void MoveControl(cmds id, int x, int y);
 void SizeControl(cmds id, int h, int w);
 void SetControlOn(cmds id);
 void SetControlOff(cmds id);
 bool GetControlSetting(cmds id);
 void EnableCommandButton(cmds id)
 { EnableButton(&m_dbox, (commands) id); }
 void DisableCommandButton(cmds id)
 { DisableButton(&m_dbox, (commands) id); }
 void SetProtection(cmds id);
 void SetControlWindowAttribute(cmds id, int attrib);
 void ResetControlWindowAttribute(cmds id, int attrib);
 bool isEmpty(cmds id) const;
 const DBOX& DBox() const
 { return m_dbox; }
 void DialogTextCopy(char *s1, cmds id, int len);
};
class DialogConfig {
 bool max; // true if DB was maximized
 Rect rc; // DB position, size
 Rect rrc; // DB restored configuration

 Rect ctls[MAXCONTROLS]; // control positions, sizes
 int restattrib; // restored attribute
public:
 DialogConfig() : max(false)
 { }
 void SaveDialog(dfDialog& db);
 void RestoreDialog(dfDialog& db);
};
#endif



Listing Eight

// ----- dfdial.cpp
#include "dfwrap.h"

MSGMAP(dfDialog)
 MSG(LB_CHOOSE, OnLBChoose)
 MSG(SIZE, OnSize)
 MSG(MOVE, OnMove)
 MSG(CLOSE_WINDOW, OnClose)
ENDMAP

bool dfDialog::OnLBChoose()
{
 PostCommand(ID_OK);
 return true;
}
bool dfDialog::OnClose()
{
 m_isrunning = false;
 return true;
}
bool dfDialog::doModal()
{
 mp_cwindow = this;
 m_isrunning = true;
 return DialogBox(0, &m_dbox, TRUE, mf_WndProc);
}
void dfDialog::doModeless()
{
 mp_cwindow = this;
 m_isrunning = true;
 DialogBox(0, &m_dbox, FALSE, mf_WndProc);
}
void dfDialog::SetTitle(const char *title)
{
 m_dbox.dwnd.title = (char *) title;
 if (m_isrunning)
 SendMessage(BORDER);
}
bool dfDialog::OnMove()
{
 m_dbox.dwnd.x = (int) p1;
 m_dbox.dwnd.y = (int) p2;
 return true;
}
bool dfDialog::OnSize()

{
 m_dbox.dwnd.x = Left();
 m_dbox.dwnd.y = Top();
 m_dbox.dwnd.w = p1-Left()+1;
 m_dbox.dwnd.h = p2-Top()+1;
 return true;
}
void dfDialog::SetControlText(cmds id, const char *t, CLASS cl)
{
 SetDlgTextString(&m_dbox, (commands) id, (char*) t, cl);
}
const char* dfDialog::GetControlText(cmds id, CLASS cl)
{
 return GetDlgTextString(&m_dbox, (commands) id, cl);
}
void dfDialog::SetControlFocus(cmds id)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 ::SendMessage(cwnd, SETFOCUS, TRUE, 0);
}
int dfDialog::SendCtlMessage(cmds id, MESSAGE msg, PARAM pr1, PARAM pr2)
{
 int rtn = -1;
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0) {
 MESSAGE svmsg = m_msg;
 PARAM svp1 = p1;
 PARAM svp2 = p2; 
 rtn = ::SendMessage(cwnd, msg, pr1, pr2);
 p1 = svp1;
 p2 = svp2;
 m_msg = svmsg;
 }
 return rtn;
}
char* dfDialog::GetWindowText(cmds id) const
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 return GetText(cwnd);
 return 0;
}
bool dfDialog::TextChanged(cmds id)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 return (bool) (cwnd->TextChanged);
 return false;
}
int dfDialog::GetLineCount(cmds id)
{
 int ct = 0;
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 ct = GetTextLines(cwnd);
 return ct;
}
void dfDialog::GetSelectedLBText(cmds id, char *t)

{
 int sel = GetLBSelection(id);
 if (sel != -1)
 GetLBTextLine(id, sel, t);
 else
 *t = '\0';
}
int dfDialog::DeleteSelectedLBLine(cmds id, char *prompt)
{
 int count = GetLineCount(id);
 if (count > 0) {
 int sel = GetLBSelection(id);
 if (sel != -1) {
 if (prompt == 0 YesNoBox(prompt)) {
 DeleteTextLine(id, sel);
 if (sel == count-1)
 SetLBSelection(id, sel-1);
 KeepLBSelectionInView(id);
 PaintControl(id);
 SetControlFocus(id);
 return sel;
 }
 }
 }
 return -1;
}
void dfDialog::MoveControl(cmds id, int x, int y)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0) {
 int oldx = FindCommand(&m_dbox, (commands) id, -1)->dwnd.x;
 int oldy = FindCommand(&m_dbox, (commands) id, -1)->dwnd.y;
 x = x == -1 ? oldx : x;
 y = y == -1 ? oldy : y;
 int difx = x - oldx;
 int dify = y - oldy;
 if (difx dify) {
 FindCommand(&m_dbox, (commands) id, -1)->dwnd.x = x;
 FindCommand(&m_dbox, (commands) id, -1)->dwnd.y = y;
 SendCtlMessage(id, MOVE, GetLeft(cwnd)+difx, GetTop(cwnd)+dify);
 }
 }
}
void dfDialog::SizeControl(cmds id, int h, int w)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0) {
 FindCommand(&m_dbox, (commands) id, -1)->dwnd.h = h;
 FindCommand(&m_dbox, (commands) id, -1)->dwnd.w = w;
 SendCtlMessage(id, SIZE, GetLeft(cwnd)+w-1, GetTop(cwnd)+h-1);
 }
}
void dfDialog::SetControlOn(cmds id)
{
 CTLWINDOW *ct = FindCommand(&m_dbox, (commands) id, RADIOBUTTON);
 if (ct != 0)
 PushRadioButton(&m_dbox, (commands) id);
 else
 SetCheckBox(&m_dbox, (commands) id);

 PaintControl(id);
}
void dfDialog::SetControlOff(cmds id)
{
 ClearCheckBox(&m_dbox, (commands) id);
 PaintControl(id);
}
bool dfDialog::GetControlSetting(cmds id)
{
 CTLWINDOW *ct = FindCommand(&m_dbox, (commands) id, RADIOBUTTON);
 if (ct != 0)
 return RadioButtonSetting(&m_dbox, (commands) id);
 return CheckBoxSetting(&m_dbox, (commands) id);
}
void dfDialog::SetProtection(cmds id)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 SetProtected(cwnd);
}
bool dfDialog::KeepLBSelectionInView(cmds id)
{
 bool rtn = false;
 int sel = SendCtlMessage(id, LB_CURRENTSELECTION);
 if (sel != -1) {
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 int top = cwnd->wtop;
 int ht = ClientHeight(cwnd) - 1;
 int bottom = top + ht;
 if ((rtn = (sel < top)) == true)
 cwnd->wtop = sel;
 else if ((rtn = (sel > bottom)) == true)
 cwnd->wtop = sel - ht;
 }
 return rtn;
}
void dfDialog::SetControlWindowAttribute(cmds id, int attrib)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 AddAttribute(cwnd, attrib);
}
void dfDialog::ResetControlWindowAttribute(cmds id, int attrib)
{
 WINDOW cwnd = ControlWindow(&m_dbox, (commands) id);
 if (cwnd != 0)
 ClearAttribute(cwnd, attrib);
}
bool dfDialog::isEmpty(cmds id) const
{
 const char *p_txt = GetWindowText(id);
 if (p_txt)
 while (*p_txt) {
 if (*p_txt != ' ' && *p_txt != '\n')
 return false;
 p_txt++;
 }
 return true;
}

void dfDialog::DialogTextCopy(char *s1, cmds id, int len)
{
 const char *s2 = GetWindowText(id);
 if (s1 != 0 && s2 != 0)
 while (*s2 && *s2 != '\n' && --len)
 *s1++ = *s2++;
 *s1 = '\0';
}
void DialogConfig::RestoreDialog(dfDialog& db)
{
 if (rc.Width() > 1) {
 db.SetWndRectangle(rc);
 db.SetRestoredRect(rrc);
 db.SetRestoredAttrib(restattrib);
 DBOX& dbox = const_cast<DBOX&>(db.DBox());
 dbox.dwnd.x = rc.Left();
 dbox.dwnd.y = rc.Top();
 dbox.dwnd.h = rc.Height();
 dbox.dwnd.w = rc.Width();
 if (max)
 // --- DB was maximized when last used
 db.SetWndCondition(ISMAXIMIZED);
 for (int i = 0; i < MAXCONTROLS; i++) {
 dbox.ctl[i].dwnd.x = ctls[i].Left();
 dbox.ctl[i].dwnd.y = ctls[i].Top();
 dbox.ctl[i].dwnd.h = ctls[i].Height();
 dbox.ctl[i].dwnd.w = ctls[i].Width();
 }
 }
}
void DialogConfig::SaveDialog(dfDialog& db)
{
 rc = db.WndRectangle();
 rrc = db.RestoredRect();
 max = db.isMaximized();
 restattrib = db.GetRestoredAttrib();
 for (int i = 0; i < MAXCONTROLS; i++) {
 const DIALOGWINDOW& dwnd = db.DBox().ctl[i].dwnd;
 ctls[i] = Rect(dwnd.x, dwnd.y, dwnd.h, dwnd.w);
 }
}






















ALGORITHM ALLEY


Constrained Optimization




Rainer Storn


Rainer, who studied at the University of Stuttgart, received his PhD for his
dissertation, "Algorithms and Architectures for the Discrete Fourier Transform
for the Fast Convolution of Real-Valued Signals." Currently, he is a
postdoctoral fellow at the International Computer Science Institute in
Berkeley, CA and can be contacted at storn@icsi.berkeley.edu or
rainer.storn@zfe.siemens.de.


Optimization problems appear in virtually every field of analysis, design, and
production. For example, an electronic circuit contains numerous components
(resistors, capacitors, and the like), the properties of which dither around a
certain nominal value. This dithering is a direct result of the production
process of these components and cannot be kept below a certain bound. Due to
these imperfections, a certain percentage of electronic circuits don't meet
the specifications. Here, the goal of optimization is to determine the nominal
value of the critical components such that the production yield is maximized. 
Another example might involve a city wanting to set up a data network that
connects the computer centers of many companies and universities. The budget
to build such a data network may not exceed a certain amount and depends,
among other factors, upon the topology of the network, the data load
introduced by the different computer sites, and the capacity of the connecting
lines. The goal in this case might be to maximize the throughput in the
network and minimize the average delay of a message, given the budget
constraints. 
Yet another example is an electric motor subjected to geometric constraints in
order to fulfill certain company standards. There are some degrees of freedom
to alter the shape of the ring magnet. Here, the goal is to alter the
ring-magnet geometry in such a way that the efficiency of the motor is
maximized.
In all these examples, an optimum must be found while operating within certain
constraints. These kinds of problems fall under the heading of "constrained
optimization." To optimize such problems using the computer, you must first
formulate them mathematically. Many algorithms are capable of tackling
constrained-optimization problems. Most require the problem to be formulated
by means of an objective (or "cost") function that has to be minimized.
Objective functions are a playground for deterministic algorithms, which often
make use of gradient computations. Such algorithms, however, run into problems
if the objective function is nondifferentiable. Furthermore, they usually look
for an improvement after each step they perform and hence are liable to get
stuck in a local optimum. Apart from that, devising a good objective function
can cause a lot of headaches, especially if you have more than one objective. 
For example, consider an integrated circuit for which both the power
consumption and the signal delay have to be minimized. Minimizing the power
consumption, however, often means minimizing the current, which in turn may
increase the signal delay. In such a case, it is difficult to weight the
different objectives properly. 
Other optimization problems don't have anything to minimize or to maximize.
They merely look for a set of parameters that fulfill certain constraints.
With such problems, an artificial objective function must be formulated;
minimizing this artificial function fulfills the constraints. An example is
the magnitude of the transfer function of an electronic filter, which has to
fit inside a tolerance scheme but can adopt any shape inside the tolerance
scheme. Contriving a good objective function is far from trivial and more
often than not requires considerable expertise in the details of the
optimization problem. But what if you are no expert and your boss still wants
you to do the optimization? The algorithm I propose here doesn't require the
formulation of an objective function. You just formulate the constraints that
must not be violated (must-constraints) and the properties that have to be
maximized or minimized (may-constraints). 
The algorithm is based on some new, as-yet-unpublished ideas. It belongs to
the so-called Monte Carlo methods, which lavishly utilize random numbers. Due
to the statistical nature of these methods, the chances are quite good for
finding the global optimum, as opposed to getting stuck in a local one. The
price is an enlarged computational effort in comparison to deterministic
optimization procedures. In light of ever-increasing computing power, however,
this disadvantage is vanishing for many practical problems. If large-scale
optimization problems have to be dealt with, it is advantageous to distribute
the calculations among many computers; an example is a high-speed computer
network, to which the proposed optimization algorithm lends itself perfectly.


The Optimization Principle


Consider a system with n parameters, p1, p2, ..., pn, which influence whether
the system meets its imposed requirements. The parameters pi can be
interpreted as the elements of an n-dimensional parameter vector P. If the
parameters are chosen appropriately, the system will meet the requirements,
which can be visualized as the parameter vector P pointing inside a "region of
acceptability" (ROA) in the parameter space. This scenario is illustrated in
Figures 1 and 2 for an example with two parameters.
In Figure 1, the shaded areas constitute the must-constraints. The function
f(x,p1,p2) is not allowed to run through, let alone touch, those areas.
Choosing the parameters p1=p11 and p2=p21 makes the function fulfill the
constraints. If you choose p1=p10 and p2=p20, the function violates the
constraints. A may-constraint could be the requirement of f(x,p1,p2) to be as
linear as possible for x in the range from 0 to 15000. For the sake of
simplicity, I'll disregard may-constraints for the moment. The situation in
Figure 1 can also be viewed in the parameter space defined by the coordinates
p1 and p2 in Figure 2. The ROA in Figure 2 corresponds to the blank area in
Figure 1. The parameter vector (p10,p20) causes the function in Figure 1 to
violate the imposed constraints (the ROA is not hit). The parameter vector
(p11,p21) causes the function in Figure 1 to satisfy the imposed constraints
(ROA is hit). If f(x,p1,p2) fulfills the constraints, the parameter vector
P=(p1,p2) lies inside the ROA. If f(x,p1,p2) violates the constraints, the
parameter vector is located outside the ROA.
The ROA of a system can be quite complicated and usually isn't known in
advance. If it were, it would be trivial to choose the right parameter values
for the dimensioning of the system. The optimization process can be
interpreted as a way of finding a parameter vector, P, which lies inside the
ROA. The task now is to estimate (at least parts of) the ROA's shape by
producing a cloud of statistically distributed parameter vectors in the
neighborhood of your starting (or "nominal") vector. Vectors which hit the ROA
provide the desired estimate of the ROA's shape (at least partly); see Figure
3.
Initially, the vector may be so far from pointing anywhere near the ROA that
even hours of computing time will not produce a hit. To escape this trap, you
initially relax the constraints just enough to allow your nominal vector to
meet them. Now, Gaussian, distributed, random deviations from the nominal
vector are generated. They produce a cloud of new parameter vectors that
gather around the nominal vector, as in Figure 3.
The algorithm I'm proposing tries to generate 3*dim hits when dim parameters
make up a parameter vector (in our example, dim=2). If this can't be achieved,
the current iteration will stop after mxvecs tries. Setting mxvecs=20*dim is a
reasonable starting point. If the optimization gets stuck in a local optimum
or no hits are found, increase mxvecs. At the end of an iteration the mean
value of the hit vectors is calculated and defined to be the new nominal
vector for the next iteration. Before the next iteration is started, the
constraints are tightened, but only to the point that the new nominal vector
lies on the rim of the new ROA. The optimization algorithm continues with this
mechanism until the original constraints are met. The entire optimization
process will be accompanied by a successive shrinking of the ROA; see Figure
4. Note that the ROA need not maintain its general shape, as might be
suggested by Figure 4. In fact, the shape can change significantly, and this
sometimes hinders the algorithm in finding the desired optimum.
Although the idea behind the proposed optimization procedure is fairly simple,
you need a bag of tricks to make it actually work. A central problem is
finding the proper standard deviations to drive your multivariate, Gaussian,
random-number generator. It is good to start the program with large standard
deviations; for example, three times the corresponding parameter value itself
(unless the value is 0, of course). During an iteration of the optimization
process, the maximum distances in each coordinate direction between the
current nominal vector and the hit vectors have yielded good values for the
standard deviations of the next iteration. In Monte Carlo optimization, a
major issue is preventing the standard deviations from becoming too small.
Otherwise, the program will run into the next-best local optimum as quickly as
it can. Therefore, my proposed algorithm tries to increase the maximum
deviations by reusing the difference vector of the nominal vector and a hit
vector. Such a "successful" difference vector is applied as often as
possible--as long as new hits can be generated. At the first occurrence of a
no-hit, the mechanism stops and the maximum deviation in each coordinate
direction is evaluated. The reusage strategy shown in Figure 5 is implemented
via a linear-search strategy (an input-file that illustrates this is available
electronically; see "Availability," page 3). Of course, the search strategy
could be improved to get closer to the rim of the ROA faster, but this has
been left out of the current implementation for the sake of simplicity.
Although the standard deviations should remain large enough to increase the
chance of leaving local optima, the algorithm will produce no hit at all if
the deviations are too large. Hence if no hit occurs, the standard deviations
are reduced by a factor of 0.7 for the next iteration. The factor 0.7 is not
mandatory; you can experiment with it. 
On one hand, the control strategy just described helps the program reduce its
standard deviations when it has to ooze through a narrow valley of the ROA. On
the other hand, the strategy expands them again when the valley has been
passed. It is this control strategy that is new, at least compared to existing
Monte Carlo optimization methods; control is crucial for the robustness of the
algorithm.
There are other pitfalls to be considered. The larger the problem size (that
is, the more parameters you want to include in your optimization), the more
difficult it becomes to generate hits because there are so many different
vectors to choose from. To alleviate this general problem, it helps to keep a
record of successful search directions from the current iteration. These
directions are the first ones to be tested in a new iteration before the
random-number generator takes over. This assumes that the shape of the ROA
doesn't change too much from one iteration to the next; hence, the desired
number of hits can be generated faster by "looking into the right direction."
In fact, this assumption has turned out to be true in many cases.
Now let's look at must-constraints and may-constraints. Try to meet your
must-constraints first. After that you can start to work on the
may-constraints while maintaining the must-constraints. There is a little
problem concerning the criterion for making the whole optimization come to an
end. If the nominal vector doesn't change much over several iterations, you
could stop the program. This, however, can result in aborting the optimization
during a stagnation phase. A sure-fire criterion is to stop the program after
a prescribed maximum amount of iterations and see if you are satisfied with
the results. But then you might be satisfied with results that could have been
surpassed significantly had you just waited long enough. As a third variant,
you could set up a fixed goal for the may-constraints and stop when this goal
is reached. However, you often don't know whether this goal can be reached at
all. Maybe it's impossible to satisfy even the must-constraints. From the
previous discussion, it is clear that each criterion has its drawbacks. As a
compromise, the last two criteria are implemented in Listing One , leaving the
decision up to you whether you go for one or more restarts to improve your
results.


Examples


I'll now examine some practical examples, starting with one taken from
electronic-filter theory. Consider a polynomial p(x) of degree 4 that must fit
the tolerance scheme in Figure 6 such that the polynomial does not run through
any parts of the shaded area. The polynomial itself can take on arbitrary
shapes inside the tolerance scheme, which makes the whole problem exhibit only
must-constraints but no may-constraints. An arbitrary starting solution for
p(x) is p0(x)=1.0+1.0*x+0.2*x2+0.0*x3+0.0*x4 (which, by the way, is not a
"solution" because it violates the tolerance scheme). After several
iterations, the statistical-optimization program arrives at the true solution,
which can also be seen in Figure 6.
Now consider an example from operations research, which uses the variables x
and y with the constraints: maximize x+y while keeping (x--3)2+(y--2)2</=16
and x*y</=14, as well as x,y>0. The maximization constitutes a may-constraint,
while the others are must-constraints. The optimization should render the
result x=7 and y=2.
In the next example (from computer science), the task is to approximate the
function f(x)=sin(x) by a polynomial in the range x from 0 to (Pi)/2; see
Figure 7. The approximation should be equally good over the entire range of x.
Because of speed requirements, the degree of the polynomial is restricted to
3. Most mathematicians will tell you to perform a Chebyshev instead of a
Taylor approximation. In fact, you don't need to because the optimization
finds the pertinent polynomial coefficients for you. The mean squared error
(mse) taken at, say, 100 discrete points in the range x from 0 to (Pi)/2,
provides a sensible measure for the quality of the curve fit. Actually, the
mse is nothing but an objective function that one tries to minimize. Although
our optimization algorithm doesn't need objective functions, it can deal with
them by regarding them simply as a may-constraint. Since you usually don't
know which value the may-constraint will take on when the global optimum has
been found, it makes sense to choose a "goal value" for the may-constraint
that can never be reached and to control the termination of the optimization
via the number of allowed iterations.
The last example is a capacity-assignment problem taken from computer
networking. In Figure 8, programmable concentrators in each of the five
boroughs of New York City are to be connected to a large computer in lower
Manhattan. Teletype terminals are in turn connected to the concentrators. Each
terminal transmits at a rate of 100 bits per second (bps). Messages are, on
the average, 1000 bits long and exhibit exponential length distribution. A
typical terminal transmits a message on the average of once a minute according
to a Poisson process. The number of terminals in each borough is: Bronx, 10;
Brooklyn, 15; Manhattan, 20; Queens, 10; and Richmond, 5. Given that the
minimum average time delay should not exceed 4 seconds, the task is to find
the optimum capacity assignment for each link that yields minimum cost. The
underlying cost function, K, is assumed to be the equation in Example 1(a),
where Ci is the line capacity that connects site i with the central computer
and li is the corresponding line length. The partial cost, Ki, is defined as
Example 1(b) with Example 1(c) denoting the step function.
The step functions in Ki(Ci,li) model fixed costs, repeater costs that appear
after 12 km of line length and a cost increase due to a change of the line
type for line capacities greater than 400 bps. (Queuing is the basic theory
required for solving this problem; see Schwartz's Computer Communication
Network Design and Analysis.)
There are many constraints in this nonlinear optimization problem--the Ci and
the queuing delays must all be positive, and the minimum average time delay
must not exceed four seconds--which makes my proposed algorithm suitable. The
result after rounding off to integer values is:
C1=792 bps
C2=397 bps
C3=576 bps
C4=400 bps
C5=248 bps


The MCO.C Program 



Listing One is an excerpt from mco.c, the program that implements the Monte
Carlo optimization. While the complete program includes the functions
rnd_uni(), rnd_gauss(), accept(), and main(), this excerpt illustrates only
the accept() function, which describes your optimization problem and must be
rewritten each time a new problem has to be solved. rnd_uni() is a basic
random-number generator which yields equally distributed random numbers in the
range between 0 and 1. rnd_gauss() makes use of rnd_uni() to construct a
vector of dim elements with each element being a random number stemming from a
Gaussian distribution with zero mean and unit standard deviation. Finally, the
function main(), which performs the general optimization strategy, reads the
input file and writes the results to the output file. Listing Two is the input
file for the filter design in Figure 6, while Listing Three is the output
file. The complete source code, executables, and sample data files are
available electronically.
The input file reads the variable itermax, which defines the maximum number of
iterations. It also contains the variable mxvecs, which defines the maximum
allowable number of parameter vectors to be generated in one iteration. As
mentioned earlier, it is reasonable to choose mxvecs=20*dim as a default value
with dim being the dimension of the parameter vector--the number of parameters
that you want to consider in your optimization problem. dim is the third item
in the input file, followed by the variable reduct, the factor by which all
standard deviations are multiplied if no hit is achieved during the current
iteration. The standard value of reduct is 0.7. The next items are the dim
starting values of your initial parameter vector, followed by the dim
corresponding standard deviations. There is no generic requirement for
choosing those values except that the standard deviations shouldn't be set to
0. If the optimization problem is not too difficult, the program will find the
solution from virtually any starting point. Nevertheless it helps to decrease
computing time if the starting vector is as close as possible to the expected
vector and if the standard deviations are not too small. Choose standard
deviations in the range of the corresponding parameter values for sound
initial operating conditions. To call the program, type: mco <input-file>
<output-file>.


Conclusion


The optimization program introduced here is suited for many constrained
optimization problems. You just have to provide a mathematical formulation of
your constraints, but you don't need to worry about devising an appropriate
objective function. You can then let the program run and do the work for you.
Of course, there are problems for which virtually no optimization strategy
will find an easy answer; when the ROA consists of many disjunct islands, for
instance. The only chance then is to try different starting vectors to find
the optimum.


References


Lueder, E. Optimization of Circuits with a Large Number of Parameters. AE,
Band 44, Heft 2, 1990.
Kjellstrm, G. and L. Taxen. "Stochastic Optimization in System Design." IEEE
Trans. CAS (July 1981).
Kreutzer, H. "Entwurfszentrierung zur Erhhung der Ausbeute und zur
Verbesserung der Eigenschaften von Schaltungen." Dissertation, University of
Stuttgart, 1984.
Eckstein, T. "Statistische Optimierung von Systemen mit einer hohen Zahl von
Parametern." Dissertation, University of Stuttgart, 1989.
Moebus, D. "Algorithmen zur Optimierung von Schaltungen und zur Lsung
nichtlinearer Differentialgleichungen." Dissertation, University of Stuttgart,
1989.
Press, W.H. et al. Numerical Recipes in C. Cambridge: Cambridge University
Press, 1992.
Schwartz, M. Computer Communication Network Design and Analysis. Englewood
Cliffs, NJ: Prentice Hall, 1977.
Figure 1 The goal for function f(x,p1,p2) is to stay apart from the
blue-shaded areas that constitute the must-constraints.
Figure 2 The situation in Figure 1 can also be viewed in the parameter space,
the coordinates of which are p1 and p2.
Figure 3 Estimation of the ROA by a cloud of random parameter vectors which
gather around the nominal vector.
Figure 4 With each iteration of the optimization process, the ROA shrinks
until the final constraints are met.
Figure 5 The reusage of successful directions helps investigate the ROA more
thoroughly and better estimate the standard deviations of your random-number
generator.
Figure 6 Polynomial of degree 4 before and after optimization. 
Figure 7 Best-fit approximation of f(x)=sin(x) in the interval [0, _/2] by a
polynomial of the third degree.
Figure 8 Topology of the data network in New York.
Example 1 (a) Underlying cost of function K; (b) partial cost of Ki; (c) step
function of 1(b).

Listing One 

int accept(int dim,float par_vec[],int adapt,int *finish_ptr,
 float constraint[],int *nocont)
/**C*F************************************************************************
** SRC-FUNCTION :accept() **
** LONG_NAME :accept **
** AUTHOR :Dr. Rainer Storn **
** DESCRIPTION :accept() tests whether a parameter vector par_vec[] **
** falls into the region of acceptability (ROA). If it does **
** hit=1 is returned, otherwise hit=0. **
** FUNCTIONS :none **
** GLOBALS :none **
** PARAMETERS :dim number of vector elements. **
** par_vec[] contains the vector with dim **
** gaussian distributed variables. **
** adapt control variable. **
** 0 : no adaptation of ROA **
** 1 : adaptation of ROA permitted (shrinkage) **
** 2 : adaptation of ROA permitted **
** (relaxation, expansion) **
** finish_ptr indicates meeting of requirements. **
** 0 : goals not yet reached. **
** 1 : goals has been reached. **
** constraint[] contains current constraints of ROA. **
** *nocont contains number of constraints. **
** PRECONDITIONS :par_vec[] must contain valid parameter vector. **
** POSTCONDITIONS :Elements of constraint[] will probably be altered. **
** nocont returns number of constraints to assist **

** printing routine of main(). **
***C*F*E**********************************************************************/
{
 int hit, i, n;
 float goal[NARY], z0, z1, x, y;
/*------Set up constraints----------------------------------------------*/
/*------goal[0] : Maximum absolute value of polynomial for x ex [0,1]-----
--------goal[1] : Maximum absolute value of polynomial for x = +/- 1.2--*/
 goal[0] = 1.001; /* must-constraint */
 goal[1] = 5.9; /* must-constraint */
 *nocont = 2; /* two constraints */
/*------Calculate function values and initializations-------------------*/
/*------Passband. Compute maximum magnitude of ordinate value.----------*/
 z0 = 0.;
 for (i=0; i<=100; i++)
 {
 y = 0.0;
 x = -1.0 + (float)i/50;
 for (n=dim-1; n>0; n=n-1)
 {
 y = (y + par_vec[n])*x;
 }
 y = y + par_vec[0];
 if (fabs(y) > z0) z0 = fabs(y); /* z0 contains maximum magnitude */
 }
/*--------Stopband. Compute ordinate value at the edges x = (+/- 1.2).--*/
/*--------Save the lowest ordinate value.-------------------------------*/
 y = 0.0;
 x = 1.2;
 for (n=dim-1; n>0; n=n-1)
 {
 y = (y + par_vec[n])*x;
 }
 y = y + par_vec[0];
 z1 = y;
 y = 0.0;
 x = -1.2;
 for (n=dim-1; n>0; n=n-1)
 {
 y = (y + par_vec[n])*x;
 }
 y = y + par_vec[0];
 if (y < z1) z1 = y;
 hit = 1; /* Preset hit-flag to "hit" */
/*------Relax constraints if adapt equals
2.---------------------------------*/
 if (adapt == 2)
 {
 for (i=0; i<=1; i++) /* Initialize constraints to goal values */
 {
 constraint[i] = goal[i];
 }
/*--Relax must-constraints as
required---------------------------------------*/
 if (z0 > constraint[0]) constraint[0] = z0;
 if (z1 < constraint[1]) constraint[1] = z1;
 }
 else if (adapt == 1) /*--------adapt must-constraints (shrinkage)-------*/
 {
 if (z0 <= constraint[0])
 {

 if (z0 > goal[0]) constraint[0] = z0; /* adapt must-constraint only if */
 else constraint[0] = goal[0]; /* goal is not reached yet. */
 }
 else
 {
 hit = 0;
 }
 if (z1 >= constraint[1]) /* adapt must-constraint only if */
 { /* goal is not reached yet. */
 if (z1 < goal[1]) constraint[1] = z1;
 else constraint[1] = goal[1];
 }
 else
 {
 hit = 0;
 }
 }
 else
 {
 if (z0 > constraint[0]) hit = 0;
 if (z1 < constraint[1]) hit = 0;
 }
 *finish_ptr = 0;
 if ((constraint[0] <= goal[0]) && (constraint[1] >= goal[1])) *finish_ptr=1;
 return(hit);
}



Listing Two

200
100
5
0.7
10.00000
10.00000
-6.00000
10.00000
80.00000
3.00100
3.00100
3.00100
3.00100
3.00100



Listing Three

Iteration: 134
---------------
xnominal[0] = 0.993562
xnominal[1] = -0.043474
xnominal[2] = -7.870154
xnominal[3] = 0.050184
xnominal[4] = 7.859358
sigma[0] = 0.012818
sigma[1] = 0.009166

sigma[2] = 0.035221
sigma[3] = 0.014853
sigma[4] = 0.040397
constraint[0] = 1.001000
constraint[1] = 5.900000
Number of random points = 100
Number of hits = 3
Yield in percent = 3.000000























































PROGRAMMER'S BOOKSHELF


Why is Route 128 in a Recession?




Peter D. Varhol


Peter is chair of the graduate computer science and mathematics department at
Rivier College in New Hampshire. He can be contacted at
pvarhol@mighty.riv.edu.


When I started reading Regional Advantage, I was inclined to go along with
Annalee Saxenian's view that the all-too-obvious differences in culture and
attitude between the people inhabiting Route 128 in Massachusetts and
California's Silicon Valley have resulted in different business models and
different reactions to changing technologies. Further, she claims, the Silicon
Valley culture and approach to high technology have made it more adaptable to
change and less prone to economic downturns.
As with most books dealing with the history of high technology in the U.S.,
Saxenian (who, as a professor of city and regional planning at the University
of California at Berkeley, has close ties to the Silicon Valley) traces the
development of both Route 128 and Silicon Valley to defense dollars from as
far back as World War II. The difference was that Route 128 (which as a
highway did not even exist at the time) was the R&D Establishment, accounting
for one-third of all defense R&D dollars spent. As a result, Route 128
companies came to depend upon bureaucracies such as the Federal Government,
while Silicon Valley firms had to depend on one another. This dependence on
Federal money made Route 128 firms big and slow-moving. Winning lucrative
government contracts, therefore, became a part of the prevailing mindset,
instead of scraping together a few thousand to work in an old mill with a few
friends to develop a new product.
Dependence upon government grants and contracts clearly prevents companies
from developing the infrastructure and attitudes necessary to effectively
compete in commercial markets. Doing business with the Federal government
requires a certain mindset, and a certain set of business systems that are not
easily transferable to commercial activities, especially in high technology.
For example, "marketing" in the world of government contracting more often
than not means hiring an insider to make contacts in the maze of government
agencies so that your company becomes a known quantity. This is so unlike
commercial marketing as to be laughable.
Then there is the regional culture. Leaving your employer in Route 128 is a
traumatic event, according to Saxenian, and don't expect to ever be invited
back. I've known people in new jobs who were told they were not permitted to
talk to friends at a former employer. In Silicon Valley, however, would-be
entrepreneurs often maintain close relationships with old employers and may
even have a standing offer to return if things don't work out. In Route 128,
such a person quickly becomes a nonentity.
Most people I encounter in New England are intensely loyal to their employers,
and the way to become successful in New England is to rise to the top of an
established company. Northern Californians, in contrast, appear to be far more
likely to pursue innovations outside of their existing companies. New
Englanders attempt to sell innovations within the system, if at all. This
leads to less stable companies in California, but arguably more vitality.
It is difficult to pin down reasons for these cultural differences. For
instance, one person I consulted thought that the climate might influence
regional culture. New England winters, he claimed, will make anyone
conservative, because if you lose your shirt, you can't heat your home next
winter. From more of a business perspective, the culture of the small,
high-tech startup-made-good also leads to the greater availability of venture
capital, since successful entrepreneurs are likely to risk funding those like
themselves. In New England, venture capital for computer startups is virtually
nonexistent, because the old money in the region values tradition, which is
primarily geared to capital preservation.
Saxenian goes on to make some interesting observations regarding corporate
organization in the two regions. Route 128 companies, she claims, tend to be
vertically integrated, performing virtually all engineering and support
functions in-house. Silicon Valley companies tend to be smaller and more
specialized, and need one another to produce a product. The relationships
between supplier and customer during product design and development are
intimate, even though the same firms may be competitors on other product
lines, or on similar products in the future.
Her obvious example in New England is Digital Equipment Corp., which at one
time made virtually every component for virtually all of their systems. To
many, it was heresy when the company adopted, however briefly, the MIPS
processors for its first RISC systems. Because of DEC's early success, other
startups had to emulate its business model to look respectable, and the
capital investment needed to do so probably discouraged many would-be
entrepreneurs.
This is like beating a dead horse, however. For better or worse, DEC, the
region's largest employer, had a profound influence on the area's attitudes
towards high technology. Virtually everyone I knew (including myself, in my
younger days) wanted to work for DEC, because the company had the best of
everything. Now, the firm is simply irrelevant. Most of the bright people I
know don't even consider DEC to be a part of the computer industry today.
I finished reading Regional Advantage with a sense of disappointment. Social
scientists are good at explaining what has already happened, but their
analyses rarely have predictive power. I couldn't get over the feeling that,
were circumstances different, the argument could be made in the other
direction just as convincingly. Consider this fictional version: 
Vertical integration was the key to success in Route 128. Silicon Valley, with
small, specialized companies, became too narrowly focused, and failed to see
the overall technological landscape. The too-easy flow of information reduced
competitive advantage_.
Furthermore, Saxenian is an unabashed cheerleader of Silicon Valley. She
ignores or glosses over some of the very real disadvantages of doing business
in Silicon Valley, such as crowded highways, high taxes, and cost of living
(not to mention the certainty of earthquakes). Because of issues such as
these, high-technology enterprises are moving out of Silicon Valley, and even
out of the state, to emerging technology meccas such as Boise, Salt Lake City,
and Phoenix. The litany of praise for the Silicon Valley ways of doing
business gets tiresome after a while.
There is also an underlying spirit of entrepreneurism in New England that
Saxenian does not even seem to be aware of. It is more of an underground
culture, less out in the open, and it perhaps does not get the same respect as
the analogous culture in Silicon Valley. Within a stone's throw of my
backyard, the success of once-startup companies such as Cabletron, Softdesk,
and Serif offer proof that New England entrepreneurs can and do succeed.
Mostly, these people are looking for ways to communicate with one another,
because in one way Saxenian is right. There is no well-known pub, or
restaurant, or informal professional organization in the Route 128 area, where
techies can gather to exchange ideas or enthusiasm. 
Nevertheless, Saxenian does say some important things concerning the
relationship between regional culture and economic vitality. Does the regional
culture play a role in the development of business models, and do these
business models influence the economic vitality of the region? Probably. Does
the Route 128 region have cultural characteristics that inhibit
high-technology development? More than likely. Can we do anything about it? I
suspect not. Indeed, her conclusion, that government institutions cannot
produce or manage, but can contribute to technological vitality, is mild and
reasoned.
I suspect we can do even less than Saxenian thinks. As individuals, we can't
look in the mirror in the morning and say, "Today I'm going to become an
entrepreneur." Either we naturally feel comfortable in that role or we don't.
The same holds true at the institutional level; it takes an upheaval of
monumental proportions for an established company (or government bureaucracy)
to accept constant change as the norm. More often they don't, and simply fail.
But the ambitious goal of a book such as Regional Advantage, as stated by
Saxenian herself, is to define and explain a formula for continued regional
excellence in high technology. In other words, the positive lessons of Silicon
Valley, and the negative lessons of Route 128, can be applied to any region. I
remain unconvinced that Regional Advantage accomplished this goal, and whether
this goal is even attainable.
Regional Advantage: Culture and Competition in Silicon Valley and Route 128
Annalee Saxenian
Harvard University Press, 1994
$24.95
ISBN 0-674-75339-9


























SWAINE'S FLAMES


The Human Interface


Last month, as you may or may not recall, I proved, with geometric logic, that
the PARC/Mac/Windows icon/window/menubar graphical user interface was the best
user interface there ever could be.
This month I'll prove that it's not, and hint at what the best user interface
there ever could be might be.
I'll start by claiming that this is not a problem to solve and that if you
approach it as a problem to solve, you will fail.
Problem solving is never the road to the future. Problem solving is often
useful, never inventive. It's throwing gravel under the wheels to get unstuck
from the mud. If you're concentrating on the mud under the wheels, you're not
going to grow wings and fly.
It's play that gets you flying, not problem solving; and the best play is the
kind that, as Michael Corleone said of his family business, keeps pulling you
back in. The most compelling computer programs are games, and the best games
are can't-quit experiences that grab you like those fright flix of your
childhood where at the end you still had a full box of Milk Duds but no
fingernails.
The trick, as any heroin dealer or corset manufacturer can tell you, is to get
'em hooked.
A lot of people have already figured this out: Chris Crawford, Guru of Games;
Trip Hawkins, High Priest of Hotness; the writers for the "Max Headroom"
television series; William Shatner.
Yes, William Shatner. Shatner has given us Tek, the electronic drug, the
perfect metaphor for getting 'em hooked electronically, which is what you, the
future billionaire mastermind, need to do, more or less metaphorically, if you
want to invent the Interface of the Future, as I'm sure you do.
That, with an exhortation to Just Do It, could be the end of this month's
column, except that it leaves the central question open: Hooked on what?
The Internet experience, the original Internet experience, that is, suggests
an answer:
Other people.
People are the great, untapped resource. The richest possible experience you
can give the user comes from putting another person on the other side of that
interface. The most powerful filter for information is someone who knows that
information well. The most interesting interaction you can have with a
computer is one in which the computer merely mediates, facilitates, or enables
conversation with another person. "Merely" is disingenuous, of course. From
the right perspective, you can view all productive uses of computers as
mediating or facilitating or enabling conversations between people.
And that is the right perspective.
The Internet experience shows that we have the ability to invent new modes of
human interaction. What new modes are there to be invented? Ah, that's what
you, the inventor, must discover, but one promising angle is the use of agents
that represent people online.
Ultimately, I argue, the user interface should cease to be a human/computer
interface and become a human/human interface.
And that's the real end of this column, except for this observation: Doing
what I've suggested here would be a lot harder than slapping a happy face on a
file structure and calling it "Bob," but it would also be a lot more
meaningful and a lot more fun.
And that's the really, really, true end of this column. That's my theory, and
if you don't like it_.
But no. I would never say to you, as they're saying these days around my water
cooler, "If I want an opinion I'll call Judge Sporkin."
And that's the end.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com


































OF INTEREST
Visual Parse++, a visual parsing generator from SandStone Technology, is
designed to provide developers with parsing technology for user commands,
protocols, data-file structures, edit-field validators, user parameter files,
scripting, tagged data, formatted files, and data filters. Using regular
expression and BNF-like notation, the lexing and parsing engines let you
define multiple expression lists and push and pop them as regular expressions
are matched. The tool also provides advanced error recovery as well as error
tokens and supports Unicode, DBCS, and any 32-bit quantity.
Visual Parse++ works by creating tables and a skeleton application for a
developer from a specification. Support for C, C++, and REXX is currently
included. The current version supports OS/2, with a Windows implementation
forthcoming. Visual Parse++ for OS/2 sells for $699.00.
SandStone Technology
70 Tidwell Lane
Henderson, NV 89014
702-896-7623
Strategic Mapping has extended its AtlasView SDK 4.1 for creating mapping
software to support Powersoft's PowerBuilder. The AtlasView SDK lets you
create apps that enable users to connect their own information to digital maps
through relational database-management systems. Information can be linked by
street address, postal code, city, county, state, and region, depending on the
type of information and geographic data available. With AtlasView support,
PowerBuilder developers can now access Strategic Mapping's 1.5 gigabytes of
digital maps. 
Strategic Mapping 
3135 Kifer Road
Santa Clara, CA 95051 
800-472-6277
CheckSig, a signature-verification engine that verifies both scanned and
online signature samples, has been released by Fficiency Software. CheckSig, a
DLL designed to be used with Windows 3.1 or higher, lets you retrieve both
online signature formats and the scanned formats directly from disk files. As
a C library, the CheckSig engine can be adapted to virtually any operating
environment.
The SDK includes programming and testing documentation. Also included are
sample code for implementing program calls and for collecting custom stroke
(mouse) data for use in nonpen operating environments.
Fficiency Software
1776 North State
Orem, UT 84057
801-225-9900 
Novell has begun shipping the Novell Embedded Systems Technology (NEST) 1.0
SDK to OEMs who want to build network technology and services directly into
electronic devices such as printers and copiers, security systems, building
controls, settops, and home appliances. Early NEST licensees include Ricoh,
Fujitsu, Lexmark, Digital Products, QMS, Canon, Andover Controls, Xerox, IC
Card, Securicor Telecoms, and Castelle.
NEST devices can plug directly into Novell's networks and take advantage of
NetWare services, security, and management facilities. Written in C, NEST is
hardware, processor, and operating-system independent; it is open and
extensible by both Novell and others. Written in C, NEST is designed for
embedded-system developers who need to port software from one platform to
another. The NEST architecture employs a building-block approach, so
developers need only include those functions and software modules required by
their embedded systems. The NEST SDK provides software tools for testing SPX
and IPX protocols as well as printer applications. The NEST 1.0 SDK includes
source code, documentation, training, test tools, and support. 
Novell
122 E. 1700 South
Provo, UT 73606
800-895-6378
Apple has begun shipping Release 16 of its MPW Pro and E.T.O.:
Essentials*Tools*Objects software, which includes the first beta release of
MrC and MrCpp, new C and C++ compilers for developing native Power Macintosh
applications. These compilers operate up to four times faster than the native
version of PPCC, Apple's first C/C++ compiler for Power Macintosh. The release
also includes an alpha release of PPCLink, the MPW-based linker for
development of native Power Macintosh applications. This version is a native
tool and generates executable applications directly. 
Other parts of the release include PPCAsm 1.1, the PowerPC assembler, which
supports symbolic debugging, a beta release of MPW p2c, an Object Pascal to
C/C++ source-code translation system, MacApp 3.1.2, Ad Lib 2.0.1 (a
user-interface editing tool for building views for the MacApp 3.x application
framework), and an alpha release of the Code Fragment Manager (CFM) run-time
software for 680x0-based Macintosh systems. 
MPW Pro, which sells for $495.00, is distributed on CD-ROM and comes with a
development environment, compilers (C, C++, assembler), debuggers, and testing
tools for development of 680x0- and PowerPC-based Macintosh systems. E.T.O.
sells for $1095.00 on a three-issue-per-year, CD-ROM-based subscription. 
APDA
Apple Computer
P.O. Box 319
Buffalo, NY 14207-0319
716-871-6555 
Distinct has released its Distinct NFS for Windows, a 32-bit VxD which runs
over Microsoft's TMP/I32 stack and Novell's Lan Workplace TCP/IP Stack, as
well as its own TMP/IP Stack. The software allows system administrators
running mixed-vendor TCP/IP environments to standardize on a single
file-sharing system. The VxD provides support for file locking and sharing,
DOS drives through Windows, and PCNFSD. NFS for Windows sells for $195.00.
Distinct 
12901 Saratoga Ave., Ste. 4
Saratoga, CA 95070
408-366-8933
Symantec has announced Version 7.0 of its C++ compiler. The new release
includes a new ResourceStudio for editing Windows resources (including Windows
95 extensions), NetBuild to automatically building distribution over a LAN,
the Multiscope Debugger 3.0, Optlink 6.0 (a multithreaded 32-bit linker), an
object-oriented browser and editor, and agents for generating OLE 2.0 servers
and containers. Symantec C++ 7.0 sells for $199.00; upgrades cost $99.95. 
Symantec
10201 Torre Ave.
Cupertino, CA 95014
408-253-9600
AccuSoft has announced a family of OCX libraries designed to integrate into
Microsoft Access, FoxPro, Visual Basic 4.0, and other OLE-based applications.
OCX libraries are OLE compliant and will likely replace VBX custom controls. 
The AccuSoft OCX16 and OCX Pro Gold tools read 36 image formats and offer more
than 100 functions for scanning, compressing, reading, displaying, printing,
and exporting images. OCX Pro Gold also offers anti-aliasing functions,
scanning control, and higher performance. OCX16 sells for $495.00, while OCX
Pro Gold sells for $1995.00.
AccuSoft
Two Westborough Business Park
Westborough, MA 01581
508-898-2770
ProtoView Development has begun shipping ProtoView Visual Help Builder, a
help-authoring tool that supports C, C++, Pascal, Visual Basic, PowerBuilder,
and a variety of 4GLs. The tool, which works with Microsoft Word for Windows,
supports AVI, WAV, SHG, and 256-color bitmaps. It includes the Microsoft Help
Compiler and Intersolv's PVCS Version Manager. The help builder sells for
$395.00.
ProtoView Development
353 Georges Road
Dayton, NJ 08810
908-329-8588
Faircom has released an ODBC Driver Kit for its c-tree Plus and Faircom Server
software. The initial ODBC Driver is a single-tier driver that interfaces
directly to the c-tree Plus API. The term "single-tier" indicates that all of
the program logic necessary to handle a request from a front-end application
is contained within the driver itself, including an SQL interpreter. The ODBC
Driver is a DLL which supports single-user, multiuser nonserver, and
client/server modes. The ODBC Driver Kit starts at $395.00.
FairCom
4006 West Broadway
Columbia, MO 65203

800-234-8180
Premia has demonstrated a 32-bit version of its Codewright Fusion programmer's
editor which replaces the editor built into Microsoft Visual C++ 2.0. With
Codewright Fusion, you have full access to the VC++ IDE along with enhanced
editing capabilities.
Premia 
1075 NW Murray Blvd.
Portland, OR 97229
503-641-6000
A Windows-hosted development environment for embedded-systems and real-time
programmers has been announced by Microtec. The VRTXsa Developer's Kit
includes Microtec's C compiler, the XRAY Debugger for cross-development and
debugging of VRTXsa applications, the VRTXsa real-time operating-system
kernel, and a choice of board-support packages. The developer's kit is
available for 68000/68300 or 386/486/Pentium application development. The
VRTXsa Developer's Kit for Windows sells for $5800.00. 
Microtec Research
2350 Mission College Blvd.
Santa Clara, CA 95054
408-980-1300
Motorola has announced its PowerPC World Wide Web server. The WWW server will
include news on PowerPC events, fact sheets, white papers, and tech support.
In the future, links will be set up with Apple, IBM, Microsoft, and other
Motorola groups. You can access it at http://www.mot.com/PowerPC.
Motorola .
RISC Microprocessor Division
P.O. Box 202558
Austin, TX 78720
512-343-8940
Phar Lap Software has started shipping its Phar Lap TNT Embedded ToolSuite, a
suite of development tools for 32-bit 386/486-based embedded-systems
development. Supporting 32-bit C/C++ compilers from Borland, Microsoft, and
MetaWare, the suite includes the TNT Embedded Kernel, Visual System Builder,
the LinkLoc linker, CVEMB and TDEMB shells for embedded cross-debugging, an
MS-DOS-compatible file system, a floating-point-emulation library, and a
remote file system. The tool suite sells for $2995.00. 
Phar Lap Software
60 Aberdeen Ave.
Cambridge, MA 02138
617-661-1510









































EDITORIAL


Call Me Irresponsible


From time to time, programmers are accused of acting irresponsibly. It's no
surprise that, more often than not, there's some truth to these accusations. I
still have flashbacks, for instance, of a gaggle of Borland programmers
dressed in togas, led by (then) Boss Man Kahn who was tootin' his saxophone
into the wee hours. And then there was the trapeze. It was_well, take my word
for it--Philippe and his merry pranksters went too far that night.
Considerably more serious charges have been leveled at Dan Farmer who, along
with Wietse Venema, has written and released a program called "Security
Administrator Tool for Analyzing Networks," or "Satan" for short. Satan
collects information about machines, nets, and remote hosts by examining a
variety of Internet and UNIX services (specifically SunOS and Irix), thereby
spotting potential problems and security holes. Using a Mosaic-like interface,
Satan queries the host, identifies the system type and available network
services, and probes the host to determine if critical access controls are in
place. The capability to actually break into systems hasn't been implemented.
Farmer and Venema know what they're talking about when it comes to system
security. Until recently, Farmer was network-security manager, first at Sun
Microsystems, then at Silicon Graphics. Farmer is also author of the widely
used security program called "COPS" and a former member of the Internet
security force. For his part, Venema is a noted security expert at the
University of Eindhoven in the Netherlands. 
When word of the freely distributed program got out, Farmer came under fire
from all sides as his critics went ballistic, at least metaphorically. Mike
Higgins, chief of the US Defense Department's computer-security team, said
that "the analogy we use is that Satan is like a gun, and this is like handing
a gun to a 12-year-old." SRI computer-security consultant Donn Parker
concurred, stating that "[Satan] is an extremely dangerous tool. It's like
distributing high-powered rocket launchers throughout the world, free of
charge, available at your local library or school, and inviting people to try
them out by shooting at somebody." As for Silicon Graphics, the company simply
fired him.
Farmer acknowledges that Satan can be a dangerous tool. "Unfortunately this is
going to cause some serious damage to some people," he says. "I'm certainly
advocating responsible use, but I'm not so naive [as] to think it won't be
abused." However, Farmer justifies releasing the program by insisting that
Satan will make network administrators more diligent when it comes to
security. Ironically, infamous network cracker Kevin Mitnick grabbed an early
version of Satan when he broke into Farmer's system. 
It should be underscored that, by itself, Satan does not attack network
systems--the program simply collects data and identifies problems. Still, like
the old story about your mother-in-law going over the cliff in your new Lexus,
I'm having trouble with Satan. While I believe in the free flow of information
as much as the next person, I'd be pretty mad if someone used Satan to wreck
or steal from a system I'd built. What it all comes down to is that network
administrators need to keep secure firewalls in place, and users should keep
sensitive data in safe places. If Satan encourages these practices, then the
program is successful, no matter what anyone says.
While I remain ambivalent about the rights and consequences surrounding Satan,
there's little doubt in my mind that Vincent Yost, a Philadelphia
embedded-system developer, has gone one toke over the line. Anyone who has
lived in an automobile-congested urban area knows that one of life's little
pleasures is finding a parking meter with time left on it. Now, thanks to an
overzealous programmer, such pleasures may end up going the way of buggy whips
and nickel candy bars.
Yost has developed a prototype parking meter that uses infrared sensors and
microcontrollers to detect when you back your car out of a parking place. It
then resets the meter to zero, requiring the person driving into the vacated
spot to put more money into the meter. As if that weren't heinous enough,
Yost's meter also foils people who don't move their cars--"meter feeders" who
run out and pump money into the meter throughout the day. If time expires and
your car hasn't been moved, the meter will continue to take your money but
without giving you additional time.
Cash-starved, blood-sucking municipalities that have tried the "Yostmeter"
love it. In some tests, the average weekly take per meter rose from $12.45
without the intelligent meter to $44.00 with it. I guess the best you can hope
for in this brave new world is justice with a little irony. Security expert
Dan Farmer got his system broken into by Kevin Mitnick. Hopefully, Vincent
Yost will someday end up with a basketful of parking tickets under his
windshield wiper. 
Jonathan Ericksoneditor-in-chief













































LETTERS


Day of the Week


Dear DDJ,
Kim Larsen's article "Computing the Day of the Week" (DDJ, April 1995) is a
good example of how a simple equation can sometimes replace tedious
programming logic. This technique could be used in other applications to
reduce code size and increase speed whenever a mathematical formula fits the
problem better than a step-by-step algorithm. I especially appreciated
Larsen's description of how this particular formula was developed. 
Larsen made no mention of Zeller's Congruence. Zeller's Congruence is another
formula for calculating the day of the week. It was published in 1887 by the
German mathematician Zeller. Jeff Duntemann discussed Zeller's Congruence in
issue #169 of DDJ, with some follow-up information in issues 170 and 171.
Like Larsen, Zeller reckoned January and February as the 13th and 14th months
of the previous year. However, he then divided the year into two components:
the century (C=Y/100) and the year within the century (Y mod 100). He probably
did this to keep all the numbers small (they didn't have ten-digit calculators
back then). After calculating C as Y/100 and adjusting Y to Y%100, Zeller's
Congruence can be expressed as: A=(D+(M+1)*26/10+Y+Y/4--2*C+C/4)%7. 
As with Larsen's formula, all divisions are integer divisions (rounding down).
Note that the fraction 26/10 can be reduced to the equivalent 13/5. Zeller
probably used 26/10 because it's very easy to divide by 10 when doing
arithmetic by hand. One potential pitfall when implementing Zeller's
Congruence is that the sum of the individual terms can sometimes be negative.
For example, this occurs with the date 1 March 2000 (D=1, M=3, C=20, Y=0).
The formula should work even if the sum is negative, but some compilers give
an unexpected result when a negative number is used with the modulo operator.
This situation can be easily avoided by replacing the term --2*C with +5*C (to
make all the terms positive). Since --2*C and +5*C differ by an exact multiple
of 7, this change won't affect the final result modulo 7. These adjustments
change the formula to: A=(D+(M+1)*13/5+Y+Y/4+5*C+C/4)%7.
Larsen's formula is virtually identical to Zeller's. Note that the value of Y
in Larsen's formula is actually equal to 100*C+Y in Zeller's nomenclature.
Making this substitution in Larsen's Y+Y/4--Y/100+Y/400 yields
100*C+Y+25*C+Y/4 --C+C/4. A little massaging will show that this differs from
Zeller's Y+Y/4 --2*C+C/4 by 126*C. Since 126*C is a multiple of 7, this
difference has no effect on the final result. Likewise, one can show that
Larsen's 2*M+3*(M+1)/5 differs from Zeller's (M+1)*26/10 by a constant value
of --2. This means that Zeller's Congruence will return 0 for Saturday and 6
for Friday, whereas Larsen's formula uses 0 for Monday and 6 for Sunday.
Either formula can be adjusted to start on any day of the week by adding an
appropriate constant offset before calculating the result modulo 7.
Larsen's detailed article gives valuable insight into the logic that Zeller
must have used when devising his congruence. The description of the inner
workings of Larsen's formula is much clearer than the explanations of Zeller's
Congruence that I've seen elsewhere. Keep up the good work.
Rod Spade
Lancaster, Pennsylvania
Dear DDJ,
I enjoyed Kim S. Larsen's article because I have been playing with the subject
too. I managed to stuff all computation into a single line of code
(dayofweek1), which was then improved even further (dayofweek2) by a friend
(Hendrik Jan Veenstra). The formulas presented in Example 1 use a table
different from the one in the article for adjustment of the correct day.
Frank Bemelman 
KNARF@NETLAND.NL
Dear DDJ,
After reading Kim Larsen's "Computing the Day of the Week," I thought you
might be interested in the Day-of-the-Week C macro in Example 2. It returns
from 0 (=Monday) to 6 (=Sunday). For reasonable input values, it works with
16-bit ints with no risk for overflow. And the y,m,d are real year, month, day
values, not "adjusted" ones: January is m=1, February M=2, March M=3, and so
on.
Paul Schlyter
Stockholm, Sweden
pausch@saaf.se
Dear DDJ, 
The article "Computing the Day of the Week," by Kim S. Larsen, was a fine one.
However, I'd like to add a few related observations.
Larsen's statement that "We switched to the current calendar system on
Thursday, September 14, 1752" is true in a sense. But the switch was made in
European Catholic countries much earlier (in 1582), and in Russia and Greece
much later (around 1917)! The "October Revolution" was in November by
Gregorian reckoning! 
In UNIX, if you enter CAL 9 1752, you get a hybrid calendar; the first two
days are Julian, and the rest of the month is Gregorian. The first week of
that UNIX display looks like this: 1 2 14 15 16. A clever bit of programming,
but be aware of what it means! One thing it means is that UNIX would be wrong
in France, Russia, and many other places around the world!
When talking about any particular calendar (Julian, Gregorian, or other), it
is helpful to think of it simply as an algorithm in its own right, independent
of any concept of time--like a recipe for brownies. In this sense, a calendar
does not change, nor does it have a beginning or an end. Only our application
of it changes as time goes by. 
In this way we should have no difficulty extrapolating any particular calendar
algorithm to any year we please. Pope Gregory 13 insisted that his calendar
(the one we use now) be phased relative to the Old Style Julian to match up in
March 21 ad 325, so it makes sense to use it for reporting historical, as well
as future dates from ad 1 until ad 3000 (when it loses accuracy) even if the
people in the past didn't. (We can be pretty sure that Julius Caesar never had
a calendar on his wall that said "50 bc:" Even so, we can quite reasonably say
that he lived from 100--44 bc!) 
Thus, even though the Gregorian calendar was not used in America prior to
1752, George Washington wisely reckoned his birthday in terms of it, as we
continue to do today. He was born 20 years earlier in Westmoreland County,
Virginia. Another thing: The 500th anniversary of Columbus's discovery of
America was October 21, 1492 Gregorian--not the 12th. That, is if we can agree
that "500 years" means 500 complete revolutions of the Earth about the Sun,
without regard for what calendar Columbus might have been using at the time!
At least the day of the week is the same; that original Columbus day being a
Friday whether you talk Julian or Gregorian. 
And when the astronomer says that his "Julian Day" (a different Julius!)
relates back to January 1, 4713 bc, he doesn't tell us what calendar he's
talking about! What's Latin for "Confusion reigns," anyway? 
I also found it interesting to compare Larsen's algorithm with my own. In the
October 1989 issue of PC Resource magazine, my short Basic program appeared
containing this "Gregorian algorithm" for the day o'week: (Y+Y\4--
Y\100+Y\400+2.6*M+1.2+M)Mod 7. For January or February, use M=13 or 14 of the
previous year. In my algorithm, I chose day 0 to be Sunday, not Monday, for a
couple of reasons: First, that makes "2'sday" become "Tuesday;" an extremely
useful mnemonic. Second, the astronomical symbol for the Sun is 0 (a circle). 
With Microsoft GWBASIC 3.22 or other redirectable Basic on path, the next
command, when invoked at the DOS prompt, will show the complete calendar for
any month and ad year you type in the M=7:Y=1776:N=31 preamble. This command
squawks if you try to use M=1 or M=2. Don't use any spaces not shown or it
won't fit on the command line: ECHO M=7:Y=1776:N=31:Q=SQR(M--3):FOR J=1TO
N:D=(Y+Y\4--Y\100+Y\400+2.6*M+1.2)MOD 7+J:LOCATE D\7+6,3*(D MOD
7)+3:?J:NEXTGWBASIC.
Homer B. Tilton
Tucson, Arizona


Fortran Tools


Dear DDJ,
As leaders in the numerical computing industry, Cray Research appreciates the
focus that your magazine brought to this important topic in the January 1995
DDJ "Tools that Count" issue. In the "Examining Room," Steven Baker reviewed
several Fortran 90 compilers and test suites. As the producer of one of the
test suites and compilers, we would like clarify a few points as they relate
to our products. 
Baker writes that "_the Cray, (and others)_compilers failed to recognize and
report the obsolete features as required by the Fortran standard." There is a
command-line option which will enable the generation of messages to note all
nonstandard usage, including obsolete features. This option is off by default
and would have to be specifically enabled to generate the report.
In Baker's "Other Considerations," it appears that the CraySoft compiler was
dropped from consideration, leaving the implication that it does not come with
a complete programming environment or supported extensions. In fact, the
CraySoft Fortran 90 environment automatically optimizes for parallel
processing and includes a parallel debugger, a source-code browser, a parallel
performance analyzer, and other development tools; it supports Cray and VAX
extensions.
In his Table 1, Baker publishes only the pass rates from selected portions of
the various suites, which does not yield an accurate measure of the robustness
and quality of the compilers being "evaluated." There was also apparently no
analysis of specific failing tests. It is very likely that a single bug or a
differing interpretation of the standard could cause multiple failures in a
given test suite. Unfortunately, Mr. Baker's own caution about his analysis,
that the "_results from these validation suites should be taken with a grain
of salt," is not nearly as prominent.
Judy Smith
CraySoft Product Development
Eagan, Minnesota 
Net Stuff
Dear DDJ
I think Jonathan Erickson's editorial "The Green, Green Cash of Gnomes" (DDJ,
April 1995) was pretty cool, and I just wanted to let you guys know. I've
subscribed to DDJ for a couple years now, I guess, and it's one of the few
trade magazines I've kept. I've kept it because of honest editorials about the
crap that goes on in the industry, as well as the useful technical
information. Anyway, like most software engineers, I'm using online resources,
particularly the Internet, more and more as part of my job. I cringe whenever
I hear someone mention the words "regulation" and "Internet" in the same
sentence. Erickson's editorial reminded me that if we sit back and let the
government and the marketing bozos overrun the Net, we've only got ourselves
to blame through our inaction. So anyway, keep telling it like it is, guys,
and we'll keep reading.
Dale M. Davis
Sunnyvale, California 
daldavis@spectrace.com



Mechanical Models


Dear DDJ,
Please allow me to poke two-cents' worth at Michael Swaine's "mechanical
model" in his "Programming Paradigms" (DDJ, October 1994). Michael asserts:
"There's precious little that we might consider the mind capable of doing that
we can't convince ourselves that software can also do, in principle."
I'm afraid this is false. My counter-example is this: the invention of natural
language. Programs that speak English do exist, but most could be improved.
Programs that can learn to speak English are, so far as I know, still glued to
the drawing board. Programs that can invent new languages as powerful as
English do not exist.
Further, if such program did exist, what would be the guarantee that any human
could learn the languages that it "spoke," or even recognize them as
languages? If these things could be guaranteed, we would be inclined to
ascribe the invention to the programmer(s), not the program.
I could push this argument a good deal further, but that will do for two
cents. I hope I have said enough to convince a few people that the equation
"mind=software" is a vague analogy which dissolves into large and unsolved
problems if you look closely.
Thanks for the stimulus of a magazine that still manages to include technical,
philosophical, and commercial information all at once.
I.K. Sayer 
Smithton, Tasmania
Australia
Example 1: Day of the week.
#include <stdio.h>
#endif
/* This table starts on Sunday !!! */
char *name[] = { "Sunday",
 "Monday",
 "Tuesday",
 "Wednesday",
 "Thursday",
 "Friday",
 "Saturday",
 };
void main(void)
{ int D,M,Y,A;
 D=3; M=3; Y=1995;
 printf("formula1, day=%s\n",name[dayofweek1(D, M, Y)]);
 printf("formula2, day=%s\n",name[dayofweek1(D, M, Y)]);
}
int dayofweek1(int d, int m, int y)
{ return(((d+((26*((m<3)?m+13:m+1))/10)+((125*(long)((m<3)?y-1:y))/100)
 -(((m<3)?y-1:y)/100)+(((m<3)?y-1:y)/400))-1)%7);
}
int dayofweek2(int d, int m, int y)
{ return((d+(int)((1040*(long)((m<3)?m+13:m+1))+
 (597*(long)((m<3)?y-1:y))/400))%7);
}
Example 2: Day-of-the-week C macro.
 /* Day-Of-Week macro for international Monday-Sunday calendars */
#define dow(y,m,d) \
 ( ( ( 3*(y) - (7*((y)+((m)+9)/12))/4 + (23*(m))/9 + (d)
 + 2 \
 + (((y)-((m)
<3))/100+1) * 3 / 4 - 16 ) % 7 ) )

















Constructing Operational Specifications


A straightforward approach that complements formal design methodologies




Mark Coats and Terry Mellon


Mark is a senior software engineer for Motorola and can be contacted at
mark_coats@email.mot.com. Terry is president of Software Engineering
Excellence and can be reached at mellon@seex.com.


There are a host of successful structured and object-oriented
system-development methodologies in use, each with its own strengths and
weaknesses. Many weaknesses center around the discovery process in the early
stages of development. In this article, we'll present a simple method for
initiating and nurturing the discovery process. This method captures system
behavior from a user's viewpoint, producing an operational specification that
can be translated into most existing system-development methodologies.
This method complements rather than competes with existing methods and is
designed to be a "front end" to methodologies such as Shlaer's object-oriented
analysis, Rumbaugh's object modeling technique (OMT), and Hatley's structured
analysis. Our method allows analysts and designers to more fully understand a
system early in the development life cycle, revealing potential problems
before analysis or design models are created. The output of this method is a
set of diagrams called the "operational specification," intended to be a
complete description of a system's desired behavioral operation from a user's
point of view. The diagrams in an operational specification contain user
events and system responses. 
The method was designed to be useful in a "paper and pencil" environment. Our
goal was to produce an inexpensive way to capture important operational
information quickly and effectively without all the hoopla surrounding CASE.
It is intended to be a back-to-basics approach, independent of CASE tools
(although tools to support this method should be fairly easy to implement).
For the sake of convenience (and in the same spirit of humility as Booch,
Shlaer, and others), we'll refer to this as the "Coats-Mellon Operational
Specification" (CMOS) methodology.


The Operational Specification


The operational specification is a set of diagrams that specify incoming
stimuli and a system's response to these stimuli. The operational
specification addresses analysis-phase, system-level behavior only; it does
not specify data or functional requirements, design, or implementation
information. At the heart of the operational specification is the incoming
event. An event is an occurrence at a point in time. The operational
specification consists of a set of two types of diagrams that divide behavior
up into a set of events and system responses. The models that result from a
translation of an operational specification (using methods like OMT and
Hatley-Pirbhai) are pure analysis models. To illustrate
system-development-method concepts and the production of an operational
specification using the CMOS method, we'll use the familiar example of an
automated teller machine (ATM) system.
The CMOS method can be summed up in six steps: 
1. Create an actor diagram.
2. Create an actor-inheritance diagram.
3. Create an event-category diagram.
4. Create an actor-event diagram.
5. Create a system-response diagram.
6. Validate the behavior.
The actor and actor-inheritance diagrams help build the actor-event and
system-response diagrams that comprise the operational specification.
Validation can occur once the operational specification is complete. 


The Actor Diagram


The actor diagram is an analysis anchor that crystallizes the operational
environment in which the product system will exist. Figure 1 represents the
system. Environment entities, or "actors," that interact with the system are
represented outside the circle. Descriptions of the human actors appear just
below each stick figure. Nonhuman actors are represented by various boxes
containing their descriptions. Each arrow is a summary of all events flowing
in the direction indicated by the arrow; thus, no labels are used. The actor
diagram shows actor-to-actor and system-to-actor interactions. It is important
to show actor-to-actor interactions because they often dictate the order of
system-to-actor interactions. Actors that can initiate event sequences (event
sequences that begin with events that are not caused by another modeled event)
are marked with an asterisk. Figure 1 is an actor diagram for an ATM system.
The actor diagram records the results of the environment (or "domain"). It is
similar to other environment-level diagrams such as Jacobson's actor diagram
and Hatley-Pirbhai's context diagram. Creating the actor diagram is a logical
first step in any method. The CMOS actor diagram differs from the Jacobson and
Hatley-Pirbhai methods in that it shows flows among external entities. This is
because the actor diagram is a summary of the sequences of events that flow
among actors and between the actors and the system.


The Actor-Inheritance Diagram


The actor-inheritance diagram shows the sharing of event categories among
actors. In Figure 1, the Service Tech actor should be able to do anything the
Card User actor does. In other words, she inherits all of the Card User's
event categories. The Service Tech also has transactions unique from those of
the Card User and will therefore include event categories like "replenish
money supply" or "run diagnostic." Figure 2(a) is the actor-inheritance
diagram for the ATM system. 
A more-elaborate actor-inheritance diagram in Figure 2(b) is for a system that
implements a metrics-tracking database. Readers can perform the basic
functions to read metrics data in the database. Only metric and formula
writers are allowed to input metrics or change metric formulas. Area
administrators have these abilities, plus additional area administrative
functions. Global administrators have complete capability.
Actor-inheritance diagrams may have multiple root actors. Once the actor
diagram and the actor-inheritance diagrams are complete, the event-category
diagram is constructed.


The Event-Category Diagram


An event-category diagram is a grouping of related events for an associated
actor. This diagram records all possible event categories and their associated
events for each actor. Discovery takes place here. At this time, responses to
these events are unimportant; instead, think only in terms of each actor's
roles and respective responsibilities. Other methods consider actor events and
system responses (or "use cases") simultaneously, but CMOS postpones
system-response considerations until all actor stimuli have been identified. 
Each event is categorized and associated with an actor resulting in one
event-category diagram per actor. Figure 3 is the set of event-category
diagrams for the ATM; it contains event categories, subcategories, and the
categories' respective events. An actor inherits event categories from another
actor according to the actor-inheritance diagram. Event-category diagrams look
like OMT's object-class diagrams, and they are indeed similar. We chose not to
invent new notation unnecessarily. Each box represents an event category that
contains events. An event category inherits the categories pointed to by the
triangle. There are association lines between the actor and the event
categories for that actor. (An actor can be associated with multiple
categories.) Each actor's event categories should be addressed individually,
thus the separate diagrams for each actor.


The Actor-Event Diagram



The actor-event diagram is one of two types that comprise the operational
specification for the system. It will be referenced constantly throughout the
development life cycle, so its representation should be easy to read and
understand. In the event-category diagram, categories, subcategories, and
events are discovered by playing the role of the actor. Once a substantial set
of events is captured, the events need to be glued together in sequences. This
is done via actor-event diagrams. There will be a separate actor-event diagram
for every actor-originated event. Systems can have sequential or concurrent
relationships between events. 
For example, a Card Reader actor cannot possibly read a card until a Card User
actor inserts the card into the reader; the relationship between these two
events is sequential. Also, for the Bank to validate a PIN, the events Card
User Inserts Card and Card Reader Reads Card must have occurred. Analyzing
these events could become extremely complex; thus a diagram is needed. An
actor-event diagram captures this information.
Figure 4 is a partial actor-event diagram for the ATM system that describes
event flows among actors. (A complete ATM actor-event diagram is shown on page
19. Additional diagrams are provided electronically; see "Availability," page
3.) It shows when each actor-initiated event occurs in relationship to other
actor-initiated events. It does not show system responses (this is the role of
the system-response diagram). The events are placed on sequence lines. A
sequence line is used to identify which events could possibly concur in the
sequence. Sequence lines advance time from left to right and show only
sequences, not actual time. A sequence line can also be used to show a return
point. This is shown in Figure 5 following the event Card Reader Ejects Card.
The next event is one of the set of events located on sequence line 1. The
sequence return symbol contains the label of the sequence line to which it
returns. A completed actor-event diagram will have sequence return symbols at
every leaf; this minimizes the number of possible threads (paths) through the
diagram. Once the event flow has reached a sequence return symbol, a thread is
complete. It is not important to consider the repeating sequences of a thread
since that behavior has already been defined. 
Sequence relationships are represented by arrows. A sequence relationship may
have a condition attached to it so that the sequence flow can advance to the
next event only if the condition has been satisfied. This feature encourages
the analyst to consider all possible conditions affecting the flow from one
event to the next. A successor event is possible when there is either no
condition or the condition is true. At least one successor event must be
possible; a plus sign (+) indicates more than one. 
A pentagonal link symbol indicates a link to an actor-event subdiagram; see
the link symbol "Ask for Help" in Figure 5. At this sequence line, a Card User
may ask for help as an alternate to entering his PIN. Figure 6 shows the
actor-event subdiagram for Card User Asks for Help. (Subdiagrams are
identified by the first event name.) The return symbol in the subdiagram means
to return to the link symbol and proceed to the next event.
To allow the Ask for Help choice on every sequence line, a box with a link
symbol can be attached to the edge of that actor-event diagram. This avoids
putting the link symbol on every sequence line. This return symbol indicates a
return to the sequence line where the event was invoked. When an event can
occur during some but not all sequences (as in Cancel Transaction), the same
construct is used with an additional list of the sequence lines upon which the
link could occur; see Figure 7. These link events attached to the edge of the
actor-event diagram represent implicit, additional event choices for each
sequence line specified. Link symbols to actor-event subdiagrams must be used
when a selection can create a path longer than one event. The link symbol may
also be used to keep diagrams to a manageable size (one page, for instance).
An ampersand indicates concurrent time relationships between events. The APU
startup routine and the IMU startup routine in Figure 8 occur simultaneously;
sometime after the payload status is displayed. When used, this construct
mandates that the succeeding threads begin concurrently. The succeeding events
can either meet up using a "+" or "&". The "+" means that the first preceding
event to complete triggers the next event. The remaining (preceding),
completing events do not trigger the next event. The "&" means that the
completion of both preceding events trigger the next event. Note that the
diagram is actor based; there are no objects, only actor events, on the
actor-event diagram.


The System-Response Diagram


The system-response diagram, along with the actor-event diagram, completes the
operational specification by showing the system's responses to each actor
event. An actor event usually has a system response, but not always. In Figure
4, a Card User inserts a card into the Card Reader. There is no system
response to this event, since the Card Reader is not part of the system. This
is shown on the actor diagram where Card User interacts with Card Reader.
There is a system response to the event Card Reader Reads Card; see the
response diagram in Figure 9(a).
Response diagrams begin with the actor event to which the system is
responding. They can be trivial or complex, but to manage complexity, they
should represent the system response for only one actor event. Response
diagrams use circles to describe the system events. (This is consistent with
the actor diagram's representation of the system.) The system-response text
should be at the level of "responsibilities," as suggested by Wirfs-Brock. The
arrows represent sequences with optional conditions, as in the actor-event
diagram. The diamond marks the end of the system response, after which only
another actor event can occur; this is specified in the actor-event diagram.
Figure 9(b) represents the system response to the actor event Card User Enters
Transaction Type. (A set of response diagrams for the ATM system is provided
electronically.) 


Behavior Validation 


The set of system-response and actor-event diagrams represent the total
operational specification. Once complete, these diagrams can be validated.
Each event from an event category can be traced though the actor-event and
system-response diagrams until a sequence-return symbol is reached. Each trace
to a sequence return symbol is an actor scenario. Traditionally,
scenario-based validation and testing produce an overwhelming number of
scenarios to validate. The operational specification mitigates this problem by
making it easy to see trace patterns, thus allowing better equivalence
partitioning. Because of the CMOS's simplicity, customers can sit in on the
validation process (tracing events in the specification), so the customer and
analysts reach agreement before more-costly development work begins. The
operational specification is an ideal peer-review instrument.


Translating to OMT


Entities from CMOS diagrams can be translated to OMT object and dynamic
models. The first candidate set of classes for the object model should consist
of an interface-object class for each actor (that is, a type of user). Each
event sentence in the actor-event diagram and the system-response diagram can
suggest classes, attributes, operations, and associations. Noun phrases are
potential objects or attributes, verb phases are potential operations, and
associations are identified by the sentence structure. For example, the event
Ask Display to Display a Request for Amount would produce the portion of the
object model shown in Figure 10(a); the portion for Bank Validates PIN is
shown in Figure 10(b).
The dynamic model (a collection of class-level state-transition diagrams) can
also be created from the actor-event and system-response diagrams. Events on
each diagram are used to directly construct a state-transition model. Figure
11 is a state-transition diagram for the Bank Interface class. Each Bank event
becomes an event on the Bank state diagram. States are inserted to receive
each event.


Translating to Hatley-Pirbhai Structured Analysis


A Hatley-Pirbhai requirements model includes data-context, data-flow
(including process specifications, or PSpecs), control-context, and
control-flow diagrams (including control specifications, CSpecs). Context
diagrams for a system can be derived directly from a CMOS actor diagram. The
context diagrams consist of the system and all actors who directly exchange
events with the system, along with those exchanged events represented either
as data flows or control flows. Guidelines for distinguishing between data and
control flows are provided by Hatley-Pirbhai and would be applied here. (The
primary guideline is that a flow whose only purpose is to activate or
deactivate processes should be modeled as a control flow.) Figures 12 is a
context diagram for the ATM.
For identifying the top-level data processes, Hatley-Pirbhai recommends event
partitioning, whereby each data flow coming into the system should have its
own top-level data process to handle the system's total response to that flow.
Hatley-Pirbhai extends this by stating that every control flow coming into the
system should flow into a single top-level control process (called "CSpec 0"),
which activates or deactivates the top-level data processes. (Recall that data
processes also can be activated solely by the arrival of data flows, if
appropriate.) In many cases, control flows also flow to CSpecs at lower levels
to provide them with finer-grained control.
Event partitioning can be achieved directly from CMOS system-response
diagrams. Because each system-response diagram describes the effect of one
event flowing into the system, the circles on a system-response diagram
identify the system's total response to that flow--collectively, they are a
top-level data process. Most real systems respond to so many events that not
all the system-response circles can be shown at the top level. In such cases,
the circles should be grouped into new, higher-level circles, as was done for
the ATM. 
An event can be a stimulus with or without associated data. In Hatley-Pirbhai
terms, the former is modeled as a control flow and the latter, as a control
flow, plus a data flow, or as a data flow alone; see Figures 13 and 14.
CSpec 0 (represented by the bar symbol in Figure 14) would be a
state-transition diagram that activated or deactivated the processes on that
diagram. 


Conclusions


Creating a rich operational specification for a system early in the
development cycle reduces the overall cost of system development and
maintenance. Systems developed in this manner experience less change,
especially later in the development cycle when the cost of change is high.
Several successful projects have been developed using the CMOS method. In each
case, the goals of the method were realized because a great deal of rich,
stable information was captured early in the development process. The
specifications allowed a better understanding of the system among customers,
analysts, testers, designers, coders, and management. This understanding was
realized early in projects before the most costly development work began; and
because the completed operational specification is user based, it is easily
comprehended by new development personnel and users of the system. 
In one experiment, we gave 20 designers the same operational specification.
The resulting designs for the specification had only slight differences. This
demonstrated that CMOS specifications can communicate a large amount of
user-based system knowledge.
Operational specifications can assist in maintenance as well as initial
development. System maintenance begins with changes to the operational
specification and continues through analysis, design, and code. New
capabilities are added in the form of new events or event categories. These,
in turn, ripple through the analysis and design models. New capabilities are
well understood early in the maintenance life cycle. Operational
specifications can be used as a test specification. They directly support
"black-box" testing because they are based upon external stimuli.
The information in the operational specification can provide valuable metric
data for the purpose of development-cost estimates. These metrics could
include the number of actors, event categories, events, threads in the
actor-event diagram, system-response events, and so on. By combining this data
in different ways and with different formulas, a viable cost-estimation model
can also be produced. 
CMOS is repeatable and can be incorporated into any development process.
Management must be aware, however, that this method may at first appear to
decrease productivity. Managers might mistake this for typical unstructured
brainstorming. But if management allows the process to unfold, a good deal of
time will be reclaimed during the remaining analysis, design, implementation,
and especially maintenance phases.


References


Booch, G. "The Booch Method: Notation, Part I and Part II." Computer Language
(September/October 1992).
Hatley, D.J. and I.A. Pirbhai. Strategies for Real-Time System Specifications.
New York, NY: Dorset House, 1988.
Hsia, P. et al. "Formal Approach to Scenario Analysis." IEEE Software (March
1994).
Jacobson, I. et al. Object-Oriented Software Engineering: A Use Case Driven
Approach. Reading, MA: Addison-Wesley, 1992.
Krell, B.E. Developing with Ada: Life-Cycle Methods. New York, NY: Bantam
Books, 1992.
Linger, R.C. "Cleanroom Process Model." IEEE Software (March 1994).

McMenamin, S.M. and J.F. Palmer. Essential Systems Analysis. Englewood Cliffs,
NJ: Prentice-Hall, 1984.
Musa, J. "Operational Profiles in Software-Reliability Engineering." IEEE
Software (March 1993).
Rumbaugh, J. "Getting Started: Using Use Cases to Capture Requirements." JOOP
(September 1994).
Rumbaugh, J. et al. Object-Oriented Modeling and Design. Englewood Cliffs, NJ:
Prentice-Hall, 1991.
Shlaer, S. and S. Mellor. "The Shlaer-Mellor Method." Project Technology,
Inc., Technical Report pr.pb.S075, Version 2.0, 1993.
Wirfs-Brock, R. "Designing Scenarios: Making the Case for a Use Case
Framework." The Smalltalk Report (March 1993).
FigureComplete actor-event diagram for ATM system.
Figure 1 Actor diagram.
Figure 2 (a) Actor-inheritance diagram ATM system; (b) actor-inheritance
diagram metrics collection. 
Figure 3 Event-category diagram.
Figure 4 Actor-event diagram for a portion of the ATM system.
Figure 5 Actor-event diagram showing the link symbol.
Figure 6 Actor-event subdiagram for Card User Asks for Help.
Figure 7 Actor-event diagram showing the return symbol.
Figure 8 Concurrent example of an actor-event diagram using (a) a plus sign;
and (b) an ampersand.
Figure 9 (a) Response diagram for Card Reader Reads Card; (b) response diagram
for Card User Enters Transaction Type.
Figure 10 (a) Object model for Ask Display to Display a Request for Amount;
(b) object model for Bank Validates PIN.
Figure 11 State transition diagram for Bank Interface Class.
Figure 12 Control-context diagram for the ATM.
Figure 13 Data-flow diagram 0 for the ATM.
Figure 14 Control-flow diagram 0 for the ATM.










































A Practical Strategy for OO Design


A hybrid methodology for C++ development




Kanchan Kumar


Kanchan is a software developer for Vedika Software in Calcutta, India. He can
be contacted at kanchan@vedika.ernet.in.


One goal of object-oriented languages and methods is to enable problems and
solutions to see eye-to-eye more naturally than procedural languages and
conventional design methods. Most object-oriented design methodologies,
however, are largely theoretical, and those that are practical often aren't
comprehensive enough to help a designer/programmer from start to finish. As a
result, much is left to the imagination of the programmers actually
implementing the project. 
In this article, I'll present a methodology that is practical, easy to
understand, and can serve as a reference in solving problems. This methodology
consists of two phases: The first presents an abstract model of the problem,
and the second prepares an implementation model. 
While the abstract model is language independent, the implementation model is
based on C++. Essentially, the abstract model is a collation of practical
ideas from different methodologies. The implementation model is a rule-based
approach to class design for the objects specified in the abstract model.


Statement of Problem


Before the design process begins, it is important to clearly define your
goals. This is done during the analysis phase and presented in a document
called the "statement of problem." The statement should be complete enough
that it doesn't leave room for further assumptions about system functionality
during design.
The statement of problem starts with an introduction to the problem, its
context, and the skills of the eventual users of the system being designed.
This includes:
Major components of the problem. 
Desired behavior/features of the system. 
Undesirable behavior.
Kind of interface.
Volume of transaction.
Performance expectations.
Sequence of tasks.
Naturally, the format of the documentation and issues it covers vary from one
problem to another. 


Abstract Model


The most important part of the abstract model is identification of objects.
The abstract model assumes that the problem was defined well during analysis,
but it can also point out flaws in the statement of problem, and thereby
improve it. (The abstract model draws heavily on the first chapter of Robert
Murray's C++ Strategies and Tactics, "Abstraction.") 
The goals of the abstract model are to:
Identify objects.
Establish object attributes.
Define relationships between different objects.
Identify each object's interface.
In almost every real-life situation, you start from the top-most problem
object, then begin breaking it down. This is the same process you should
follow during the abstract-model phase of software design. The steps for
preparing the abstract model of any problem are: 
Step 1. Identify the single, top-most problem object in a system. In most
cases, this is easy: The top-most object is found in the first line of the
statement of problem. Likely candidates for such an object include the
system/application itself. 
Step 2. Determine the functions performed by this object. This statement of
functionality is the "executive summary." Minute details aren't necessary,
just a broad description of the functionality. The basic goal of this step is
to establish the behavior of the object. 
In a point of sale (POS) system, for instance, the functionality might be to
let customers enter purchase instructions for a product. The executive summary
could address such issues as obtaining credit-card details and invoice
printing, as well as anything the statement of problem may have overlooked;
inventory maintenance or credit-card validation, for instance.
Executive summaries must be crisp and clear, and thus require revision and
refinement. Anything vague is best left out. The summary is a description: You
need to think about how the object fits in and interacts with its context.
Since what you write for the first time may not be the most suitable executive
summary, be prepared to revise it.
If you are not able to find a role for an object, categorize it as a
placeholder (Product quantity, Currency, and the like) and call it a
"primitive object."
Step 3. Find the component objects and determine which combined objects they
include. Refer to the statement of problem and the executive summary of a
given object for the terms used. 
If the executive summary is well written, finding objects isn't difficult. For
instance, if the executive summary of a POS system focuses on customers,
products, invoices, credit cards, and the like, then these elements are likely
object candidates.
The process of identifying objects is called "discovering." At times, however,
discovery is not sufficient and objects must be invented. Such inventions are
based solely on your judgment and appreciation of the problem.
For example, for a POS system, you might invent product-table objects. If the
problem needs to track customer history, then you will also need a
customer-table object. However, you should avoid inventing a process as an
object. In such cases, you should find the object on (or by) which this
function is performed and mark it as another component object.
The components in this step, along with the behavior in Step 2, comprise the
"attributes" of the object.
Step 4. Establish the relationship between the problem object and the
component objects (not vice versa). Evaluate the relations according to the
following criteria: 
Classify the relation as either Is-A, Has-A, or Uses-A.
Is the relation one-to-one or one-to-many? With one-to-many, you need to
define "many" as a constant number, range, or infinite number.

Is the relation established at the time the problem object is created or
anytime during the lifetime of the problem object?
Is the relation one- or two-way? If the problem object can be reached through
the component object, it is a two-way relation; in all other cases, it is
one-way.
Is the relation required? If the relation could not be established, what
functionality will be affected? 
Step 5. Apply steps 24 to each component object until no more component
objects can be found, or until a primitive object is encountered. 
Step 6. Finalize the interface of each object. Questions will arise, such as
"How can this object be used?", "What is required to create the object?", or
"What is the functionality expected of this object?" Answering them gives you
a list of parameters without which the object is not useful. The answers must
be derived from the context of the statement of problem.
Putting yourself in the shoes of someone using your objects and asking, "How
do I use this?" will make it easier to establish the interface. This holds
true, regardless of whether the user is a fellow team member, someone who buys
your library, or you yourself.
Go through the abstract model repeatedly until you are confident that it
represents the problem correctly and completely. It's important to document
each step. Make sure that you list all the "discovered" objects and then those
that were "invented." Since the invention of objects is based on certain
inferences, these inferences should also be prominently noted. This will help
if anyone other than you goes through the design later. You can represent the
model in terms familiar to you and your team. You may want to use a CASE tool
to represent it visually. In any case, it should be easy to refer to.


Implementation Model


The implementation model takes the abstract model and turns it into a C++
class declaration. Many methodologies leave this to programmers who, if
inexperienced, can compromise an elegant design. With this in mind, I'll now
describe a set of rules which takes the guesswork out of class design. These
rules may not always produce the best class designs, but they can save you
from costly mistakes.
For the implementation model, the relationship among the objects is critical.
The rules will help you in converting the relations table into a class
hierarchy (assuming that ObjectA is the problem object, and ObjectB, the
component object).
Rule 1. Is-A. ObjectA is derived from ObjectB.
Has-A. ObjectA keeps the complete object of ObjectB. It could keep it as a
pointer or nonpointer but not as a reference.
Uses-A. ObjectA keeps the pointer or reference of ObjectB but not a complete
data member. Choice between a pointer and reference is governed by Rule 3.
Rule 2. One:One. ObjectA keeps only one instance of ObjectB. Whether the data
member is pointer, reference, or complete object is determined by Rule 1.
One:Many. ObjectA keeps a list of instances of ObjectB. Rule 1 determines
whether an individual instance of the list is a pointer, reference, or
complete object.
Rule 3. CreationTime. ObjectA should instantiate ObjectB at construction time.

In the case of ObjectA Uses-A ObjectB (Rule 1), ObjectA should take the
reference/pointer to a valid ObjectB as the constructor param and maintain an
instance of ObjectB as a reference, rather than a pointer.
In the case of ObjectA Has-A' ObjectB (Rule 1), the instance of ObjectB is
maintained as a complete object rather than a data member.
AnyTime. ObjectA must have a member function which can be invoked as a command
to instantiate ObjectB. If ObjectA Uses-A ObjectB (Rule 1), then ObjectB can
only be kept as a pointer. In such cases, the function should accept a valid
pointer to ObjectB as param. Furthermore, ObjectA can maintain the instance of
ObjectB only as a pointer, not as a reference.
If ObjectA Has-A ObjectB (Rule 1), then ObjectA should maintain the instance
of ObjectB as a pointer that should be newed during this function call. If
ObjectB is kept as a complete object in ObjectA, then ObjectB should have
methods (function, insertion operator, assignment operators, and the like) to
initialize itself.
Rule 4. One Way. Nothing special.
Two Way. ObjectB must have the reference/pointer to ObjectA as a data member.
It can be a reference only if the relation is established during creation time
(Rule 3) of ObjectA; otherwise it should be a pointer. In the previous case,
the constructor should take the reference/pointer to a valid ObjectA as a
constructor param. If ObjectB is being Used-by ObjectA (Rule 1) and many such
objects are using it simultaneously, then it would need to maintain a list of
all such instances (a situation referred to as Many:One or Many:Many
relationship).
Rule 5. Mandatory. If a relation is established between ObjectA and ObjectB
during creation of ObjectA (Rule 3), then ObjectA must throw an exception (or
invalidate itself) if the relation could not be established. If the relation
is not established at creation time (Rule 3), there may be a flaw in the
design. If it is found acceptable/valid, there must be a clause attached which
says the functionality of ObjectA is affected/unavailable until the
relationship is established.
Optional. Provisions to initialize the data member of ObjectB must not be in
the constructor--these should be through a member function. A clause must be
attached which determines that all functionality is affected/unavailable until
the relation is established.


Other Guidelines


If ObjectA Uses-A ObjectB (Rule 1), then some other object in the system must
Has-A ObjectB. Assuming that ObjectC is such an object, there should be only
one such object (which Has-A ObjectB) in the system at any point in time. It
is the responsibility of that object (ObjectC) and no other to create and
delete ObjectB. The instance of ObjectB must be created before it is "used" by
ObjectA. This leads to the following subrules:
The constructor and destructor of ObjectB should be private, and ObjectC
should be a friend class.
ObjectC should have a function to return a valid instance of ObjectB. If you
use data attribute notation (DAN), do so by overloading the subscript []
operator. This function should return a NULL pointer if the function fails. If
there is no possibility of failure, then it should return a reference.
ObjectC should maintain a list of all the instances of ObjectB that it
returned so that whenever anyone asks for that instance, the same
pointer/reference is returned, thereby maintaining referential integrity and
concurrency.
Finally, use the conclusions of Step 6 of the abstract model as public member
functions of the classes to which the objects described in Step 6 belong. You
might need to distribute these functions among various classes in the
hierarchy.


Adopting this Methodology 


In any nontrivial project, it's impossible to design the complete system in
one sitting. Consequently, you will take the top-most object and break it down
to the first level. You should then assign functionality to each identified
object and establish the relationship among them. This will yield the
framework of the system you are developing. Each object you identified in this
stage will typically turn out to be a complete module in itself, which you can
treat as a separate system for design purposes. 


Conclusion


Considering the complexity of today's software projects, it is unrealistic to
expect this methodology to solve all problems. However, this approach will
clear some of the major stumbling blocks from the design phase. Keep in mind
that the methodology is still evolving and will become more concrete in the
coming years.


References


Booch, Grady. Object-Oriented Analysis and Design with Applications, Second
Edition. Menlo Park, CA: Benjamin/Cummings, 1994.
Charney, Reginald B. "Data Attribute Notation." Dr. Dobb's Journal (August
1994).
Murray, Robert B. C++ Strategies and Tactics. Reading, MA: Addison-Wesley,
1993.

































































Interactive Design Methodology


Designing a client/server hypertext help system




Phil Herold and Carla Merrill


Phil, a project manager at SAS Institute, can be contacted at
saspjh@unx.sas.com. Carla, an owner of Internationalization and Translation
Services, can be reached on CompuServe at 75221,3536.


Helplus is a hypertext help application modeled on the Microsoft Windows help
system. Helplus is unusual because it is a server program that manages
multiple help files concurrently, each in its own window. These help files can
be attached to and invoked from applications or invoked as stand-alone
programs through Helplus. This means, in effect, that one instance of Helplus
can run multiple help files for applications, with or without the applications
themselves running; see Figure 1.
Helplus can also be used to create and run a help file (or a hypertext file
for some other purpose) as an independent application. For example, with only
one instance of Helplus running, users can have concurrent access to Helplus
windows containing tutorials, orientation information, and other online
information that may not be associated with a particular application.
Our application's interface is similar to Windows help and uses a similar
external file system, yet takes a different approach in compiler and viewer
construction. This may provide some interesting comparisons for developers of
Windows help. 


Background


Emulus is SAS Institute's enhanced 3270 emulator for the X Window System.
Emulus started life as "phlem" (short for "Phil's Emulator"). When SAS
Institute moved to networked Hewlett-Packard 9000 Series 700 workstations,
phlem was used for 3270 emulation. Later, phlem was renamed "Emulus" and
targeted as commercial software, but it did not have help available for the
Help buttons in its dialog windows or for the Help item on its menu bar.
Furthermore, SAS Institute didn't own a hypertext help system (compiler and
viewer) that could support online-help development for Emulus. Helplus was
created to fill this void and eventually became a separate application.
Clearly, building a powerful help system such as Helplus involves much more
than we can adequately cover in a single article. Issues we faced included the
user interface, file and tag structures, compiler and viewer design and
implementation, server implementation, and interface design. In this article,
we will focus on three areas: the help compiler, help viewer, and server
program. We'll also touch on the design methodology that enabled us to achieve
our goals in a timely manner. 


Designing the Compiler


Initially, we used Windows help as the model for the user interface and the
structure of the help files for the compiler's input. We chose public-domain
code for a hypertext widget as a starting point for the viewer. 
Given our desire to emulate Windows help and the capabilities of the
public-domain widget, we had to decide whether to assign a particular task to
the compiler or viewer, the format of their input/output, and how particular
tasks (such as searching) should be implemented. These decisions and their
implementations influenced the evolution of the system into a separate
software package.
The compiler was originally a separate C application; it became an X Window
application when integrated into the same executable as the viewer even though
it used only one X Window routine.
Using the input to Windows help as a model, we knew the compiler needed to
process a help project file. This, in turn, required processing the RTF topic
files it listed and producing a compiled help file in a format that would be
read by the viewer and displayed. From the Windows help documentation, we
learned about the content and structure of project files and RTF files. We
decided to support the ROOT, TITLE, CONTENTS, and REPORT options. The [FILES]
section of the project file, which lists the RTF files to compile, was
supported in a format identical to Windows help.
Next, we decided which RTF tags the compiler would minimally need to support.
The system would have to show text with highlighted links and allow users to
select links to navigate through the help information. The public-domain
hypertext widget supported both. We also wanted to emulate other Windows
help-system features such as keyword search, history, bookmarks, and
annotations. Therefore, the first tags we supported were the \footnote, topic,
and pop-up link tags.
Next we defined the content and form of the compiler output, keeping in mind
that the viewer needed to read the help file efficiently and make sense of the
information. Designing the data for the compiler meant creating C-language
data definitions (including C structures) that would map help-file topics and
keyword information. For example, information for a particular topic (the
topic context string, the topic title, the text for the topic, and the links
within the text) had to be associated with that topic. In addition to data
definitions, a mechanism for resolving links had to be provided. Early on, we
assigned that task to the compiler, which affected its internal design and
data structures. One effect was increased speed. In our design, the compiled
link information is the text of the link, plus a number, which is an index
into the list of topics that identifies the topic to be linked to. The
compiled link information gives the viewer direct access to a topic, making
the link-selection process fast.
Of course, the viewer could have accomplished this task quickly via a binary
search on the topic context strings that are ordered alphabetically by the
compiler. Then the compiler would not have had to resolve topic links.
However, using the compiler to resolve links provided a second advantage: The
compiler reported information about unresolved links that was extremely
valuable while we were making changes and improvements to the interface. The
frequent retagging and compiling required to verify and evaluate changes would
not have been possible without the compiler's link-checking information. It
was also easy to create a link map for maintaining help files later on.
We also knew that the viewer would have to perform a direct topic lookup in
cases where Emulus would simply specify the help topic to display using the
context string. This would occur if users selected the Help button on an
Emulus dialog window. To make this process efficient at viewer time, we used a
binary search of the context string information.
Another factor that affected the internal design of the compiler and its data
structures was keyword management. The same keyword can be used for multiple
topics, and topics typically have multiple keywords. This means that the
compiler has to maintain keyword information independently of the help-topic
information. Additionally, keywords have to be sorted to display
alphabetically in the Search window, and duplicate keywords (the same keyword
for multiple topics) have to be removed from the sorted list. To accomplish
this, our compiler maintains a list of topics (again, using a number for topic
indexes) for each keyword.
Once these processing mechanisms were designed, we designed the help-file
format of the compiled output. UNIX standard I/O was the quickest format to
implement. The Emulus code already contained routines that read a file into a
buffer and present each line of text to a caller. The text could easily be
parsed using the C sscanf function. Since the help viewer was initially a part
of Emulus, which had these routines available and working, the format was
attractive. The help-file format was essentially text, so we decided that
writing the help file to stdout would be sufficient and, perhaps, useful.
Precedent certainly exists for this approach in other UNIX applications. In
any case, the output could always be directed from the UNIX prompt to a real
file on disk. Writing to stdout also eliminated the need for a compiler option
to specify the destination of the compiler output.
With the design in place, we started writing and testing code, building parts
of the system from the bottom up. First came the code that would read a help
project file and compile the list of RTF files. Once written, a full-screen
debugger was used to test the code, verifying the internal data. Then we wrote
the code that would read each RTF topic file and parse it. This code would
look for the RTF tags we needed to support and compile the topic information.
Topic links were resolved in a traditional second pass of the compiler after
all of the topic information was read from all of the RTF files listed in the
project file. This pass through the data also processed keyword information
(associated topics were resolved to topic indexes). The last code developed
was a simple function that wrote the help file to stdout. With this function
in place, verification of the output simply consisted of eyeballing the data
spit out by the compiler in the terminal window. 


Designing the Viewer


We started by making the help viewer an integrated part of Emulus. The
original public-domain hypertext widget defined functions to create a
hypertext widget, set the text in a hypertext widget (SetText), and retrieve
the text from a hypertext widget (GetText). To load the hypertext widget with
text, our help application called the SetText routine, passing it the text.
Within the text, tag characters instructed the hypertext widget to display an
item as a link. Link text was displayed in a different color and underlined.
The color was controllable through an X resource. The tag characters were also
controllable through X resources, although they defaulted to braces ({}),
which conformed to our help application.
We added more link types (for example, pop-up link text that used a dotted
underline and pop-up link text that had normal text color and no underline).
To accommodate these new links, more tag characters were added to the text to
tell the hypertext widget how to visualize the link. (This approach is
cumbersome, however, so the next release of Helplus uses link segments.)
We added two more parameters to the SetText routine:
The text for the title was set with the addition of the titleFont resource.
However, this was changed again when we put the title outside of the
scrolled-window hypertext widget and into a hypertext widget of its own, which
was not part of the scrollable area. This allowed the topic title to remain at
the top of the window even if the user had to scroll to see additional topic
text. To accomplish this, a different hypertext widget was passed with the
title parameter set to a value. SetText was then called with the rest of the
text for the scrollable hypertext, with a NULL value for the title.
We did titles completely differently from Windows help, partly because we were
not yet familiar enough with Windows help syntax. Our implementation of the
viewer always used the topic-title text from the ${\footnote statement as the
first line of the topic. We added the titleFont resource to distinguish the
font of the topic-title text from the rest of the topic text. Of course, we
realized later that Windows does not do this. The text that is part of the
${\footnote statement is used for the History, Search, and Bookmark dialog
windows. To display a topic title as part of the topic text, the Windows
help-file author must also type the title (and any appropriate font change) in
the topic text.
An annotate parameter was added, indicating if the help topic had associated
annotation text (the hypertext widget itself was not bothered with the details
of the annotation). This parameter was simply a Boolean value. If True, the
hypertext widget displayed a paper clip in front of the title.
The annotations and bookmarks are managed in a text file, like the help file.
Many of the routines used to read the help file are also used to read the
annotation/bookmark file. This file is kept in the user's $HOME/.helplus
directory, which Helplus creates automatically. The filename consists of the
help filename appended with "ab;" for example, emulus.hlpab for the Emulus
help file. The viewer rewrites the entire annotation/bookmark file every time
an annotation or bookmark is added or deleted.
GetText retrieves the text from the hypertext widget, usually with the special
tag characters removed. Our help application used GetText to get the text for
the Copy function. GetText was eventually changed to accommodate the different
types of links and to remove picture tags from the returned text.
The original public-domain code also displayed the hypertext widget,
presumably parented to a scrolled-window widget. Action (translation) routines
were already defined to track the mouse-pointer movement and calculate when
the mouse was over link text. In this case, the appearance of the mouse
pointer changed to a hand. If the user selected link text, the hypertext
widget would call a callback function (defined by the hypertext widget's
owner) and pass it the link text and the length of the link text. The caller
was then responsible for determining what to do with the link, presumably
reloading the hypertext widget with other text.
The hypertext widget had to be modified to return more information in the
callback structure. These modifications implemented features in the interface
that made our hypertext linking more flexible and extendible. For example, our
first pop-up topics were spring-loaded, appearing only as long as the user
held down the mouse button. This process was handled entirely by the hypertext
widget. However, we wanted to put both topic and pop-up links within a popup.
This required the popup to stay on screen after the user released the mouse
button and to remain until it was clicked on. Enabling this behavior required
making the callback structure reflect the behavior to the hypertext widget's
owner--our help application. The information had to include whether or not
users had clicked to remove the popup or selected a link within the popup.
We also had to modify the callback structure to support the Annotate feature.
If the annotation paper clip is selected, the callback structure needs to
indicate this and allow the owner to display the annotation dialog. All of
this additional information was specified through separate reason codes, which
are a part of every X Window callback structure.



Developing the Server Program


Six weeks into our project, the help system had bitmap (bitonal) support; a
highly readable, attractive, proportional font; and menu and button features
in place. We achieved these milestones because of the design methodology we
used, although we assessed our results differently. The "writer" judged the
application in terms of its ability to present information efficiently and
attractively. The "developer" judged it by its functionality and performance.
Despite our different perspectives, we both realized that our help system was
developing an identity of its own. The idea of making it a distinct
application appealed to both of us and seemed the logical next step.
Making the help system a separate software package would insulate Emulus from
the newly developed help-system code, which greatly increased the executable
size of Emulus and made it more volatile late in its development cycle.
However, making the help system an independent application was a problem.
Even after we decided to externalize the help viewer from Emulus, running the
help system as a server was not an obvious solution. Our original inclination
was to exorcise the viewer code from Emulus and make it a stand-alone program
for viewing help files. Then users would get a new copy of the viewer program
each time it was invoked (either through an application or from the UNIX shell
prompt). 
A colleague suggested using a server program that runs in the background,
waiting for other processes (clients) to ask for service. In the case of
Helplus, the client could be an application or Helplus itself (if it were
invoked from the UNIX prompt when it was already up and running).
A server implementation would insulate Emulus from the help system's code, and
it provides additional benefits:
It allows sharing of resources (help-file data and picture data) when multiple
instances of Helplus are invoked. Help-file data is the memory devoted
internally to data structures that map the help-file information. In Helplus,
this can be considerable, since the entire help file is brought into memory
and left there. (In the next release of Helplus, the help file is kept on disk
and only brought in one page at a time as the user selects topics.)
A Helplus client can start up faster, since the server is already running and
in memory. Additionally, if the help file is already being viewed, it is not
reread from disk. 
A server allows changes in bookmarks or annotations to be easily broadcast to
other instances of the same help-file view. For example, if a user is viewing
the same topic in multiple Helplus viewer windows and adds or deletes an
annotation or bookmark, all of the Helplus viewer windows are updated
simultaneously.
In our Helplus code, the server is started and makes itself known by interning
an atom, a property on the root window of the display. Other applications
(like Emulus) can check to see if the Helplus server is running by obtaining
this property and seeing if it has an owner. The server is started by
StartHelpServer, which checks the root-window property. If it is not owned,
StartHelpServer starts the server using the UNIX fork and execv system calls.
This sequence illustrates how the functions StartHelpServer and
SendMessageToServer work:
1. The application obtains the value of this special property set by the help
server. This value is the window ID created by XCreateSimpleWindow when the
server is started. (This window is never realized by the help server.)
2. With XChangeProperty, the application can change two properties on this
window: the help filename to display and the topic name within that help file.
3. With XSendEvent, the application sends the help server a message. The data
in this message consists of a window ID for the client, a generic help handle,
and the interned atoms for the two properties (help filename and topic name).
For the first request by an application to the server, the help handle passed
is NULL.
4. The server sends a message back to the client indicating the success or
failure of the request. The indication is the existence of a message, which
can be displayed by the client. The returned message also contains the value
of the help handle. This value is always saved by the client and used in
subsequent requests to the help server.
Listing One(listings begin on page 98) shows the StartHelpServer and
SendMessageToServer functions. When Helplus is invoked from the UNIX prompt,
it goes through the same procedure as any other application. It checks to see
if a Helplus server is already running. If it is, Helplus uses
SendMessageToServer to process the help request; if not, this process makes
itself the server and then displays the requested help file.
It is relatively easy to insert these function calls in an application to
display a help topic. Listing Two shows how these calls are used. 
The server does not run forever but periodically wakes up to check for any
opened help files (viewer windows). If none are open, it shuts down. This
wake-up interval is set by XtAppAddTimeOut and is customizable through an X
resource. Resources are also freed during this wake-up period, even if help
file views are still active. For example, if a user brings up a viewer on two
help files and closes one of them, at the next wake-up interval the server
cleans up resources associated with the closed help file (namely, the memory
containing the help-file information).


Our Methodology


The key factor in our design methodology was that the writer actually used the
interface while the developer was developing it. The writer's motivation for
frequent compilation and viewing was to edit small amounts of text; online
editing is more reliable when done in small increments. The writer could also
verify the appearance of the text frequently. Changes could be made to
existing text and incorporated into any new text immediately. As the amount of
compiled help information grew, more attractive and efficient ways of
presenting it became clear through repeated viewing and use of the compiled
help file. If this editing phase had been postponed, improvements to the
interface would have been less apparent and there would have been little time
to make and test improvements. 
Another factor in our methodology was quick response time. The writer compiled
and displayed the current help file an average of three times per day,
allowing new elements to be tested as soon they were available. It also
revealed bugs early on. The developer's fast response to fixing bugs and
making other changes was critical to the evolution of the help system.
Our daily cycle consisted of these steps:
1. The developer made the latest code (compiler and viewer) available to the
writer.
2. The writer compiled, edited, and displayed the most-recent version of the
help file. Editing included checking format, grammar/spelling, style, and
links. 
3. The writer told the developer about any bugs in compilation and display of
the information. 
4. Any deficiencies or enhancements to the interface were discussed, based on
the most recent display.
5. The developer changed the code to fix bugs and implement enhancements,
including the support of new tags.
6. The writer used the new tags for new text and retagged old text to take
advantage of any new enhancements.


Conclusion


Helplus has become a strategic internal tool that serves as the native help
system for the SAS System running on UNIX hosts. The most recent version
includes support of a richer RTF tag set, Windows help macros, and text
flowing. In addition, help files created for Windows can now be compiled by
Helplus. This allows online help development for the SAS System to proceed
across UNIX and Windows with a single set of RTF files.
One important design change is that the compiler now parses RTF and produces
segments. This implements text flowing and insulates the hypertext widget from
RTF parsing. Our segments are encoded pieces of data consisting of a segment
identifier and some associated information. For example, the link segment in
Listing Three contains all of the information necessary for the hypertext
widget to display the link text and makes it easy for the viewer to interpret
the action that should occur if the user selects the link. Having the data
already encoded in an easily interpreted form is much faster and more reliable
than having the hypertext widget (and viewer) parse the RTF tags. 
In addition to performance gain, segments allowed us to keep the hypertext
widget more general. If the hypertext widget is not RTF dependent, it can
support viewing help files from any tag language as long as the compiler used
for that language produces the segments. (The segments, of course, are not
language specific.) Theoretically, such a hypertext widget could be used in
other applications, including other help viewers. Code reusability gives
Helplus the flexibility to evolve to meet the multiplatform needs of the SAS
System.
The use of segments results in binary data, which changed the Helplus help
file from textual to binary format. In addition, massive code changes were
made to add new features, including those new to Windows 95 (hypergraphics,
linking across multiple help files, shortcut keys, MAPFONTSIZE option, and the
\button tag for push buttons).
The compiler retains its link-resolving capabilities within a help file but
does not resolve links to external help files. This allows the help author to
insert links to help files unavailable to the compiler without affecting the
success of the compilation, yet, it still provides the help author with error
reporting for failed links within the base help file.
With its Snapshot button and the ClientMessage macro, which lets Helplus send
a message back to the client application when a link is selected or a help
file is opened, Helplus now goes beyond Windows help. This capability enhances
the client/server relationship between an application and Helplus by allowing
communication between the two.
Figure 1 The Helplus hypertext help system.

Listing One

char *StartHelpServer(display, server_wdw, path, started)
 Display *display;
 Window *server_wdw;
 char *path;
 int *started;
 display - (I) display connection
 server_wdw - (U) window handle returned from the server;
 the return value is passed to the
 SendMessageToServer routine

 path - (I) path to executable; can be NULL, implying normal
 search mechanism

 started - (U) returned 1 if this call started the help server;
 0 if help server was already running
 StartHelpServer returns a pointer to message if unsuccessful,
 otherwise it returns NULL. 
char *SendMessageToServer(app_con, display, server_wdw, client_wdw,
 helpfilename, topicname, help_handle)
 XtAppContext app_con;
 Display *display;
 Window server_wdw, client_wdw;
 char *helpfilename, *topicname, *help_handle;
 app_con - (I) application context
 display - (I) display connection
 server_wdw - (I) window handle returned from StartHelpServer
 client_wdw - (I) your window
 helpfilename - (I) name of helpfile
 topicname - (I) topic (context string) to display
 help_handle - (U) generic pointer returned; this should be passed
 to subsequent calls to this routine
 SendMessageToServer returns a pointer to message if unsuccessful,
 otherwise it returns NULL.



Listing Two

void MyHelpRoutine(app_con, w, topic)
XtAppContext app_con;
Widget w;
char *topic;
{
static char *help_handle;
static Window initial_server_wdw = NULL;
char *msg;
int we_started_server;
/*---- Start the help server; if message returned, print it. ----*/
Window server_wdw;
if (msg = StartHelpServer(XtDisplay(w), &server_wdw, NULL,
 &we_started_server))
 fprintf(stderr, msg);
/*---------------------------------------------------------------------------*/
/* If server was indeed started on the above call, check to see if server */
/* is the same X-Window (server could have crashed or been killed). If not */
/* the same window, then ensure our help_handle is NULL. Then tell help */
/* server to display the given topic in our help file. */
/*---------------------------------------------------------------------------*/
else
{ 
 if ( (we_started_server) 
 (initial_server_wdw != server_wdw) )
 help_handle = NULL;
 initial_server_wdw = server_wdw;

 if (msg = SendMessageToServer(app_con, XtDisplay(w),
 server_wdw, XtWindow(w),"myappl.hlp", topic, &help_handle))
 fprintf(stderr, msg);
 }

}



Listing Three

struct HELP_LINK
{
 int topic_index, /* linked topic (same help file) */
 link_num, /* link number within page */
 help_file_index; /* >= 0 if link across help file */
 long hash_value; /* hash value of context string */
 unsigned char is_picture, /* link is a picture */
 has_color, /* Link has link "color" */
 has_underline, /* Link has "underline" */
 popup; /* TRUE if link is popup */
};














































Event-Driven Threads in C++


An object-oriented infrastructure for multithreaded apps




Dan Ford


Dan is a software engineer in Hewlett-Packard's medical-products group. He can
be contacted at ford@mcm.hp.com.


For some years now, the superior performance and responsiveness of
multithreaded applications has been anticipated by users and developers alike.
However, as developers are finding, great care must be taken in designing
multithreaded applications, or many expected benefits are not realized.
Furthermore, multiple threads can greatly increase the complexity of an
application, thereby increasing development and testing costs.
The designer of a high-caliber multithreaded application is faced with two
challenges that go beyond traditional, single-threaded application
development: First, inherent parallelisms in the system must be identified and
translated into program segments that can execute independently; and second,
effective interthread communication and synchronization strategies must be
designed. Failure in the first area will result in a program that doesn't
deliver on the promise of multithreading (by using CPU resources
inefficiently); failure in the second will mire development in unnecessary
complexity and overhead. In this article, I'll focus on the second issue and
present a powerful, multithreaded architecture that can be used by almost any
application--once the basic building blocks are available. Since the concepts
presented are particularly useful to object-oriented programs, I'll also
describe and implement a set of C++ classes. 
Almost all threads must at some point coordinate their actions with events
occurring in other threads. In this article, any occasion in which a thread
must wait for an event to occur in another thread will be referred to as a
"synchronization point." Two threads can implement a synchronization point
with a semaphore. If one thread must wait until an event in another thread has
occurred, then it simply blocks on a semaphore and waits for it to be cleared
by the other thread. The two threads might look something like Example 1.
Although this example is a bit simplistic (for example, it ignores some
semaphore creation and initialization tasks), this common mechanism is easy
and works well when the number of distinct events or synchronization points is
small. However, it doesn't scale up well to applications with complex
synchronization needs or those that contain a large number of threads
communicating in different ways. Each synchronization point will need its own
dedicated semaphore; as the number of threads increases, the potential number
of dedicated semaphores increases dramatically. Other types of synchronization
needs are more complicated, such as when a thread must wait for one of many
events. Furthermore, threads using this type of synchronization are difficult
to test and debug because they must be tested in isolation.
The mechanism in Example 1 doesn't scale up well because it is not structured
on an event-driven model, despite the event-driven nature of the system.
Thread synchronization is inherently event driven. An alternative structure
for the thread is a message-passing architecture where, instead of threads
that block on dedicated semaphores, message-driven threads depend on a single
queue of messages. A skeleton message-driven thread is shown in Example 2.
While a thread is waiting for a message to arrive on its queue, it is blocked
and consumes little or no CPU resources. As soon as the message arrives, the
thread wakes up, retrieves the message, and carries out some action in
response to it. Finally, it returns to wait for another message on the queue.
Message-driven threads have many advantages over the simpler type of threads
described earlier:
They have a simple and predictable behavior, yet communication protocols of
arbitrary complexity can be implemented with other threads.
All threads communicate with each other using the same interface. This frees
up the threads to concentrate on their respective tasks. Valuable resources
are saved since allocating semaphores for specific synchronization points is
not necessary. Defining an interface to a thread consists of defining the set
of messages the thread is prepared to respond to, the arguments that must
accompany these messages, and what responses are expected.
Testing is easier because standard test beds can be developed. A
message-driven thread can often be unit tested in isolation by sending it
messages and trapping the responses. Similarly, if all threads use the same
messaging mechanism, debugging can be facilitated by message-logging and
tracking tools. 
Message passing between threads of the same process can be implemented more
efficiently than interprocess-communication mechanisms.


The Class Hierarchy


In Figure 1, an overview of the class-inheritance hierarchy, each box shows a
class with the class name at the top, followed by the public and protected
methods in the next two sections of the box. (Private methods are not shown.)
The base class of the hierarchy is a simple thread class. Although it is a
no-frills class, it is not abstract; that is, you may create instances of this
class. When you instantiate a simple thread, you must provide a thread
function. A stack size can be supplied as well, but it is optional. The Start
method must be invoked to cause the thread to begin. Start accepts an optional
32-bit argument that is passed to the thread. Once instantiated and started,
the thread can be suspended, resumed, and stopped with the methods provided;
see Listings One and Two; listings begin on page 98. There is also a method to
get the OS/2 thread ID, which can be used for OS/2 API calls that require a
thread ID.
The next class in the hierarchy, QThread, is abstract and defines the behavior
and interface for the message-driven threads described earlier; see Listings
Three and Four. The interface provides a method, SendMsg, which is used to
send a message to the thread. Several of the methods are pure virtual, meaning
that we must provide an implementation for them in the derived classes.
QThread is abstract because we will want to implement at least two different
types of message-driven threads. Therefore, this class serves only to
formalize the behavior and interface of all message-driven threads, regardless
of implementation.
At first glance, it might appear that the QThread class constructor is missing
an argument--no thread procedure is required. How can you instantiate a thread
object without providing a thread procedure? QThread contains a private
static-thread procedure, threadProc, that is passed to the base thread class.
This procedure enforces the behavior of all message threads derived from
QThread. The behavior of threadProc consists of a three-step process:
1. It calls the Startup method, which will be implemented in the derived
class. The derived class can perform any necessary initialization. This
initialization is distinct from that which might be performed by the
constructor for the derived class, because this startup method is called in
the execution context of the new thread (that is, after the thread has been
started). The constructor, on the other hand, is called in the execution
context of the thread that instantiated the new thread. When the startup
method is called, it is passed the initial argument supplied when the thread
is started.
2. Next the MsgLoop method is called. Most of a typical, message-driven
thread's life is spent in this simple loop, which looks something like Example
3. The methods GetMessage and DispatchMsg are pure virtual, so they will be
implemented in the derived class. The GetMessage method returns FALSE to cause
MsgLoop to terminate.
3. When MsgLoop terminates, the Shutdown method is called. This method is also
pure virtual. The derived class can then perform any cleanup or notification
tasks. For instance, when a message thread ends, it may wish to send messages
to other threads to inform them that it will no longer be available. Just like
the Startup method, this call is made in the same execution context as the
rest of the thread.
An important consideration when designing the QThread class is the format of
the messages. Messages can be as simple as a single short integer, or
arbitrarily complex. However, three characteristics of a message format render
it sufficiently flexible to meet the needs of a wide variety of applications:
It should uniquely identify a specific event or command. This is the message
identifier, and usually consists of an unsigned integer.
It should allow arguments to be passed along with the message. The space
allowed for arguments can vary, but at a minimum, enough room for a pointer
should be allocated, since that allows passing pointers to arbitrarily complex
data.
For some threads it is useful if the message carries with it additional
addressing information. The thread can then use this addressing information to
dispatch the message to a particular object or subsystem. Such a field is not
mandatory, since that information could be passed in the argument list;
however, if such a field is provided in the message structure, the overhead of
packaging up a message can often be reduced.
For the design of the QThread class, I chose the same format OS/2 Presentation
Manager (PM) uses for its user-interface messages--the QMSG structure. This
allows the creation of two kinds of message-driven threads--OS/2 PM threads
and non-OS/2 PM threads. The QMSG structure, as defined by OS/2, is shown in
Example 4.
The first field is used by PM threads to hold the window handle for the window
which is to receive the message (since a single PM thread may service many
windows). Non-PM threads derived from QThread can use this field to contain
any 32-bit value, since it is up to the thread to interpret the contents of
the message; however, to follow the convention set by PM, this field could be
used as a pointer to an object to which the message is addressed or on which
the thread will operate. The second field of the QMSG structure is an unsigned
32-bit integer used for the message identifier. A message thread will use this
value to decide what action to take. The third and fourth fields are
general-purpose fields to hold arguments that accompany the message. They are
32-bit values, and their contents will depend on the message ID. In this
implementation, the remaining fields will not be used.
The class hierarchy in Figure 1 shows two classes derived from QThread:
MsgThread and PMThread. Although the source code for PMThread (an OS/2 PM
thread class) is included (see Listings Nine and Ten), I won't discuss it in
detail here. The MsgThread (Listings Five and Six) class encapsulates a
message-driven thread useful for OS/2 background threads. MsgThread provides
implementations for all the pure virtual methods defined by QThread, and is
therefore not an abstract class.
When a MsgThread is instantiated, a message procedure must be provided as a
parameter to the constructor. This user-supplied thread procedure is called
each time a new message arrives on the message queue. It is only called to
handle messages when they are available--the real thread procedure is the
static method belonging to the QThread class, which drives the message loop.
This message procedure resembles a PM window procedure, its counterpart in PM
threads.
The message procedure expects five arguments: The first four are the four
fields of the message (from the QMSG structure); the last argument is the
initial argument passed to the MsgThread when it was started (using the Start
method in the base Thread class). This argument is usually used to point to
the message procedure's instance data. Looking at the source listings, you can
see that the QThread class saved the Start argument as a protected member, so
it is accessible to the MsgThread class, which adds it as the fifth argument
to each message-procedure call.
MsgThread also provides implementations for the Startup and Shutdown methods
(which were pure virtual in QThread). Startup simply calls the message
procedure with a special message ID (MSG_THRD_STARTUP, defined in
MsgThread.h). Shutdown calls the message procedure with another special
message ID (MSG_THRD_SHUTDOWN, also defined in MsgThread.h). These two
messages allow the message procedure to perform any necessary initialization
and cleanup tasks. The Startup message also passes the this pointer so the
message procedure can get access to the message-thread object. (This might be
necessary, for example, if the message procedure needed to post messages to
its own message queue.)
The final component of the MsgThread class is the actual implementation for
the message queue, provided in a separate class; see Listings Seven and Eight.
This allows the implementation for the message queues to change without
impacting the MsgThread class. When designing the message-queue class,
remember that many threads might attempt to post a message to the same queue
simultaneously. Access to the queue is serialized (with a mutual exclusion
semaphore) to prevent the queue from being corrupted. Another (event)
semaphore is used to unblock, or wake up, the waiting thread when a message
arrives.


Using the Classes


The structure of a message-driven application usually consists of a collection
of communicating, message-driven threads, each with a particular
responsibility. Usually, the main thread creates the other threads and then
blocks until it's time to shut down. Sample.cpp and PMApp.cpp are two OS/2
programs that illustrate this; see "Availability," page 3. All of the sample
code has been compiled with the Borland C++ compiler for OS/2 2.x. Sample.cpp
creates several message threads, each of which forwards its messages to the
next thread. Each time a thread receives a message, a bit in one of the
message parameters is turned on to mark the message as having visited that
thread. When the message makes a complete circle, visiting each thread, the
message is posted to a display thread which prints a string indicating that
the message has arrived. The display thread keeps a count of how many messages
have arrived. When all messages have been accounted for, it clears a semaphore
and terminates. The main thread then resumes execution and terminates the
program. PMApp.cpp is a PM "Hello World" program in which the Hello World
window is implemented in its own thread. In addition to the OS/2 version, a
Windows NT implementation is provided electronically.
Messages are useful for command and control purposes. They are an efficient
way to send requests, signal events, and synchronize activities. However, they
are not usually a good choice for serializing access to shared data. A better
strategy is to encapsulate shared data into objects that provide methods to
access the data. These objects can contain their own (private) instances of
semaphores that are used by their public methods to serialize access to the
data. This way, the serialization is guaranteed without building in specific
knowledge about any particular thread.


Enhancing the Message-Thread Concept



The framework presented here is a no-frills infrastructure for multithreaded
applications. One obvious enhancement is to add functionality to the message
queues. The queues described here do not prioritize messages. Also, some
applications might wish to filter the message queue, or to peek at the
messages on the queue to see if a particular message is waiting. Other
enhancements could include methods to broadcast or forward messages.
Message-driven threads create exciting opportunities in the area of
object-oriented design. Message-driven threads make it easy to encapsulate a
thread into an otherwise ordinary class, making it possible to build a library
of truly asynchronous classes. Objects in the real world tend to communicate
asynchronously and perform their functions in parallel, so why not do the same
with our software objects? One of the goals of object-oriented design is to
minimize the conceptual leap that must be made when we map real-world objects
into programming structures. By wrapping message threads into application
classes and mapping events into messages, we approximate the real world we are
trying to model. Both sending and receiving objects perform their duties in
parallel.
Of course, mapping a problem into a system of objects operating in parallel
requires rethinking interface design. Interfaces in such a system become less
a collection of functions and more a protocol of messages and responses. 


Conclusion


Threads are an exciting, powerful tool. However, like most other powerful new
technologies, it is easy to use them inappropriately or carelessly. The class
library presented here can provide ways to quickly create and control threads.
In addition, message-driven threads facilitate the design and development of
event-driven, multithreaded applications. Finally, I hope that these classes
can aid in the development of new asynchronous classes, which can be used to
better model the complexities of the real world.
Figure 1 The inheritance hierarchy using OMT notation.
Example 1: Two threads that implement a synchronization point with a
semaphore.
ThreadA()
{
 // do some stuff
 block on semaphore(a)
// wait for a specific event
 // do some more stuff
}
ThreadB()
{
 // do some stuff
 clear semaphore(a)
// signals ThreadA
 // do more stuff
}
Example 2: Skeleton message-driven thread.
MsgThreadA()
{
 while (TRUE) {
 msg = getMsgFromQueue();
 // do something with msg
 }
}
Example 3: Typical message-driven thread.
while (this->GetMessage(qmsg))
 this->DispatchMsg(qmsg);
Example 4: QMSG structure as defined by OS/2.
struct QMSG {
 HWND hwnd;
 ULONG msg;
 MPARAM mp1;
 MPARAM mp2;
 ULONG time;
 POINTL ptl;
 ULONG reserved;
}

Listing One

// Thread.h
#if !defined(THREADS_INC)
#define THREADS_INC

//---------------------- Constants and Types -----------------------
const int THRDS_DEF_STACK = 8192; // default stack size

typedef void FNTHREADPROC (VOID * ulArg); // thread procedure type

typedef FNTHREADPROC* PFNTHREADPROC;

//------------------------------ Class -----------------------------
class Thread {
public:
 Thread( PFNTHREADPROC pfnThread, // Constructor
 ULONG ulStack=THRDS_DEF_STACK);
 virtual ~Thread(); // Destructor
 virtual VOID Start (ULONG arg=0L);
 VOID Stop() { DosKillThread(idThread); }
 VOID Resume() { DosResumeThread(idThread); }
 VOID Suspend() { DosSuspendThread(idThread); }
 TID GetTID() { return idThread; }
private:
 ULONG ulStackSize;
 TID idThread;
 PFNTHREADPROC pfnThreadProc;
};
#endif



Listing Two

// Thread.cpp
//------------------------- Includes ------------------------------
#define INCL_DOS
#include <os2.h>
#include <Process.h>
#include "Thread.h"

//--------------------------- code --------------------------------
Thread::Thread(PFNTHREADPROC pfnThread, ULONG ulStack)
{
 ulStackSize = ulStack;
 pfnThreadProc = pfnThread;
}
Thread::~Thread() {} // empty implementation
VOID Thread::Start (ULONG arg)
{
 idThread = _beginthread(pfnThreadProc, ulStackSize, (void*)arg);
}



Listing Three

// QThread.h
#if !defined(QTHREAD_INC)
#define QTHREAD_INC

//----------------------------- Includes ---------------------------
#include "Thread.h"
//----------------------------- defines ----------------------------
const int QTHRD_DEF_QSIZE = 0L;
//------------------------------ Types -----------------------------
class QThread; // forward declaration
typedef VOID FNQTHPROC (QThread *, ULONG); // QThread Procedure type
typedef FNQTHPROC * PFNQTHPROC;


//------------------------------ Class -----------------------------
class QThread : public Thread {
public:
 QThread ( ULONG ulQueueSize=QTHRD_DEF_QSIZE, // Constructor
 ULONG ulStackSize=THRDS_DEF_STACK);
 ~QThread(); // Destructor
 VOID Start(ULONG ulArg=0L);
 virtual VOID SendMsg(ULONG objAddr, ULONG msg,
 MPARAM mp1, MPARAM mp2) = 0;
protected:
 virtual VOID MsgLoop();
 virtual BOOL GetMessage(QMSG & qmsg) = 0; // pure virtual
 virtual VOID DispatchMsg (QMSG & qmsg) = 0; // pure virtual
 virtual VOID Startup(ULONG ulArg) = 0; // pure virtual

 ULONG ulQSize;
 ULONG ulParam; // initial argument passed in when thread is started
private:
 static VOID threadProc(QThread*); // static thread procedure
};
#endif



Listing Four

// QThread.cpp
//------------------------- Includes ------------------------------
#define INCL_WIN
#define INCL_DOS

#include <os2.h>
#include "QThread.h"
//--------------------------- code --------------------------------
QThread::QThread ( ULONG ulQueueSize, ULONG ulStack): 
 Thread((PFNTHREADPROC)this->threadProc, ulStack), ulQSize(ulQueueSize)
{}
QThread::~QThread() 
{}
VOID QThread::Start(ULONG ulArg)
{
 ulParam = ulArg;
 this -> Thread::Start((ULONG)this);
}
VOID QThread::MsgLoop()
{
 QMSG qmsg;
 while (this -> GetMessage(qmsg))
 this -> DispatchMsg(qmsg);
}
VOID QThread::threadProc (QThread* pQThrd)
{
 pQThrd->Startup(pQThrd->ulParam);
 pQThrd->MsgLoop();
 pQThrd->Shutdown(pQThrd->ulParam);
}




Listing Five
// MsgThrd.h
#if !defined(MSGTHREAD_INC)
#define MSGTHREAD_INC

//----------------------------- Includes ---------------------------
#include "QThread.h"
#include "MsgQ.h"
//----------------------------- defines ----------------------------
const USHORT MSG_DEF_QSIZE = 10;

//----------------------------------------------------
// The following two values are reserved messages ID's. All MsgThread's must
be
// prepared to receive them. All other message ID's are user defined.
const ULONG MSG_THRD_SHUTDOWN = 0; // Received during shutdown
const ULONG MSG_THRD_STARTUP = 1; // Received at startup
const ULONG MSG_THRD_USER = 2; // First user defined msg ID
//------------------------------ Types -----------------------------
typedef VOID FNMSGTHRDPROC (ULONG objAddr, ULONG msgID, MPARAM mp1, 
 MPARAM mp2, ULONG ulParam);
typedef FNMSGTHRDPROC* PFNMSGTHRDPROC;
//------------------------------ Class -----------------------------
class MsgThread : public QThread {
public:
 MsgThread ( PFNMSGTHRDPROC pfn, USHORT usQSize=MSG_DEF_QSIZE,
 ULONG ulStack=THRDS_DEF_STACK);
 ~MsgThread ();
 VOID SendMsg (ULONG objAddr, ULONG msgID, MPARAM mp1, MPARAM mp2)
 { pMsgQ->PostMsg(objAddr, msgID, mp1, mp2); }
protected:
 BOOL GetMessage (QMSG & qmsg) 
 { return pMsgQ->WaitMsg(qmsg); }
 VOID DispatchMsg (QMSG & qmsg) 
 { pfnMsg((ULONG)qmsg.hwnd,qmsg.msg,qmsg.mp1,qmsg.mp2,ulParam); }
 VOID Startup (ULONG ulArg)
 { pfnMsg((ULONG)this, MSG_THRD_STARTUP, 
 (MPARAM)ulArg, (MPARAM)NULL,ulArg); }
 BOOL Shutdown(ULONG ulArg);
private:
 MsgQueue* pMsgQ; // pointer to msg queue
 PFNMSGTHRDPROC pfnMsg; // pointer to client thread proc
};
#endif



Listing Six

// MsgThrd.cpp
//------------------------- Includes ------------------------------
#define INCL_WIN
#define INCL_DOS

#include <os2.h>
#include "MsgThrd.h"

//--------------------------- code --------------------------------
MsgThread::MsgThread ( PFNMSGTHRDPROC pfn, USHORT usQSize, ULONG ulStack) :

 QThread(usQSize,ulStack), pfnMsg(pfn)
{
 pMsgQ = new MsgQueue(usQSize);
}
MsgThread::~MsgThread ()
{
 delete(pMsgQ);
}
BOOL MsgThread::Shutdown(ULONG ulArg)
{
 pfnMsg((ULONG)NULL, MSG_THRD_SHUTDOWN, 0L, 0L, ulArg);
 return TRUE;
}



Listing Seven

// Msgq.h
#if !defined(MSGQUEUE_INC)
#define MSGQUEUE_INC

//-------------------------- defines -------------------------------
const USHORT MQ_DEF_QSIZE = 10;

//------------------------------ Class -----------------------------
class MsgQueue {
public:
 MsgQueue (USHORT usQSz=MQ_DEF_QSIZE);
 ~MsgQueue ();
 //--------------------------------------------------------------
 // This method blocks until it acquires the mutual exclusion 
 // semaphore for the queue. It then calls the private 
 // method QPut to add the message to the queue.
 VOID PostMsg (ULONG hobj, ULONG msg, MPARAM mp1, MPARAM mp2);
 //--------------------------------------------------------------
 // This method blocks until a message is available on the queue.
 // It then obtains the necessary mutual exclusion semaphores
 // before calling the private method QGet.
 BOOL WaitMsg(QMSG & qmsg);
private:
 BOOL QEmpty(); // returns TRUE if queue is empty
 //--------------------------------------------------------------------
 // This function puts a message in the queue. This function is private
 // because it assumes that the proper mutual exclusion semaphores have
 // already been acquired. If the queue is full it will automatically 
 // grow, so it cannot overflow until memory is exhausted.
 VOID QPut( ULONG hobj, // hwnd or object handle
 ULONG msg, // msg ID
 MPARAM mp1, // parameter 1
 MPARAM mp2); // parameter 2
 //--------------------------------------------------------------------
 // This function extracts a waiting message from the queue and fills 
 // the QMSG structure. This is a private function because it does 
 // no mutual exclusion and assumes a msg is indeed waiting at the 
 // Front of the queue (it returns whatever is there, valid or not).
 // This function does not block.
 VOID QGet (QMSG & pqmsg);
 HEV hevItmRdy; // Semaphore to indicate item ready

 HMTX hmtx; // Mutual exclusion semaphore
 USHORT Front, Rear; // Queue pointers
 USHORT usQSize; // Maximum number of queue entries
 QMSG *msgs; // Array of QMSG structures
};
#endif



Listing Eight

// MsgQ.cpp
//------------------------- Includes ------------------------------
#define INCL_WIN
#define INCL_DOS

#include <os2.h>
#include "MsgQ.h"

//-------------------------- defines ------------------------------
const USHORT MQ_INCREMENT = 5;

MsgQueue::MsgQueue (USHORT usQSz) : usQSize(usQSz), Front(0), Rear(0)
{
 msgs = new QMSG[usQSize];
 DosCreateMutexSem (NULL, &hmtx, DC_SEM_SHARED, FALSE);
 DosCreateEventSem (NULL, &hevItmRdy, DC_SEM_SHARED, FALSE);
}
MsgQueue::~MsgQueue()
{
 DosCloseEventSem (hevItmRdy);
 DosCloseMutexSem (hmtx);
 delete msgs;
}
VOID MsgQueue::PostMsg (ULONG hobj, ULONG msg, MPARAM mp1, MPARAM mp2)
{
 DosRequestMutexSem (hmtx, SEM_INDEFINITE_WAIT);
 QPut(hobj, msg, mp1, mp2);
 DosReleaseMutexSem (hmtx);
 DosPostEventSem (hevItmRdy); // wake up whoever is waiting for msgs
}
BOOL MsgQueue::WaitMsg(QMSG & qmsg)
{
 ULONG ulNPosts;
 
 DosWaitEventSem (hevItmRdy, SEM_INDEFINITE_WAIT);
 DosRequestMutexSem (hmtx, SEM_INDEFINITE_WAIT);
 QGet (qmsg);
 if (QEmpty())
 DosResetEventSem (hevItmRdy, &ulNPosts);
 DosReleaseMutexSem (hmtx);
 return (qmsg.msg);
}
BOOL MsgQueue::QEmpty()
{
 return (Front == Rear);
}
VOID MsgQueue::QPut(ULONG hobj, ULONG msg, MPARAM mp1,MPARAM mp2)
{

 USHORT usNxtR, usNQSize, idxF, i;
 QMSG *p;
 msgs[Rear].hwnd = (HWND)hobj;
 msgs[Rear].msg = msg;
 msgs[Rear].mp1 = mp1;
 msgs[Rear].mp2 = mp2;
 // If queue has filled up, then reallocate a larger queue
 // and transfer the contents to the new queue
 usNxtR = (Rear+1) % usQSize;
 if (usNxtR == Front) {
 usNQSize = usQSize + MQ_INCREMENT;
 p = new QMSG[usNQSize];
 idxF = Front;
 for (i=0; i < usQSize; i++) {
 p[i] = msgs[idxF++];
 if (idxF == usQSize)
 idxF = 0;
 }
 Front = 0;
 Rear = usQSize;
 delete msgs;
 usQSize = usNQSize;
 msgs = p;
 } else 
 Rear = usNxtR;
}
VOID MsgQueue::QGet (QMSG & qmsg)
{
 qmsg.hwnd = msgs[Front].hwnd;
 qmsg.msg = msgs[Front].msg;
 qmsg.mp1 = msgs[Front].mp1;
 qmsg.mp2 = msgs[Front].mp2;
 Front = (++Front % usQSize);
}



Listing Nine

// PMThread.h

#if !defined(PMTHREAD_INC)
#define PMTHREAD_INC

//----------------------------- Includes ---------------------------
#include "QThread.h"

//----------------------------- defines ----------------------------
const ULONG PMTHRD_DEF_STACKSIZE = 8192;

//--------------------------- Public Types -------------------------
// Type for the procedure that is supplied to perform initialization and 
// shutdown for the PM thread. Usually this proc registers user classes
// and/or creates the main window or windows.
class PMThread; // forward declaration
typedef VOID FNPROC (BOOL start, ULONG ulArg, PMThread* pmThrd);
typedef FNPROC* PFNPROC;
//------------------------------ Class -----------------------------
class PMThread : public QThread {

public:
 PMThread ( PFNPROC pfn, USHORT usQSize=0,
 ULONG ulStackSize=PMTHRD_DEF_STACKSIZE);
 ~PMThread ();
 VOID Startup (ULONG ulArg);
 BOOL Shutdown(ULONG ulArg);
 VOID SendMsg( ULONG objAddr, ULONG msg, MPARAM mp1, MPARAM mp2);
 BOOL GetMessage(QMSG & qmsg)
 { return WinGetMsg(hab, &qmsg, NULLHANDLE, 0,0); }
 VOID DispatchMsg (QMSG & qmsg)
 { WinDispatchMsg (hab, &qmsg); }
 HAB QueryHAB() { return hab; }
 HMQ QueryHMQ() { return hmq; }
private:
 HAB hab; // PM Anchor block handle
 HMQ hmq; // Message Queue handle
 PFNPROC pfnProc;
};
#endif



Listing Ten

// PMThread.cpp

//------------------------- Includes ------------------------------
#define INCL_WIN
#define INCL_DOS

#include <os2.h>
#include "PMThread.h"

//--------------------------- code --------------------------------
PMThread::PMThread (PFNPROC pfn, USHORT usQSize, ULONG ulStackSize) : 
 QThread (usQSize, ulStackSize), pfnProc(pfn)
{}
PMThread::~PMThread()
{}
VOID PMThread::Startup(ULONG ulArg)
{
 hab = WinInitialize(0);
 hmq = WinCreateMsgQueue (hab, ulQSize);
 pfnProc(TRUE, ulArg, this);
}
BOOL PMThread::Shutdown(ULONG ulArg)
{
 pfnProc(FALSE, ulArg, this);
 WinDestroyMsgQueue(hmq);
 WinTerminate(hab);
 return TRUE;
}
VOID PMThread::SendMsg( ULONG objAddr, ULONG msg, MPARAM mp1, MPARAM mp2)
{
 if (objAddr)
 WinPostMsg ((HWND)objAddr, msg, mp1, mp2);
 else
 WinPostQueueMsg (hmq, msg, mp1, mp2);
}
































































Thread Programming in UnixWare 2.0


Just say "no" to fork()




John Rodley


John is an independent consultant in Cambridge, MA. He can be contacted at
john.rodley@channel1.com.


With the advent of UnixWare 2.0, threads have made their way to the UNIX
desktop. A superset of the thread specification in the POSIX Portable
Operating Systems Standard (draft standard P1003.1c), threads have the
potential to liberate UnixWare developers from the limitations of the age-old
fork() model. Furthermore, threads let you exploit the capabilities of
multiprocessing hardware. 
Before Version 2.0 (POSIX 1003.1c and SVR4.2 MP), UnixWare provided two ways
to create new processes: fork and fork-exec. The fork system call creates an
exact copy of the calling process and sets it running at the return from the
fork call. The new process is a child of the old; it gets a copy of the
parent's data space and valid file descriptors for all files opened by the
parent. To start a different process, the child process calls exec right after
the return from fork.
With fork(), creating a new process consisted of a few lines of code, such as
those in Example 1. To start another process, a process had to clone itself,
then ask the operating system which of the two copies it was. Until recently,
fork/exec was the only avenue for concurrent programming. 


Lightweight Processes


Pre-2.0 UnixWare kernels had only one type of process, which I call a
"heavy-weight process" (HWP), and is the object of such calls as ps, kill(),
and getpid(). HWPs still exist, but only as collections of lightweight
processes (LWPs), which are the only schedulable entity in UW 2.0. An HWP
consists of from one to MAXULWP LWPs. If you run a nonthreaded application in
UW 2.0, in memory you will get an HWP that consists of a single LWP. In
effect, instead of being a pointer to a piece of executable code, the HWP is
now a pointer to a list of pieces of executable code. 
In multiprocessor systems, separate LWPs from a single HWP can run on
different processors, allowing them to achieve true concurrency. The best
example of the need for this is a print function. You want to hit the print
button, then move on--not sit watching a dialog box that says, "Now formatting
page n. Please wait." 
Since the HWP concept is still supported, old process-specific calls such as
getpid, kill, and nice work much as they did before. Therefore, you need
analogues to those calls to control threads and their LWPs the way you've
always controlled HWPs. Table 1 lists some process-control calls and their
threads' lib analogues.


Threads


Threads are not LWPs. The kernel itself knows nothing about threads, it only
schedules LWPs. Each running LWP makes calls to the dynamic-threads library,
which schedules threads to run on LWPs. So you now have two levels of
scheduling: kernel scheduling of LWPs on processors and thread-library
scheduling of threads on LWPs. A single instance of a thread can run, at
different points in its life, on different processors and different LWPs. To
really get this, you have to view the scheduled process as something
completely independent from the lines of code that will run when that process
gets scheduled. Think of the processor as a field, of each LWP as someone who
has signed up to use the field, and of each thread as a particular activity
such as baseball, soccer, or football. Now, when the kernel schedules someone
to use the field, that person can play football for the entire time (a bound
thread) or football for five minutes and baseball for ten. The kernel doesn't
care. The person using the field (the threads library, through any of its
LWPs) has to keep track of the games being played (thread instances) within
the time that person uses the field. Thus, a thread is simply a series of
logical statements, independent of the process, or processor upon which it
might be executed.
There are two basic kinds of threads: bound and multiplexed. A bound thread
gets its own dedicated LWP. Each HWP has a number of LWPs in its pool, and it
can run a particular multiplexed thread (muxthread) on any LWP in the pool at
any given point.
The major consideration in choosing between bound and multiplexed threads is
the trade-off between performance and concurrency. On a uniprocessor, bound
threads can have up to five times the context-switching overhead of
muxthreads. Bound threads, though, enjoy the most concurrency. Five bound
threads on a five-processor system could be running physically concurrently,
one thread to a processor, while five muxthreads on the same system might end
up running on a single processor.


Concurrency


Concurrency is easiest to understand in the multiprocessor model. In a
multiprocessing machine, an LWP can be farmed out to another processor. Two
LWPs, or two threads bound to different LWPs, running on different processors
at the same time are running truly concurrently. If two muxthreads run on the
same LWP, they can never run on separate physical processors, and can thus
never be truly concurrent.
Thus you can see the two extremes of concurrency: the maximum being one LWP
per thread and the minimum being one LWP for all threads. In reality, the
threads library will not let you pile a large number of threads onto a single
LWP. UW 2.0 allows you to set the concurrency level through the
thr_setconcurrency call. Listing One (beginning on page 102) is a program that
creates six additional multiplexed threads, each of which only prints out its
process ID, LWP id, and thread id. Figure 1 shows the output from a run of the
program with concurrency set to 1 (the minimum). Even at that setting, the
threads library created two new LWPs (2 and 4) to run our spawned threads
(2--7), proof that the concurrency level we set in thr_setconcurrency is a
hint, not an order. Figure 2 shows the output when we increase the concurrency
level by 1. A new LWP (5) appears. Notice also that from one iteration of the
thread's main loop to the next, the thread can run on different LWPs.
Another anomaly that leaps out when running Listing One is that the first run
creates three LWPs: the one running thread 1 (LWP1) and those running threads
2--7 (LWP2 and 4). Logically, there must have been an LWP3. In this case, the
thread library's wrapper for the sleep function created its own bound thread
and thus a new LWP, so you don't have absolute control over the number of LWPs
in a process.
Down in the details, scheduling and concurrency are even more complicated, but
the bottom line is that two bound threads have the maximum probability of
achieving true concurrency, while two muxthreads with concurrency level 1 have
the minimum.


What's in a Thread?


The first decision in threading an application is which lumps of code should
get their own threads. Table 2 lists categories of code granularity for
threading. You need to mark medium- and coarse-grain functions for possible
threading. A good example of a medium-grain function is a signal handler.
Typically, a signal handler is a single function that does all its work within
that function, or with calls to one or two other small functions. A typical
coarse-grain function would be the serial I/O handler of a communications
package. While it contains a huge amount of functionality, and correspondingly
huge amounts of code, it needs user input to make a complete program.
You also have to decide whether making two threads of execution concurrent
yields any real-time gain to the user. If you have three functions--A, B, and
C--where B can't start until A is done and C can't start until B is done, then
making A and B concurrent gets you nowhere. If, however, B can start without A
being done, then putting B in a separate thread could be a real-time win.


Creation


Creating a thread is as simple as making a call to thr_create with the address
of the function that will be the "main" for that thread. Creating the new
thread in a suspended state (THR_SUSPENDED) lets you specify exactly when the
new thread begins to run. By calling thr_continue, the new thread begins
processing at the first line of the function passed to thr_create. You can
call thr_suspend at any time to pause your new thread.
Note that with thr_create you can no longer rely on your stack to autogrow.
The kernel supports autogrowth of a stack when you run out of stack space, but
since the kernel isn't handling threads, it doesn't know anything about the
threads' stacks. Thus, you have to allocate a big-enough stack right from the
thr_create call.



Threads and Signals


You can set up separate signal masks for each thread in a process. A signal
sent to a UNIX process from another UNIX process via
kill(process_id,signal_id), however, will only go to a thread enabled to catch
that signal. If more than one thread is accepting a particular signal, the
signal may be delivered to any accepting thread.
For this and other reasons, Novell recommends that instead of dealing with
signals on a thread-by-thread basis, applications mask all signals in all
threads and dedicate a single thread to wait on incoming signals via sigwait.
An "object thread" program that adds a signal handler thread to Listing One is
available electronically (see "Availability," page 3). As always, it pays to
build an appropriately limited signal set. Two new signals have been defined
in UW 2.0 to support the threads lib, SIGWAITING and SIGLWP. SIGWAITING
happens when all LWPs in the processes' LWP pool are blocked interruptibly. In
thread8, this occurs when thread 1 is in gets(), thread 2 is sitting in a
sigwait(), and all the other threads are either suspended or sleeping. If you
add SIGWAITING to an object-thread program's signal set, the process will stop
accepting user input.


Shared Data


To share data among HWPs, you have to use the System V shared-memory IPC.
Threads, on the other hand, automatically share all global and static data.
You can see this in Listing One, where the variable ulIterations is a static
in the thread-start function. Each thread increments ulIterations each time
through the loop, and you get output like that in Figures 1 and 2.
If you made ulIteration an automatic variable, it would have gone on the
stack, which is separate for each thread, and thus each thread would get its
own, private copy, giving you output such as this:
Thread1 Iteration 1
Thread2 Iteration 1
Thread1 Iteration 2
Thread1 Iteration 3
Thread2 Iteration 2


Interthread Coordination


UW 2.0 supports a number of mechanisms for coordinating the activity of
threads within a single HWP: locks, semaphores, and conditions. 
Mutual-exclusion locks are used to restrict resource access to a single
thread. Lock the resource by calling mutex_lock. Any other thread calling
mutex_lock for that mutex blocks until you call mutex_unlock. All the mutex
calls take a pointer to a mutex_t structure as their first arg in order to
identify the mutex. Under the rules of shared data, this mutex_t struct must
be either global or static in order to be available to all threads.
Reader-writer locks are a variation of mutex locks. They allow the application
to place two different types of lock on the same resource. When performing a
nondestructive operation on the resource (read), the app calls rw_rdlock to
put a read lock on. Any number of threads can put read locks on a resource. If
a thread attempts a write lock on the resource, it will block until the
reader's unlock. When a thread acquires a write lock, all other readers and
writers block until the single writer unlocks. In file-system terms, putting a
read-lock on a resource is the equivalent of doing a chmod 444 on a file
(everyone can read, none can write), while putting a write lock is more like a
chmod 600 (one can read/write, no others can read or write).
Conditions provide a way for threads to wait on specific conditions without
having to "acquire" a semaphore or a mutex. The pseudocode in Example 2
demonstrates this. cond_wait blocks until some thread validates the condition
(sets bLineIn True) and calls cond_signal or cond_broadcast. We only sit in
this loop retesting the condition because the condition could have been
invalidated again by another thread that was also blocked on this condition
and got scheduled before us.


Thread Termination


Terminating a thread is very similar to terminating a UNIX process. From
inside the thread, you call thr_exit (which is called implicitly if the start
function returns). From outside the thread, you have to send the thread a
SIGTERM signal. To clean up, you can catch the signal, then call thr_exit.
Suspended threads do not terminate until they are restarted.
A process terminates when all non-daemon threads have terminated. A call to
exit() or a return from main() (which implies exit) forces termination of all
threads. A program that lets you interactively create and control a
command-line-specifiable number of threads is available electronically. In the
program, I precede the call return(0) at the end of main() with a call to
thr_exit(). If you run this program and start up the threads, they'll start
printing their output. While they're running, hit q to exit the main
user-input loop. The spawned threads keep running, but thread0 exits (the
return(0) never gets executed). Run thread9 again with the --d flag so that
all threads are daemon threads and you see that the process (and all daemon
threads) terminates when all nondaemon threads (thread 0, for instance)
terminate.
If, as Novell suggests, you create a separate signal-handling thread, either
make it a daemon thread or make sure you have some way of killing it so that
your process doesn't hang waiting for that endless thread to die.


Threads and Libraries


Those of us who suffered through the combination of OS/2 1.0 and Microsoft C
5.1 know all about the misery of non-reentrant libraries in a multithreaded
environment--traps, mysterious hangs, crazy values. According to Novell, all
the libraries delivered with 2.0 and the new SDK are thread safe. Third-party
libraries are another story altogether. As usual, there's only one way to know
for sure_.


File I/O


Sharing open-file descriptors introduces an atomicity problem that is almost
certain to blow up any pre-SVR4.2 MP third-party library that does file I/O.
Consider two threads, X and Y, which share an open-file descriptor. X wants to
do a simple seek/read on that file, but seek() and read() are separate
instructions, so X could be preempted between the two. During that preemption,
Y could also call seek against that file descriptor, putting the descriptor's
internal pointer someplace other than where X wanted it. When X regains
control, it will read at the offset Y sought to, not the one X wanted. You
could get around this by locking the file or surrounding all file ops with a
semaphore, but those are pretty big hammers to use on such a small problem. 
To deal with this, UW 2.0 introduces pread() and pwrite(), which are atomic
combinations of lseek/read and lseek/write. The calls are identical to read
and write except that they take an extra argument--the offset from beginning
of file to seek to. These calls do not change the file descriptors' internal
file pointer as lseek would. 


Other Considerations


Now that you're free of fork/exec, the temptation is to go out and write a new
thread for everything (16 million threads!), but you should check that impulse
just a little. There is a kernel-enforced limit on the number of LWPs that one
user id can have. This is a kernel tunable called "MAXULWP." It has a range of
1--65000 and defaults to 200, which should suffice for all but the most
esoteric programs. Listing One uses a kludgy method for obtaining MAXULWP.
According to Novell, there is no supported way for a nonroot user to obtain
MAXULWP.


The Bottom Line



UW 2.0 threads are easy to get running, and once you get used to it, they're a
much more natural way of viewing problems than the old sequential model.
Keeping in mind a few of the concepts and caveats I've discussed here should
put you well on your way to writing the maximum multithreading program.
Example 1: Calling a new process.
if(( child_pid = fork()) != 0 )
 // do child process stuff
 exec( "new_program" ); // overlay this clone with a new executable
else
 // continue doing parent process stuff
Example 2: Pseudocode for conditions.
cond_t MyCondition; // All threads agree that this global condition indicates
that a
// line has arrived from the user.
mutex_t MyConditionsMutex; // All threads agree that this mutex is associated
with
// MyCondition.
// this is thread0
BOOL bLineIn = FALSE;
cond_init( &MyCondition ...)
// spawn thread1
gets();
bLineIn = TRUE;
mutex_lock( &MyConditionsMutex );
cond_signal( &MyCondition );
mutex_unlock( &MyCondition );
// this is thread1
mutex_lock( &MyConditionsMutex )
do {
 iRet = cond_wait( &MyCondition, &MyConditionsMutex );
} while ( bLineIn == FALSE );
mutex_unlock( &MyConditionsMutex );
Figure 1: Listing One output at concurrency level 1.
P1688 LWP2 - Thread 2 iteration 0
P1688 LWP2 - Thread 3 iteration 1
P1688 LWP2 - Thread 4 iteration 2
P1688 LWP4 - Thread 5 iteration 3
P1688 LWP4 - Thread 6 iteration 4
P1688 LWP4 - Thread 7 iteration 5
P1688 LWP2 - Thread 2 iteration 6
P1688 LWP2 - Thread 3 iteration 7
P1688 LWP2 - Thread 4 iteration 8
P1688 LWP2 - Thread 5 iteration 9
P1688 LWP4 - Thread 6 iteration 10
P1688 LWP4 - Thread 7 iteration 11
 ....
Figure 2: Listing One output at concurrency level 2.
P1688 LWP2 - Thread 2 iteration 0
P1688 LWP2 - Thread 3 iteration 1
P1688 LWP2 - Thread 4 iteration 2
P1688 LWP4 - Thread 5 iteration 3
P1688 LWP4 - Thread 6 iteration 4
P1688 LWP4 - Thread 7 iteration 5
P1688 LWP5 - Thread 2 iteration 6
P1688 LWP2 - Thread 3 iteration 7
P1688 LWP2 - Thread 4 iteration 8
P1688 LWP5 - Thread 5 iteration 9
P1688 LWP5 - Thread 6 iteration 10
P1688 LWP4 - Thread 7 iteration 11
 ....
Table 1: Thread-specific calls and their process-specific analogues.
Thread-Specific Process-Specific 
Call Analogue 
thr_create fork/exec

thr_exit exit
thr_join wait
thr_kill kill
thr_setprio nice
thr_sigsetmask sigsetmask (BSD)
pread lseek/read
pwrite lseek/write
getpid thr_self
Table 2: Code granularity.
Code Granularity Level Code Item Comments 
Fine Loop May be threaded by parallelizing
 compiler
Medium Standard one-page Thread
 function
Coarse Background serial I/O Thread
 communications handler
Super-coarse/gross Program Separate heavyweight process

Listing One 

// A program to create and control a command line specifiable
// number of threads interactively.
// command line arguments:
// -b Create BOUND threads, defaults to multiplexed
// -nthreads <number> Create number threads
// see code for explanation of interactive commands.

#include "defines.h"
#include <sys/types.h>
#include <ctype.h>
#include <stdio.h>
#include <unistd.h>
#include <mt.h>
#include <sys/signal.h>
#include <thread.h>
#include <stdlib.h>
#include <sys/lwp.h>
#include "listing1.hpp"

#define MAX_THREADOBJECTS 65000
#define IDTUNE_CMD "grep MAXULWP /etc/conf/cf.d/mtune awk '{print $2}'"
#define BUFSIZE 80

bool bBound; // are we using bound or multiplexed threads??
pid_t getpid(), child1_pid, child2_pid;

int GetMAXULWP();
int flagset( int argc, char *argv[], char *flag );

// Main - create the number of threads specified on the command line, then sit
// in a loop accepting and executing interactive commands from the user.
int main( int argc, char *argv[] )
{
int i, k, iNumThreads;
Thread *pt[MAX_THREADOBJECTS];
int iMaxThreads;
int maxulwp;
int iNumRequestedThreads;
int thread_index;

int thread_id;
char kar;
char buffer[80];
int iConcurrencyLevel = 1;
int iRet;

maxulwp = GetMAXULWP();
if(( k = flagset( argc, argv, "-nthreads" )) > 0 )
 iNumRequestedThreads = atoi( argv[k+1] );
else
 iNumRequestedThreads = MAX_THREADOBJECTS;
if( flagset( argc, argv, "-b" ) > 0 )
 {
 bBound = TRUE;
 iMaxThreads = maxulwp + 1;
 }
else
 {
 bBound = FALSE;
 iMaxThreads = (maxulwp*4) + 1;
 if(( iRet = thr_setconcurrency( iConcurrencyLevel )) != 0 )
 printf( "Error: thr_setconcurrency(%d) = %d\n", iConcurrencyLevel, iRet );
 }
printf( "P%d LWP%d - Creating %d %s threads\n", getpid(), _lwp_self(),
iNumRequestedThreads, 
 bBound?"bound":"multiplexed" );
for( i = 0, iNumThreads = 0; i < MAX_THREADOBJECTS && i < iNumRequestedThreads
; i++ )
 {
 if( bBound )
 pt[i] = new BoundThread();
 else
 pt[i] = new MultiplexedThread();
 if( !pt[i] )
 break;
 if( pt[i]->iCreateError != 0 )
 {
 printf( "P%d - Thread create error %d\n", getpid(), pt[i]->iCreateError );
 delete pt[i];
 break;
 }
 iNumThreads++;
 if( iNumThreads == iMaxThreads)
 printf( "P%d -\tNext thread will exceed MAXULWP (%d)\n", getpid(), maxulwp );
 }
printf( "Following thread commands are available:\n" );
printf( "\ti - shows the status of all the threads\n" );
printf( "\ta - increments concurrency level\n" );
printf( "\tc - continues all the threads\n" );
printf( "\tc <thread#> - continues the specified thread\n" );
printf( "\ts - suspends all the threads\n" );
printf( "\ts <thread#> - suspends the specified thread\n" );
printf( "\tk <thread#> - sends SIGTERM to the specified thread\n" );
printf( "\tv - turns iteration printing on/off\n" );
printf( "\tq - ends the program\n" );

sigignore( SIGTERM );
bool bKeepRunning = TRUE;
while( bKeepRunning )
 { 
 thread_id = -1;

 thread_index = -1;
 gets( buffer );
 kar = toupper( buffer[0] );
 for( i = 1; buffer[i] != '\0'; i++ )
 {
 if( !isspace( buffer[i] ))
 {
 thread_id = atoi( &buffer[i] );
 for( i = 0; i < iNumThreads; i++ )
 {
 if( thread_id == pt[i]->tid )
 {
 thread_index = i;
 break;
 }
 }
 break;
 } 
 }
 switch( kar )
 {
 case 'A':
 iConcurrencyLevel++;
 // iConcurr... is 1 based, while iNumThreads is 0 based
 if( iConcurrencyLevel > (iNumThreads+1))
 printf( "Error: would have more LWPs than threads!\n" );
 else
 {
 if(( iRet = thr_setconcurrency( iConcurrencyLevel )) != 0 )
 printf( "Error: thr_setconcurrency(%d) = %d\n", iConcurrencyLevel, iRet );
 }
 break;
 case 'I':
 for( i = 0; i < iNumThreads; i++ )
 printf( "\tThread id %d - %s\n", 
 pt[i]->tid, pt[i]->Ended?"GONE":"STILL RUNNING" );
 break;
 case 'C':
 if( thread_id >= 0 )
 pt[thread_index]->Continue();
 else
 for( i = 0; i < iNumThreads; i++ )
 pt[i]->Continue();
 break;
 case 'S':
 if( thread_id >= 0 )
 pt[thread_index]->Suspend();
 else
 for( i = 0; i < iNumThreads; i++ )
 pt[i]->Suspend();
 break;
 case 'K':
 if( thread_id >= 0 )
 pt[thread_index]->Kill( SIGTERM );
 else
 for( i = 0; i < iNumThreads; i++ )
 pt[i]->Kill(SIGTERM);
 break;
 case 'V':

 for( i = 0; i < iNumThreads; i++ )
 pt[i]->bVerbose = pt[i]->bVerbose^0x1;
 break;
 case 'Q':
 bKeepRunning = FALSE;
 break;
 default:
 printf( "Unknown command (%c) (%s)\n", kar, buffer );
 break;
 }
 }
// We really don't have to call End, because the return kills the 
// threads anyway, but cleanliness counts.
for( i = 0; i < iNumThreads; i++ )
 pt[i]->End();
printf( "P%d - Ending thread 0\n", getpid() );
return( 0 );
}
// flagset - tells whether a command-line flag was set. returns an index into
// argv where flag was detected. Use return val+1 to get arg following a flag
int flagset( int argc, char *argv[], char *flag )
{
for( int i = 1; i < argc; i++ )
 {
 if( strcmp( argv[i], flag ) == 0 )
 return( i );
 }
return( -1 );
// This function greps MAXULWP out of mtune so that we can tell 
// when we're about to exceed the maximum allowable number of LWPs per user
id.
int GetMAXULWP()
{
int maxulwp, i;
FILE *fp;
char buf[BUFSIZE];

if(( fp = popen( IDTUNE_CMD, "r" )) < 0 )
 printf( "P%d - Couldn't exec %s - skipping MAXULWP check\n", getpid(),
IDTUNE_CMD );
else
 {
 i = 0;
 while (fgets(buf, BUFSIZ, fp ) != NULL)
 {
 maxulwp = atoi( buf );
 printf( "P%d - Got MAXULWP value of %d\n", getpid(), maxulwp );
 i++;
 }
 if( i > 1 )
 printf( "P%d - ambiguous value for MAXULWP, skipping check\n", getpid() );
 pclose( fp );
 }
return( maxulwp );
}










Visually Designing Embedded-Systems Applications


State diagram inheritance in C++




Doron Drusinsky


Doron holds several patents in the areas of state-chart synthesis and finite
state machine optimization. He is president of R-Active Concepts and Co-Active
Concepts, Ltd. Doron can be contacted at doron@ractive.com.


Because it enables code reuse and enhances maintenance, inheritance is one of
the more-important properties of object-oriented programming. In a C++
implementation of firmware for a product such as a digital answering machine,
for instance, one base (parent) program can perform operating-system-like
routines (storing/deleting messages), while the other performs push-button and
keypad control. With C++, different answering machines can inherit and enhance
these "classes," thereby leading to specific versions of answering-machine
firmware. Also, the parent designs can be reused in many other designs,
without requiring intimate familiarity with the original parent design. 
In my article "Extended State Diagrams and Reactive Systems" (DDJ, October
1994), I introduced the concept of extended state diagrams and illustrated
their applicability for the design and development of "reactive" systems,
which react to inputs that are not ready at any given point in time. Extended
state diagrams (ESDs) are conventional finite state diagrams (FSDs) augmented
with hierarchy (the ability to draw states inside states and do top-down
design), concurrency (the ability to describe independent and conceptually
concurrent threads of control inside any state), and visual synchronization
(the ability to visually describe dependencies between these threads). In
short, ESDs are the visual counterpart of an enhancement of FSMs.
ESDs address the limitations of traditional state diagrams, while retaining
their visual and intuitive appeal. For instance:
FSMs are flat. They do not cater to top-down design and information hiding. 
FSMs are purely sequential, whereas applications are not. Modern controllers
must react to signals to and from many entities in the environment. Take, for
instance, an answering-machine controller specified to cater to a
"second-call-waiting" situation in addition to the "first call." A
conventional FSM needs to account for all possible combinations of states
catering to the first and second callers; this leads to the state-blowup
phenomenon. 
Text-based entry methods, which are by definition sequential, cannot
effectively capture concurrent behavior. Drawing state diagrams on paper and
entering them textually is no longer effective. 
Top-down design concepts require interactive software to enable the user to
manipulate and browse through complex designs. 


Traffic Light Controller Revisited


To illustrate these concepts, consider the following specification for a
traffic light controller (TLC):
There are two directions, Main and Sec, with alternating lights.
Lights alternate based on a Timeout signal read from the Timeout variable.
Initially, all lights flash yellow. Upon Reset going low (0), the ongoing
operation starts. When Reset goes high (1), the system must reset into the
initial state.
A counter counts cars waiting in the Main direction. A simple
signal-processing procedure detects the energy level transmitted from sensors
under the road and compares it against two threshold levels, THRESH and
HIGH_THRESH, which detect a car or a truck. A sequence of three threshold
crossings is interpreted as a car, whereas one or more threshold crossings
followed by a high-level threshold crossing is interpreted as a truck.
If four cars or one or more cars followed by a truck are waiting in the Main
direction, a hidden camera takes shots of the junction whenever the Main
direction has the Red lights.
Using the tools and techniques introduced in the October 1994 article, you can
easily design an ESD for this specification set. Such a design might have a
TLC that toggles between Green and Red, thereby implementing the three initial
requirement items for our controller; see Figure 1. Further, a Car and Truck
detection controller can implement the hidden-camera requirement for the
controller, as in Figure 2.
C programmers might reuse the diagrams in Figures 1 and 2 by copying them into
a new diagram, then refining them to meet all specification items. One
drawback to this approach is that a change made to any parent diagram (due to
a bug, new release, or whatever) has to be manually copied into all models of
the controller. The situation worsens if some of the TLC models are reused by
other designs. In other cases, you may not want the source diagram of any of
the parents to be available to the programmers who are reusing it.
Consequently, in this article I'll describe how inheritance is incorporated in
the ESD framework, allowing diagrams to inherit (multiple) parent diagrams
without actually copying the parent diagrams into the child diagrams. To the
best of my knowledge, the only software that currently enables ESD inheritance
is BetterState Pro, a graphical state-machine design tool my company has
developed. 


Implementing ESD Inheritance


Using two existing ESDs as parent (or base) designs, Figure 3 illustrates a
diagram which captures all specification items for the TLC example.
Inheritance is specified during code generation. The BetterState code
generator provides a list of diagrams which will be inherited by the current
diagram. In the case of Figure 3, these will be base1 (Listing One; listings
begin on page 104) of Figure 1, and base2 (Listing Two) of Figure 2. Each ESD
is realized as a C++ class that inherits the classes of the parent
controllers. Listing Three is the generated code for the child chart in Figure
3.
Each class has private data, representing its internal state configuration (we
use more than one state variable to realize an ESD). It also has a public
function, BS_Fire(), which executes a transition from one state to the
following state within that class, much like the C function call described in
my previous article. However, before the child's BS_Fire() function fires, it
calls the parent's BS_Fire(), thereby allowing the parent controllers to
traverse their own internal transitions. 
Similarly, a controller starts up in its default state (for example, the Idle
state in Figure 3) using its class constructor to set up the initial states.
Again, the child's constructor calls its parent's constructors before it
actually executes its own constructor. A special set of public functions,
denoted as BS_in_St_xxx_Py() (where xxx is a state name and y is a page
number) is available for sensing the state of a parent class from its
children. In Figure 3, for example, the controller actually starts its
activity only sensing that base1 (Figure 1) has moved to the Red2Main state.
It aborts its activities and moves back to the Idle state when it senses that
base1 has moved out of Red2Main. Similarly, it senses whether base2 (Figure 2)
has detected a Car or a Truck. (Visual priorities were discussed in the
previous article.) The two transitions in Figure 3--one from Car2 to Car3, and
the other from Cars to Camera_Shoot--might fire simultaneously, thereby
creating a race situation. The arrowhead colors in Figure 3 resolve this
conflict by giving the purple transition (to Camera_Shoot) a higher priority
than any of the white transitions. 
Visual priorities are ever more important in the context of ESD inheritance.
The transition labeled !BS_in_St_Red2Main_P1() leads to the Idle state. This
transition might fire at the same time as another transition in Figure 3.
Conflicts such as these are commonly generated when the designer of the child
diagram is not aware of the design details of a parent diagram. In the
example, you resolve the race situation by assigning a Black arrowhead to the
transition leading to the Idle state, thereby giving it a higher priority than
the purple or white transitions in Figure 3. Such priority assignments are
effective for eliminating races between child and parent controllers. For
example, a parent controller can specify that the three highest-priority
levels are reserved for its class, out of which one was manifested by its own
parent. This way, conflicts are resolved without disclosing or assuming
knowledge about the contents or behavior of the parent design.
BetterState Pro 2.0 allows an ESD to inherit (or be inherited by) other types
of diagrams, such as Petri-Nets (PNs). A PN looks much like a state diagram,
but is by definition fully concurrent and requires special synchronization to
limit concurrency and introduce explicit, sequential operation. This contrasts
with an ESD, which is by definition sequential and requires concurrency to be
explicitly defined. The code generator generates the same class structure for
ESDs and PNs; therefore, an ESD can inherit and sense the state of a PN, and
vice versa.
In my previous article, I showed how state hierarchy acts as an important
mechanism for top-down design and teamwork design of complex controllers. In
the context of this article, hierarchy is even more important because events
from parent controllers often have a global effect on the child controller.
For example, when base1 moves out of the Red2Main state in Figure 1, it must
cause the child controller of Figure 3 to quit all activities and to return to
the Idle state. Using hierarchy, we need no more than one transition to
describe such a behavior.
Animated playback is a tool for simulating and debugging a design using
animation. In Figure 1, for example, when the program being debugged moves
from Green2Main to Red2Main, the Green2Main regains its original color and
Red2Main turns red, thereby showing that it is the current state. Such visual
simulation and debug capabilities are crucial for correctly designing complex
diagrams, especially when concurrency is involved. When inheritance is
incorporated, the animation tool ripples through the class hierarchy showing
the state changes in each class, according to the actual order of execution.
My approach to code generation (based on the realization of ESD and PN
controllers as classes) lets you instantiate various composite objects. For
example, you can use arrays or dynamic arrays of ESDs. Many modern
applications need such capabilities, including cellular automata and
fine-grain, parallel applications.
Figure 1 Base1 is the top-level processing of traffic light controller.
Figure 2 Base2 is a subcomponent of traffic light controller that controls a
camera.
Figure 3 Complete traffic controller inherits from base1 and base2.

Listing One

//==============================================================
// C++ Code-Generation Output File #1 of 1. Generated for Chart base1 
// by the BetterState Code Generator v2.x
// R-Active Concepts, Cupertino CA, 95014, doron@ractive.com


//----- State ID dictionary for base1
// Use this dictionary to symbolically reference your states (for those
// States that you gave names in the chart editor). 
//------------------------------------------------------------
#ifndef DUMMY
#define DUMMY -7
#endif
#ifndef DONT_CARE
#define DONT_CARE 0
#endif
const int St_Yellow_All_P1 = 9; // mapped to PS[0] 
const int St_Green2Main_P1 = 61; // mapped to PS[0] 
const int St_Red2Main_P1 = 72; // mapped to PS[0] 
const int St_On_going_P1 = 60; // mapped to PS[1] 
 
class CHRT_base1 {
private: 
int PS[2];
int NS[2];
int BS_i;
// This flag, being 1, indicates that controller (or one of it's parents)
//reached a Terminal state. 
int BS_Terminal_Reached;
 //Local variables you defined in your drawing:
public:
// Use Macro to test across threads whether St_Yellow_All_P1 is present state
int BS_in_St_Yellow_All_P1(){return (PS[0]==St_Yellow_All_P1);};
// Use Macro to test across threads whether St_Green2Main_P1 is present state
int BS_in_St_Green2Main_P1(){return (PS[0]==St_Green2Main_P1);};
// Use Macro to test across threads whether St_Red2Main_P1 is present state
int BS_in_St_Red2Main_P1(){return (PS[0]==St_Red2Main_P1);};
// Use Macro to test across threads whether St_On_going_P1 is present state
int BS_in_St_On_going_P1(){return (PS[1]==St_On_going_P1);};
 CHRT_base1()
 {
// Reset state assignments 
 PS[0]=St_Yellow_All_P1;
 NS[0]=DONT_CARE; 
 PS[1]=DUMMY;
 NS[1]=DONT_CARE; 
// On_entry actions for reset states 
 Color_Main=YELLOW;Color_Sec=YELLOW;
 BS_Terminal_Reached=0; 
 }
int BS_Fire();
 }; //end of BS controller class
int CHRT_base1::BS_Fire()
 {
 //-------
 if (PS[0]==St_Yellow_All_P1) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (!Reset)
 {
 NS[0] = St_Green2Main_P1 ;
 NS[1] = St_On_going_P1 ;
 Color_Main=GREEN; Color_Sec=RED;
 }

 //-------
 if (PS[1]==St_On_going_P1) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (Reset)
 {
 NS[0] = St_Yellow_All_P1 ;
 NS[1] = DUMMY ;
 Color_Main=YELLOW;Color_Sec=YELLOW;
 }
 //-------
 if (PS[0]==St_Green2Main_P1) 
 if ( (NS[0] == DONT_CARE)
 ) 
 if (TIMEOUT)
 {
 NS[0] = St_Red2Main_P1 ;
 Color_Main=RED; Color_Sec=GREEN;
 }
 //-------
 if (PS[0]==St_Red2Main_P1) 
 if ( (NS[0] == DONT_CARE)
 ) 
 if (TIMEOUT)
 {
 NS[0] = St_Green2Main_P1 ;
 }
 // Assigning next state to present-state 
 for (BS_i=0;BS_i < 2;BS_i++)
 if (NS[BS_i] != DONT_CARE) 
 {PS[BS_i]=NS[BS_i]; NS[BS_i]=DONT_CARE;}
 // This return statement return 0 if and only if a terminal state was
reached.
 // Use the return value to break out of a loop in which the controller 
 // is called, if you want to suspend the controller in a terminal state.
 return (!BS_Terminal_Reached);
 };
// Bye



Listing Two

//==============================================================
// C++ Code-Generation Output File #1 of 1. Generated for Chart base2
// by the BetterState Code Generator v2.x
// R-Active Concepts, Cupertino CA, 95014, doron@ractive.com
//----- State ID dictionary for base2
// Use this dictionary to symbolically reference your states (for those
// States that you gave names in the chart editor). 
//------------------------------------------------------------
#ifndef DUMMY
#define DUMMY -7
#endif
#ifndef DONT_CARE
#define DONT_CARE 0
#endif
const int St_s0_P2 = 140; // mapped to PS[0] 
const int St_Peek1_P2 = 144; // mapped to PS[0] 
const int St_Peek2_P2 = 145; // mapped to PS[0] 

const int St_Car_detected_P2 = 146; // mapped to PS[0] 
const int St_Truck_detected_P2 = 147; // mapped to PS[0] 
const int St_Peeks_P2 = 143; // mapped to PS[1] 
 
class CHRT_base2 {
private: 
int PS[2];
int NS[2];
int BS_i;
// This flag, being 1, indicates that the controller (or one of it's parents)
//reached a Terminal state. 
int BS_Terminal_Reached;
 //Local variables you defined in your drawing:
 
public:
// Use this Macro to test across threads whether St_s0_P2 is the present state
int BS_in_St_s0_P2(){return (PS[0]==St_s0_P2);};
// Use this Macro to test across threads whether St_Peek1_P2 is present state
int BS_in_St_Peek1_P2(){return (PS[0]==St_Peek1_P2);};
// Use this Macro to test across threads whether St_Peek2_P2 is present state
int BS_in_St_Peek2_P2(){return (PS[0]==St_Peek2_P2);};
// Use Macro to test across threads whether St_Car_detected_P2 is present
state
int BS_in_St_Car_detected_P2(){return (PS[0]==St_Car_detected_P2);};
// Use Macro to test whether St_Truck_detected_P2 is present state
int BS_in_St_Truck_detected_P2(){return (PS[0]==St_Truck_detected_P2);};
// Use Macro to test across threads whether St_Peeks_P2 is the present state
int BS_in_St_Peeks_P2(){return (PS[1]==St_Peeks_P2);};
 
 CHRT_base2()
 {
// Reset state assignments 
 PS[0]=St_s0_P2;
 NS[0]=DONT_CARE; 
 PS[1]=DUMMY;
 NS[1]=DONT_CARE; 
 BS_Terminal_Reached=0; 
 }
int BS_Fire();
 }; //end of BS controller class
int CHRT_base2::BS_Fire()
 {
 //-------
 if (PS[0]==St_s0_P2) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (Energy > THRESH)
 {
 NS[0] = St_Peek1_P2 ;
 NS[1] = St_Peeks_P2 ;
 }
 //-------
 if (PS[0]==St_Peek1_P2) 
 if ( (NS[0] == DONT_CARE)
 ) 
 if (Energy > THRESH)
 {
 NS[0] = St_Peek2_P2 ;
 }
 //-------

 if (PS[0]==St_Peek2_P2) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 if (Energy > THRESH)
 {
 NS[0] = St_Car_detected_P2 ;
 NS[1] = DUMMY ;
 }
 //-------
 if (PS[1]==St_Peeks_P2) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (Energy > HIGH_THRESH)
 {
 NS[0] = St_Truck_detected_P2 ;
 NS[1] = DUMMY ;
 }
 
 // Assigning next state to present-state 
 for (BS_i=0;BS_i < 2;BS_i++)
 if (NS[BS_i] != DONT_CARE) 
 {PS[BS_i]=NS[BS_i]; NS[BS_i]=DONT_CARE;}
 // Return statement return 0 if and only if a terminal state was reached. 
 // Use the return value to break out of a loop in which the controller 
 // is called, if you want to suspend the controller in a terminal state.
 //
 return (!BS_Terminal_Reached);
 };
// Bye



Listing Three

//=========================================================================
// C++ Code-Generation Output File #1 of 1. Generated for Child Chart 
// by the BetterState Code Generator v2.x
// R-Active Concepts, Cupertino CA, 95014, doron@ractive.com

//----- State ID dictionary for Child
// Use this dictionary to symbolically reference your states (for those
// States that you gave names in the chart editor). 
//-------------------------------------------------------------------------

#ifndef DUMMY
#define DUMMY -7
#endif
#ifndef DONT_CARE
#define DONT_CARE 0
#endif
const int St_Wait_P3 = 164; // mapped to PS[0] 
const int St_Car_1_P3 = 177; // mapped to PS[0] 
const int St_Car_2_P3 = 178; // mapped to PS[0] 
const int St_Car_3_P3 = 179; // mapped to PS[0] 
const int St_Camera_Shoot_P3 = 181; // mapped to PS[0] 
const int St_Idle_P3 = 182; // mapped to PS[0] 
const int St_Cars_P3 = 180; // mapped to PS[1] 
const int St_Active_P3 = 185; // mapped to PS[2] 
 
class CHRT_Child: public CHRT_base1,CHRT_base2 {

private: 
int PS[3];
int NS[3];
int BS_i;
// This flag, being 1, indicates that the controller (or one of it's parents)
//reached a Terminal state. 
int BS_Terminal_Reached;
 //Local variables you defined in your drawing:
 
public:
// Use this Macro to test across threads whether St_Wait_P3 is present state
int BS_in_St_Wait_P3(){return (PS[0]==St_Wait_P3);};
// Use this Macro to test across threads whether St_Car_1_P3 is present state
int BS_in_St_Car_1_P3(){return (PS[0]==St_Car_1_P3);};
// Use this Macro to test across threads whether St_Car_2_P3 is present state
int BS_in_St_Car_2_P3(){return (PS[0]==St_Car_2_P3);};
// Use this Macro to test across threads whether St_Car_3_P3 is present state
int BS_in_St_Car_3_P3(){return (PS[0]==St_Car_3_P3);};
// Use Macro to test across threads whether St_Camera_Shoot_P3 is present
state
int BS_in_St_Camera_Shoot_P3(){return (PS[0]==St_Camera_Shoot_P3);};
// Use this Macro to test across threads whether St_Idle_P3 is present state
int BS_in_St_Idle_P3(){return (PS[0]==St_Idle_P3);};
// Use this Macro to test across threads whether St_Cars_P3 is present state
int BS_in_St_Cars_P3(){return (PS[1]==St_Cars_P3);};
// Use this Macro to test across threads whether St_Active_P3 is present state
int BS_in_St_Active_P3(){return (PS[2]==St_Active_P3);};
 
 CHRT_Child():CHRT_base1(),CHRT_base2()
 {
// Reset state assignments 
 PS[0]=St_Idle_P3;
 NS[0]=DONT_CARE; 
 PS[1]=DUMMY;
 NS[1]=DONT_CARE; 
 PS[2]=DUMMY;
 NS[2]=DONT_CARE; 
 BS_Terminal_Reached=0; 
int BS_Fire();
 }; //end of BS controller class
int CHRT_Child::BS_Fire()
 {
//Fire parent controller(s), and ripple down if Terminal state(s) was reached.
BS_Terminal_Reached = (!CHRT_base1::BS_Fire());
BS_Terminal_Reached = ~CHRT_base2::BS_Fire();
 //-------
 if (PS[0]==St_Idle_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) && (NS[2] == DONT_CARE)
 ) 
 if (BS_in_St_Red2Main_P1())
 {
 NS[0] = St_Wait_P3 ;
 NS[1] = DUMMY ;
 NS[2] = St_Active_P3 ;
 }
 //-------
 if (PS[2]==St_Active_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) && (NS[2] == DONT_CARE)
 ) 
 if (!BS_in_St_Red2Main_P1())

 {
 NS[0] = St_Idle_P3 ;
 NS[1] = DUMMY ;
 NS[2] = DUMMY ;
 }
 //-------
 if (PS[1]==St_Cars_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (BS_in_St_Truck_detected_P2)
 {
 NS[0] = St_Camera_Shoot_P3 ;
 NS[1] = DUMMY ;
 }
 //-------
 if (PS[0]==St_Wait_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (BS_in_St_Car_detected_P2())
 {
 NS[0] = St_Car_1_P3 ;
 NS[1] = St_Cars_P3 ;
 }
 //-------
 if (PS[0]==St_Car_1_P3) 
 if ( (NS[0] == DONT_CARE)
 ) 
 if (BS_in_St_Car_detected_P2())
 {
 NS[0] = St_Car_2_P3 ;
 }
 //-------
 if (PS[0]==St_Car_2_P3) 
 if ( (NS[0] == DONT_CARE)
 ) 
 if (BS_in_St_Car_detected_P2())
 {
 NS[0] = St_Car_3_P3 ;
 }
 //-------
 if (PS[0]==St_Car_3_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 if (BS_in_St_Car_detected_P2())
 {
 NS[0] = St_Camera_Shoot_P3 ;
 NS[1] = DUMMY ;
 }
 //-------
 if (PS[0]==St_Camera_Shoot_P3) 
 if ( (NS[0] == DONT_CARE) && (NS[1] == DONT_CARE) 
 ) 
 {
 NS[0] = St_Wait_P3 ;
 NS[1] = DUMMY ;
 }
 // Assigning next state to present-state 
 for (BS_i=0;BS_i < 3;BS_i++)
 if (NS[BS_i] != DONT_CARE) 

 {PS[BS_i]=NS[BS_i]; NS[BS_i]=DONT_CARE;}
 // Return statement returns 0 if and only if a terminal state was reached. 
 // Use the return value to break out of a loop in which the controller 
 // is called, if you want to suspend the controller in a terminal state.
 //
 return (!BS_Terminal_Reached);
 };
// Bye























































Coding with HTML Forms


HTML goes interactive




Andrew Davison


Andrew is a lecturer in the department of computer science at the University
of Melbourne, Australia. He can be reached at ad@cs.mu.oz.au.


The World Wide Web (WWW) is a hypertext-based system that allows users to
"surf" the Internet, accessing information on topics as diverse as astronomy,
the Marx brothers, and kite making. The most common WWW browser is Mosaic, a
graphical tool from the National Center for Supercomputing Applications (NCSA)
that's been ported to most operating systems. In addition, a plethora of new
browsers are currently being released, offering similar (usually extended)
capabilities for Windows, Macintosh, and the X Window system. (There are even
text-based browsers such as Lynx.)
Underpinning the WWW is a page-description language called the "hypertext
markup language" (HTML), derived from the "standard generalized markup
language" (SGML). Essentially, HTML offers a small set of commands which, when
embedded in a text file, allow a browser to display the text, replete with
fancy fonts, graphics and, most importantly, hypertext links to other
hypertext documents.
One feature of HTML is its simplicity: An HTML document can be produced in a
few minutes, as Douglas McArthur demonstrated in his article, "World Wide Web
and HTML" (DDJ, December 1994). However, HTML lacks support for writing
documents which interact with the user. Interaction in most documents consists
of the user deciding which hypertext link to follow next. 
Fortunately, HTML is still evolving. The current specification of the language
is Document Type Definition (DTD) level 2, which includes "forms" that allow a
document to include text-entry fields, radio boxes, selection lists, check
boxes, and buttons. These can be used to gather information for an application
"behind" the document, to guide what is offered to the user next. Some typical
forms documents include a movie database (see
http://www.cm.cf.ac.uk/Movies/moviequery.html), weather-map order form
(http://rs560.cl.msu.edu/weather/getmegif.html), questionnaires, surveys, and
Pizza Hut's famous PizzaNet (http://www.pizzahut.com/). A problem with forms
is that many older WWW browsers (such as the most-common Macintosh version of
Mosaic) do not support them, although this problem is rapidly disappearing as
browsers are updated.
In this article, I'll detail the steps in writing forms-based applications.
Although I'll use NCSA Mosaic for X Windows 2.0, the approach is applicable to
all WWW browsers with forms capabilities. 


HTML Forms 


There are three basic stages in creating a forms-based document:
1. Design the input form and write the corresponding HTML document.
2. Write the application program that interprets the data from the input form.
3. Design the document generated by the program as the reply to the user.
Usually, this document is written in HTML, but this is not mandatory.
Before describing how to do a forms application, let's review the HTML
features for defining forms. A form begins with <FORM ACTION="URL address of
application" METHOD="POST"> and ends with </FORM>. The METHOD attribute
specifies how the data entered in the various fields of the form is
transmitted to the application. It is best to use the POST method, since the
data is then sent to the standard input of the application, as a string of the
form name=value&name=value&_, where name is the name of the form's data-entry
field, and value is its associated data. 
The other method for sending data is GET. This causes the string to arrive at
the server in the environment variable QUERY_STRING, which may result in the
string being truncated if it exceeds the shell's command-line length. For that
reason, GET should be avoided.
A form can contain six types of data-entry fields: single-line text-entry
fields, as in Figure 1(a); check boxes, radio boxes, and selection lists, as
in Figure 1(b); multiline text-entry fields; and submit and reset buttons, as
in Figure 1(c). Single-line text-entry fields, check boxes, and radio boxes
are specified using the same basic HTML syntax: <INPUT TYPE="field-type"
NAME=Name of field" VALUE="default value" >, where field-type can be either:
text, check box, radio, hidden, or password.
For a check box or radio box, the VALUE field specifies the value of the field
when it is checked; unchecked check boxes are disregarded when name=value
substrings are posted to the application.
If several radio boxes have the same name, they act as a one-of-many
selection: Only one of them can be switched "on" and have its value paired
with the name. A hidden text field does not appear on the form, but it can
have a default value which will be sent to the application. A password text
field will echo asterisks (*) when a value is typed into it. A selection list
is specified using the code in Example 1(a). The option chosen will become the
value associated with the selection list's name. It is also possible to
include the attribute MULTIPLE after the NAME string to allow multiple
selections. This maps to multiple name=value substrings, each with the same
name. A multiline text-entry field has the form of Example 1(b).
The submit button causes the document to collect the data from the various
form fields, pair it with the names of the fields, and post it to the
application. The reset button resets the fields to their default values.
Example 1(c) illustrates button syntax. All of these form constructs are
illustrated on Douglas McArthur's form at
http://www.biodata.com/douglas/form.html.
The form's HTML code is shown in Listing One. Figure 1 shows how part of the
form looks on an X-terminal running Mosaic for X Windows. Thirteen other form
examples are accessible through
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
(overview.html also contains more details on the syntax of form fields).
Stylists recommend that a form be separated from the rest of a document by a
horizontal rule (<HR>). Rules are also useful for subdividing logical
subcomponents within a form. The submit button should always be placed at the
end of the form.


Document and Application Communication


When the submit button is clicked, the POST method causes a string to be sent
to the application. The string consists of a series of name=value substrings,
separated by ampersands (&). An added complication is that name and value are
encoded so that spaces are changed into plus signs (+) and some characters are
encoded as hexadecimals. Fortunately, form-application programmers have
written routines for handling these coded strings.
The POST method means that the form application will receive the string on its
standard input. This protocol is defined by the Common Gateway Interface (CGI)
specification, which also states that an application can respond by generating
suitable code on its standard output. Details on the CGI specification can be
found at http://hoohoo.ncsa.uiuc.edu/cgi/interface.html.
The CGI specification permits an application to output many different types of
documents (for example, an image, audio code, plain text, HTML, or references
to other documents). The application determines the output type by writing a
header string to standard output, of the form: Content-type: type/subtype,
where type and subtype must be MIME (Multipurpose Internet Mail Extensions)
types; two common types are text/html for HTML output and text/plain for ASCII
text. There must be two blank lines after the header, and then the data can
begin. For instance, an application (coded in C) could output Example 2. More
details on the CGI output protocol can be found at
http://hoohoo.ncsa.uiuc.edu/cgi/primer.html, while documentation on MIME
begins at http://info.cern.ch/hypertext/WWW/Protocols/rfc1341/0_Abstract.html.


The Echoer Example Program


I'll now turn to a complete example--an "echoer" application. This example's
input document consists of a form with five single-line text-entry fields. The
application processes it by outputting an HTML document containing the text
entered in the fields. In other words, the user's input is echoed.
Figure 2 shows the input document. The form is quite simple: five text-entry
fields, plus submit and reset buttons labeled Start, Search, and Clear,
respectively. Listing Two is the HTML code for the document (it is also
available at http://www.cs.mu.oz.au/~ad/code/form-gp.html). 
The text-field constructs include extra attributes to limit the size of both
the input and the boxes drawn on the screen. The fields are named pat1 through
pat5, although these are not displayed as part of the input document. When the
terms "John" and "uk" are input and the Start Search button is clicked, the
application returns Figure 3.
In form-gp.html, the name of the application is given in the FORM ACTION
attribute as http://www.cs.mu.oz.au/cgi-bin/qgp, where qgp's actual location
on the server depends on the configuration file for the httpd daemon (called
http.conf). The relevant line in that file is Exec /cgi-bin /*
/local/dept/wwwd/scripts/*. In other words, qgp must be placed in
/local/dept/wwwd/scripts for the form to invoke it. This step in linking the
input HTML document to the application varies from system to system. Listing
Three, qgp.c (which can also be found at
http://www.cs.mu.oz.au/~ad/code/qgp.c), consists mostly of utility functions
for processing name=value substrings; consequently, these appear in almost
every form application. The functions were written by Rob McCool and can be
accessed at http://hoohoo.ncsa.uiuc.edu/cgi/forms.html. Also available from
that page are similar utilities for writing applications in the Bourne Shell,
Perl, and Tcl, along with several excellent small programs showing how the
utilities can be used.
The qgp.c program uses five utility functions: makeword(), fmakeword(),
unescape_url(), x2c(), and plustospace(). makeword() builds a word by
extracting characters from a larger string up to a stopping character (or the
end of the longer string). fmakeword() performs a similar operation but reads
from a file and is also supplied with the length of the string left unread in
the file. unescape_url() converts hexadecimal characters in a string into
ordinary characters, by calling x2c(). plustospace() converts the plus signs
(+) in a string into spaces.
main() begins by outputting the header line for the reply document, an HTML
document in this case. The If tests perform standard error checking: The first
determines whether the delivery METHOD is something other than POST; the
second checks the encoding strategy for the name=value substrings. In fact,
the only encoding supported by Mosaic for X Windows 2.0 is
x-www-form-urlencoded, but this may not be the case for other browsers.
The If tests and the use of CONTENT_LENGTH illustrate the importance of
environment variables for conveying information from the input document to the
application. A complete list of environment variables supported by the CGI
specification can be found at http://hoohoo.ncsa.iuc.edu/cgi/env.html.
CONTENT_LENGTH contains the length of the string sent to the application and
is used by the For loop to build the entries array. Each name=value substring
is extracted by a call to fmakeword(). The pluses (+) and hexadecimal URL
encodings are replaced, and then the name part of the substring is removed,
leaving only the value. More output to the HTML reply document follows, then
the final For loop cycles through the entries array and prints the name and
value strings.



The File-Searcher Example Program


The next example, a file-searcher application, uses the input document in
Figure 2, but the application now searches through a text file holding a
membership list. It looks for lines containing the strings entered in the
text-entry fields of the form. A maximum of ten matching lines are printed,
together with the total number of matching lines.
Querying again for "John" and "uk" results in the HTML document in Figure 4
being generated by the application. It can also produce an error document if
no strings are entered before a search is initiated.
Listing Four is the C code for qdir.c (it can also be found at
http://www.cs.mu.oz.au/~ad/code/qdir.c). The compiled version, qdir, is in
/local/dept/wwwd/scripts, and form-gp.html now uses
http://www.cs.mu.oz.au/cgi-bin/qdir as the URL in the FORM ACTION attribute.
main() begins with the same preliminaries as qgp.c and uses the same utility
functions. The call to record_details() logs information about the user in a
file and has no effect on the subsequent code. get_pats() searches through the
entries array and copies the nonempty strings into a patterns array. If there
are no strings in entries, then get_pats() outputs an HTML error document.
build_re() translates the strings in the patterns array into part of a UNIX
command. The idea is to translate a single search string (such as "John") into
the command fgrep 'John' search-file > temp-file. Multiple search strings like
"John," "uk," and "LPA" would be utilized in the command fgrep 'John'
search-file fgrep 'uk' fgrep 'LPA' > temp-file. The trick is to pipe the
matching lines of one call to fgrep into another call which further filters
the selection. 
The matching lines are printed by a While loop which reads at most ten lines
from the temporary file. The total number of lines in the temporary file is
counted by the UNIX wc command (wc -l temp-file > second-temp-file). The value
is read in from the second temporary file and printed to the reply document.
This approach demonstrates how to utilize UNIX as part of an application. UNIX
features are preferable in this case because of the size of the file being
searched and the potentially large number of matching lines that need to be
manipulated. UNIX can also be employed to create forms that edit files, send
mail, read news, or monitor the network, for example.


A Note on Testing


You'll find it useful in the early stages of form design to test the form
without having to write the accompanying application. For instance, early form
testing involves checking what default values are posted if the user
immediately presses the "submit" button.
One possibility is to set the FORM ACTION to point to qgp (or a program like
it), which returns name/value pairs. An alternative is to set ACTION to
http://hoohoo.ncsa.uiuc.edu/htbin-post/post-query. This program does much the
same thing as qgp. The drawbacks are longer network-access time and the
inability to modify the application to test specific form features.
Testing is also a problem with form applications, since it is not possible to
run their user interface (for example, the input form and a browser) inside a
source-level debugger. In normal circumstances, if the application fails, the
browser returns a cryptic message. The easiest way to avoid this problem is to
test a modified version of the application that reads name and value pairs
from the keyboard. Example 3, a fragment of code that illustrates this for
qdir.c, reads strings straight into the val fields of the entries array. Since
the name fields are not used in this application, they are not assigned
values. Output can be sent to the screen and is quite readable even when mixed
with HTML formatting instructions.


Logging


Remember that data may already be available in an access-log file that records
all browser accesses to the server and is set up through the httpd
configuration file. For many applications, however, such general-purpose
logging may not capture all the information required. For instance, a common
reason for recording accesses is to have the application offer different
facilities to different users. Thus, in a video-ordering service, it might be
useful to record the types of films that a user likes, so that similar films
can be pointed out when that user next makes an order. For such specific
information, it is better to have the application carry out the necessary
logging.
In the logging version of qdir.c, access details are collected from four
environment variables: REMOTE_USER, REMOTE_IDENT, REMOTE_HOST, and
REMOTE_ADDR. In addition, the local time on the server is recorded.
The extra code is wrapped up inside the record_details() function called early
in qdir.c. The code for record_details() is included in Listing Four. One
drawback with the first three environment variables is that they are not
guaranteed to have values. REMOTE_USER is only bound if the client and server
support user authentication, and REMOTE_IDENT relies on support for RFC 931
identification. REMOTE_HOST may not be bound, but the IP address equivalent
will be assigned to REMOTE_ADDR.


Conclusion


Forms are an extremely useful mechanism, since they transform HTML from a
hypertext page-description language into a tool for creating interactive
documents. Forms and their associated programs are straightforward to write,
due to the availability of examples, utilities, and documentation accessible
through the WWW.
Figure 1: (a) Form showing text fields; (b) form with check boxes and radio
buttons; (c) form with submit and reset buttons.
Figure 2 Typical input document.
Figure 3 HTML document generated by echoing example application.
Figure 4 HTML document generated by the file-search application.
Example 1: (a) Code that specifies a selection list; (b) multiline text-entry
field; (c) button syntax.
(a)
<SELECT NAME="list title">
 <OPTION>first option
 <OPTION>second option
 :
</SELECT>

(b)
<TEXTAREA NAME="text area name" ROWS=no-of-rows COLS=no-of-columns >
Default text goes here
</TEXTAREA>

(c)
<INPUT TYPE="submit" VALUE="text on button" >
<INPUT TYPE="reset" VALUE="text on button" >
Example 2: C code that generates the first two lines of Figure 2.
printf("content-type: text/html%c%c",10,10); /* 10 is a linefeed */
printf("<H1>Search String Error!</H1>");
printf("<BR>Must specify at least 1 pattern<p>");
Example 3: Code that reads strings straight into the val fields of the entries
array.

char line[LINELEN];
etnum = 0;
while (etnum < PATNO) {
 printf("Enter pattern %d:",etnum+1);
 if (gets(line) == NULL) /* input terminated? */
 break;
 entries[etnum].val = (char *) malloc(sizeof(char)*(strlen(line)+1));
 strcpy(entries[etnum].val, line);
 etnum++;
}

Listing One
<HTML>
<HEAD>
<!-- ------------------------------------------------------------------- -->
<!-- http://www.biodata.com/douglas/form.html - Modified 9/20/94 -=DCM=- -->
<!-- ------------------------------------------------------------------- -->
<TITLE>Prototypical HTML Forms</TITLE>
<H1>Prototypical HTML Forms</H1>
</HEAD>
This document displays the various form gadgets currently supported.
<P>
<FORM ACTION="http://hoohoo.ncsa.uiuc.edu/htbin-post/post-query"
METHOD="POST">
<HR>
<H1>Text Fields</H1>
Basic text entry field:
<INPUT TYPE="text" NAME="entry1" VALUE=""> 
<P>
Text entry field with default value:
<INPUT TYPE="text" NAME="entry2" VALUE="This is the default."> 
<P>
Text entry field of 40 characters:
<INPUT TYPE="text" NAME="entry3" SIZE=40 VALUE=""> 
<P>
Text entry field of 5 characters, maximum:
<INPUT TYPE="text" NAME="entry5" SIZE=5 MAXLENGTH=5 VALUE=""> 
<P>
Password entry field (*'s are echoed):
<INPUT TYPE="password" NAME="password" SIZE=8 MAXLENGTH=8 VALUE=""> 
<HR>
<H1>Textareas</H1>
A 60x3 scrollable textarea:
<P>
<TEXTAREA NAME="textarea" COLS=60 ROWS=3>NOTE:
Default text can be entered here.
</TEXTAREA>
<HR>
<H1>Checkboxes</H1>
Here is a checkbox
<INPUT TYPE="checkbox" NAME="Checkbox1" VALUE="TRUE">,
and a checked checkbox
<INPUT TYPE="checkbox" NAME="Checkbox2" VALUE="TRUE" CHECKED>
. 
<HR>
<H1>Radio Buttons</H1>
Radio buttons (one-of-many selection):
<OL>
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value1">

First choice. 
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value2" CHECKED>

Second choice. (Default CHECKED.)
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value3">
Third choice. 
</OL>
<HR>
<H1>Option Menus</H1>
One-of-many (Third Option selected by default):
<SELECT NAME="first-menu">
<OPTION>First Option
<OPTION>Second Option
<OPTION SELECTED>Third Option
<OPTION>Fourth Option
<OPTION>Last Option
</SELECT>
<P>
Many-of-many (First and Third selected by default):
<SELECT NAME="second-menu" MULTIPLE>
<OPTION SELECTED>First Option
<OPTION>Second Option
<OPTION SELECTED>Third Option
<OPTION>Fourth Option
<OPTION>Last Option
</SELECT>
<P>
<B>NOTE: Hold down CTRL and click to multiple-select.</B>
<!-- You can also assign VALUEs using TYPE="hidden" -->
<INPUT TYPE="hidden" NAME="hidden" VALUE="invisible">
<HR>
<H1>Special Buttons</H1>
Submit button (mandatory):
<INPUT TYPE="submit" VALUE="Submit Form">
<P>
Reset button (optional):
<INPUT TYPE="reset" VALUE="Clear Values">
<P>
</FORM>
<HR>
<H1>References</H1>
Heres a link to
<A HREF="http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/
 fill-out-forms/overview.html">
a handy HTML forms reference
</A>.
<P>
<HR>
<ADDRESS>
Prototypical HTML Form /
<A HREF="http://www.biodata.com/douglas/people/douglas.html">
douglas@BioData.COM 
</A>
</ADDRESS>
</HTML>



Listing Two
<HTML>
<HEAD>
<TITLE>ALP Membership Search</TITLE>
</HEAD>
<BODY>
<H1><img src="../alp/symbol.gif"> ALP Membership Search</H1>
<BR>
<ul>
<li>Enter at most 15 characters in a box (e.g. <code>Melbo</code>). <p>
<li>At least one box should contain something. <p>
<li>Matches are lines in the membership list which contain all 
the box entries. <p>
<li>The first 10 matches will be returned, together with the total
number of matches. <p>
<li>Click the <b>Start Search</b> button to start the search. <p>
<li>All the boxes can be cleared by clicking on the <b>Clear</b> button. <p>
</ul>
<BR>
<H2><img src="gball.gif"> Search Boxes</H2>

<FORM ACTION="http://www.cs.mu.oz.au/cgi-bin/qgp" METHOD="POST">
<INPUT TYPE="text" NAME="pat1" SIZE="15" MAXLENGTH="15" VALUE=""> 
<INPUT TYPE="text" NAME="pat2" SIZE="15" MAXLENGTH="15" VALUE="">
<INPUT TYPE="text" NAME="pat3" SIZE="15" MAXLENGTH="15" VALUE="">
<INPUT TYPE="text" NAME="pat4" SIZE="15" MAXLENGTH="15" VALUE="">
<INPUT TYPE="text" NAME="pat5" SIZE="15" MAXLENGTH="15" VALUE="">
<P>
<BR>
<INPUT TYPE="submit" VALUE="Start Search">
<INPUT TYPE="reset" VALUE="Clear">
<P>
</FORM>
<HR>

<br>
<img src="gball.gif">
USE OF THIS MEMBERSHIP LIST FOR COMMERCIAL OR PROMOTIONAL PURPOSES IS
PROHIBITED. <p>

<img src="gball.gif">
If you have any problems using this service, contact
<a href="http://www.cs.mu.oz.au/~ad">Andrew Davison</a>. <p>

<img src="gball.gif">
If you would like changes made to the membership list,
contact the ALP Administrative Secretary. <p>
<HR>
<ADDRESS>
</BODY>
<a href="../alp/alp-news/dir.html">To Membership List Info</a>
</A>
</ADDRESS>
</HTML>


Listing Three

/* Echo name=value substrings posted from Form */

/* HTML utilities written by Rob McCool */

#include <stdio.h>
#include <stdlib.h>

#define LF 10
#define CR 13
#define MAX_ENTRIES 5 /* number of input fields */

typedef struct {
 char *name;
 char *val;
} entry;

char *makeword(char *line, char stop);
char *fmakeword(FILE *f, char stop, int *len);
void unescape_url(char *url);
char x2c(char *what);
void plustospace(char *str);

main() {
 entry entries[MAX_ENTRIES]; /* HTML name-val pairs */
 int x, cl, etnum;

 printf("Content-type: text/html%c%c",LF,LF);
 if(strcmp(getenv("REQUEST_METHOD"),"POST")) {
 printf("This script should be referenced with a METHOD of POST.\n");
 printf("If you don't understand this, read ");
 printf("<A HREF=\"http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/
 Docs/fill-out-forms/overview.html\">forms overview</A>.%c",LF);
 exit(1);
 }
 if(strcmp(getenv("CONTENT_TYPE"),"application/x-www-form-urlencoded")) {
 printf("This script can only be used to decode form results. \n");
 exit(1);
 }
 cl = atoi(getenv("CONTENT_LENGTH"));

 etnum = 0;
 for(x=0;cl && (!feof(stdin));x++) {
 entries[x].val = fmakeword(stdin,'&',&cl);
 plustospace(entries[x].val);
 unescape_url(entries[x].val);
 entries[x].name = makeword(entries[x].val,'=');

 etnum++;
 }
 printf("<H1>Query Results</H1>");
 printf("You submitted the following name/value pairs:<p>%c",LF);
 printf("<ul>%c",LF);

 for(x=0; x < etnum; x++)
 printf("<li> <code>%s = %s</code>%c",entries[x].name,entries[x].val,LF);
 printf("</ul>%c",LF);
}
/* HTML utilities */
char *makeword(char *line, char stop) {
 int x = 0,y;
 char *word = (char *) malloc(sizeof(char) * (strlen(line) + 1));

 for(x=0;((line[x]) && (line[x] != stop));x++)
 word[x] = line[x];
 word[x] = '\0';
 if(line[x]) ++x;
 y=0;
 while(line[y++] = line[x++]);
 return word;
}
char *fmakeword(FILE *f, char stop, int *cl) {
 int wsize;
 char *word;
 int ll;

 wsize = 102400;
 ll=0;
 word = (char *) malloc(sizeof(char) * (wsize + 1));

 while(1) {
 word[ll] = (char)fgetc(f);
 if(ll==wsize) {
 word[ll+1] = '\0';
 wsize+=102400;
 word = (char *)realloc(word,sizeof(char)*(wsize+1));
 }
 --(*cl);
 if((word[ll] == stop) (feof(f)) (!(*cl))) {
 if(word[ll] != stop) ll++;
 word[ll] = '\0';
 return word;
 }
 ++ll;

 }
}
void unescape_url(char *url) {
 register int x,y;
 for(x=0,y=0;url[y];++x,++y) {
 if((url[x] = url[y]) == '%') {
 url[x] = x2c(&url[y+1]);
 y+=2;
 }
 }
 url[x] = '\0';
}
char x2c(char *what) {
 register char digit;
 digit = (what[0] >= 'A' ? ((what[0] & 0xdf) - 'A')+10 : (what[0] - '0'));
 digit *= 16;
 digit += (what[1] >= 'A' ? ((what[1] & 0xdf) - 'A')+10 : (what[1] - '0'));
 return(digit);
}
void plustospace(char *str) {
 register int x;
 for(x=0;str[x];x++) 
 if(str[x] == '+') str[x] = ' ';
}

Listing Four


/* Search file FNM via a HTML form. This version logs user in file RFNM */
/* Executable is in /local/dept/wwwd/scripts/qdir */
/* HTML utilities written by Rob McCool */
/* The rest by Andrew Davison (ad@cs.mu.oz.au), December 1994 */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

#define LF 10
#define CR 13

#define MAX_ENTRIES 5 /* number of input fields */
#define PATNO 5 /* max number of patterns */
#define RESNO 10 /* max number of matching lines */
#define CMDLEN 200 /* max length of cmd */
#define LINELEN 240 /* max length of input line */
#define NAMELEN 40 /* max length of a file name */

#define FNM "/home/staff/ad/www_public/code/dir.alp" /* file searched */
#define RFNM "/home/staff/ad/www_public/code/people.txt" /* log file */

typedef struct {
 char *name;
 char *val;
} entry;

void get_pats(entry entries[], int etnum, char pat[][LINELEN], int *pno);
char *build_re(char pat[][LINELEN], int tot, char re[]);
void back_to_form(void);
void record_details(void);

char *makeword(char *line, char stop);
char *fmakeword(FILE *f, char stop, int *len);
void unescape_url(char *url);
char x2c(char *what);
void plustospace(char *str);

main() 
{
 char gcmd[CMDLEN]; /* fgrep command string */
 char restexpr[CMDLEN]; /* part of fgrep string */
 char wcmd[CMDLEN]; /* line count cmd string */
 char result[LINELEN]; /* matching line */
 char patterns[PATNO][LINELEN]; /* patterns */

 char tmp_gfname[NAMELEN], tmp_wfname[NAMELEN];
 FILE *gtfp, *wtfp; /* temp file ptrs */
 int rno, pno, nmatch, nlines;

 entry entries[MAX_ENTRIES]; /* HTML name-val pairs */
 int x, w, cl, etnum;

 printf("Content-type: text/html%c%c",LF,LF);
 if(strcmp(getenv("REQUEST_METHOD"),"POST")) {
 printf("This script should be referenced with a METHOD of POST.\n");
 printf("If you don't understand this, read ");
 printf("<A HREF=\"http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/

 Docs/fill-out-forms/overview.html\">forms overview</A>.%c",LF);
 exit(1);
 }
 if(strcmp(getenv("CONTENT_TYPE"),"application/x-www-form-urlencoded")) {
 printf("This script can only be used to decode form results. \n");
 exit(1);
 }
 cl = atoi(getenv("CONTENT_LENGTH"));
 etnum = 0;
 for(x=0;cl && (!feof(stdin));x++) {
 entries[x].val = fmakeword(stdin,'&',&cl);
 plustospace(entries[x].val);
 unescape_url(entries[x].val);
 entries[x].name = makeword(entries[x].val,'=');
 etnum++; 
 }
 record_details(); /* log the user */
/* collect non-empty strings into patterns[] */
 get_pats(entries, etnum, patterns, &pno);

 printf("<H1>Search Results</H1>");
 printf("<BR>Maximum of 10 matching lines are shown for any search.<p>");
 printf("<BR>The following strings are being used for the search:<p>%c",LF);
 printf("<ul>%c",LF);
 for(x=0; x < pno; x++) 
 printf("<li> %s%c",patterns[x],LF);
 printf("</ul>%c",LF);

/* get at most RESNO matching lines */
 tmpnam(tmp_gfname);
 build_re(patterns,pno,restexpr);
 sprintf(gcmd,"fgrep '%s' %s %s > %s",patterns[0], FNM, restexpr, tmp_gfname);
 system(gcmd);

 printf("<BR><b>The lines found are:</b><P>");
 printf("<ul>%c",LF);
 gtfp = fopen(tmp_gfname,"r");
 rno = 0;
 while (rno < RESNO) {
 if (fgets(result, LINELEN, gtfp) == NULL)
 break;
 printf("<li> "); puts(result); printf("<P>");
 rno++;
 }
 printf("</ul>%c",LF);
 fclose(gtfp);
/* count the total number of matching lines */
 tmpnam(tmp_wfname);
 sprintf(wcmd, "wc -l %s > %s", tmp_gfname, tmp_wfname);
 system(wcmd);
 wtfp = fopen(tmp_wfname,"r");
 fscanf(wtfp,"%d", &nlines);
 fclose(wtfp);
 if (nlines > RESNO)
 printf("<BR><b>%d lines printed from a total of %d.</b><P>",RESNO,nlines);
 else if (rno == 0)
 printf("<BR><b>No matching lines.</b><P>");
 else


 printf("<BR><b>%d line(s) printed.</b><P>", rno);
 back_to_form();
 remove(tmp_gfname);
 remove(tmp_wfname);
}
void get_pats(entry entries[], int etnum, char pat[][LINELEN], int *pno)
{
 int x;
 *pno = 0;
 for (x=0; x < etnum; x++) {
 if (entries[x].val[0] != '\0') {
 strcpy(pat[*pno], entries[x].val);
 (*pno)++;
 }
 }
 if (*pno == 0) {
 printf("<H1>Search String Error!</H1>");
 printf("<BR>Must specify at least 1 pattern<p>");
 back_to_form();
 exit(1);
 }
}
char *build_re(char pat[][LINELEN], int total, char re[])
{
 char part [NAMELEN];
 int idx;
 re[0]='\0';
 for (idx=1; idx<total; idx++){
 sprintf(part," fgrep '%s'",pat[idx]);
 strcout(re,part);
 }
}
void back_to_form(void)
{
 printf("<HR><BR><i><a href=\"http://www.cs.mu.oz.au/~ad/code/form-gp.html\">
 Back to Form</a></i>.");
}
void record_details(void)
{
 char *ruser, *rid, *rhost, *raddr;
 struct tm *tp;
 time_t now;
 FILE *rfp;

 rfp = fopen(RFNM,"a");

 ruser = getenv("REMOTE_USER");

 if (strcmp(ruser,"") == 0)
 fprintf(rfp,"no_ruser ");
 else
 fprintf(rfp, "%s ",ruser);
 rid = getenv("REMOTE_IDENT");
 if (strcmp(rid,"") == 0)
 fprintf(rfp,"no_rid ");
 else
 fprintf(rfp, "%s ",rid);
 rhost = getenv("REMOTE_HOST");
 if (strcmp(rhost,"") == 0)

 fprintf(rfp,"no_rhost ");
 else
 fprintf(rfp, "%s ",rhost);
 raddr = getenv("REMOTE_ADDR");
 if (strcmp(raddr,"") == 0) 
 fprintf(rfp,"no_raddr ");
 else
 fprintf(rfp, "%s ",raddr);
 now = time(NULL);
 tp = localtime(&now);
 if (tp == NULL)
 fprintf(rfp,"no_ltime\n");
 else
 fprintf(rfp, "%s",ctime(&now));
 fclose(rfp);
}
/* HTML utilities */
char *makeword(char *line, char stop) {
 int x = 0,y;
 char *word = (char *) malloc(sizeof(char) * (strlen(line) + 1));
 for(x=0;((line[x]) && (line[x] != stop));x++)
 word[x] = line[x];
 word[x] = '\0';
 if(line[x]) ++x;
 y=0;

 while(line[y++] = line[x++]);
 return word;
}
char *fmakeword(FILE *f, char stop, int *cl) {
 int wsize;
 char *word;
 int ll;


 wsize = 102400;
 ll=0;
 word = (char *) malloc(sizeof(char) * (wsize + 1));

 while(1) {
 word[ll] = (char)fgetc(f);
 if(ll==wsize) {
 word[ll+1] = '\0';
 wsize+=102400;
 word = (char *)realloc(word,sizeof(char)*(wsize+1));
 }
 --(*cl);
 if((word[ll] == stop) (feof(f)) (!(*cl))) {
 if(word[ll] != stop) ll++;
 word[ll] = '\0';
 return word;
 }
 ++ll;
 }
}
void unescape_url(char *url) {
 register int x,y;
 for(x=0,y=0;url[y];++x,++y) {
 if((url[x] = url[y]) == '%') {

 url[x] = x2c(&url[y+1]);
 y+=2;
 }
 }
 url[x] = '\0';
}
char x2c(char *what) {
 register char digit;
 digit = (what[0] >= 'A' ? ((what[0] & 0xdf) - 'A')+10 : (what[0] - '0'));
 digit *= 16;
 digit += (what[1] >= 'A' ? ((what[1] & 0xdf) - 'A')+10 : (what[1] - '0'));
 return(digit);
}
void plustospace(char *str) {
 register int x;
 for(x=0;str[x];x++) 
 if(str[x] == '+') str[x] = ' ';
}













































Comparing CASE Tools


Programming by design




Jeffrey L. Armbruster


Jeffrey is team leader of the Windows Development Team at Aircraft Technical
Publishers in Brisbane, California. He can be reached on CompuServe at
72711,3565.


Adopting an object-oriented design methodology can be the single most
important step you take to improving your software-development process. An OOP
methodology crystallizes (or exposes) a system's philosophy and psychology.
Without a methodology, a system is shrouded in mystery. 
What makes a good CASE tool? Detailed opinions appear in the accompanying text
box entitled, "Expert Opinions." In general, most people agree that a tool
should: 
Offer several methodologies and be able to switch between them on the fly.
Check for logic errors.
Animate the execution of the diagram.
Be inexpensive.
Be easy to use.
Allow easy browsing through the class hierarchy.
Allow you to brainstorm your diagrams in a nonlinear fashion.
Allow you to enter code onto a diagram, then update the diagram.
Separate object-model components into categories of responsibility.
Provide version control for diagrams.
Support work groups (multiple users).
Reverse-engineer existing code.
Have hooks into other API Frameworks (such as Microsoft's MFC, Borland's OWL,
and Rogue Wave's DBtools.H++).
Print large diagrams (that is, map large diagrams across 8.5x11-inch pages).
Generate code.
A methodology provides you with a set of rules that guide you toward
organizing, designing, and building an application. It is not a recipe or a
formula. A methodology is a set of symbols that represent classes and the
relationships between them. The symbols allow you to model the behavior of a
system's components. Any competent developer can easily map the needed events
for an application; a design methodology and the model with which it is built
will help uncover inefficiencies and identify opportunities for code reuse and
inheritance. In short, a methodology gives both an eagle's-eye and a
microscopic view of an application.
The CASE tools I examine here--ProtoSoft's Paradigm Plus 2.0, Select Software
Tools' Select OMT, and Object International's Together C++--let you place
symbols to represent classes. Connector symbols representing inheritance,
composite classes, or messages can then be inserted between the classes. The
end result is a model of the behavior of components within your system. The
diagramming symbols used by the tools represent the philosophy of a particular
methodology for designing classes and defining their relationships within a
program. Instead of viewing the system at the microscopic (that is, code)
level, you see it at an eagle's height, using symbols. Since design is
iterative, making changes to the system is painless, as the CASE tool reprints
the system for you. Table 1 lists the features of each tool discussed in this
article. 


The PriceView Example


To examine the CASE tools, I've designed "PriceView," an application that
collects pricing information in grocery stores. As a service to customers, the
program provides them with a price audit. 
PriceView (PV) consists of two components: one application that collects
product prices in grocery stores and transmits the data to the corporate
level, and another that generates reports from collected data stored in the
corporate repository. In this article, I'll concentrate on the report
generator. 
Inputs to PV include customer name and address, store name and address, the
stores to be price-audited, the type of audit ("Full-Store" or
"Customer-Provided List"), items to audit, and audit data from the stores. For
a Full-Store price audit, PV audits all of the items found in the store. For a
Customer-Provided List, the list of items to be audited are transmitted via
modem to the in-store auditor, unless another customer has requested a
Full-Store audit.
PV has several outputs: a management report that includes the number of stores
audited during a given week, the number of items audited, and the number of
customers that ordered an audit for each store. This report tells management
PV's gross sales and income for the week. The "Price-Audit Manager Report" is
created to schedule workers to collect the data at the various stores. The
"Customer Item-Price Report" lets customers see the price-audited items in
spreadsheet form.
Object-oriented methodologies show class relationships in a system--an
application's potential behavior--more than they show program work flow. For
the event-driven behavior of modern GUIs, an OOP methodology detailing class
relationships is more relevant than a work-flow or data-driven diagram.
I've set up the PV model as a hier-archy of classes, with the PriceViewCorp
class as a starting point. PriceViewCorp has an aggregate relationship (Booch,
Rumbaugh) with the classes Store, Report, and Customer. An aggregate
relationship forms a "has_a" or "part_of" relationship. In this situation,
PriceViewCorp is the parent, or owner, of the parts has_a Store, has_a
Customer, and has_a Report. (I won't complicate this example with the
additional detail of attributes and messages/functions for each class.) Store
uses Report, AuditType, and AuditData to form an aggregate relationship. The
AuditType class has two generalization/specialization (Coad/Yourdon) classes
connected to it: FullStore and CustomerProvidedList. A
generalization/specialization relationship indicates class inheritance. So the
FullStore class is_a type of AuditType, and CustomerProvidedList class is_a
type of AuditType. Finally, the abstract/generalized class Report has the
Management, CustomerReport, and AuditManager classes making specialization
connections to it. The class Customer has the AuditSubscription class as an
aggregate relationship.


Paradigm Plus 2.0


Paradigm Plus 2.0 requires that you attach a dongle to a your machine to run
the software. Paradigm Plus (PP) also requires 8 Mbytes of hard-disk space and
the standard Windows 3.1 configurations. PP can be configured as a single- or
multiuser tool. While installation is generally straightforward, network
installation can cause headaches because the software acts as a network file
server, database server, license server, and client workstation. However, the
documentation is complete enough to walk you through the process. I opted for
the single-user version and the installation was hassle free (except for
crawling behind my floor-based tower to plug in the dongle).
PP comes with a user's guide/tutorial, a methods manual, and an installation
manual. The methods manual helps you understand the diagramming symbols if you
have not read a methodology text from one of the methodologists supported by
the tool.
PP supports six methodologies: Rumbaugh et al., object modeling and design,
Booch, fusion, Coad/Yourdon, and Shlaer/Mellor. The methods manual briefly
illustrates the symbols used with each methodology, but it's enough to be
familiar with one to start using the application.
In PP, you select one of the object-oriented methodologies, name your project,
then place and position symbols on the screen to identify the classes and
their relationships to PP; see Figure 1. You click on one of the symbols from
the symbol toolbar to put Paradigm Plus into the drawing mode of that symbol.
The cursor changes to help you see which symbol-mode you have activated. You
then place the symbol on your diagram. You can name it by typing in the name
or you can keep adding symbols to quickly brainstorm your idea into PP. I
named my classes and tried to generate C++ code, but Paradigm Plus returned an
error message until I added attributes and functions/messages to the classes.
At any point when building your model, PP will generate Ada, C++, Smalltalk,
or SQL code from your diagram. Mastering a methodology and using its
principles and symbols to write your programs will improve the quality of your
code. You will reduce your errors and false starts and see the weaknesses and
strengths of your program before you've written a single line of code. And
because the CASE tool writes the code for you, you can easily rearrange the
symbols to improve your design. It is the difference between standing on a
street corner reading street signs or seeing all of the streets on a map. A
methodology provides a detailed, disciplined map.
The question many programmers ask, however, is whether CASE tools improve or
impede the development process. I believe they improve it. For instance, PP
provides an alignment tool for tidying up your diagram by centering or
justifying the symbols (a feature missing from the other CASE tools in this
article). 
Paradigm Plus also allows you to nest or level a diagram. This is an important
feature when you're working with a lot of classes on a single screen. You can
make an object explode or implode into a subdiagram, allowing you to zoom in
to or out of your model's detail. This allows you to layer your model into
Microsoft's MFC Document/View or Smalltalk's Model/View/Controller
architecture, for example. 
Additional Paradigm Plus features include reverse engineering of existing
C/C++ code; availability for Windows, Windows NT, Sun, HP, AIX, and OS/2; a
Basic-like scripting language to customize its behavior; multiuser support;
the ability to generate SQL code for Versant, Objectivity/DB, ONTOS,
ObjectStore, Oracle SQL, IBM's SOM, Raima Object Manager, and HP's Open ODB,
and the ability to generate C, C++, Ada, and Smalltalk code.

In general, Paradigm Plus provides a robust set of tools. The Object Browser,
for instance, lets you view and edit the relationships between all of the
objects within your diagrams. You can view and edit your project using the
matrix editor, a spreadsheet with the rows representing inherited classes, and
columns repre-senting base classes. A class may be made into a generalization
of another class by double-clicking on an intersecting cell. I prefer
indicating the relationships in the diagram editor, but the matrix editor can
also be used to double-check yourself. The table editor gives you another
spreadsheet view of your project. The rows display the name of each class in
your diagram, and the columns display metaclass information (additional data
that Paradigm Plus allows you to collect for each class).


Select OMT


Select OMT uses the Rumbaugh et al. object methodology technique (thus the
"OMT") exclusively. First, you create a project, which in turn creates a
directory to store the project's associated files. You then open either an
object-diagram, state-diagram, or general-graphics-diagram window. The
general-graphics-diagram window allows you to create a (Demarco) data-flow,
object-instance, or an event-flow diagram, but not to generate code. For the
PriceView example, I used the object-diagram window to construct the classes
in Figure 2. This was easy: You click on the object-diagram window, and a
pop-up menu prompts you to select Class, Ternary Association, Free Format
Text, or Free Format Box. If you select Class, Select OMT prompts you for the
class name, then paints a Rumbaugh class symbol on the screen. Some CASE tools
force you to fill out all of the class information (attributes, services,
operations, and the like) before allowing you to add another class--a real
annoyance. Select OMT allows you to brainstorm your entire model, prompting
you only for the barest essentials, permitting you to rapidly see your model
before worrying about the details. When you're ready to add the details, click
on the class and OMT pops up the class-editor dialog box. This is where you
can add/delete class attributes and operations, get a preview of the code that
Select OMT will generate for your class, and enter comments and descriptions
that will be included in your code. 
At some point, your diagram will become too large to fit on one screen. You
can either zoom out or use the Dictionary Item Selector to browse through all
of the classes in your model and view their context. From here, you can edit
the item and check its usage within your model. If your model is huge, the
dictionary lets you create a type filter so that the dictionary displays only
those items that meet your specified type. Select OMT has the standard paint
features for moving groups of classes and tidying up your diagram. At any
point, you can generate or simply preview your code. The code can be generated
to a file or copied to the clipboard and imported into your favorite
Windows-hosted editor.
Select OMT can also print a tiled diagram (mosaic printing), allowing you to
see the full scale of your project by tiling the printed sheets on the wall or
the floor. Select OMT's printing feature is the best in the group. In the
multiuser environment, you can lock out prying eyes by requiring a password.
Select OMT also allows you to view more than one diagram at a time. It is not
as feature rich as Paradigm Plus, but you can get a lot out of this tool for a
lot less money.
One peculiarity was the difference in the code Select OMT and Paradigm Plus
generated. If a class had an association/aggregation relationship with another
class, Paradigm Plus would generate a bidirectional pointer connection. That
is, class A would contain a private pointer to class B, and class B would
contain a private pointer to class A. Rumbaugh specifies this bidirectional
connection in his text. However, Select OMT simply adds a comment in both
classes denoting that they are part of an association/aggregation
relationship.


Together C++


Together C++ uses the Coad/Yourdon object methodology, which is so clean and
simple that you may be able to use Together C++ without reading a book that
explains the methodology. (Coad/ Yourdon symbols are subject, abstract class,
object class, generalization-connector, specialization-connector, message
line, and a subset of the standard flowchart symbols.)
Together C++ opens with four windows. You diagram your model in the
object-model window; see Figure 3. The other three windows contain the classes
in your diagram, subjects, and graphical views of your project. The exciting
feature is that if you type in the code in the editor window, the diagram
window will be updated and vice versa. Your code and diagram always reflect
each other. You create a Class-Object either by clicking on the right mouse
button and selecting class or subject from a pop-up menu or by clicking on the
Class-Object icon in the floating toolbar. Together C++ will prompt you for
the name of the Class-Object and generate the .hpp and .cpp files. Together
C++ arranges the symbols within the object-model window; you can override this
feature by switching to manual or semi-automatic. In semi-automatic mode, you
arrange the symbols and Together C++ arranges the relationship connections. In
manual mode, you handle arranging the symbols and the placement of their
connections. Manual mode also allows you to show the class multiplicities
(one-to-many or one-to-one relationships between Whole/Part classes) for every
class on the diagram. The other products hide the multiplicities in the code
comments.
At any time, you can add comments to your code and Together C++ will record
them automatically. (Comments aren't so easy to add with the other tools.) If
you add attributes, services, or comments to your code using an external
editor, Together C++ will parse your code and update the graphics symbols of
the diagram model. If you want to use Together C++ with an existing project,
Together C++ will parse your code and build the diagram model for you. The
reengineered diagram will need to be rearranged (it will be all jumbled up
across the screen), but it will accurately reflect the model buried in your
code. In fact, of the three tools discussed here, Together C++ has the
fastest, most-accurate parser/reengineering tool.
Together C++ also contains a version-control system so that a team of
developers can use Together C++ for the same project. The team members share a
revision control system (RCS) directory on the network, and Together C++
contains a menu for locking and unlocking the project's new and updated code
and diagram files. It also keeps a history of the changes. When the diagram
begins to look like a spider's web of connections, you can selectively hide
symbols within the diagram, making it easier to read and update. You can
easily bring all of the hidden symbols back with a single menu command.
Together C++ contains many browser windows that allow you to see all of the
files connected to the project, including the .cpp and .hpp files and all of
the classes, attributes, services, and types of connections to other classes
through generalization or specialization.
Like the other CASE tools examined here, you can control the default options
used by the program, such as setting attribute types from ints to chars for
the generated code, building comment templates, changing colors, and so on.


Conclusion


None of the CASE tools I've examined here is perfect. It would be great to
take the best features from each and build a single toolset. Yet, any one of
these tools will go a long way toward making you a better designer, developer,
and programmer. Paradigm Plus is the most comprehensive. It doesn't lock you
in to a single methodology, and it allows you to layer your diagram, with each
layer revealing greater detail. However, Together C++ contains the best
parser--if you enter code in one window, Together C++ updates your entire
diagram. It is also the best at parsing code for an existing project. But it
is weak at layering--you have to hide each class and its subject one at a
time. This is not as clean and quick as Paradigm Plus. On the other hand,
Paradigm Plus's steep price tag may be a sticking point for many developers.
Finally, Select OMT is the best at printing your diagram, the easiest to get
up to speed with, and the least expensive.


For More Information


Paradigm Plus 2.0
ProtoSoft Inc.
17629 El Camino Real, Ste. #202
Houston, TX 77058
713-480-3233
Select OMT
Select Software Tools Inc.
1526 Brookhollow, Ste. #84
Santa Ana, CA 92705
800-577-6633
Together C++
Object International Inc.
8140 N. MoPac 4-200
Austin, TX 78759
512-795-0202
Expert Opinions
When it comes to CASE tools, everyone has an opinion. Here's what some leading
methodologists and programmers have to say about computer-aided software
engineering:
Ed Yourdon. The best CASE tools allow the analyst to model requirements or
software architecture in diagrams (and then have the tool generate the code).
Some essential elements are:
Flexibility of diagramming, so the CASE user can see whatever he wants, in
whatever form he wants to see it.
Good error checking to enforce the OOA/OOD methodology.
A low price, so that you can experiment with the methodology without feeling
you've made a "religious" commitment to it.
Animation is a must-have. It lets you create diagrams and then simulate the
behavior of the system by executing the diagrams.
Larry Constantine. An excellent CASE tool is more than a diagramming tool. It
must actually support the work of real developers working within particular
methodologies. The CASE vendor needs a deep understanding of the techniques
and modeling processes that can only be gained from actually using the
supported methodologies. The tool's user-interface design should reflect
careful attention to detail. Common tasks should be simple to perform and
reflect how developers think about software problems. In the hands of skilled
software developers, a CASE tool should fade into the background, becoming an
extension of the developer's thought processes.
An outstanding CASE tool permits the analyst/developer to move seamlessly back
and forth between different models and between different views of the same
underlying model. A user should be able to develop a data-flow model in one
notation and instantly switch to another; to move through interconnected
models--class hierarchy, object communication, functional decomposition of
methods, and code--as smoothly as navigating through hypertext. The best CASE
tools support the iterative, nonlinear thinking and exploratory processes of
real system development and evolution (as opposed to the linear and limited
life-cycle models enshrined in textbooks and methodology courses).
Peter Coad. An excellent CASE tool should have the object model in one window
and C++ in the other. You can edit in either window, and the two update
themselves continuously. 
I build object models by identifying purposes, finding objects, establishing
responsibilities, and establishing dynamics with scenarios. Therefore view
management and filtering are necessary for viewing specific object-model
components (problem domain, human interaction, data management, task
management, system interaction), subjects (groupings of objects that are
meaningful to look at together), and each scenario, with its objects and its
scenario-specific responsibilities.
Grady Booch. The ideal CASE tool should aid in the creation and visualization
of architectures (instead of just producing pretty pictures). It should be
tied to back-end tools, for both forward and reverse engineering, and it must
manage the tedium of crafting complex systems by having enough semantic
knowledge to check for consistency and correctness of the system being
architected.
"Must-have" features include deep semantic knowledge of the notation; coupling
to other tools via APIs, mechanisms such as OLE, scripts, or programmatic
means; multiuser support; and the ability to scale up to complex things.

James Rumbaugh. Transparency is a key element of an excellent CASE tool. The
mechanics of using the tool should fade into the background--manipulating the
notation directly should be as easy as using a piece of paper. Many Macintosh,
UNIX, and Windows programs embody this quality, including programs for desktop
publishing, financial management, spreadsheets, drawing tools, some
methodology tools, and so on. The lack of transparency has been a drawback of
many CASE tools in the past. 
Must-have features include a good balance between simplicity and power. This
is easy to state, fairly easy to recognize, but much harder to achieve. When a
tool tries to do everything, even simple tasks become difficult because the
user has to make so many choices. A good tool makes common things easy to do
at the expense of more unusual things. A good tool doesn't try to automate
everything (as some AI tools tried to do); instead, it should automate the
simple things and provide a straightforward way of accomplishing the difficult
things, perhaps by some textual escape hatch. A good tool has well-chosen
defaults so that the user can get started without making choices and can later
tailor the results by selecting options. For example, a C++ code generator
should come with some well-chosen defaults that work most of the time but can
be overridden.
Stanley Lippman. I appreciate your including me in a somewhat-august group.
However, I have never used a CASE tool, nor have any of my colleagues here. In
fact, I personally don't know of anyone writing code who uses a CASE
tool--although I know many excellent designers and programmers. Looking at
internal job postings and talking with people leads me to believe that CASE
tools tend to be mandated by management, in many cases due to a sense of
unease about controlling a software project new to OOP. Methodologies and CASE
tools seem to be most fervently espoused by those the least technically
astute. This is a developer's point of view, of course; one who has never
worked in a large project (100+ developers) or contracted to supply a system
to a client--the domains where CASE tools may prove useful.
Stephen J. Mellor and Rod Montrose. Essential elements of an excellent CASE
tool include: group support (analysis is a group activity); configuration
control, at least to the diagram level; enforcement of a particular method
(this is in contrast to the popular CASE drawing tools that only provide
notation support); flexible reporting and printing; support for large diagrams
(software engineering is the only engineering discipline that insists on
keeping everything to a 8.5x11-inch sheet of paper); ability to view multiple
diagrams on the same screen of the current project and other past projects;
and database integrity. This last includes self-checking/repair,
administration utilities, and security levels. The only way to evaluate a CASE
tool's effectiveness is to use it on a moderately sized project with three to
four engineers. With this type of testing, initial impressions such as user
interface will be far less important than method support. Remember, the CASE
tool can only support what the method supports, and the benefits to a project
come from the practice of a method, not just its automation.
We approach must-have features from the Shlaer/Mellor angle. Its three main
characteristics are: partitioning of a problem into domains; rigorous analysis
of each domain and execution of each domain's models; and translation of the
analysis of each domain into an implementation. In a CASE tool, this would
result in: support for domain charts, subsystem models (subsystem-relationship
model, subsystem-communications model, and subsystem-access model), and bridge
descriptions; and support for the analysis diagrams for each domain, including
the object-information model, object-state model, process model, object and
attribute (textual) descriptions, object-communications model, object-access
model, and thread-of-control chart. Ideally, CASE tools would have a detailed
checker that could critique the analysis models and support for simulation and
execution of each domain's analysis models. Many defects traditionally
discovered using debuggers would be uncovered using an analysis simulator, and
a translation engine would accept an architecture and the analysis models to
produce 60 to 80 percent of the final system code.
--J.L.A.
Table 1: CASE-tool comparison.
 Select OMT Paradigm Plus Together C++ 
Methodologies supported Rumbaugh Rumbaugh Coad/Yourdon
 Coad/Yourdon
 Booch
 Shlaer/Mellor
 Fusion
Animate diagram execution No No No
Cost Under $1000 Under $4000 Under $1000
Class-hierarchy browsing Yes Yes Yes
Brainstorm diagrams Yes Yes Yes
nonlinearly
Dynamic updates between No Not easily Yes
code and diagram
Separate object-model Yes Yes Yes
components into
categories of
responsibility
Diagram version control Yes No Yes
Support work groups Yes Yes Yes
Reverse-engineer existing Yes Yes Yes
code
Hook into other API No No Yes*
frameworks
(MFC, OWL, Rogue Wave's
DBtools.H++)
Print large diagrams Yes No No
Generate 3GL code Yes Yes Yes
Hierarchical layering No Yes No
or nesting of diagrams
*requires adjusting the diagram after it parses the MFC or OWL code
Figure 1 Creating classes and their relationships using Paradigm Plus.
Figure 2 Constructing classes using the object-diagram window in Select OMT.
Figure 3 Object-model view in Together C++.























Flexible Testing Systems


Testing software in rapid application development environments




Herb Isenberg


Herb is the technical lead for the software testing group at Charles Schwab &
Co. He can be contacted at hisen@slip.net.


Rapid application development (RAD) environments give programmers the ability
to quickly create client/server and other complex applications. In doing so,
however, RAD environments introduce a new set of problems, especially when it
comes to software testing. 
Traditionally, testing has adhered to a rigid paradigm implemented in terms
of:
Embedded test-script code, characterized by the mixing, or embedding, of
test-case logic, navigation sequences, and data within the same test script.
Exact verification, which is based on the notion that expected results are
either true or false. 
Informal test-case format, the narrative style in which test cases are often
written. The style is informal in that it lacks a well-defined set of test
criteria for organizing test-case components. 
Low-tech testing tools, which are limited to "capture/playback" due to their
inability to easily perform field-level (or object-level) verification.
While valid in many development situations, each of these methods has
generally proven too inflexible to deal with today's rapid-development
methodologies and CASE/prototyping tools. In particular, functional testing of
applications ends up occurring very late in the development process, during or
after construction, due to its rigid requirement that screens and data be
complete and stable. At this point, it is not only difficult and costly to fix
and retest newly discovered bugs, but schedule restraints prevent many bugs
from being identified or corrected. Clearly, a flexible testing system is
needed that's adaptable enough for RAD. 
Such a testing system is even more important for designing and building
automated testing systems in environments where the interfaces (screens and
windows) continually change and data is constantly revised and modified. In
this article, I'll present an automated testing paradigm I've coined "flexible
testing system" (FTS) that represents a shift in how we test application
software. What makes this methodology possible is an emerging class of
intelligent, object-oriented testing tools. Specifically, I'll describe how to
build an FTS that can automate regression testing of business functionality,
while providing valuable information during user-acceptance testing cycles.
This FTS is application independent and facilitates data and environments that
are in constant transition. The FTS I'll describe is built around AutoTester,
an automated testing and verification tool that runs on DOS, Windows, and OS/2
Presentation Manager.
AutoTester records user events and generates a documented test script, rather
than simply capturing keystrokes and mouse events. An editor lets you create
or enhance scripts, including those for toolbars and standard features such as
cut and paste. The testing tool sports a "learn window" tool that
automatically "learns" a window and all controls contained within it and
enables testing of host-based applications through terminal-emulation
packages. AutoTester also provides script synchronization with application
execution, and supports testing of applications built with
KnowledgeWare/Sterling's ObjectView, Powersoft's PowerBuilder, and Gupta's
SQLWindows. 


Flexible Testing Systems: The Problem


I work in the Information Service Department (ISD) of Charles Schwab & Co. The
group I work with (the Business Systems Development Testing Group) provides
testing support for application developers and business users. The development
environment is quickly evolving from a centralized, legacy CICS mainframe
configuration, to a Windows-based, multilayered, distributed architecture,
where CASE tools are employed to rapidly prototype and develop new
applications. The automated testing systems designed for the old configuration
couldn't keep pace in our RAD environment. For instance, our order-entry
system was being designed and constructed in the new development environment,
and test cases were failing because they couldn't locate fields correctly when
their screen positions changed. It was obvious that the testing system was not
working and that we had to come up with a better method.
Our solution was to create intelligent field-navigation and location routines
that utilized "relative context awareness." This allows the test case to vary
its execution based on the context at playback time instead of development
time. Eliminating the field-position problem enabled us to run hundreds of
test cases per hour, daily. This was the first big step toward creating an
FTS.
Our FTS methodology grew quickly. A bad-data issue turned into the dynamic,
multilevel-verification solution, which expanded our entire perspective on
automating testing.


FTS: The Solution


FTS is based on structured test-script code, multilevel verification, and a
dynamic test-case format. Structured test-script code separates test-case
logic, screen navigation, and data, as follows:
A screen program is written to service all the screen/windows in the system.
Screen programs exist independently of test cases and have three major
functions: defining the screen itself and its title or label, creating
variables for all the objects/fields on the screen, and navigation. The screen
program has the intelligence to verify when its screen is up and in focus, as
well as to locate and determine the status of any object on the screen. Screen
programs are CALLed by test cases whenever they need to service a screen (for
example, when entering or validating data).
The navigation component of a screen program is central to FTS methodology. It
is the ability to find screen objects that have unexpectedly changed position
that makes a testing system flexible. For example, assume that in Version 1.0
of some software, the customer-name field is the fourth field on the screen
and can be located by pressing the Tab key three times. In Version 2.0, the
customer-name field is the sixth field on the screen, and no one has
communicated this change to the testers. The question is, how do you design a
testing system so that this unexpected change does not interrupt the flow of
testing?
FTS methodology handles this by first locating the cursor in a "home" position
on the screen. This is usually the first field/object, located at the top-left
corner of the screen. FTS then captures the cursor's row/column coordinates at
the home position and stores them in temporary variables (the testing tool
should be able to automatically capture row/column coordinates). FTS then
moves the cursor to the next field/object (using the Tab or other navigation
key) and checks if it is on the correct field; if not, it checks the current
row/column position against the home position. It continues navigating the
screen until the cursor is on the correct field or back to its home position.
If the cursor returns to home position without locating the field, then the
field probably has either been removed from the screen or had a drastic name
change. At this point, FTS cancels the current test case and begins the next
one.
This FTS navigation process locates fields that have changed position
unexpectedly. FTS also incorporates the concept of "traverse and mark," which
is like navigation with the added benefit of marking where the field should
be, reporting whether it was found in that position or not, continuing to
traverse the screen, and marking where the field is found, independent of
expectations. Table 1 lists FTS, the AutoTester navigation code using
cursor-relative LOOK functionality.


Screen Programs


Another benefit of building screen programs is easy maintenance. When an
application screen changes, only the single corresponding screen program must
be updated. (Remember, test cases are independent of screen programs, so they
are not affected.) For example, if 1000 test cases have been built that access
the customer-entry screen and that screen changes, only the screen program
must be updated.


Data


Data for individual or logically grouped sets of test cases is stored in
separate ASCII text files. Each test case (which defines its own data
requirements) will READ an ASCII file containing data to either input or
verify.
The major benefits of storing data in separate ASCII text files are
maintenance and ownership. Again, single-pointed maintenance can be achieved
by storing data for a large group of test cases in a single ASCII text file.
Additionally, data can quickly be added or modified to accommodate rapid
changes in the system being tested. No technical knowledge of the testing tool
is required to make these changes; the user need only know the application and
how to use a word processor or text editor. Users can maintain their own
testing system and even add new test cases in some instances. This takes the
burden off developers and gives the users more control and ownership. User
ownership greatly increases the productivity of the testing system over time,
and has far-reaching, beneficial effects on the overall organization.
Given the independence of test data and screen navigation, what remains in an
individual test-case script? A READ statement (loop) to pick up its data
requirements; CALLs to screen programs and logs in the order specified by the
test case; and information or logic specific to the individual test case
(flags that point to fields/objects required to execute particular business
functions for this test case).



Data Verification/Evaluation


Multilevel verification is the ability of the testing system to perform
dynamic data verification at multiple levels, and provide the information
(system or edit messages) necessary for evaluating and/or verifying the data.
(Verification refers to correctness of data, while evaluation refers to system
information such as system messages or edits.) The more levels of data
verification/evaluation, the more flexible the testing system.
Dynamic data verification is the process whereby the automated testing tool,
in real time, gets data from a field, compares it to an expected value, and
writes the result to a log file. It also expresses the ability of the testing
tool to make branching decisions based on the outcome of the comparison.
Ideally, the testing tool should combine the GET and COMPARE into one LOOK
function, to simplify dynamic-verification coding.
With an FTS, dynamic data verification/evaluation can be conducted at seven
different levels; see Table 2. Level 1 is the most direct, obvious type of
dynamic data verification. It does a GET/COMPARE (or LOOK) of a single field
and performs a straightforward, one-to-one, True/False data evaluation based
on an expected result.
Level 2 does a GET/COMPARE (or LOOK) for an entire line on the screen. All
text/data that exists within the boundaries of a predefined row is retrieved
from the screen and stored in a variable. The COMPARE is set to either
Inclusive (the compare is True if the line contains only the expected value)
or Exclusive (the compare is True if the expected value exists anywhere on the
line).
Level 3 LOOKs at a predefined, boxed area of the screen with the COMPARE set
to either Inclusive or Exclusive. Level 4 is a cursor-relative LOOK (an
important FTS feature) for verifying data relative to the position of the
cursor, independent of a field's absolute row/column coordinates.
Cursor-relative functionality increases the testing system's flexibility by
making it possible to find a specific field, even if its screen position has
changed. This is accomplished by measuring cursor position relative to field
location. It does not matter where a field is at any given moment, only where
the cursor is in relation to the field. For example, assume the customer-name
field is labeled "Cust Name;" upon entering the field, the cursor is three
bytes from the label "Cust Name." The cursor-relative function LOOKs to see
that the cursor is in fact three bytes from the label, independent of its
row/column coordinates. If so, there is a match--the cursor is correctly
positioned, and the test cases continues from that point.
Level 5 looks for an exact system or application message displayed in the
message area. In this case, the message is captured to a log file only if it
is unexpected. Level 6 does not capture/log the message if any part of it is
found. Level 7 is the most commonly used evaluation type. All message/edits
are captured/logged when displayed, providing the most information for
evaluating/debugging a system's performance. The log indicates exactly when
and where the message occurs--screen name, function key, test operation,
date/time--along with the message. This level of information makes it easy for
you to identify bugs that may be masked over, even when the expected results
match correctly.


Dynamic Test-Case Format


Dynamic test-case format represents and displays the dynamic nature of test
cases and their major components: action, initial state (or screen), data, and
expected result(s). This format makes the FTS easier to automate since all the
test cases have the same structure. Other advantages include: clearly defined
DATA requirements; precise navigation; specific expected results (no guess
work); and an increased likelihood of correctly testing the function. Example
1 illustrates a typical test-case format, while Table 3 describes the dynamic
test-case format's major components.


FTS Tool Requirements


To support the development of a flexible automated-test system, your test tool
must have four basic characteristics:
Fixed and relative context awareness.
Variables.
Variable indirection.
Logic and branching.
Context awareness refers to the ability of the tool to recognize where it is
in the application or system under test. In most GUI environments, this means
recognizing which window is active and which control has focus; in
character-based systems, it includes recognizing the location of the cursor or
defined text strings. Context awareness is important because it allows the
test to vary its execution based on the context at playback time instead of
development time, thus minimizing the impact of changes in the application on
test playback.
The concept of relative context awareness means that the context of one object
or control can be determined with reference to the position or location of
another. In a GUI environment, where windows can be sized and repositioned,
relative context means both the relative position of an item within a window
and its aspect ratio, or relationship to the size of the window. For
text-based applications, this means being aware of the location of a text
string, for example, relative to the position of the cursor. This relativity
of context awareness is critical for reducing or minimizing the effect of
otherwise cosmetic changes, such as moving a window or a field.
Variables are named elements or objects in the test script whose content can
be varied during execution. An obvious type of variable is an edit control in
a GUI application or an input field in a text system; a perhaps less-obvious
type is a window or screen. By assigning a variable to contain the actual
string of input characters, the contents of the control or field can be varied
from one test case or run to another without duplicating the entire test
script for each iteration. The potential values for a variable can then be
supplied either from an external file, from the user at the keyboard, as a
result of calculations, or from another window or screen in the application.
For example, a series of transactions may be created in a spreadsheet or
extracted from a database into an exchange file format, such as ASCII. Then,
by defining a loop in the test tool that reads each record consecutively,
substituting the file values for each defined variable in the test script, a
test sequence that processes a single transaction can be transformed into one
that processes an unlimited number of transactions. These external values can
supply not only inputs but also expected outputs, including responses such as
error messages or state transitions.
Variable indirection is another dimension of variability where a variable is
used to store the name of the target variable: The term "indirection" refers
to the fact that the first variable points to the second, so that it is
accessed indirectly. This level of abstraction greatly compresses the number
and size of test scripts by requiring only one script per common element or
control within a system. (Variable-indirection code by Marin Data Systems;
CompuServe 72172,3661.)
For example, a test script could be created to verify standard requirements of
a certain type of control, such as a check box: setting the focus with the
keyboard, mouse, or a mnemonic; verifying the status of the check box; and
setting the status to either checked or unchecked. By using one variable to
name each control and its parameters, then another to indicate which control
is active, a single set of tasks can be used to verify the behavior of any
check box in the system by simply passing the name of the control and its
parameters to the test script.
As an example, imagine a window named "Update Employee" with an Edit control
named "Last Name." To verify that the field contains the value, "Smith," the
text-file-based test script has the commands Verify Panel, Update Employee and
Verify Field, Last Name, "Smith." The window variables to support this
verification are UPDATE_EMPLOYEE.MAIN (the panel itself) and
UPDATE_EMPLOYEE.LAST_NAME (the edit-control field). The code for the test
script is:
1. Verify Panel, UPDATE_EMPLOYEE
2. Verify Field, LAST_NAME, "Smith"
The testing tool would then concatenate the panel name ("UPDATE_EMPLOYEE") and
field name ("LAST_NAME," the name of the window variable) as a text string in
a temporary text variable (for example, TEMP.TEXT). Then, by using variable
indirection (variable TEMP.TEXT contains the name of another variable,
UPDATE_EMPLOYEE.LAST_NAME), the value "Smith" could be verified directly.
The variable-indirection feature allows the automated-test designer to write a
small number of highly capable control programs that, along with the careful
use of English-like variable-naming standards, provides a self-documenting,
easy-to-read testing system.
"Logic" refers to the ability of the tool to make decisions based on
conditions at the time of playback and vary the sequence of test execution as
a result. The most-basic level includes If/Then/Else decisions that execute or
skip instructions within a single test script based on the outcome. For
example, the test script might first verify whether a particular control or
field has focus before input is attempted; if the focus were not correct, the
test script could set the focus before proceeding.
Another level of logic and branching includes the capability to branch from
one test script to another, then return to the original one and continue. This
is known as "nesting," because one or more test scripts can be nested within
another. For example, a test script that encounters an error during playback
might branch to another script whose function is to log the error, then return
to the original script and continue execution. The error-logging script might
also verify whether the error is a problem with context; if so, it is required
to restore the expected context before playback continues. To accomplish this,
it might also branch to a test script whose function is to recover the
context, or restore the state of the application to a known place. The
error-logging script is nested within the test script, and the
context-recovery script is nested within the error-logging script.
The advantage of logic and branching is that it supports modularity and
flexibility. "Modularity" means that common tasks, such as logging errors or
recovering context, can be developed once and shared across all other scripts.
This saves development time and permits single-point maintenance when a change
is needed. If all scripts use the same error-logging routine, a decision to
capture new system-status information as part of error documentation would
need to be made in only one place (instead of in every script that could
potentially encounter an error).
Logic and branching also maximizes flexibility, as it allows a single test
script to modify its behavior based on results at playback. Otherwise, a
separate test script would be required for each possible condition or pathway,
or the exact state of the system would have to be known beforehand. For
example, if the test script needed to create a new file of a certain name, it
could first verify that there was no existing file of the same name; if one
was found, the steps to delete it could be executed first. This flexibility
means the state of test data and files would not have to be as rigidly
controlled in order to maximize the probability that the test would execute
successfully.


Conclusions


FTS is an evolving technology aimed at increasing longevity and decreasing
maintenance. Longevity reflects the automated testing system's ability to
intelligently adjust and respond to unexpected changes to the application
under test. 
Reduced maintenance applies not only to time but to necessity as well.
One-point maintenance saves time, while features such as heuristic navigation
reduce maintenance necessity.
FTS is beginning to provide solutions to testing challenges arising from new
development technologies and environments. It is important to remember that
flexible testing systems execute test cases, but do not define or create them.
Therefore, FTS is only a piece of the automation puzzle, but its principles
may one day be applied to a model for automating the definition and creation
of test cases. This moves us closer to realizing the goal of a system that
automatically tests itself.


For More Information


AutoTester Inc.
8150 N. Central Expressway, #1300
Dallas, TX 75206
214-368-1196
Table 1: FTS. The AutoTester navigation code using cursor-relative LOOK
function (AutoTester code by Steve Vance, stv@well.com).
Description AutoTester Code 

Save cursor position. LocateCursor host $FOCUSWND W.COLSAVE W.ROWSAVE
Loop through all cursor positions. Label "LOOKLOOP"
Look for field label "Name" Look text "Name" host "$FOCUSWND" @ -9, 0, -4, ,0
(cursor-relative LOOK function). cursor area
Found label? No? If No
Go to next line. >tab
Only one field per row.
Are we back where we started? >LocateCursor host $FOCUSWND W.COL W.ROW
 >Compare W.ROW W.ROWSAVE
 >If Not equal
Not done yet, keep looking. >>Goto "LOOKLOOP"
NOT Find Field Label
Display screen message. >Message "Name FIELD NOT FOUND" wait 1>
Write out log file. Log $CURLOG "Name FIELD NOT FOUND"
Set Field flag to No. > Assign FLAG.FIELD = "NO"
Call outline to print the screen. >Call "PRT_SCR"
Return to test-case outline Resume
called from.
Table 2: FTS dynamic data (a) verification; (b) evaluation.
(a)
Level 1 Exact field-level verification, (1--1).
Level 2 Line LOOK, inclusive or exclusive.
Level 3 Area LOOKs, inclusive or exclusive.
Level 4 Cursor-relative LOOK.
(b)
Level 5 Exact message with capture only on error.
Level 6 Partial message match with capture.
Level 7 Capture message only.
Example 1: Typical test-case format.
Test Case ID: CUST.01.
Function: Add a new Customer.
Data Assumptions: Customer database has been restored.
General Description: Add a new customer via the Customer
 Add screen, and validate that the
 new Customer was displayed correctly
 on the All Customer screen.
Table 3 Sample dynamic test-case format.


























PROGRAMMING PARADIGMS


Fluid Concepts and Creative Analogies




Michael Swaine


This spring, a book was published that proposes to change fundamentally the
direction of research in artificial intelligence. Fluid Concepts and Creative
Analogies, by Douglas Hofstadter and the Fluid Analogies Research Group
(BasicBooks, 1995), challenges many deep assumptions of AI work today, and
lays out a program of research that will annoy some, inspire others, and
entertain many.
This month's column follows Hofstadter, a Pulitzer Prize-winning author and
Indiana University computer scientist, as he takes on the entire
artificial-intelligence community. Along the way, I'll touch on what
Hofstadter has been up to the past 15 years, since the Pulitzer.
It begins, though, with a reminiscence_.
Bloomington, Indiana in the late 1970s was the kind of place you think of when
you hear the words "college town." Rich with small-town flavor, seasoned by
the advantages a major university brings. (And of course the disadvantages:
Even people who hated football memorized the season schedule, so as to avoid
the traffic jams on football weekends.) There were places, like the Runcible
Spoon coffeehouse, that everyone knew about, and yet that somehow seemed to be
the province of the savvy few. The savvy few were cliques of students and
ex-students living a bohemian life of cappuccino and crash pads, of low income
and high intellectual stimulation. More ex-students than you would think.
Graduates and dropouts from the university seemed to find it hard to leave
Bloomington, and the size of the ex-student population was enormous.
I was one of them. For money, I was working at a computer store maintaining
Alpha Micro Systems and CP/M boxes. For intellectual stimulation, I was
hanging out at the Runcible Spoon, freelancing for an early hacker rag that I
thought of as Intelligent Machines Journal (though it had just changed its
name to InfoWorld), and maintaining ties with the computer-science department
from which I had recently received a master's degree.


Enter Douglas Hofstadter


I had stopped by the computer science department office in Lindley Hall one
summer afternoon on some business or other, and the departmental
secretary/student/assistant started raving about a new professor who had just
been hired and would be starting that fall semester. The faculty apparently
thought they had achieved a coup in landing this character, whose name meant
nothing to me. Among other things, the departmental secretary/et cetera told
me, he was the son of a Nobel Prize-winning physicist.
Later that day, I checked the fall schedule to see what this prodigy was
teaching. The description of his first class was so bizarre that I decided on
the spot to sign up to audit.
The class turned out to be all about Hofstadter's Big Book, a fat, ring-bound,
Xeroxed tome printed in an ugly monospace font with no page numbers anywhere.
He'd been shipping it around to publishers without success and was soliciting
feedback from the class.
Among the memories I have of that class, this one stands out: pages and pages
of puzzles. Mostly picture puzzles, like, which picture does not belong in
this group? And visual analogies; I remember being challenged to think how to
write a computer program that would solve visual analogies.
I was struck by the idea that something so frivolous and entertaining could be
meat for computer programs. I was soon to learn that pretty much everything
Hofstadter was interested in was like that: puzzling, clever, entertaining,
and, on the surface, frivolous. I was struck also by the sheer number of these
puzzles and soon learned that it was also a characteristic of Hofstadter not
to use just one example to make an important point if he could come up with
eighty. But I was chiefly struck by the difficulty of the visual-analogy
problems. Clearly, exploring even a small corner of this domain would take
dissertation-scale effort, and I was already making my plans to leave
Bloomington for California, and computer science for computer journalism.
The book, Gdel, Escher, Bach: An Eternal Golden Braid, was picked up by Basic
Books soon after that (back then, when intercaps were less universal, the
publisher spelled its name as two words). It won a Pulitzer Prize and was
reviewed widely, so it's probably not necessary to summarize that book here,
which is fortunate, since GEB is a complex weaving (sorry, braiding) of many
threads. But certain threads run through that book, the current book, and
Hofstadter's intervening work.
GEB was about intelligence and artificial intelligence, among other things,
including typeface design, wordplay, and analogies.
While he was writing the "Metamagical Thomas'' column in Scientific American,
Hofstadter focused on puzzles and wordplay (even the title is an anagram). But
the column (and the book of the same name) also had deep things to say about
creativity, the mind, and the Turing test.
The Mind's I, written with Daniel Dennett, presents some philosophical musings
on minds, brains, and programs.
This year, Hofstadter published his fourth fat book, and this one is where the
speculations and obsessions start to pay off.
Fluid Concepts and Creative Analogies is the report of Hofstadter and his
students' work over the past 15 years in creating programs that solve
analogies problems. The coauthor listed on the spine of the book is the Fluid
Analogies Research Group (FARG), which refers to Hofstadter's students and
ex-students of the past 15 years.
I'd better say up front that Hofstadter and FARG haven't come up with a
program that solves visual analogies. That's still in the future. Analogies
are central to the group's work, though. Hofstadter believes that analogies
are central to intelligence, and hence to artificial-intelligence research.
That belief underlies all the work of FARG, and, it seems to me, all the
conclusions in the book.


Seeking Whence


Hofstadter began exploring analogies in his childhood, but the beginning of
his serious attempts to create computer programs capable of dealing
"intelligently" with analogies was a project called "Seek-Whence."
The name Seek-Whence is, as you would expect from Hofstadter, a play on words.
The program solves number-sequence problems, like "What is the next item in
the sequence '1, 4, 9, 16, _'?" The program is supposed to seek the rule
whence the sequence came. When he presented the number-sequence problem to
students in his first AI class, Hofstadter gave them many examples of
sequences for which a successful program should be able to predict the next
term, such as:
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, _
2, 3, 5, 7, 11, 13, 17, 19, _
3, 5, 11, 17, 31, 41, 47, 59, _
1, 1, 3, 4, 2, 2, 5, 6, 7, _
2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, _
Hofstadter's approach was to model, as closely as possible, the way he himself
attacked such problems. He wasn't concerned with making the best sequence
solver, but with making the best model of what a human being does in solving
number-sequence problems. He wanted to model that mental activity.
The problem with modeling human mental activity is that you need some theory
about what human mental activity is. This presupposes the existence of some
kind of broad agreement about the nature of mental processes; otherwise how do
you characterize what it is you're modeling? And agreement about the nature of
mental processes does not exist. That's why psychologists and cognitive
researchers have repeatedly turned to overt behavior or neural physiology for
the content of their theories. There just is no widely accepted theory of
mental processes, like what the mind is doing in solving number-sequence
problems, for example.
Hofstadter's response to this dilemma was direct. He cut the Gordian knot and
relied on introspection. He knew what steps he himself took in solving these
problems, what blind alleys he was likely to explore, what errors he was prone
to. And that's what his Seek-Whence program was supposed to model.
This very personal approach led him into trouble at least once. Because
Hofstadter has a lot of "number savvy," he built a lot of number savvy (27 is
a cube, a number ending in 5 may be a power of 5, and so on) into his first
program. Reflecting on it later, he realized that this was exactly the wrong
approach if what he wanted to explore was general intelligence. The more his
program relied on knowledge of properties of numbers in solving the
number-sequence problems, the less it was using general strategies of problem
solving. He had, as he later characterized it, fallen into the expert-systems
trap: the idea that the key to intelligence is knowledge and more knowledge.
Hofstadter was morally certain that that view was just plain wrong.


Themes of the Research


He went back to the drawing board and took his students with him. Marsha
Meredith ultimately wrote the program that would be known as Seek-Whence, and
by the time it was done, a number of ideas that would be central to the entire
FARG research program had emerged, including:
The inseparability of perception and high-level cognition. To Hofstadter,
cognition is, at its heart, perception.
The idea that the output of this perception is multilevel cognitive
representations.

The idea of subcognitive pressures. More "important" cognitive representations
exert a probabilistically greater influence on the direction of processing.
A nondeterministic parallel architecture in which top-down and bottom-up
processing gracefully coexist.
The simultaneous exploration of many potential pathways based on an assessed
degree of promise.
The central role of the making of analogies in higher-level cognition.
The idea that cognitive representations are subject to "slippage," with
shallower representations more likely to slip than deeper representations.
The crucial role of the inner structure of concepts in all these goals.


Numb and Number


Two programs written by the FARG group over the next few years dealt in
greater detail with first five central ideas just listed.
"Jumbo" is a program for solving anagrams, specifically those syndicated
newspaper puzzles called "Jumbles.'' The program Jumbo is so careful in
avoiding the "expert-systems trap" that it doesn't even have a
dictionary----extraordinary in a program that is attempting to identify words.
Some of the central ideas took explicit form in Jumbo.
The simultaneous exploration became what Hofstadter calls a "parallel terraced
scan," a search strategy inspired by the Hearsay II speech-understanding
project led by Raj Reddy. A parallel terraced scan explores different paths in
parallel to different depths, working, Hofstadter assures us, much like
sorority rush in Bloomington.
Another element of the architecture is the coderack. The procedural content of
Jumbo is encoded in minimalist codelets that reside in a coderack and are
invoked at random, but with a probability influenced by decisions the program
is making. These decisions are the cumulative result of the working of
codelets, so the whole process is, Hofstadter says, self-sensitive and
self-driven.
Whatever "intelligence" or problem-solving ability the program has emerges
from the semirandom action of these codelets, rather than being coded in. This
is exactly the claim of emergent intelligence sometimes made by neural-net
proponents, but Jumbo is fundamentally different from a neural net. For one
thing, it operates at a much higher level.
FARG next turned its collective eye upon the game of "Numble" in which the
goal is to construct a given number from a set of five other numbers and the
operations of addition, subtraction, and multiplication. Each of the five can
be used at most once, and numbers can be grouped with parentheses; for
example, Make 114 from 11, 20, 7, 1, and 6. The Numbo program tackled such
puzzles very much as Jumbo tackled Jumbles puzzles. That's the most
interesting thing about the program, in fact. Hofstadter and FARG are
interested in finding general mechanisms of creativity and intelligence, so it
is particularly significant that Jumbo and Numbo could almost be described as
being the same program, applied to different domains.


Runcible Ruminations


That idea is taken much further in the chapter that describes the Tabletop
program. Tabletop originated with Hofstadter explaining analogy problems to
one listener or another across the table at the Runcible Spoon coffeehouse.
Hofstadter pushed around cups and spoons on his side of the table and then
invited the listener to "do the same thing." Since the listener's side of the
table often had a different set and configuration of objects, the reader often
had to generalize the concept "do the same thing" in interesting ways.
Tabletop operated in a very small domain, but its "do the same thing" concept
scales up to larger domains. Hofstadter describes variations on the theme:
BattleOp, Op-Platte, and other anagrams name possible programs for larger
domains. Ob-Platte would tackle problems like, "What is the Ob (river) of
Kansas?" (Answer, the Platte.) Or "What is the Vatican City of Indiana?" "What
is the Athens of Georgia?"


Hofstadter's Critique(s) of AI


There's a lot more detail in the book regarding these programs. And I haven't
described the most substantial of FARG's programs, CopyCat, a program that
solves letter-string analogy puzzles such as "The string abc is changed into
abd. Now do the same thing to the string xyz." "The string abc is changed into
abd. Now do the same thing to the string mrrjjj."
Then there's Letter Spirit, a program in the works for designing typefaces.
But it's more important to summarize some of the book's criticisms of
"traditional" artificial-intelligence work.
Among the points are these, paraphrased in my terms:
The inextricable role of perception in cognition has been largely overlooked.
It is necessary to model the process by which mental representations are
formed, but many AI programs take their representations made-to-order.
AI researchers often overstate the abilities of their programs.
The tendency is to tackle relatively large domains; Hofstadter argues for a
return to small domains, like Winograd's Blocks world or Hofstadter's Tabletop
domain.
AI programs need to focus on "understanding" their domains in as deep a sense
as possible. Most current work focuses more on solving problems.
Neural-net models are as likely to give us insight into intelligence as
quantum physics is to help us understand disease.
Hofstadter goes beyond this, with some scathing critiques of some specific
computer models. His approach, he says, differs from expert systems and neural
nets, taking a "middle ground" between these high- and low-level approaches.
Time will tell whether Hofstadter's research program will inspire others to
take the middle ground, but one thing is clear: He has presented an original
approach to AI and done so in a very entertaining way.


Solutions to the Puzzles


Here are the solutions to the number-sequence puzzles presented in this
column: 
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, _ That's one 1, two 2s, three 3s, _
2, 3, 5, 7, 11, 13, 17, 19, _ That's the primes.
3, 5, 11, 17, 31, 41, 47, 59, _ That's p(p(n)); the (nth prime)-th prime. The
second prime, the third prime, the fifth prime, the seventh prime, _
1, 1, 3, 4, 2, 2, 5, 6, 7, _ No definite solution here, but it does show how
analogies can enter into the solution of sequences. You probably noticed the
similarity between the 1, 1 and the 2, 2 subsequences, right? And 3, 4 has
some sort of connection with 5, 6, 7._
2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, _ Successive denominators in the simple
continued-fraction expansion of Euler's constant, e. Obvious when you know the
answer, isn't it?
Here's the solution to the Numbo puzzle: Make 114 from 11, 20, 7, 1, and 6.
Chances are you came up with 20*6--7+1 rather than the apparently simpler
(20--1)*6. Why is that? 
Here are some possible solutions to the letter-string puzzles: The string abc
is changed into abd. Now do the same thing to the string xyz. Sure, there's
xya, but how about wyz? You might consider at what level(s) you are drawing
analogies if you come up with that; Hofstadter speaks of the analogy between
the successor and predecessor relationships. 
The string abc is changed into abd. Now do the same thing to the string
mrrjjj. You might come up with mrrkkk, but isn't mrrjjjj better? But the
analogy you have to draw to come up with this is pretty abstract: You're
mapping alphabetical position onto run length. 
Here are the solutions to the Ob-Platte puzzles: 
What is the Vatican City of Indiana? Speedway, a tiny city with a single
significant edifice and a single attraction, entirely enclosed by the capital
city.
What is the Athens of Georgia? Athens, Georgia.

































































C PROGRAMMING


The Check is in the IMail




Al Stevens


This month's column describes the IMail application, a communications program
that I designed specifically for sending and receiving electronic mail through
a modem connected to an Internet site. I came up with the IMail application
when DDJ established an Internet host with accounts for the editors. (I'm
astevens@ddj.com, for instance.) Before that, I used the CompuServe
Information Service (CIS) for most of my online activities (71101,1262). My
satisfaction with that arrangement was being strained. When I needed to
discuss something with the powers at CIS, I tried several times to get through
to their middle management, and failed--even though I frequently mention the
service in this column and in my books. They don't return phone calls, even to
important journalists like yours truly. No respect.
Then came the GIF patent debacle. By now you know that CIS and Unisys struck
an agreement based on a 1985 patent that Unisys holds on the LZW compression
algorithm, an integral part of CIS's GIF graphical file format. As a
consequence of that agreement, CIS put the money squeeze on (mostly) small
developers who use GIF in their applications. The industry is still reacting
to that unseemly turn of events, and most developers are abandoning GIF in
favor of other formats, such as the recently defined "Portable Network Graphic
Format" (PNG). The way CIS has handled this affair sets my teeth on edge. 
Sometimes you just have to grit those teeth and hang in there. Despite my
disapproval of their treatment of programmers, I needed the services of CIS,
mainly as a way to communicate with readers electronically. CIS's e-mail
service is virtually universal; there are CIS gateways to most other mail
services. The ddj.com connection offered me an acceptable alternative.
Internet mail is even more universal.
The mail system at ddj.com uses the UNIX mail program, a primitive
command-line interface that lets you write, read, reply to, and save mail
messages in text files. The program works, but is not what you would call
feature-laden. To use the mail services of ddj.com routinely, I needed a
better mail-management program, and so I wrote the first version of IMail. You
might recall from last month that I wrote that first version in C so that I
could use D-Flat. That experience had me groaning at every turn. A little C++
angel sat on my shoulder and whispered into my ear every time I coded a C
construct that would have been easier in C++. Once the program was working, I
heeded that whisper and rewrote the program in C++, first building C++ wrapper
classes around D-Flat. Last month I described the wrapper. This month I'll
talk about the C++ application.


IMail: The Requirements


First let's consider the requirements for the program. It should store and
remember configuration items such as login identification, password, telephone
number, baud rate, communications port, data format, and modem-command
strings. The program should be a multiple-window application with windows to
display incoming and outgoing messages. It should remember how the user (me)
positions and sizes those windows, and it should preserve those settings
between executions. The program should maintain an address book of people to
whom I send mail and let me select from that book as a function of addressing
messages. The program should hold incoming and outgoing messages in their
respective mailboxes until they have been read or sent. An option should
permit me to automatically save copies of outgoing messages. The program
should allow me to reply to and forward messages that I am reading. There
should be user-defined file folders into which I can file mail messages. The
mail-creation editor should support the usual text-editing functions. The
program should be able to dial the host computer and upload and download all
pending mail messages without my interaction or intervention. The program
should support automatic message uploads and downloads, as well as interactive
logins with the ability to upload and download binary files by using the
XModem file-transfer protocol.


IMail: The Implementation


The IMail application uses the features of the D-Flat wrapper classes to
define menus and dialog boxes and to associate member functions with those
user-interface items. Listings One and Two are mailappl.h and mailappl.cpp,
the source-code files that implement the MailAppl class, which is derived from
the dfApplication class (listings begin on page 135). The class seems big, but
it consists mostly of member functions associated with menu commands through
the MSGMAP table. Many of these member functions do little more than open
dialog boxes, which do most of the work of managing mail messages. Those
dialog boxes are implemented in their own source-code files. I'm publishing
only the application class this month, but I'll discuss generally how the
whole program works.


Scripts


Interaction with the host--logging in, reading mail, sending mail, logging
off--is managed by a set of scripts. IMail watches for prompting strings from
the host and sends UNIX commands accordingly. The scripts are coded in a C++
source-code file named scripts.cpp. By isolating them there, I can change the
scripts to work with other mail protocols if I want. They are not scripts in
the truest sense of the word, however: They are not interpreted like the
scripts of other communications programs. Since this program is for
programmers who can be expected to have compilers, the scripts are written to
be compiled into the program along with the rest of the code.


Mailboxes


IMail uses mailbox classes that store messages and display lists of the
messages. Mail-reader classes display the contents of messages. The user can
read messages, delete them, and file them in other mailboxes. The program uses
two default mailboxes for incoming and outgoing messages: INBOX and OUTBOX.
Other mailboxes are implemented as folders into which the user files messages
to be saved. The user can establish and delete folders. The program maintains
folders as subdirectories under a fixed subdirectory named CABINET. Any
subdirectory under CABINET without a file extension is automatically treated
as a folder. One such folder is named SENT and optionally holds copies of all
the messages that the user sends. Another folder is named GENERAL, to be used
as needed. The user may delete the GENERAL folder but not the SENT folder.
Messages in the mailboxes are text-file copies of the messages as written by
the user or received from the user's correspondents. Each message is its own
text file, and the program assigns a unique filename to each message file. The
message filename has this format: MAILnnnn.MSG. The first message written to a
folder is named MAIL0000.MSG, the next is MAIL0001.MSG, and so on. When
messages are deleted or moved to other folders, their names can be reused for
new messages in the folder.
I don't save a lot of mail. If I did, putting every message into its own text
file would get expensive. Not only would each message consume a directory
entry, but every message would also use disk space in multiples of the file
system's cluster factor, which on my computer is 8K. If I were a heavy mail
user and saver, I'd build an indexing system and concatenated message files,
perhaps even with conversation-thread management.


Incoming Messages


IMail receives mail messages when it sees the text string "you have mail" in
the input stream from the host. The UNIX system displays that message when you
log in if mail is indeed waiting. The program waits for the $ prompt and then
issues the following command as if you had typed it: cat $MAIL. The UNIX cat
command displays a text file on the screen, and the $MAIL environment variable
expands to the name of the logged-on user's incoming mail file in the UNIX
environment. IMail captures the input stream into a disk file named IMAIL.TMP.
When the messages are copied, IMail sends this command to delete the mail text
file from the UNIX system: rm $MAIL.
After IMail logs off and hangs up, it converts the messages in IMAIL.TMP into
message files in the incoming-mailbox folder. Each received message is stored
in the incoming mailbox in this format: a line of text with the sender's mail
address, a second line of text with the subject, and a third line with the
date the message was written. Subsequent lines of text constitute the message
body.
By reacting to the UNIX mail program's format for incoming messages, IMail is
able to preserve only those three lines from the typically abundant, verbose
message headers that are created by Internet mail transport programs and that
are meaningless to most people. This one feature is worth the whole program.
It removes one of my principal gripes about Internet mail--all that junk up
front that nobody reads. The program parses the first lines of text, watching
for lines with the strings "From:", "Subject:", and "Date:" in the first
characters of the line. The program extracts the pertinent fields from those
lines, ignoring all others until it sees a blank line, which marks the end of
the message header and the start of the message text.


Outgoing Messages


For each outgoing message, the IMail script simulates an interactive session
with the UNIX mail program. The script sends the "mail" command, a space
character, the recipient's mail address, and a carriage return. Then the
script waits for a "ject:" string that prompts for the subject, whereupon it
sends the subject text and a carriage return. Next, it sends the message text,
a final carriage return, and the Ctrl-D character to tell the mail program
that the message is complete. When the $ prompt comes back, the script copies
the message to the SENT folder if that option is selected and deletes the
message from the outgoing message box.
Both scripts have a lot of one-second delays built in between interchanges
with the host. These delays reflect what I have determined works best. Without
them, some incoming data characters are lost, and the program goes into a
timeout loop waiting for an expected string from the host.



The Communications Classes


To support the dial-up connection, I encapsulated the operations of the serial
port and the modem in two classes. The CommPort class includes a lot of
low-level stuff to connect to the input interrupt vector, read and write the
serial device's data and status ports, and manage things like XON/XOFF
protocols. The Modem class manages modem operations--dialing, hanging up,
sending modem commands, and watching the carrier-detect signal. An XModem
class handles uploads and downloads of binary files with the XModem
file-transfer protocol. These operations allow me to use ddj.com for FTP
downloads and for exchanging files with the staff at DDJ.


IMail: The Future


I developed IMail and the D-Flat wrapper classes with an objective: Like the
rest of the world, I'm gradually becoming more of a Windows user and less of a
DOS user. The architecture of this software leans toward that of a program
written under the Microsoft Foundation Classes. Eventually IMail will be a
Win32 application.


Source Code and Database


The source-code files for IMail and the D-Flat libraries are free. You can
download them from the DDJ Forum on CompuServe and on the Internet by
anonymous FTP; see "Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Make sure that you
include a note that says which project you want. The code is free, but if you
care to support my Careware charity, include a dollar for the Brevard County
Food Bank.

Listing One 

// ------- mailappl.h
#ifndef MAILAPPL_H
#define MAILAPPL_H

#include <fstream.h>

#define MAILPREFIX "MAIL"
#define MAILSUFFIX "MSG"
#define SENTFOLDER "SENT"
#define GENERALFOLDER "GENERAL"
#define INBOXSUBDIRECTORY "INBOX"
#define OUTBOXSUBDIRECTORY "OUTBOX"
#define CABINETSUBDIRECTORY "CABINET"
#define SENTSUBDIRECTORY CABINETSUBDIRECTORY "\\" SENTFOLDER
#define GENERALSUBDIRECTORY CABINETSUBDIRECTORY "\\" GENERALFOLDER
#define ADDRBOOK "ADDRBOOK.DAT"

#include <cstring.h>
#include "dfwrap.h"
#include "icmds.h"
#include "mailbox.h"
#include "contents.h"
#include "modem.h"

extern MBAR MainMenu;
extern DBOX ReadMailDB;
extern DBOX CreateMailDB;

class CreateMail;
class OnLineWnd;
class Modem;

// ------------ configuration items
struct IMailConfig {
 // ----- login configuration
 char phoneno[25]; // phone number
 char loginid[35]; // user's remote login id
 char password[15]; // user's remote password (encrypted)

 bool viewlogin; // true to view login scripts as they run
 bool saveoutgoing; // true to save outgoing mail in SENT
 // ----- communications configuration
 CommParameters commparms;
 // ----- modem configuration
 ModemParameters modemparms;
};
class MailAppl : public dfApplication {
 bool m_connected;
 bool m_closing;
 bool m_EscPressed;
 ofstream logfile;
 ofstream mailfile;
 CreateMail *mp_CreateMail;
 OnLineWnd *mp_OnLineWnd;
 InBox m_inbox;
 OutBox m_outbox;
 Folder m_folder;
 Modem *mp_Modem;
 bool LoadConfig();
 void SaveConfig();
 void EnDeCryptPassword();
 bool OnDownload();
 bool OnUploadXmodem();
 bool OnUploadASCII();
 bool OnViewLogin();
 bool OnDeliver();
 bool OnLogin();
 bool OnManual();
 bool OnHangup();
 bool OnLogTransmissions();
 bool OnCreateMail();
 bool OnAddrBook();
 bool OnKeyboard();
 bool OnAbout();
 bool OnInBox();
 bool OnOutBox();
 bool OnOpenFolder();
 bool OnFileCabinet();
 bool OnLoginOptions();
 bool OnCommOptions();
 bool OnModemOptions();
 bool OnClose();
 void LoginScript();
 void CollectMailScript();
 void DeliverMailScript();
 void ConvertMailScript();
 void GoInteractive(const CommParameters& cp,
 const char *phone, bool autolog);
 bool GetCommOptions(const string& ttl,
 CommParameters& commparms, char *phoneno);
 void DisableMenuCommands();
 void EnableMenuCommands();
 friend class ReadMail;
protected:
 MAPDEF(dfApplication)
public:
 static IMailConfig iCfg;
 // ----- document window configurations

 static DialogConfig CreateMailCfg;
 static DialogConfig ReadMailCfg;
 MailAppl();
 virtual ~MailAppl();
 bool isConnected()
 { return m_connected; }
 bool isModemDeclared()
 { return mp_Modem != 0; }
 const InBox& InBoxRef() const
 { return m_inbox; }
 const MailBox& OutBoxRef() const
 { return m_outbox; }
 Folder& FolderRef()
 { return m_folder; }
 void IMailPath(string& path) const;
 void AddrBookPath(string& path) const;
 void ConfigPath(string& path) const;
 void TempMailPath(string& path) const;
 void LogSerialInput(int ch);
};
extern MailAppl *mailappl;
#endif



Listing Two

// -------- mailappl.cpp
#include <fstream.h>
#include <strstrea.h>
#include "mailappl.h"
#include "addrbook.h"
#include "crmail.h"
#include "modem.h"
#include "loginopt.h"
#include "commopt.h"
#include "modemopt.h"
#include "fcabinet.h"
#include "onlinewn.h"

extern "C" {
char DFlatApplication[] = "IMail";
}
IMailConfig MailAppl::iCfg = {
 // ----- login configuration
 "", // phone number
 "", // user's remote login id
 "", // user's remote password (encrypted)
 true, // true to view login scripts as they run
 true, // true to save outgoing mail in SENT
 // ----- communications configuration
 {
 1, // serial port (1,2)
 0, // parity (0=N, 2=E)
 1, // stop bits (1)
 8, // data bits (7,8)
 9600 // baud rate
 },
 // ----- modem configuration

 {
 RESETMODEM, // reset string
 INITMODEM, // initialize string
 DIAL, // dial command
 HANGUP // hangup string
 }
};
// --- dialog box configurations to be saved and restored
DialogConfig MailAppl::CreateMailCfg;
DialogConfig MailAppl::ReadMailCfg;

MSGMAP(MailAppl)
 MSG(KEYBOARD, OnKeyboard)
 MSG(ID_DOWNLOAD, OnDownload)
 MSG(ID_UPLOADASCII, OnUploadASCII)
 MSG(ID_UPLOADXMODEM, OnUploadXmodem)
 MSG(ID_VIEWLOGIN, OnViewLogin)
 MSG(ID_DELIVER, OnDeliver)
 MSG(ID_LOGIN, OnLogin)
 MSG(ID_MANUAL, OnManual)
 MSG(ID_HANGUP, OnHangup)
 MSG(ID_LOGTRX, OnLogTransmissions)
 MSG(ID_CREATE, OnCreateMail)
 MSG(ID_ADDRBOOK, OnAddrBook)
 MSG(ID_LOGINOPTIONS, OnLoginOptions)
 MSG(ID_COMMUNICATIONS, OnCommOptions)
 MSG(ID_MODEMOPTIONS, OnModemOptions)
 MSG(ID_INBOX, OnInBox)
 MSG(ID_OUTBOX, OnOutBox)
 MSG(ID_FILECABINET, OnFileCabinet)
 MSG(ID_ABOUT, OnAbout)
 MSG(CLOSE_WINDOW, OnClose)
ENDMAP
// -------- construct the mail application
MailAppl::MailAppl() : dfApplication(string(), MainMenu),
 m_connected(false),
 m_closing(false),
 m_EscPressed(false),
 mp_CreateMail(0),
 mp_OnLineWnd(0),
 mp_Modem(0)
{
 string path;
 m_inbox.MailBoxPath(path);
 mkdir(path.c_str());
 m_outbox.MailBoxPath(path);
 mkdir(path.c_str());
 MailBox::CabinetPath(path);
 mkdir(path.c_str());
 mkdir((path+GENERALSUBDIRECTORY).c_str());
 mkdir((path+SENTSUBDIRECTORY).c_str());
}
// ---- destroy the mail application
MailAppl::~MailAppl()
{
 if (logfile.rdbuf()->is_open())
 logfile.close();
 delete mp_CreateMail;
 delete mp_OnLineWnd;

 delete mp_Modem;
}
// ---- encrypt and decrypt the password for the the configuration
void MailAppl::EnDeCryptPassword(void)
{
 char *pwd = iCfg.password;
 srand(62490U);
 while (*pwd && pwd < iCfg.password + sizeof iCfg.password)
 *pwd++ ^= (rand() % 256);
}
// ---- load the program's configuration from the last run
bool MailAppl::LoadConfig()
{
 dfApplication::LoadConfig();
 string path;
 ConfigPath(path);
 ifstream cfgfile(path.c_str(), ios::binary);
 if (!cfgfile.fail()) {
 cfgfile.seekg(sizeof(CONFIG));
 cfgfile.read(reinterpret_cast<char*>(&iCfg), sizeof(iCfg));
 cfgfile.read(reinterpret_cast<char*>(&ReadMailCfg),
 sizeof(ReadMailCfg));
 cfgfile.read(reinterpret_cast<char*>(&CreateMailCfg),
 sizeof(CreateMailCfg));
 EnDeCryptPassword();
 }
 if (iCfg.viewlogin)
 SetCommandToggle((commands)ID_VIEWLOGIN);
 return true;
}
// ---- save the program's configuration for the next time
void MailAppl::SaveConfig()
{
 dfApplication::SaveConfig();
 string path;
 ConfigPath(path);
 ofstream cfgfile(path.c_str(), ios::ate ios::binary);
 if (!cfgfile.fail()) {
 EnDeCryptPassword();
 cfgfile.write(reinterpret_cast<char*>(&iCfg), sizeof(iCfg));
 cfgfile.write(reinterpret_cast<char*>(&ReadMailCfg),
 sizeof(ReadMailCfg));
 cfgfile.write(reinterpret_cast<char*>(&CreateMailCfg),
 sizeof(CreateMailCfg));
 EnDeCryptPassword();
 }
}
// ---- build a path to the IMAIL application
void MailAppl::IMailPath(string& path) const
{
 path = _argv[0];
 int wh = path.find_last_of("\\");
 path.resize(wh+1);
}
// ---- build a path and filename for the configuration file
void MailAppl::ConfigPath(string& path) const
{
 IMailPath(path);
 path += DFlatApplication;

 path += ".cfg";
}
// ---- build a path and filename for the address book
void MailAppl::AddrBookPath(string& path) const
{
 IMailPath(path);
 path += ADDRBOOK;
// ---- build a path and filename for the temporary mail file
void MailAppl::TempMailPath(string& path) const
{
 IMailPath(path);
 path += DFlatApplication;
 path += ".tmp";
}
// ---- File/Download... command
bool MailAppl::OnDownload()
{
 if (mp_Modem != 0) {
 char FileName[128] = "*.*";
 while (SaveAsDialogBox(FileName, "*.*", FileName)) {
 if (access(FileName, 0) == 0) {
 strstream ermsg;
 ermsg << FileName
 << " exists. Replace it?"
 << ends;
 bool rtn = YesNoBox(ermsg.str());
 delete ermsg.str();
 if (!rtn)
 continue;
 }
 ofstream downfile(FileName, ios::binary);
 mp_Modem->DownloadXmodem(downfile);
 break;
 }
 }
 return false;
}
// ---- get a file name to upload from the user
bool MailAppl::GetUploadFile(ifstream& upfile, int mode)
{
 char FileName[128] = "*.*";
 while (OpenFileDialogBox(FileName, FileName)) {
 upfile.open(FileName, mode);
 if (upfile.fail()) {
 strstream ermsg;
 ermsg << "No such file as " << FileName << ends;
 ErrorMessage(ermsg.str());
 delete ermsg.str();
 continue;
 }
 return true;
 }
 return false;
}
// ---- File/Upload/Xmodem... command
bool MailAppl::OnUploadXmodem()
{
 if (mp_Modem != 0) {
 ifstream upfile;

 if (GetUploadFile(upfile, ios::binary))
 mp_Modem->UploadXmodem(upfile);
 }
 return false;
}
// ---- File/Upload/ASCII... command
bool MailAppl::OnUploadASCII()
{
 if (mp_Modem != 0) {
 ifstream upfile;
 if (GetUploadFile(upfile))
 mp_Modem->UploadASCII(upfile);
 }
 return false;
}
// ---- View/View login script command
bool MailAppl::OnViewLogin()
{
 iCfg.viewlogin = GetCommandToggle((commands)ID_VIEWLOGIN);
 return false;
}
// ---- CLOSE_WINDOW message
bool MailAppl::OnClose()
{
 if (m_connected) {
 if (YesNoBox("Log off and exit?")) {
 m_connected = false;
 m_closing = true;
 }
 return false;
 }
 return true;
}
// ---- command codes to be disabled during on-line session
static int dcmds[] = {
 ID_ADDRBOOK,
 ID_VIEWLOGIN,
 ID_DELIVER,
 ID_DOS,
 ID_CUT,
 ID_COPY,
 ID_PASTE,
 ID_DELETETEXT,
 ID_PARAGRAPH,
 ID_REPLACE,
 ID_SEARCHNEXT,
 ID_INTERACTIVE,
 ID_CREATE,
 ID_INBOX,
 ID_OUTBOX,
 ID_FILECABINET,
 ID_LOGINOPTIONS,
 ID_COMMUNICATIONS,
 ID_MODEMOPTIONS,
 ID_DISPLAY,
 ID_CLOSEALL,
 ID_WINDOW,
 0
};

// ----- disable menu commands during on-line session
void MailAppl::DisableMenuCommands()
{
 for (int i = 0; i < *(dcmds+i) != 0; i++)
 DeactivateCommand(&MainMenu, *(dcmds+i));
}
// ----- enable menu commands following on-line session
void MailAppl::EnableMenuCommands()
{
 for (int i = 0; i < *(dcmds+i) != 0; i++)
 ActivateCommand(&MainMenu, *(dcmds+i));
}
// ---- log onto the host for an interactive session
void MailAppl::GoInteractive(const CommParameters& cp,
 const char *phone, bool autolog)
{
 DisableMenuCommands();
 mp_Modem = new Modem(cp, iCfg.modemparms);
 if (iCfg.viewlogin !autolog)
 mp_OnLineWnd = new OnLineWnd(mp_Modem);
 m_connected = mp_Modem->Dial(phone);
 if (m_connected && autolog)
 LoginScript();
 if (mp_OnLineWnd == 0)
 mp_OnLineWnd = new OnLineWnd(mp_Modem);
 while (m_connected) {
 dispatch_message();
 if (mp_Modem->InputCharReady())
 LogSerialInput(mp_Modem->ReadChar());
 else if (!m_closing)
 m_connected = mp_Modem->TestCarrier();
 }
 delete mp_OnLineWnd;
 mp_OnLineWnd = 0;
 delete mp_Modem;
 mp_Modem = 0;
 WriteStatus("Off line");
 EnableMenuCommands();
 if (m_closing)
 CloseWindow();
}
// ---- Connect/Send-receive Mail command
bool MailAppl::OnDeliver()
{
 DeactivateCommand(&MainMenu, ID_EXIT);
 DisableMenuCommands();
 mp_Modem = new Modem(iCfg.commparms, iCfg.modemparms);
 if (iCfg.viewlogin)
 mp_OnLineWnd = new OnLineWnd(mp_Modem);
 m_connected = mp_Modem->Dial(iCfg.phoneno);
 if (m_connected) {
 LoginScript();
 CollectMailScript();
 DeliverMailScript();
 }
 mp_Modem->HangUp();
 m_connected = false;
 delete mp_Modem;
 mp_Modem = 0;

 delete mp_OnLineWnd;
 mp_OnLineWnd = 0;
 WriteStatus("Off line");
 EnableMenuCommands();
 ConvertMailScript();
 ActivateCommand(&MainMenu, ID_EXIT);
 if (m_inbox.filecount) {
 WriteStatus("You have mail in your Inbox");
 beep();
 }
 return false;
}
// ---- Connect/Interactive Session/Automatic Login... command
bool MailAppl::OnLogin()
{
 GoInteractive(iCfg.commparms, iCfg.phoneno, true);
 return false;
}
// ---- get communications options from user
bool MailAppl::GetCommOptions(const string& ttl,
 CommParameters& commparms, char *phoneno)
{
 bool rtn = p_commoptions->doModal();
 delete p_commoptions;
 return rtn;
}
// ---- Connect/Interactive Session/Manual Login... command
bool MailAppl::OnManual()
{
 static CommParameters commparms;
 static char phone[25] = "";
 static bool doneit = false;

 if (!doneit) {
 commparms = iCfg.commparms;
 doneit = true;
 }
 if (GetCommOptions("Manual Dial", commparms, phone))
 GoInteractive(commparms, phone, false);
 return false;
}
// ---- Log serial input stream from host
void MailAppl::LogSerialInput(int ch)
{
 if (mp_OnLineWnd != 0) {
 mp_OnLineWnd->WriteChar(ch);
 if (logfile.rdbuf()->is_open())
 logfile.put(static_cast<char>(ch));
 }
 if (mailfile.rdbuf()->is_open())
 mailfile.put(static_cast<char>(ch));
}
// ---- Connect/Hang up command
bool MailAppl::OnHangup()
{
 if (m_connected && mp_Modem != 0) {
 mp_Modem->HangUp();
 m_connected = false;
 }

 return false;
}
// ---- File/Log Transmissions... command
bool MailAppl::OnLogTransmissions()
{
 if (logfile.rdbuf()->is_open())
 logfile.close();
 else {
 char *FileName = new char[128];
 strcpy(FileName, "*.log");
 if (OpenFileDialogBox(FileName, FileName))
 logfile.open(FileName, ios::app);
 else
 ClearCommandToggle((commands) ID_LOGTRX);
 delete[] FileName;
 }
 return false;
}
// ---- Messages/Create New... command
bool MailAppl::OnCreateMail()
{
 if (mp_CreateMail && mp_CreateMail->isRunning())
 mp_CreateMail->SetControlFocus(ID_MESSAGE);
 else {
 delete mp_CreateMail;
 mp_CreateMail = new CreateMail(m_outbox, "New Mail", true);
 mp_CreateMail->doModeless();
 }
 return false;
}
// ---- KEYBOARD message
bool MailAppl::OnKeyboard()
{
 if (p1 == ESC)
 m_EscPressed = true;
 return true;
}
// ---- File/Address Book... command
bool MailAppl::OnAddrBook()
{
 if (mp_CreateMail && mp_CreateMail->isRunning())
 mp_CreateMail->OnAddrBook();
 else {
 AddrBook *p_addrbook = new AddrBook;
 p_addrbook->doModal();
 delete p_addrbook;
 }
 return false;
}
// ---- Messages/Out Box... command
bool MailAppl::OnOutBox()
{
 m_outbox.OpenMailBox("OutBox");
 return false;
}
// ---- Messages/In Box... command
bool MailAppl::OnInBox()
{
 m_inbox.OpenMailBox("InBox");

}
// ---- open a mail folder
bool MailAppl::OnOpenFolder()
{
 if (!m_folder.FolderName().is_null()) {
 static string title;
 title = "Folder: " + m_folder.FolderName();
 m_folder.OpenMailBox(title);
 }
 return false;
}
// ---- Messages/File Cabinet... command
bool MailAppl::OnFileCabinet()
{
 FolderReader* p_fr = new FolderReader();
 if (p_fr->doModal())
 OnOpenFolder();
 delete p_fr;
 return false;
}
// ---- Options/Login... command
bool MailAppl::OnLoginOptions()
{
 LoginOptions *p_loginoptions = new LoginOptions;
 p_loginoptions->doModal();
 delete p_loginoptions;
 return false;
}
// ---- Options/Communications... command
bool MailAppl::OnCommOptions()
{
 GetCommOptions("Communications Options", iCfg.commparms, iCfg.phoneno);
 return false;
}
// ---- Options/Modem... command
bool MailAppl::OnModemOptions()
{
 ModemOptions *p_modemoptions = new ModemOptions;
 p_modemoptions->doModal();
 delete p_modemoptions;
 return false;
}
// ---- Help/About command 
bool MailAppl::OnAbout()
{
 MessageBox(
 "About IMail",
 " ------------------------------------------------ \n"
 " zzz zzz z \n"
 " z z z z z \n"
 " z z z z z \n"
 " z z z z z z \n"
 " zzz zzz zzz \n"
 " ------------------------------------------------ \n"
 " IMail Manages ddj.com Mail ");
 return false;
}


































































ALGORITHM ALLEY


Sound Compression Using Quantized Deltas




Kyle A. York


Kyle is a programmer for McGraw-Hill School Systems and can be contacted at
noesis@ucscb.ucsc.edu.


Introduction 
by Bruce Schneier
Data-compression techniques can be divided into two major categories: lossy
and lossless. Lossy data compression allows for a certain loss of accuracy in
exchange for an increased compression rate. These techniques are primarily
used to compress graphics images and digitized voice--media that can afford to
lose data. Most lossy techniques allow you to tune your parameters, so you can
trade more-effective compression for greater accuracy. Until recently, lossy
compression was implemented primarily on dedicated hardware, but increases in
the power of desktop PCs, coupled with advances in algorithms have changed
this. 
Lossless-compression techniques guarantee an exact copy of the original after
compression and decompression. These techniques are used to compress data
files, programs, or any application for which even the loss of a single bit is
unacceptable.
This month, Kyle York examines lossy compression techniques that have been
optimized for sound. His false starts and deadends depict the difficulty of
finding a particular algorithm to solve a given problem, even if both the
algorithms and the problems are well defined.
Computer-generated audio samples are created by taking an analog signal and
passing it through a digital-to-analog converter (DAC). The quality of the
sample is determined by the sampling rate and the resolution (number of bits
per sample).
As far back as 1933, AT&T mathematician Harry Nyquist determined that to
accurately reproduce digital signals at a given frequency f, the sample rate
must be at least 2f. Since the human ear can hear past 20 kHz, you would
expect a need for at least 40-kHz sampling. Luckily, most applications require
much less than fullspectrum sampling. 
Typical computer applications sample audio at between 8000 Hz (U.S. telephone
quality) and 44,000 Hz (audio CD quality), using either 8- or 16-bit samples
that are some representation of the voltage present in the signal. Normally,
it is a linear relationship; for example, if the voltage difference between 0
and 1 is 0.1mv, then the voltage difference between 100 and 101 is also 0.1mv.
The technique I developed works well for 8-bit samples. The 8-bit values
returned by the DAC are usually signed, where --127_128 represents
--2.5mv_2.5mv (some return unsigned values are shifted by adding 127). Figure
1(a) is a small audio sample that has obvious patterns, although the exact
similarities are not nearly as clear. 
I searched the literature for real-time, high-ratio audio-compression
techniques, and I found that the best rely on linear predictive coding (LPC).
The LPC algorithm uses preceding audio samples to predict the next sample.
This algorithm can be implemented as lossless compression by encoding the
difference between the expected and actual values. If the predictor is good,
these differences should be small and should compress well. Researchers have
spent much time attempting to determine an optimum prediction strategy, but
you can achieve reasonable results if you simply use the preceding two samples
to define a line and assume the current sample will be near that line (thus
the term "linear prediction").
Unfortunately, my application required that the (de)compression be fast enough
to perform in real time on a 286-class PC. Since any LPC algorithm is very CPU
intensive, this requirement immediately ruled them out. On a machine this
slow, about the best I can do is use a lookup table and a few bit
manipulations, but I was still determined to achieve a ratio of at least 4:1. 
I then explored other methods: reducing the sample rate by a factor of 4 (bad
idea), ignoring the low-order bits (equally bad--more than 70 percent of the
information is actually stored here), and ignoring the high-order bits (this
works, but decreases the volume, and when I later increased the volume, the
resulting static was unbearable). 
Finally, I turned to an algorithm described by Mark Nelson in The Data
Compression Book (M&T Publishing, 1991), which plots an exponential graph and
bands the samples. In other words, samples that fall in a certain band are all
mapped to a single value; see Figure 1(b). The band weighted the lower samples
heavier, and since most samples fall in the lower magnitudes, it worked okay.
Still, as with the other techniques, I continued to have a problem with
static. Finally, it occurred to me that I really wanted to concentrate on
modeling the waveform as closely as possible. When I did this, the static
virtually disappeared. 


Deltas 


It was obvious that banding was never going to be sufficient because it
changed the waveforms too drastically. It was also apparent, when I magnified
part of the graph, that the magnitude of the differences between samples
tended to be small, so my next attempt was simply to create a file which
contained the differences between the samples of my input file. When I looked
at the frequencies, I found that 40 percent of the differences were between
--8 and 8. So a Huffman compressor would work much better on this. 
Next, I looked at quantizing the deltas. My approach to quantizing is not
linear but rather exponential; see Figure 2. Compression from this is
immediately 2:1, and is simple enough to be done in a lookup table; thus real
time can be achieved even on the slowest machine. When I graphed the results,
96 percent of the samples were within 1 percent; see Figure 1(c).
Since I was working with voice data (not music), up to about 1/3 of the file
was silence. By running the compressed data through a specialized run length
encoding (RLE) filter that detected the silence, I further reduced the sample
size. I defined silence as any length of 8 nybbles that reduced to 1 when
summed. This length was chosen to remove any spikes that might appear in a
sample, such as line noise. The block is short enough to not cause any serious
degradation in sound quality. 
I implemented this approach in C. Listing One (listings begin on page 138) is
the include file that accompanies Listing Two which, in turn, converts .WAV
format files to .DQ format. Listing Three converts back from .DQ to .WAV.
Sample data files (one, an original output of 999 samples from a test file;
another, 999 samples from after being DQed and unDQed; and a third that
included 999 samples after being compressed/expanded using the algorithm in
Nelson's book) are available electronically; see "Availability," page 3.


Future Enhancements


If the sound sample is music instead of voice, then the RLE filter will have
little effect. I've tinkered with a Huffman compressor, which increases
compression somewhat. It would probably be better to use an arithmetic
compressor, which I have estimated could reduce the results by 25 percent or
more, but this begins to reach beyond the capabilities of my platform.
Sound is surprisingly difficult to compress, which may explain the relatively
little work done in this area. It is much more subtle than video compression
and less obvious than text. But with virtually every new program adding
uncompressed sound files to our machines, the time has come to look for ways
to ease the load.
Figure 1 (a) Audio sample of approximately 400 data points; (b)
straightforward exponential compression can result in "banding;" (c)
exponential compression of the deltas better model the waveform.
Figure 2: Quantizing deltas.
Start an accumulator at 128 (the 0 voltage value for my DAC). For each sample
 set [magnitude] = abs(sample - accumulator),
 choose the power 2^n (where 0 <= n <= 7) closest to [magnitude]
 if (sample >= accumulator)
 store [n] as 8+[n],
 add 2^n to [accumulator]
 otherwise
 store [n] as 7-[n]
 subtract 2^n from [accumulator]
 done.

Listing One 


/* DQ header file -- kyle a. york -- contains best guess for WAV format */
#ifndef dq_h
#define dq_h

typedef struct {
 char RIFFsig[4];
 long junk1;
 char WAVEsig[4];
 char fmtsig[4];
 long len;
 short style;
 short channels;
 long rate;
 long avg_Bps;
 short align;
 short bitsize;
} RIFFHEADER;
typedef struct {
 char datasig[4];
 long length;
} RIFFDATAHEADER;
/* this should be a 16-byte structure */
typedef struct {
 char descr[4];
 short bitSize;
 short sampleRate;
 long length;
 char misc[4];
} DQHEADER;
#endif



Listing Two

/* kyle a. york. convert WAV --> DQ. Notes: only handles a small subset of 
 WAV files (8-bit/1-channel/unsigned) drops the last sample if the number 
 of samples is odd
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <assert.h>

#include "dq.h"

#define NOTRIFF "Not a RIFF file."
#define NOTWAVE "Not a WAVE file."
#define MISSINGFMT "Missing format info"
#define UNKNOWNSTYLE "style is not unsigned"
#define NODATA "no data"
#define NOTEIGHTBIT "data is not 8-bit"
#define INFILEERROR "input file error"
#define OUTFILEERROR "output file error"
#define OUTOFMEMORY "out of memory"
#define ERRORFI "cannot open source file"
#define ERRORFO "cannot open destination file"
#define TOOMANYCHANNELS "too many channels (only 1 is allowed)"


void EXIT(char *tmp)
{
 fprintf(stderr, "%s\n", tmp);
 exit(1);
}
/* add default extension to a filename [ext] must include the preceding '.'
 returns NULL on out of memory */
#define EXTFLAG_DEFAULT 0
#define EXTFLAG_CHANGE 1
char *AddExtension(const char *src, const char *ext, const int flags)
{
 char *tmp = malloc(strlen(src)+strlen(ext)+1);
 if (tmp) {
 char *tPtr;
 strcpy(tmp, src);
 tPtr = strrchr(tmp, '.');
 if (tPtr && strchr(tPtr, '\\'))
 tPtr = NULL;
 if (tPtr) {
 if (flags & EXTFLAG_CHANGE)
 strcpy(tPtr, ext);
 } else
 strcat(tmp, ext);
 }
 return tmp;
}
/* eliminate lot's of shifts with a simple lookup buffer */
static int expand[16] =
 {
 -0x80, -0x40, -0x20, -0x10, -0x08, -0x04, -0x02, -0x01,
 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
 };
/* Format: dq source dest -- return 0 if no error */
int main(int argc, char **argv)
{
 RIFFHEADER riffHeader;
 RIFFDATAHEADER riffDataHeader;
 DQHEADER dqHeader;
 char *srcName,
 *dstName;
 FILE *fi,
 *fo;
 long len;
 int accum,
 delta,
 sign,
 nybble,
 prev,
 next,
 byte,
 sample;
 if ((argc != 2) && (argc != 3)) {
 printf("Format: %s srcfile {dstfile}\n"
 " where srcfile is a '.wav' file\n",
 argv[0]);
 exit(1);
 }
 srcName = AddExtension(argv[1], ".wav", EXTFLAG_DEFAULT);

 if (argc == 3)
 dstName = AddExtension(argv[2], ".dq", EXTFLAG_DEFAULT);
 else
 dstName = AddExtension(argv[1], ".dq", EXTFLAG_CHANGE);
 if (!srcName !dstName)
 EXIT(OUTOFMEMORY);

 fi = fopen(srcName, "rb");
 if (!fi)
 EXIT(ERRORFI);
 fo = fopen(dstName, "wb");
 if (!fo)
 EXIT(ERRORFO);
 /* read & verify WAV header */
 if (fread(&riffHeader, sizeof(riffHeader), 1, fi) != 1)
 EXIT(INFILEERROR);
 else if (strncmp(riffHeader.RIFFsig, "RIFF", 4))
 EXIT(NOTRIFF);
 else if (strncmp(riffHeader.WAVEsig, "WAVE", 4))
 EXIT(NOTWAVE);
 else if (strncmp(riffHeader.fmtsig, "fmt ", 4))
 EXIT(MISSINGFMT);
 else if (riffHeader.style != 1)
 EXIT(UNKNOWNSTYLE);
 else if (riffHeader.channels != 1)
 EXIT(TOOMANYCHANNELS);
 else if (riffHeader.bitsize != 8)
 EXIT(NOTEIGHTBIT);
 /* skip any. misc bytes */
 for (len=riffHeader.len-16; len; len--)
 fgetc(fi);
 /* data header should follow. read & verify */
 if (fread(&riffDataHeader, sizeof(riffDataHeader), 1, fi) != 1)
 EXIT(INFILEERROR);
 else if (strncmp(riffDataHeader.datasig, "data", 4))
 EXIT(NODATA);
 /* setup my file header */
 memset(&dqHeader, 0, sizeof(dqHeader));
 memcpy(dqHeader.descr, "DQ ", 4);
 dqHeader.bitSize = 8;
 dqHeader.sampleRate = riffHeader.rate;
 dqHeader.length = (riffDataHeader.length/2)*2;
 if (fwrite(&dqHeader, sizeof(dqHeader), 1, fo) != 1)
 EXIT(OUTFILEERROR);
 /* init accumulator to zero value (0x80) */
 accum = 0x80;
 len = 0;
 while ((sample = fgetc(fi)) != EOF) {
 delta = abs(sample-accum); /* absolute difference */
 sign = (sample >= accum); /* used later */
 /* quantize [delta] to a power of 2 */
 if (delta >= 128)
 nybble=7;
 else if (delta >= 64)
 nybble=6;
 else if (delta >= 32)
 nybble=5;
 else if (delta >= 16)
 nybble=4;

 else if (delta >= 8)
 nybble=3;
 else if (delta >= 4)
 nybble=2;
 else if (delta >= 2)
 nybble=1;
 else
 nybble=0;
 /* check nybble-1...nybble+1 for closest
 eg. minimize (abs((accum +/- delta) - sample))
 */
 prev = max(nybble-1, 0);
 next = min(nybble+1, 7);
 if (sign) {
 nybble = 7-nybble;
 prev = 7-prev;
 next = 7-next;
 } else {
 nybble += 8;
 prev += 8;
 next += 8;
 }
 if (abs(accum-expand[prev]-sample) < abs(accum-expand[nybble]-sample))
 nybble = prev;
 else if (abs(accum-expand[next]-sample) < abs(accum-expand[nybble]-sample))
 nybble = next;
 /* pack 2-1 using the last bit in length as a flag. 1st nybble goes high */
 if ((len & 0x01) == 0)
 byte = nybble << 4;
 else if (fputc(byte nybble, fo) == -1)
 EXIT(OUTFILEERROR);
 len++;
 /* adjust accumulator & repeat */
 accum -= expand[nybble];
 }
 fclose(fi);
 fclose(fo);
 free(srcName);
 free(dstName);
 return 0;
}



Listing Three

/* kyle a. york. convert DQ --> WAV. Notes: * does no data smoothing */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <assert.h>

#include "dq.h"

#define NOTADQFILE "not a DQ file"
#define INFILEERROR "input file error"
#define OUTFILEERROR "output file error"
#define OUTOFMEMORY "out of memory"

#define ERRORFI "cannot open source file"
#define ERRORFO "cannot open destination file"

void EXIT(char *tmp)
{
 fprintf(stderr, "%s\n", tmp);
 exit(1);
}
/* add default extension to a filename [ext] must include the preceding '.'
 returns NULL on out of memory
*/
#define EXTFLAG_DEFAULT 0
#define EXTFLAG_CHANGE 1

char *AddExtension(const char *src, const char *ext, const int flags)
{
 char *tmp = malloc(strlen(src)+strlen(ext)+1);
 if (tmp) {
 char *tPtr;
 strcpy(tmp, src);
 tPtr = strrchr(tmp, '.');
 if (tPtr && strchr(tPtr, '\\'))
 tPtr = NULL;
 if (tPtr) {
 if (flags & EXTFLAG_CHANGE)
 strcpy(tPtr, ext);
 } else
 strcat(tmp, ext);
 }
 return tmp;
}
/* eliminate lot's of shifts with a simple lookup buffer */
static int expand[16] =
 {
 -0x80, -0x40, -0x20, -0x10, -0x08, -0x04, -0x02, -0x01,
 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
 };

#define adjust(a) (((a) < 0) ? 0 : ((a) > 255) ? 255 : a)
/* Format: dq source dest -- return 0 if no error */
int main(int argc, char **argv)
{
 RIFFHEADER riffHeader;
 RIFFDATAHEADER riffDataHeader;
 DQHEADER dqHeader;
 char *srcName,
 *dstName;
 FILE *fi,
 *fo;
 int accum,
 byte;
 if ((argc != 2) && (argc != 3)) {
 printf("Format: %s srcfile {dstfile}\n"
 " where srcfile is a '.dq' file\n", argv[0]);
 exit(1);
 }
 srcName = AddExtension(argv[1], ".dq", EXTFLAG_DEFAULT);
 if (argc == 3)
 dstName = AddExtension(argv[2], ".wav", EXTFLAG_DEFAULT);

 else
 dstName = AddExtension(argv[1], ".wav", EXTFLAG_CHANGE);

 if (!srcName !dstName)
 EXIT(OUTOFMEMORY);

 fi = fopen(srcName, "rb");
 if (!fi)
 EXIT(ERRORFI);
 fo = fopen(dstName, "wb");
 if (!fo)
 EXIT(ERRORFO);
 /* read & verify DQ header */
 if (fread(&dqHeader, sizeof(dqHeader), 1, fi) != 1)
 EXIT(INFILEERROR);
 else if (strncmp(dqHeader.descr, "DQ ", 4))
 EXIT(NOTADQFILE);
 /* create necessary RIFF/WAVE headers */
 memset(&riffHeader, 0, sizeof(riffHeader));
 memcpy(&riffHeader.RIFFsig, "RIFF", 4);
 memcpy(&riffHeader.WAVEsig, "WAVE", 4);
 memcpy(&riffHeader.fmtsig, "fmt ", 4);
 riffHeader.style = 1;
 riffHeader.bitsize = dqHeader.bitSize;
 riffHeader.len = 16;
 riffHeader.channels = 1;
 riffHeader.rate = dqHeader.sampleRate;
 riffHeader.avg_Bps = dqHeader.sampleRate;
 riffHeader.align = 1;

 memset(&riffDataHeader, 0, sizeof(riffDataHeader));
 memcpy(&riffDataHeader.datasig, "data", 4);
 riffDataHeader.length = dqHeader.length;

 if (fwrite(&riffHeader, sizeof(riffHeader), 1, fo) != 1)
 EXIT(OUTFILEERROR);
 else if (fwrite(&riffDataHeader, sizeof(riffDataHeader), 1, fo) != 1)
 EXIT(OUTFILEERROR);
 /* init accumulator to zero value (0x80) */
 accum = 0x80;
 while ((byte = fgetc(fi)) != EOF) {
 int sample1 = expand[(byte >> 4) & 0x0f],
 sample2 = expand[byte & 0x0f];
 accum -= sample1;
 fputc(adjust(accum), fo);
 accum -= sample2;
 fputc(adjust(accum), fo);
 }
 fclose(fi);
 fclose(fo);
 free(srcName);
 free(dstName);
 return 0;
}









PROGRAMMER'S BOOKSHELF


Software Engineering and Z




Reginald B. Charney


Reg is president of Charney & Day. He can be reached on CompuServe at
70272,3427 or at charney@pipeline.com.


Software Development with Z, by J.B. Wordsworth, is an introduction to
modern-day formal methods and their practicality for software development.
Although Wordsworth uses only the Z specification language (supported by a
variant of Dijkstra's guarded command language), you still get a good sense
for strongly typed specification languages. The book also serves as a tutorial
on formal methods in general and the Z language in particular.
While the book is easily read and understood, you do need to familiarize
yourself with set theory and its notation, since Z uses more than 90 different
symbols, most of which are nonmnemonic and unavailable in any normal typeface.
Some operators are overloaded, and the naming conventions are not always
helpful. For example, names like je1 and k1 are used instead of b1 and b2.
Software Development with Z starts out by covering the basics of discrete
mathematics: sets, logic, and relationships. It then examines Z schema and the
guarded command language and finally discusses design and implementation
issues.
Wordsworth uses two examples throughout the book. One discusses a library, its
books, and the people borrowing them, while the second is based on school
classes and people enrolled in them. The examples are discussed informally at
first, and are developed more rigorously as the book progresses. This approach
maintains coherence and makes the book easy to follow. A complete application
appears at the end of the book, less the actual code. 
There are exercises at the end of each chapter, with answers for most of the
questions. There's also an appendix containing a bibliography and a complete
list of all special symbols used in Z.


Z Language Specification


Z lets you record software specifications, data design, and algorithmic
implementation decisions. It is not a programming language, and is best used
for defining specifications, rather than designs.
More specifically, Z is a mathematical software-specification language
developed in the 1970s at Oxford University in England. In the early 1980s,
software developers at IBM's Hursley Research Centre started working on Z,
releasing commercial CICS online-transaction software for banks and airlines
in the early 1990s. Wordsworth was a part of IBM's original Z development
team.
Figures 1 through 4 show schema specifications in Z. The first line always
declares the schema name. The remaining lines in the upper part of the schema
declare data types used in the schema. The lower part of the schema consists
of the predicates that must be true for the schema to be valid.
Figure 1 is a typical schema specification. It defines the starting conditions
that must exist for an oil-terminal control system. The following are the
interpretations of the code by line number:
1. OTCSys is the schema name. 
2. waiting is of the type "sequence of Tankers."
3. docked is of the type "operation of relating a Tanker to a unique Berth." 
4. known is of the type "one set out of all possible sets of Tankers." 
5. No Tanker can be waiting and docked at the same time. 
6. The number of waiting Tankers is the number of Tankers in the sequence of
Tankers. 
7. If Tankers are waiting, then all Berths are filled by docked Tankers. 
8. Docked Tankers are a subset of the Berths.
9. All known Tankers are either docked or waiting.
Figure 2 shows that the precondition for any arriving Tanker is that it is not
yet known.
1. Arrive0 is the name of the schema. 
2. The declarations used in OTCSys apply here as well. 
3. t is an input variable of the type "Tanker." 
4. t is not already known.
When a Tanker arrives at the oil terminal, either Berths are available and it
is docked (see Figure 3) or all Berths are occupied and the Tanker must be
queued (see Figure 4).
For Figure 3:
1. Docked is the name of the schema. 
2. The contents of schema Arrive0 are inserted into schema Docked. 
3. b is an output variable of the type "Berth."
4. r is an output variable of the type "Response" (there either is or is not
room for the Tanker).
5. A Berth is available.
6. If line 5 is true, the output response is ok for this schema. 
7. An unoccupied Berth is output.
8. The list of docked tankers is updated by appending the current tanker to
the set of those already docked (x and x' represent the before and after
states of variable x). 
9. There is no change in the state of those waiting.
For Figure 4:
1. Queued is the name of the schema. 
2. The complete contents of schema Arrive0 is inserted into schema Queued. 
3. r is an output variable of type "Response." 

4. Tankers are docked at all Berths. 
5. If line 4 is true, then the response is wait.
6. The current Tanker is appended to the sequence of waiting Tankers.
7. There is no change in the state of the docked Tankers.


Guarded Command Language


Given these schema, you can write this logical equation in
Z:ArrivedOK@Docked/Queued. Because Z is only a specification language,
Dijkstra's guarded command language is used to implement the specifications.
This language consist of statements having two parts, a guard and a command,
separated by a  sign. The command is executed only if the guard expression is
true. For example, to increment a variable x by 1, we would normally write
x:=x+1. In the guarded command language, you would write x<MAX_INTx:=x+1,
which states that the increment operation is valid only if the current value
of x is less than the maximum allowable integer value.
The guarded command language has three control structures: sequence,
alternation, and iteration. These control structures are made up of
constructs, each of which consists of a guard expression and a body. A guard
expression must be true for the body of the statement to be executed. Either
the guard or the body can be omitted.


The Social Side


Wordsworth emphasizes that formal methods are part of a social process of
software development, which should include a group of people representing
various parts of the development cycle. Formal methods are not means for
developing software without difficult decisions, says Wordsworth, but are
methods for recording those decisions once made. Z documents consist
principally of English text with small pieces of mathematical formalisms.
I highly recommend the beautifully written section on writing good
specifications, as well as Wordsworth's description of how to verify a
specification. Verification is part of the testing process. His comments on
testing are reminiscent of one of Edward Yourdon's main points in Decline and
Fall of the American Programmer (Prentice Hall, 1991): Tracking bugs is
important, not just to fix problems, but to modify the processes that created
them.
Software Development with Z achieves its three principal aims: To encourage
programmers to explore the use of formal methods in perfecting their craft, to
provide an insight into the practical problems of applying mathematics to real
software development, and to explain the features of Z. 
Software Development with Z
J.B. Wordsworth
Addison-Wesley, 1992, 334 pp.
$32.25 
ISBN 0-201-627-57-4
Figure 1 Starting conditions for an oil-terminal control system.
Figure 3 Tanker is docked.
Figure 2 Condition for arriving Tankers.
Figure 4 Tanker is queued.



































SWAINE'S FLAMES


Vanity Home Pages


I hadn't talked with my Cousin Corbett recently, so I was mildly surprised to
get an e-mail message from him. Looking at the subject line, I was even more
surprised. It read:Express Yourself with a Vanity Home Page!
As I read through the message it became clear that this was no cousinly note,
but rather a press release on some sort of business. Apparently I was on
Corbett's press-contacts list. I decided to get to the bottom of this.
When I dialed his number, an unfamiliar voice answered, and I found myself
running a gauntlet of strangers before finally getting through to the man
himself.
"Corbett, what's this Vanity Home Page garbage?"
He seemed to have trouble recognizing my voice at first, which annoyed me no
end.
"Oh, Mike. Always good to hear from you. What can I do for you this morning?"
"What's this Vanity Home Page garbage?"
"Why, that's our new business, Personal Web Presentations, Inc. Basically,
it's a consulting service for people who want to create their own personal Web
pages. Our slogan is 'Express Yourself with a Vanity Home Page!'"
"Yeah, I've seen the press release. Who needs that kind of service?"
"Everybody. We help design pages, explain hooks, teach writing in the area of
self-expression, the whole package."
"Hooks?"
"Sure. Because home pages don't go out looking for readers, you need a clever
name to pull them in. The Awesome List. The Last Homely House. Ask Kato.
MUDslinger. Swaine's World. XXX Rated. Free Money. It also helps if you can
get other people to put links to your page on their pages. We help with that.
Then once readers have linked to your page, you need nifty graphics and other
sugar to keep them interested."
He was going to continue, but I interrupted. "Yeah, but who needs this stuff?
Why would anybody pay for your services?"
"A lot of our customers are looking for better jobs. We help them design
electronic r'esum'es. We're looking into VRML, to design r'esum'es with
attract mode."
"Slow down, I'm writing this down. VRML, that's a proposed markup language for
creating Web pages that are virtual-reality environments. Attract mode, if I
remember right, is the mode video games are in when you're not playing them?"
"Right. But the real opportunity is more than a market--it's a mission. We're
involved in nothing less than a rekindling of the art of personal writing.
Letter writing has virtually died, although it has enjoyed a small rebirth in
e-mail. Writing instructors kept the journal or diary form alive. Now the Web
is bubbling over with autobiographies and diaries."
"And the Web is an appropriate place for that?" I asked.
"Yes. It's the hypertext links that make it so interesting. You can enrich
your own essays by incorporating links to deep thoughts of Shakespeare or Newt
Gingrich that you subscribe to."
"Sort of an alternative to thinking: links for dittoheads." 
"I like that," he said. "Let me write it down."
He had another angle to share with me: "Interlaced GIFs are really hot now. By
bringing the image in gradually, with increasing resolution, they produce an
interesting, revealing effect, a kind of striptease. One of my clients has a
very interesting home page where she--"
Just then he got interrupted and said he'd call me back. I haven't heard from
him since. But I see more personal home pages every day.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com































OF INTEREST
The recently released Cogent Prolog 3.0 features a Logic Server API that
enables Windows programmers to add rule-based Prolog modules to conventional
applications. This capability is especially relevant for programs that involve
problem diagnosis and resolution, configuration/tuning, planning and
evaluation, intelligent advice, and language parsing.
Cogent Prolog 3.0 is designed to provide plug-in, rule-based services for C,
C++, Visual Basic, PowerBuilder, Access, dBase, and many other development
tools. For example, you can add functions to retrieve records from an SQL
database, or functions that invoke features of the Windows SDK.
The Cogent Prolog development system for Windows and DOS sells for $298.00.
Both 16-bit and 32-bit (extended) DOS modes are supported. Cogent Prolog is
also available for the Alpha under OSF/1 or OpenVMS.
Amzi! 
40 Samuel Prescott Drive 
Stow, MA 01775
508-897-7332
amzi@world.std.com
WexTech and Mainsoft have teamed up to create the WexTech Hyperformance
Viewer, a UNIX version of the WexTech Doc-To-Help extensions for the Windows
Help Engine. The viewer helps authors using the Doc-To-Help tools to publish
for UNIX workstations the same content files that support PCs. Additionally,
Hyperformance Viewer will display any standard Windows Help file created with
RoboHelp or ForeHelp. 
WexTech Systems
310 Madison Ave., Suite 905
New York, NY 10017
212-949-9595
ArchiveLib 1.01a, a Windows-compatible data-compression and archive library
for C/C++ programmers from Greenleaf Software, has begun shipping. The
toolkit, which is described as an object-oriented data-compression run-time
library, consists of about 100 functions that let programmers compress ASCII
or binary data for storage. Under Windows, ArchiveLib, which is also available
as a language-independent DLL, sells for $279.00.
Greenleaf Software
16479 Dallas Pkwy., Suite 570
Dallas, TX 75248
214-248-2561
The NAG Numerical PVM Library, a collection of numerical routines designed for
distributed-memory parallel machines, has been released by the Numerical
Algorithms Group. The library uses the Basic Linear Algebra Communication
Subprograms (BLACS) and PVM for message passing, and includes ScaLAPACK
routines for solving linear equations, symmetric Eigenproblem and SVD
routines, sparse linear-equation solvers, multidimensional quadrature
routines, data-distribution support routines, and more. The library is
initially available for SunSPARC, IBM RS/6000, Silicon Graphics IRIX 5, DEC
Alpha, HP 9000/700, and similar platforms. 
Numerical Algorithms Group
1400 Opus Place, Suite 200
Downers Grove, IL 60515-5702
708-971-2337
InfoPower 1.0, a set of data-aware VCL components for Delphi-based database
development, has been announced by Woll2Woll Software. The InfoPower
components include a super database grid, lookup combo box, advanced
filtering, table sort, auto-expanding memo, incremental search, and a
table-lookup/locate dialog-box component. According to Woll2Woll, the toolset
is composed of native Delphi components which are automatically linked into
compiled EXE files. InfoPower 1.09 sells for $199.00.
Woll2Woll Software
1032 Summerplace Drive
San Jose, CA 95122
408-293-9369
Ryan McFarland has announced the release of RM/CodeBench, an integrated
Windows-based development environment for RM/ Cobol. RM/CodeBench lets
RM/Cobol developers seamlessly edit/compile/debug applications. The interface
includes pull-down menus, a toolbar, dialog boxes, status prompts, MDI, and
execution animation.
Ryan McFarland
8911 N. Capital of Texas Hwy.
Austin, TX 78759
800-762-6265 
Young Minds has begun shipping an updated version of CD Studio, its CD-ROM
recording system. CD Studio is a UNIX-based system for recording CDs at the
desktop or across a network. The system supports the extended UNIX file
structure by creating ISO 9660/Rock Ridge formatted discs. The new version
supports writing at up to 900-Kbytes/second. Additionally, CD Studio now
supports a comm port that sends recording information back to the workstation.
Young Minds
1910 Orange Tree Lane, Suite 300
P.O. Box 6910
Redlands, CA 92375
909-335-1350
Racotek has released the Keybuilder SDK for wireless communication application
development. The Windows-hosted Keybuilder environment includes a GUI code
generator, sample apps, online help, testing tool, and support for
PowerBuilder and Visual Basic. Keybuilder sells for $995.00.
Racotek
7301 Ohms Lane, Suite 200
Minneapolis, MN 55439
612-832-9800
The Nucleus PLUS real-time kernel from Accelerated Technology provides
advanced kernel support for the PowerPC microprocessor. This support includes
multitasking capabilities such as task communication, task synchronization,
memory management, and application timers. Nucleus PLUS supports both the Diab
Data D-CC/PowerPC-optimized C compiler and the SDS SingleStep source-level
debugger.
Accelerated Technology
P.O. Box 850245
Mobile, AL 36685
205-661-5770
Simply Solutions has introduced Visual DLL, a Visual Basic add-on that lets
you create Windows DLLs without resorting to tools such as a C compiler or the
Windows SDK. DLLs created with Visual DLL can be called from languages such as
C, Pascal, and Fortran. Visual DLL automatically creates C header files and
Basic Declare statements for access from third-party applications. In
addition, Visual DLL creates binary library modules for distribution to
customers and third-party developers. 
Simply Solutions
3337 Bristol, Suite 143
Santa Ana, CA 92704
310-575-5047 
simply@netcom.com
The Development Group for Advanced Technology has released Sql Index Coverage
Analyzer for Microsoft's SQL Server, Oracle, and Sybase SQL Server. The Sql
Index Coverage Analyzer, which identifies potential performance problems
before application deployment, analyzes all dynamic-SQL, stored-procedure, and
trigger-SQL statements for index coverage in the table selection and joint
criteria. 

The Sql Power Tools suite consists of seven tools: Sql Index Coverage
Analyzer, Sql Inspector & DeadLock Predictor, Sql Relationship & Impact
Analysis Browser, Sql Data Base Documenter, Sql Stored Procedure Encapsulator,
Sql Data Base Migrator, and Sql Application Data Base Diff for Data Base
Administrators and SQL Developers.
The Development Group for Advanced Technology
12 Bonfield
Upper Saddle River, NJ 07458
201-825-9511 
http://www.nis.net/sqlpower!
SQA has released SQA TeamTest 3.1, an upgrade to its Windows automated
client/server testing tool. Version 3.1 supports PowerBuilder 4.0 and provides
enhancements to SQA's object-oriented recording technology.
SQA TeamTest 3.1 integrates six major testing areas: test planning, test
development, test execution, results analysis, defect tracking, and summary
reporting and analy-sis. SQA TeamTest can be used to test any Windows product
or application, and offers special integration with PowerBuilder and other
client/server development tools. SQA TeamTest 3.1 sells for $2495.00 per seat.

SQA Inc.
10 State Street
Woburn, MA 01801
800-228-9922
Visigenic Software has released a set of ODBC tools for developers needing
access to multiple databases. Based on Microsoft's Open Database Connectivity
(ODBC) spec, Visigenic's suite of products includes the Visigenic ODBC
DriverSet, Visigenic ODBC SDK, and Visigenic ODBC Test Suites. Additionally,
Visigenic has licensed Microsoft's SQL Server middleware technology to develop
and market Microsoft SQL Server ODBC drivers for several UNIX platforms,
Macintosh, PowerMac, OS/2, and other non-Microsoft operating environments.
The Visigenic ODBC DriverSet lets you provide cross-platform access to
multiple SQL databases by using a single, standard API. The ODBC DriverSet is
composed of drivers that offer application access to Informix, Ingres, Sybase
SQL Server, and Oracle, with Microsoft SQL Server soon to follow. The
Visigenic drivers, based on the ODBC 2.0 specs, are compliant with all Core,
Level 1, and key Level 2 API functions. 
The Visigenic ODBC 2.0 SDK allows you to write database-independent C/C++
applications or database drivers, and deploy them across any platform. With
ODBC, an application can communicate simultaneously with multiple databases
from different vendors all through a single, standard interface. The ODBC SDK
comes with a Driver Manager, header files, programmer's reference, and sample
programs. The ODBC Test Suite includes individual C test programs for all
Core, Level 1, and Level 2 API calls in the OBDC 2.0 specification. 
The Visigenic DriverSet, currently available on Windows, Solaris, SunOS,
HP/UX, and IBM AIX, starts at $395.00 for Windows and $595.00 for UNIX.
Visigenic's ODBC 2.0 SDKs are available for Solaris, SunOS, HP/UX, and IBM AIX
at a cost of $995.00 per developer. The ODBC Test Suites, available on Windows
and UNIX, sell for $25,000 per platform.
Visigenic Software Inc.
951 Mariners Island, Suite 460
San Mateo, CA 94404
415-286-1900
Delrina has begun shipping the Delrina FormFlow SDK, which allows developers
to incorporate electronic-form features into new or existing applications.
Users can quickly and easily add the database connectivity, e-mail
integration, and forms- routing capabilities of FormFlow to their
client/server applications. 
The SDK includes more than 300 external calls specific to Visual Basic and C,
and includes an API for e-mail, fax, and data messaging, sample code, and
extra utilities such as a forms auditor. With the FormFlow SDK, users also
receive Delrina WinComm Pro online communication software and the Delrina
WinFax Pro Phonebook Server API. The FormFlow SDK sells for $249.00 and
includes one year of technical support.
Delrina Corp. 
6320 San Ignatio
San Jose, CA 95119 
800-268-6082
Revision Labs, a software-testing lab, has released a report entitled A Guide
to Test Automation Tools for GUI Software. The report includes articles and
tables relevant to testing-tool evaluation and selection. In particular, the
report covers AutoTester, WinRunner, Microsoft Test, QA Partner, SQA TeamTest,
Hightest, Evaluator, Auto-matar QA, and others. The report evaluates the
testing tools on a set of more than 90 feature criteria, including recording,
verification, programming, execution, report generation, maintenance,
portability, and support. The report sells for $95.00.
Revision Labs
15220 NW Greenbrier Parkway, Suite 305
Beaverton, OR 97006
503-531-4020
rli@teleport.com
Common Lisp/CLOS has been officially approved as an ANSI standard, making it
the first object-oriented programming language to achieve ANSI
standardization. X3J13 is the X3 technical committee responsible for drafting
the ANSI Standard for Common Lisp. Committee chairman is Guy Steel, who was
joined by representatives from Apple Computer, Chestnut Software, DEC, Franz,
Hewlett-Packard, IBM, Sun, Xerox, the University of Utah, and Aoyama Gakuin
University in Japan. Major Common Lisp/CLOS vendors include Franz, Digitool,
Harlequin, Gold Hill, and Venue.
ANSI X3
1250 Eye St. NW, Suite 200
Washington, DC 20005
202-626-5740
x3sec@itic.nw.do.us
The Houston Advanced Research Center (HARC) has released for licensing its
HARC-C image-compression software that's based on wavelet technology. HARC
claims that the software has a compression ratio of 300:1. Implemented in C,
HARC-C contains a number of kernel modules so that applications can be
developed by calling function libraries. It is available for UNIX (including
Linux), DOS, Windows, OS/2, and NT. (For background information on wavelet
technology, see the article "The Wavelet Packet Transform," by Mac A. Cody,
DDJ, April 1994.)
Houston Advanced Research Center
4800 Research Forest Drive
The Woodlands, TX 77381
713-367-1348
Teletech Systems is launching a contest for help-file developers who can
create the best help file of 30 topics or less using VB HelpWriter, Teletech's
help-authoring tool for Visual Basic. Entries will be judged on entertainment
value, originality, presentation, creative use of Winhelp features, and
integration with a VB program. The winner gets a $1000.00 shopping spree of
programming tools from the VBxtras tools catalog. Runners-up receive copies of
VB HelpWriter Professional. Enter by June 15, 1995. A free copy of VB
HelpWriter Lite is available on CompuServe in the MSBASIC forum and on the
Internet at ftp.cica.indiana.edu.
Teletech Systems
750 Birch Ridge Dr.
Roswell, GA 30076
404-475-6985
API Vision 1.0 is a new Windows API-level debugging tool from Berkeley
Toolworks. API Vision dynamically displays calls to the Windows API, including
those from inside Windows itself. Parameters passed to the API calls are
decoded and displayed using symbolic names from the SDK help file whenever
possible. Additionally, the tool includes filtering by task, module, segment
and address, filtering for message and message-like APIs, logging, timings,
driver and multimedia support, and an API file finder. API Vision 1.0 sells
for $199.00.
Berkeley Toolworks
2600 Tenth Street, Suite 415
Berkeley, CA 94710
510-649-9891
apivis@berktool.com









EDITORIAL


Teacher, Teacher, I Declare...


If there's one place computer games have made inroads, it's in education.
That's not to say you can expect to see grandmotherly fourth-grade teachers
using DOOM or Myst to teach geography. (If the truth were known, young minds
probably wouldn't object.) Instead, you're more likely to find games such as
the venerable The Oregon Trail playing an integral part of an elementary
history curriculum. The Oregon Trail, originally developed by the Minnesota
Department of Education and now under the auspices of the for-profit MECC, has
been adopted by more than one-third of all U.S. school districts. Most
recently, the game was updated to include the Mormon and California trails,
all delivered on CD-ROM. 
Games such as The Oregon Trail aren't being used solely in schools. The home
software market is booming, with games and home-education software leading the
pack. According to the SPA, home-education software sales are up more than 50
percent over last year. Fueling this growth are PCs, which have found their
way into 33 percent of U.S. homes. With sales growth four times the industry
average, home-education software is currently the fastest-growing category of
software sales. 
This surge in home-education software is both good and bad news. On one hand,
the trend indicates that many parents are interested enough in their
children's well-being to invest in PCs and supplemental educational materials.
On the other hand, you could say the trend reflects widespread skepticism of
schools' performance. 
The biggest game in town for the upper grades is the Internet. In melding
technology and education, students and teachers are finding that using the
Internet as an educational resource is fast, fun, and relatively easy.
Students can access the most up-to-date information, which traditional
textbooks won't provide for years to come--if ever. 
Interestingly, schools are also leading the way in some Internet-related
scenarios. The Independence, MO, School District, for instance, is the first
public district in the nation using wireless communication to provide Internet
services to remote sites. Instead of spending $150,000 per year to link 25
buildings to the Internet, the school system leases a dedicated line from the
phone company for $5000 per year. That line is connected to the district host,
which in turn broadcasts Internet services via an antenna to other district
buildings. Students use the system for everything from asking teachers
questions via e-mail to researching and submitting homework assignments.
The Independence experiment has been funded in part by a state grant of
$80,000. For the past few years, state and federal agencies have been pumping
a lot of money into K--12 programs that have an eye towards technology. In
fiscal 1995 alone, Congress has committed $40 million to link technology with
improvement in education. The Federal government provides about $750 million
for purchasing educationally related computer hardware and software. Add to
this another $450 million from Title I funds for hardware/software, and all of
a sudden we're talking real money.
Granted, some of this money has been spent unwisely (I recently read about one
school-district warehouse stuffed to the gunnels with brand new, never-unboxed
80286-based PCs). Still, if we're going to waste money, I'd just as soon see
it frittered away on education than, say, exorbitant Congressional pension
plans or franking privileges (or, for that matter, "educational" television
shows starring Congressional leaders). Nevertheless, education/technology
programs may be in for some hard times in the coming months as Congress goes
about reevaluating how it will divvy up our education-allocated dollars. Every
education-related committee in the House and Senate will be reviewing
programs, and cuts will likely result. Similarly, upcoming changes in
telecommunications rules and regulations will impact education programs like
those in Independence. One of the main telecom issues is whether or not
schools will be in the running for universal access to the Internet.
No one said education was supposed to be fun, but then, no one said it was
supposed to be a shuttlecock, batted back and forth in high-stake political
games either. What technology brings to the learning process is not only an
efficient means of putting students in contact with basic information, but
also a familiarity with the tools that will be commonplace in the 21st
century. In short, when it comes to downsizing what has grown to be an
intrusive government, there are better places to start than educational
programs.
Jonathan Erickson
editor-in-chief













































Collision Detection


Getting the most out of your collision tests




Dave Roberts


Dave is the author of PC Game Programming Explorer (Coriolis Group Books,
1994). He can be reached via CompuServe at 75572,1151.


Collision detection is fundamental to most fast-action arcade games. After
all, the program has to know when a missile slams into an alien spaceship or
when the giant frog leaps onto a player. If coded incorrectly, however,
collision detection can take an inordinate amount of time, decrease the
animation frame rate, and ruin an otherwise enjoyable game. In this article,
I'll examine the basic collision-detection problem, describe situations that
can cause collision-detection code to run slowly, and suggest ways to keep
your code skipping along.


Explosive Growth


There are two main reasons why collision-detection code runs slowly: Either
the code performs more collision tests than necessary or the individual
collision tests themselves are slow. For the first case, imagine a game with
two objects on the field of play. You want to determine if the objects have
collided. With only two objects, collision detection is easy--you simply test
whether object1 is touching object2 or object2 is touching object1. One of
these tests is obviously redundant--if object1 is touching object2, then
object2 is also touching object1.
A game with three objects requires three collision tests: object1 with
object2, object1 with object3, and object2 with object3. With four objects the
number of tests increases to six. Table 1 shows how many tests must be
performed to check from 2 to 20 objects. The numbers were derived from the
formula (n2--n)/2, where n is the number of objects; given n objects, you can
determine which objects collide by exhaustively performing n2 tests. Since you
don't compare an object with itself, you eliminate n tests, giving n2--n. And
since testing object x with object y is the same as testing object y with
object x, you eliminate half of the remaining tests, yielding the final
formula.
O-notation describes how the run-time or memory requirements of a problem grow
as the number of inputs or outputs, n, changes. In this case, n is the number
of game objects for which you must calculate collision status, so the
algorithm is O(n2). As n increases, the number of required tests increases in
roughly the same way as n2, and the run time of the collision-testing code
increases at a rate of roughly n2.
This means that having several objects active at once can slow your
collision-testing code if you aren't careful. Doubling the number of objects
on the screen roughly quadruples the number of tests you must perform. But it
could be worse: Algorithms that are O(n3) or O(2n) increase run time much more
quickly. Of course, it could also be better: An O(n) algorithm increases at a
constant rate depending on how many inputs are fed to it. Doubling the number
of inputs doubles the run time. An O(1) algorithm is pure joy because it runs
at the same rate no matter how many inputs are fed to it.


Eliminating Collision Tests


Techniques for reducing the number of collision tests include game-rule
elimination and eliminations based on spatial position such as axis sorting
and the sector method.
Game rules typically dictate which objects are allowed to collide.
Consequently, you can reduce the number of collision tests. For instance,
imagine writing a game where the player controls a spaceship that shoots
aliens. Since you're the programmer, you make the rules. You may decide that
aliens can't collide with each other and that the player's ship can't collide
with its own missiles. Alien missiles probably won't hit aliens. Player
missiles won't hit other player missiles and alien missiles won't hit other
alien missiles, although perhaps you should allow player missiles to hit other
alien missiles.
Now assume that you have 1 player spaceship, 5 player missiles, 20 aliens, and
10 alien missiles all active in the game at a given point in time. That's 36
total objects, which would require 1296 tests (n2) if every object had to be
compared with every other object. Our formula reduces that to 630 tests
((n2--n)/2), but taking the game rules into account reduces this number even
further.
The game rules dictate that the player spaceship must be tested against each
alien (20 tests) and each alien missile (10 tests); the player missiles must
be tested against the aliens (100 tests) and the alien missiles (50 tests).
That's it--180 tests total. Since aliens, player missiles, and alien missiles
can't collide with objects of the same type, 450 tests (71 percent) of the 630
tests are eliminated.
The real advantage to this approach can be seen when you add one more alien to
the screen. Using the original formula, this would require an increase of 36
tests; but because of the game rules, only six must be added (the alien tested
against the player spaceship and each player missile).


Spatial-Test Elimination 


Some game rules require that every object be tested with every other one. In
such situations, it's best to use spatial-test elimination techniques.
Spatial-test elimination works by sorting game objects according to their
position on the screen or play field (if the play field is larger than the
screen itself). Objects are then tested for collisions with other objects near
them. Objects farther away are not tested because they could not possibly be
colliding.
Spatial elimination requires extra computation to sort the various objects and
perform additional bookkeeping. The key is to make the bookkeeping run quickly
and eliminate as many tests as possible. When bookkeeping saves more time than
it adds, spatial elimination is worth the effort. It is better suited for
games in which a large number of objects are active at once. If a game has
only a few objects, there aren't many potential collision tests to eliminate,
and the added bookkeeping can exceed the gain. 


Axis Sorting


When many sprites are on the screen at one time, they are rarely at the same
location. An easy way to eliminate collision tests is to sort the sprites in
ascending order using each sprite's x- or y-axis coordinate as a key. To check
for collisions, simply test a given sprite with the next few on the list. Stop
testing sprites when you reach a sprite further down the list whose location
has a key greater than the location of the first sprite, plus its width or
height. For example, suppose you have five sprites (as in Table 2) sorted
according to their x-coordinates. First, you'd test sprite #1 with sprite #2,
then with sprite #3. At sprite #4, you'd stop because sprite #4 is located at
x-coordinate 40 and sprite #1's right side is located at x-coordinate 19
(x-coordinate 10+width of 10--1). Since every sprite after sprite #4 has an
x-coordinate greater than sprite #4, there is no need to check any further.
You'd then continue with sprite #2, and so on, for a total of four tests.
Listing One is an implementation of this method.
This method relies on the assumption that sprites are usually distributed
across the screen. If the sprites are closely grouped together along the
sorting axis, this method will degenerate to the (n2--n)/2 situation.
You could just as easily sort the sprites by y-coordinate. Since the
y-resolution of the screen is usually less, however, using the x-coordinate
tends to spread out the sprites. Before using either coordinate axis for this
method, you should examine your situation to determine which axis will give
you better results in general.
I haven't covered actual techniques for sorting the sprites because this is an
implementation issue that varies depending on how the sprites in the game
move. A common technique is to keep a doubly linked list of sprites in sorted
order at all times. After the movement routine calculates the sprite's new
position for the next animation frame, it moves the sprite forward or backward
in the linked list to keep it sorted. Typically, a sprite's position will only
move a few pixels per frame, leading to only one or two position changes
relative to the other sprites in the linked list. Frequently, no position
changes will be needed. Note that general-purpose sorting routines are not
usually necessary or even beneficial. Most of the time, the linked list is
nearly sorted anyway. General-purpose sorts like quicksort are overkill and
can even show their worst-case running times when the objects are nearly
sorted.


The Sector Method


The sector method divides the screen or play field into a grid of sectors. In
the simplest case, the screen is divided into quadrants. You determine which
sector each sprite is located in according to its position on the screen. You
then perform standard collision testing among all the sprites within a given
sector. Assuming that sprites are usually well distributed around the screen,
most sectors will only contain a few sprites; some may not contain any. For
instance, Figure 1 shows 6 sprites on a screen divided into 16 sectors.
Sprites #1 and #2 are located in the same sector and would be tested against
each other; the same goes for sprites #4 and #5. Sprite #3 is all alone in its
sector and would not be tested against anything.

The primary difficulty of the sector method is in handling sprites that fall
into more than one sector--sprite #6, for instance. Typically, the sprite is
included in all sectors it covers. As long as sprites are smaller than
sectors, a single sprite can only be located in one to four sectors. The
locations of the sprite's corners determine in which sectors the sprite is
located. If sprites can be larger than individual sectors, determining which
sectors a sprite covers can involve more calculations.
Try to make sector widths and heights a power of 2 larger than the maximum
size of the sprites. This approach makes determining a sprite's sector
location much easier because you can simply divide its coordinate values by a
power of 2. Dividing by a power of 2 is the same as left shifting the
coordinate value, so no slow division instructions or routines need to be
invoked. Sectors don't have to be square: The vertical size of a sector can be
different than its horizontal size. This lets you optimize the number of
sectors versus the sector size to more closely fit the needs of your
particular game.


Hybrid Techniques


Sometimes no single technique can acceptably reduce the number of collision
tests. This happens when many sprites cluster around each other. Using both
game rules and spatial techniques can help out in these cases. For instance,
if you're using the sector technique and find that many sprites end up in the
same sectors, use game rules to reduce the number of collision tests among
sprites within each sector.


Quick Tests


Generally, collision-testing methods are either bounding or pixel based.
Bounding methods are fast, imperfect tests. Pixel-based methods are more
precise, but much slower. Novice game programmers often make the mistake of
using pixel-based methods for all collision tests. Even with aggressive
test-reduction techniques, a pixel-based collision algorithm can slow a game
down if it's used for every test.
Bounding methods don't compare actual sprite pixels with each other. Instead,
a geometric object, typically a rectangle, is used to represent the sprite
object in the test. The rectangle is sized to enclose just the pixels that
make up the sprite. The advantage is that a few simple tests can determine
whether two rectangles touch. Listing Two shows how to test two rectangles for
overlap. The test determination takes no more than four comparisons. In fact,
since logical expressions are evaluated in a short-circuit fashion in C, most
tests will stop before the whole logical expression has been evaluated. The
order of the tests in Listing Two is arbitrary. If objects in a specific game
are typically above and below one another and collide horizontally, you can
put the vertical tests first. Finally, the function call may take up much of
the execution time in Listing Two. You can easily make CollisionTestRect into
a C preprocessor macro using the ?: ternary operator. This allows the test to
be expanded inline, saving the time required for a function call.
The only problem with bounding methods is that they are imperfect. Unless your
sprite is shaped very much like a rectangle, for instance, some pixels
enclosed by the bounding rectangle won't be a part of your sprite. Those
pixels may be a part of your sprite bitmap, but they are transparent.
Technically, these pixels are not part of the object but will be considered so
for the purpose of testing collisions. Thus, a collision might be indicated
between two objects when only a few transparent pixels have actually
intersected.
Perfect collision detection requires pixel-based collision detection, in which
the pixels of the bitmap are tested to see if they overlap in the current
animation frame. Some methods use an auxiliary buffer, where sprites are drawn
into the buffer as they are drawn to video memory and tests are performed as
each pixel is drawn. This approach requires a lot of memory, however, and can
slow down sprite-drawing code.
Another method uses much less memory by first calculating a collision map for
each sprite. A collision map is a bitmap with one bit representing each pixel
in the original bitmap, which could be larger, a 256-color byte-per-pixel
bitmap, for instance. For each solid pixel in the original bitmap, the
corresponding bit of the collision map is set to 1; for each transparent
pixel, the corresponding bit is set to 0.
The program takes bytes from the two collision maps and aligns them according
to the original sprites' positions using shifts and rolls. A logical AND is
then performed between the two collision-map bytes. If any bits of the result
are set to 1, then nontransparent pixels are colliding. Listing Three shows
code that tests two collision maps.
Unfortunately, pixel-based collision testing runs slowly. It requires much
more code than the bounding-rectangle test, and consequently, more run time.
The ideal test would run as fast as the bounding rectangle test but retain the
accuracy of the collision map test. Fortunately, this is possible.


Fast and Accurate


Quick elimination is the secret to fast collision detection. Most collision
tests indicate a negative result. If a game runs at a frame rate of 20 frames
per second (fps) and two objects collide about once a second, that's only one
collision every 20 animation frames. Even if there are many objects on screen
that collide fairly frequently, a given object doesn't collide with most of
the others. Therefore, it makes no sense to use an accurate but slow
collision-test method for every test. A bounding-rectangle collision test can
determine that two objects aren't anywhere near each other as well as a
pixel-based method, and is much faster. The only problem is that it will
sometimes falsely indicate a collision when transparent pixels of the sprite
within the bounding rectangle overlap.
Using both a bounding-rectangle test and a pixel-based test in tandem yields
both speed and accuracy. First, the combined algorithm performs a
bounding-rectangle test. If the sprites don't collide (the usual case), then
the bounding-rectangle test indicates a negative result very quickly and the
program proceeds to the next pair of sprites. If the bounding-rectangle test
indicates a collision, however, the program performs a pixel-based test to
determine whether the bounding-rectangle test was correct. The pixel-based
test runs much more slowly than the bounding-rectangle test, but it is invoked
infrequently.


Acknowledgments


The author gratefully acknowledges Coriolis Group Books for use of some
material from the book PC Game Programming Explorer.
Table 1: Number of collision-detection tests that must be performed for a
given number of objects.
 Objects Collision Tests 
 2 1
 3 3
 4 6
 5 10
 6 15
 7 21
 8 28
 9 36
 10 45
 15 105
 20 190
Table 2: Sprites sorted in ascending order by x-coordinate.
 Sprite Sprite 
 Number X-coordinate Width 
 1 10 10
 2 15 5
 3 18 8
 4 40 10
 5 45 10
Figure 1 Six sprites and their sector locations.

Listing One 

typedef struct _SPRITE_DATA {

 struct _SPRITE_DATA * Next;
 int Top; /* sprite location */
 int Left;
 int Width; /* sprite dimensions */
 int Height;
} SPRITE_DATA;

/*
 Function: CollisionTestSorted
 Description:
 Tests a linked list of sorted sprites to see if they
 potentially overlap. If so, they are collision tested.
*/
void CollisionTestSorted(SPRITE_DATA * SpriteList)
{
 SPRITE_DATA *s1, *s2;
 int s1Right;

 s1 = SpriteList;
 while (s1 != NULL) {
 s1Right = s1->Left + s1->Width - 1;
 /* Compare s1 with all following sprites until left edge */
 /* of a following sprite is located beyond the right */
 /* edge of s2. */
 s2 = s1->Next;
 while (s2 != NULL && (s1Right > s2->Left)) {
 CollisionTest(s1, s2);
 s2 = s2->Next;
 }
 s1 = s1->Next;
 }
}



Listing Two

typedef struct {
 int Left;
 int Top;
 int Right;
 int Bottom;
} RECT;

/*
 Function: CollisionTestRect
 Description:
 Tests two bounding rectangles to see if they overlap.
 Returns TRUE if so, FALSE otherwise.
*/
BOOL CollisionTestRect(RECT * r1, RECT * r2)
{
 if (r1->Left > r2->Right r2->Left > r1->Right 
 r1->Top > r2->Bottom r2->Top > r1->Bottom) {
 return FALSE;
 }
 else {
 return TRUE;
 }

}



Listing Three

typedef unsigned int UINT16;
typedef unsigned char UINT8;

typedef struct {
 UINT16 Width; /* sprite pixel width / 8 bits per pixel */
 UINT16 Height;
 UINT8 Data; /* first byte of variable length data */
} COLLISION_MAP;

/*
 Function: CollisionTestBitmap
 Description:
 Tests two objects using COLLISION_MAPs. The upper left corner
 of each object is specified with (x1, y1) and (x2, y2).
*/
BOOL CollisionTestBitmap
 (
 COLLISION_MAP far * Object1,
 COLLISION_MAP far * Object2,
 int x1,
 int y1,
 int x2,
 int y2
 )
{
 UINT8 far * Data1;
 UINT8 far * Data2;
 COLLISION_MAP far * SwapTemp;
 int DeltaX;
 int DeltaY;
 int Shift;
 int Skip;
 UINT16 WidthCounter1;
 UINT16 WidthCounter2;
 UINT16 HeightCounter1;
 UINT16 HeightCounter2;
 UINT8 Object1Data;
 UINT8 ShiftRegister;
 UINT8 OldObject2Data;
 UINT8 NewObject2Data;
 UINT8 FinalObject2Data;

 assert(Object1 != NULL);
 assert(Object2 != NULL);

 DeltaX = x2 - x1;
 DeltaY = y2 - y1;

 /* swap objects to make the algorithm work */
 if (DeltaX < 0) {
 SwapTemp = Object1;
 Object1 = Object2;
 Object2 = SwapTemp;

 DeltaX = -DeltaX;
 DeltaY = -DeltaY;
 }

 Data1 = (UINT8 far *) &(Object1->Data);
 Data2 = (UINT8 far *) &(Object2->Data);

 HeightCounter1 = 0;
 HeightCounter2 = 0;

 /* skip rows off the object with the least Y-value */
 if (DeltaY > 0) {
 Data1 += Object1->Width * DeltaY;
 HeightCounter1 += DeltaY;
 }
 else if (DeltaY < 0) {
 Data2 += Object2->Width * -DeltaY;
 HeightCounter2 -= DeltaY;
 }

 Shift = DeltaX % 8; /* amount to shift object 2 data to right */
 Skip = DeltaX / 8; /* number of bytes to skip at beginning of */
 /* object 1 data line */

 while (HeightCounter1 < Object1->Height &&
 HeightCounter2 < Object2->Height) {

 /* potentially skip a few bytes 'cause obj 1 is to left of obj 2 */
 WidthCounter1 = Skip;
 Data1 += Skip;

 WidthCounter2 = 0;
 OldObject2Data = 0;

 while (WidthCounter1 < Object1->Width &&
 WidthCounter2 < Object2->Width) {

 /* get data */
 Object1Data = *Data1++;
 NewObject2Data = *Data2++;
 /* shift object 2 data to correct delta X differential */
 ShiftRegister = ((UINT16) OldObject2Data << 8) 
 (UINT16) NewObject2Data;
 ShiftRegister >>= Shift;
 FinalObject2Data = ShiftRegister & 0xFF;

 /* return if we have a collision */
 if (Object1Data & FinalObject2Data) {
 return TRUE;
 }

 OldObject2Data = NewObject2Data;
 WidthCounter1++;
 WidthCounter2++;
 }

 /* correct pointers at end of line */
 Data1 += Object1->Width - WidthCounter1;
 Data2 += Object2->Width - WidthCounter2;


 HeightCounter1++;
 HeightCounter2++;
 }

 /* we got through all that with no collision */
 return FALSE;
}























































Theatrix: A C++ Game Class Library


Encapsulating arcade-game operations




Al Stevens


Al is a DDJ contributing editor and can be contacted on CompuServe at
71101,1262.


PC game programming is currently very popular among programmers largely
because of the overwhelming success of games such as DOOM and Myst. Games
themselves have always been part of the personal-computer phenomena. Among the
first PC games to be widely used were Microsoft's Flight Simulator, a
graphical simulation where you fly a Cessna 182, and Adventure, a text-mode
tour through a cave of dragons, dwarfs, mazes, chasms, and other imaginative
obstacles. Those games ran very well on the small PCs of their time--4.77-MHz
8088 machines with 640 Kbytes of RAM, 360-Kbyte diskette drives, and little or
no hard-disk space. By comparison, today's desktop machines are
supercomputers, and the best contemporary games take full advantage of the
processing power and high-resolution graphics of mainstream configurations. 
Theatrix is a C++ class library that encapsulates the operations of typical
arcade games. The name comes from the metaphor that the library
implements--games are viewed as theatrical productions, with directors,
players, and scenery. You build a game by designing these components with
graphics tools and by deriving from the Theatrix classes, modifying the
behavior of the classes to provide the actions in the game. An event-driven
programming model sends controller and timer-event messages to the game's
directors and players.
The Theatrix library is the subject of a book I am writing in association with
the library's author, Stan Trujillo. Stan brought the library to me for my
opinion. Its level of abstraction was impressive: In a couple of days and with
only about 500 lines of C++ code, I built an arcade-style game with background
scenery and seven sprites that move around the screen in the fashion of an
animated cartoon. Most of the work was the artistic part--designing the
scenery and the sprites with a paint package.
The book project came about when we realized that a programmer can build a
comprehensive set of game-creation tools from widely available freeware and
shareware programs. There are paint programs, 3-D modelers, ray-tracers,
image-format converters, graphics libraries, sound editors, and so on. All
that was missing was a class library to encapsulate the organization of the
graphical elements into a game scenario. Stan's Theatrix library filled that
need, and we agreed to publish it in a book that includes the source code for
the library, several demonstration games, and the shareware and freeware
programs on a companion CD-ROM. The book is to be entitled C++ Games
Programming and will be published in mid-1995 by M&T Books.


Abstraction


Theatrix provides several levels of interface. Each lower level in the class
hierarchy encapsulates and hides more of the details of the game
implementation, raising the programmer's level of abstraction in his or her
view of the problem. At the highest level of abstraction, you create scenes
that include players under the control of directors. You are unaware of (and
do not care) how the library manages the low-level details of page flipping,
z-ordering, sound generation, and the like. Those details are hidden in the
Theatrix class implementations.
In this article, I'll describe Theatrix at its highest level of abstraction
and provide example code that uses the library at that level. When the book is
available, you will be able to use these examples to build the simple demo
game discussed here, as well as others, by using the library and tools on the
companion CD-ROM. When the library is complete, the entire source code will be
available electronically; see "Availability," page 3.


A Graphics Library


The lowest level of game control is handled by a graphics video package. To
support the objectives of Theatrix, the package must be able to display
full-screen, static graphics scenes and superimpose smaller frames of graphic
sprites at refresh rates fast enough to suggest movement--to achieve
animation. We chose FastGraph, a graphics library from Ted Gruber Software
(Las Vegas, NV) known for its efficiency and performance. To experiment with
Theatrix classes, you will need at least the shareware version of FastGraph,
which is available for download from the graphics forums on CompuServe and
other online services. For serious Theatrix-based game development, you should
get the commercial version of FastGraph. As with all shareware programs, the
downloaded version includes ordering information for the registered version.
A future version of Theatrix will work with the graphics APIs of Win32,
allowing portable games that compile and run on both DOS and Windows with few
or no source-code changes. The same executables will run with Windows NT,
Windows 95, and Windows 3.1 with WinG and Win32s. That work is underway, but
not yet ready for widespread consumption.


Game Action


A game usually consists of several ancillary displays--menus, help screens,
and options screens--but they all eventually get down to the action. That's
the part that I'll discuss here. The game's action is organized into scenes,
with each scene under the control of a director. The scene director is a
game-dependent class derived from the Theatrix SceneDirector class. Only one
scene is playing out at any given time. Each scene has a background display
and one or more players--sprites--derived from the Theatrix Player class.
Players move around the screen and do things in response to external events
such as keystrokes and ticks of the clock. You build action into a game by
deriving player objects that remember their current, game-dependent mode and
change their modes, image, and position at regular intervals based on those
modes. Players can communicate with one another by using messages or simply
calling member functions. Players and directors communicate with one another
in the same ways.
There are several distinctions between the graphical representation of a
scene's background and that of the players: The background is stationary,
occupies the entire screen, and represents the lowest (most-distant) z-order;
the players are smaller, superimposed over the background, and positioned
anywhere on the screen. Players move around, maintaining distinct z-order
relationships with one another and are always in front of the background. The
z-order determines which player passes in front when two players intersect on
the screen. Players can enter and leave the scene through portals (doors, for
example) defined as clipping parameters for the display of the player. There
are usually several graphic renderings for each player and one image for the
scene's background. The player tells Theatrix which player image to display at
any given interval, and the combination of images generates animated
sequences.


Game Timers and Events


Theatrix uses the 18.2-ticks-per-second clock frequency to manage animation.
The scene director includes an on_timer function called at that interval. The
players each have an update_position function that is called at intervals
specified in numbers of ticks. The system automatically calls update_position,
a virtual member function of the Player base class. Players and directors may
also register to be called when external events occur. For example, a player
may use a specified keystroke to initiate an action, such as firing a weapon
or changing the direction of movement. The registration specifies the event
and the member function in the player's class to be called when the event
occurs. The derived Player class declares the registrations with macros.


Game Architecture 


A Theatrix-based game metaphor usually follows this scenario: The game program
instantiates a derived SceneDirector object, which constructs itself and
instantiates one or more derived Player objects. The director and the players
register for events with macros at compile time. Each player has a z-order
based on its order of instantiation within the director, and the player's
constructor specifies how frequently the player is to be called to update its
position and image.
Once every clock tick, Theatrix calls the director's on_timer function. The
director monitors the action in the game and sends messages to players to tell
them what to do next. At regular intervals, Theatrix calls each player so that
the player can update its image and position. 
The director and players respond to events for which they have registered.
These events cause the director and players to change either their own mode
parameters or those of other elements in the game. The players' timer-driven
functions respond to these modes and call Theatrix functions to change their
position and image.


Inside Theatrix



Theatrix maintains three screen buffers. One always contains a rendering of
the background scenery with no players in view. This buffer is constant and
never changes. Its purpose is to provide screen segments of the untouched
background to effectively erase a player's current image. The second buffer is
a working buffer in which Theatrix builds the next screen to be displayed. The
third buffer is the one that the user is currently viewing. Most Theatrix
games use the VGA's Mode X, which has 320x240-pixel resolution, 256 colors,
and three video buffers. Theatrix takes advantage of the fact that
memory-to-memory writes are faster when both buffers are video buffers.
During construction, a scene-director object tells Theatrix the name of a PCX
file that contains its graphical rendering. Theatrix uses this file to build
the first versions of the three buffers. When player objects construct, they
tell Theatrix the name of a file of images that contain the animation still
frames for the player. Theatrix reads these image clips and stores them in
extended memory, if the computer has it, or in conventional memory otherwise.
Each player may also use an optional file of sound clips in Sound Blaster VOC
format; Theatrix likewise stores these clips in extended or conventional
memory. These files of image and sound clips are in a Theatrix-specific
format. You build the files with utility programs that organize the clips into
the format of the database.
Instantiated players are either on or off stage. At each tick of the clock,
Theatrix iterates through the current scene director's list of on-stage
players. If the player's refresh interval has expired, Theatrix calls the
player's update_position function to allow the player to modify its position
and image. If the circumstances of the game tell the player to make any
changes, it does so by calling Theatrix functions to change its image number,
screen coordinates, and possibly its clipping coordinates. These functions
only post the changed values; they do not take any immediate action on the
image itself. When the player returns, Theatrix uses the player's updated
image number and screen position (or existing ones, if no update occurred) to
superimpose the image onto the background in the working screen buffer (not
the visible one). After iterating through all the players, Theatrix swaps the
working screen buffer with the visible one, which displays the updated frame
with scenery and all the players in their new configurations. Then, while the
user views the updated screen for an instant, Theatrix iterates through the
players again, using their current image coordinates and size to erase the
player's image by copying a rectangle from the constant scenery buffer to the
working buffer. This process prepares the working buffer for the next refresh
frame of the entire scene.
A scene director can change the z-order of its players. It does so by
monitoring the progress of the action and deciding that a player needs a
different position in the z-coordinate system with respect to the other
players. Players are maintained in a linked list among the director's data
members. To change the z-order, the director changes the position of a player
in the list. Theatrix provides interface functions to support this operation.


Graphical Elements


A game's graphical elements consist of PCX files. A scene is a PCX file that
fills the 320x240-pixel screen. Players are represented by smaller PCX files
organized into player-oriented graphics databases. Each image of a player
includes the image's 256-color representation on a solid black background,
which the low-level graphics function library uses for a transparent color.
Elements of the background scene display through the transparent parts of a
player. This permits you to build a player of any shape in a rectangular PCX
file.
Building those pictures is the biggest part of game construction, (although
designing a good game scenario is no trivial task). Your choice of tools
depends on your artistic abilities and the look you want for the game. We
built some games by using a paint program to construct all the graphical
elements, which gave the game an arcade look and feel. Years ago, I was a
newspaper cartoonist, and those skills, rusty though they are, came in handy
during this project.
We built other game graphics by using 3-D modeling and ray tracing to achieve
photo-realistic images. Games with surrealistic scenery, spacecraft, robots,
and accurate perspective are best built this way, although designing and
rendering such objects can involve a substantial time investment.
Your choice of a tool for building pictures has no effect on the game's
performance, however. To Theatrix, the scenes and players are all PCX files
that display on a 256-color, 320x240-pixel Mode-X screen.


An Example


Listing One is skater.cpp, the C++ code that implements a demo of a game.
Although the game doesn't do much, the code contains all the elements of a
more-complex game built under Theatrix. As Figure 1 shows, the game's
background is a skating pond. There are three players in the game: Two of them
stand still while the third skates around them in a figure eight. If you press
the Enter key, the skating player breaks through the ice, making a splashing
sound.
Listing One begins by deriving the Skater class from Theatrix's Player class.
Two data-member integers maintain a step count (actually a skid count, since
the skater is skating) and the current action mode. There are eight modes
during the normal course of the game, representing the eight segments of the
figure eight, which is actually a squared eight to simplify the example. The
DECLARE_CUELIST macro declares the presence of a list of cue registrations
that assign event messages to the class. The CUELIST macro series that follows
the class declaration declares the cues that the player receives. In this case
there is only one cue, which occurs when the user presses the Enter key. The
KEYSTROKE macro specifies a key value and a function to execute when the key
is pressed. This method registers all objects of a class for a common set of
events. Individual objects of a class can register other events independently
of one another by calling functions in Theatrix. Besides keystroke events,
there are timer and logical-message events.
The Skater constructor specifies the name of the graphics and sound-effects
files that a skater uses to animate itself and make sounds. The
Skater::update_position function is called from Theatrix once each timer tick
to modify the skater's position and image, if appropriate. The function tests
the mode data member to see what to do next. Each segment of the figure eight
involves an image and a number of steps through the x- or y-coordinate,
depending on whether the segment is horizontal or vertical. I built the images
in three sizes to suggest perspective as the skater gets further away from or
nearer to the user in the figure-eight pattern.
If the mode member is greater than 8, the user has pressed the Enter key, and
the skater crashes through the ice. The program controls that sequence by
incrementing modes until it gets to Mode 13. The program manages all animation
sequences by calling setxy and set_imageno with appropriate parameters for
each execution of update_position.
A Stander class is derived from Player to represent the two stationary
figures. This class shares the graphics file with the Skater class, and it has
no sound-effects file.
The Pond class is derived from Theatrix's SceneDirector class. It declares
instances of Stander classes and sets their position and image numbers. The
Pond class also instantiates the Skater object, which manages its own
representation and movement.
The Pond::on_timer function overrides SceneDirector's virtual function. The
overriding function gets called once for each timer tick, or approximately 18
times per second. Its purpose is to watch the progress of the game and give
direction as needed. In this case, the on_timer function monitors the skater's
progress. When the skater enters the forward, center, or rear lateral segments
of the skating pattern, the director must change the skater's z-order so that
the sprites display appropriately. Only the director can change the z-order.
If the player tried to do it from within its update_position function, the
change could be a problem. That function is executing from within an iteration
through the list of players. Changing the z-order changes the sequence of
players in the list, which could have an undesirable side effect on the
iteration. A player can tell the director to change the player's z-order, but
only from a different function, perhaps an event-driven one.
Theatrix is a good example of the powers of abstraction. Listing One contains
less than 200 lines of C++ code. However, when combined with a sound clip and
some PCX files, the code implements most of an arcade game's contents. You can
download an executable version of the skater game and run it to see what you
can do with Theatrix and some shareware utilities. The graphics are primitive,
roughed out in about an hour with a laptop computer and the wonderful NeoPaint
shareware program from Neosoft (Bend, OR). You could use a 3-D modeler and ray
tracing to build other PCX files and give the game a completely different,
photorealistic look, yet the code would be virtually unchanged.
Figure 1 Sample Theatrix game.

Listing One 

// ------- skater.cpp
#include <theatrix.h>
const int sidesteps = 25; // number of steps in lateral movement
const int fwdsteps = 12; // number of steps in front/rear movement
const int sstepincr = 5; // lateral x coordinate increments
const int fstepincr = 3; // front/rear y coordinate increments
// ------- a moving sprite
class Skater : public Player {
 short int steps; // number of steps taken current segment
 short int mode; // 1-8 = skating pattern #; 11-13 = splash
 friend class Pond;
 void OnEnter(int);
protected:
 DECLARE_CUELIST
public:
 Skater();
 virtual ~Skater() { }
 void update_position();
};
// ---- event message map for Skater class
CUELIST(Skater)
 KEYSTROKE('\r', OnEnter)
ENDCUELIST
// ---- construct a moving sprite
Skater::Skater() : Player("skater.gfx", "skater.sfx")
{
 setxy(90,145); // initial position on pond
 set_imageno(1); // first skater frame

 appear();
 steps = 0;
 mode = 1;
}
// --- skater frame animation entry point (once every tick)
void Skater::update_position()
{
 switch (mode) {
 case 1:
 case 3:
 case 5:
 case 7:
 // --- side to side movement
 if (++steps == sidesteps) {
 steps = 0;
 set_imageno(++mode);
 break;
 }
 if (mode & 2) // modes 3 and 7: to the left
 setx(getx() - sstepincr);
 else // modes 1 and 5: to the right
 setx(getx() + sstepincr);
 break;
 case 2:
 case 4:
 case 6:
 case 8:
 // --- front or back movement
 if (++steps == fwdsteps) {
 steps = 0;
 if (mode == 8)
 mode = 0;
 set_imageno(++mode);
 break;
 }
 if (mode < 6) // modes 2 and 4: away from the screen
 sety(gety() - fstepincr);
 else // modes 6 and 8: toward the screen
 sety(gety() + fstepincr);
 break;
 case 9:
 setinterval(3); // slow down refresh rate
 set_imageno(13); // 1st frame of ice-breaking splash
 mode++;
 play_sound_clip(1);
 break;
 case 10:
 set_imageno(12); // 2nd frame of ice-breaking splash
 mode++;
 break;
 case 11:
 set_imageno(13); // 3rd frame of ice-breaking splash
 mode++;
 break;
 case 12:
 set_imageno(11); // hole in ice
 mode++;
 steps = 0;
 break;

 default:
 if (steps++ == 30)
 stop_director();
 break;
 }
}
// ---- pressed Enter, break the ice
void Skater::OnEnter(int)
{
 if (mode < 9) {
 int yp = 35;
 if (mode == 3 mode == 7)
 yp = 25;
 else if (mode > 3 && mode < 7)
 yp = 15;
 setxy(getx()-10, gety() + yp);
 mode = 9;
 }
}
// ---- stationary sprites
class Stander : public Player {
public:
 Stander() : Player("skater.gfx") { }
};
// ---- the scene: a skating pond
class Pond : public SceneDirector {
 Stander *stander1;
 Stander *stander2;
 Skater *skater;
 void on_timer();
public:
 Pond();
 ~Pond();
};
// ---- construct the scene
Pond::Pond() : SceneDirector("pond.pcx")
{
 // --- most distant stationary sprite
 stander1 = new Stander;
 stander1->set_imageno(10);
 stander1->setxy(180,100);
 stander1->appear();
 // --- closest stationary sprite
 stander2 = new Stander;
 stander2->set_imageno(9);
 stander2->setxy(140,130);
 stander2->appear();
 // --- moving sprite
 skater = new Skater;
}
Pond::~Pond()
{
 delete skater;
 delete stander2;
 delete stander1;
}
// ---- called after each timer tick
void Pond::on_timer()
{

 SceneDirector::on_timer();
 if (skater->steps == 0) {
 switch (skater->mode) {
 case 1:
 // -- front lateral segment
 // set moving sprite in front of others
 MoveZToFront(skater);
 break;
 case 3:
 case 7:
 // -- center lateral segment
 // set moving sprite between other two
 ChangeZOrder(skater,stander2);
 break;
 case 5:
 // -- rear lateral segment
 // set moving sprite behind others
 ChangeZOrder(skater,stander1);
 break;
 default:
 break;
 }
 }
}
int main(int argc,char *argv[])
{
 process_cmdline(argc,argv);
 Pond *pond = new Pond; // scene director
 begin("Pond"); // launch scene
 delete pond;
 return 0;
}































Video for Windows and WinG


Writing a custom draw handler




Christopher Kelly


Chris is senior engineer at Symbionics Video Ltd., a technology development
company that focuses on computer-video applications. He can be contacted at
cpk@symbionics.co.uk.


While Video for Windows (VfW) has been available for sometime, the only
programming documentation for it is a help file supplied with the VfW
developers kit. This lack of information is unfortunate since VfW is
remarkably interesting and offers numerous opportunities for creative
programmers.
For instance, writing a custom draw handler is a commonly used technique and
is the basis of WinToon, the Microsoft cartoon engine. (WinToon is essentially
a canned sprite playback engine for animators. Walt Disney's Lion King
software, for example, is a WinToon app.) In this article, I'll develop a
custom draw handler and use it in conjunction with Microsoft's games
interface, WinG, to scroll text across a video window.


The Media Control Interface


With the release of Windows 3.1, Microsoft made multimedia support a core part
of the operating environment. Windows 3.1 was initially geared toward
wave-form audio and Musical Instrument Digital Interface (MIDI) devices.
Consequently, a central part of the architecture is the Media Control
Interface (MCI), which provides a uniform method of accessing multimedia
devices.
From a programmer's perspective, MCI makes multimedia devices look like
software VCRs, with commands such as play, pause, seek, and stop. The control
of each MCI device is encapsulated in a driver called an "MCI command
interpreter." This design permits the addition of new command interpreters as
new devices become available.
The first version of MCI came with support for wave-form audio and MIDI
devices. When VfW was released, it was shipped with its own command
interpreter. A later version of VfW, built on top of MCI, provided the
preregistered window class, MCIWnd, that supports video playback. MCIWnd makes
writing VfW applications a straightforward process.


A Sample VfW Program


As Listing One shows, a video-playback program with considerable functionality
can be written in less than 50 lines of code. Most of the code, in fact, has
nothing to do with VfW and is merely the minimum code required to write a
Windows program.
By using a modal dialog box, you avoid having to register a window class and
create a message loop. An MCIWnd child window is created to fill the client
area of the dialog box. This window displays the video image and provides a
number of buttons for loading a video file and controlling its playback.
You create the MCIWnd window by calling the function MCIWndCreate() in
response to the WM_ INITDIALOG message received by the dialog box. This
function is flexible and accepts a number of flags that control window
attributes, such as whether it has a menu or slider control. The
MCIWNDF_NOTIFYSIZE flag requests the window to send notifications (using the
MCIWNDM_NOTIFYSIZE message) to the dialog whenever the child window changes
size. The dialog box responds to this message by resizing itself precisely to
enclose the child MCIWnd window within its client area.
The final issue is handling palette changes brought about when the focus
shifts between applications. In the dialog box, you must respond to
WM_PALETTECHANGED and WM_QUERYNEWPALETTE messages and route them to the MCIWnd
window for processing.
To build the program, you will need to get the VfW developer kit, which is
available, among other sources, from the Microsoft Developers Network Level 2
CD-ROM. Note that the program I develop here requires the vfw.h header file
and the vfw.lib library available from the MSDN. You also must include the
mmsystem.lib library, a standard part of the Windows SDK that should come with
your compiler.
This program is a fully functional playback application. It displays a small
window with a play/stop button, menu button, and slider bar. The play/stop
button is disabled until a video file is loaded. The menu button displays a
pop-up menu, initially containing a single item for loading a video file. Once
the file is loaded, the menu offers a variety of options, including
controlling the video size and the audio volume. There are a couple of things
you might like to try with these controls: 1. Hold down Ctrl while pressing
the play button. This causes the video to play full screen. Be aware that not
all display drivers support full-screen playback. 2.Hold the Shift key while
pressing the play button. This plays the video backward. Although jerky, it
does work.
The program also will play other multimedia files such as wave-form (.WAV)
files.


What's Happening Under the Hood?


The example program is deceptively simple. Notice that the video appears to be
playing in the background. This is because it is played under the control of a
hidden program, MMTASK.TSK, started by the MCI subsystem. MMTASK.TSK is a
program despite not having an .EXE extension. 
Consider the sequence of actions MCI carries out to display the video in the
way that we have seen:
1. Read the compressed video and audio data from the file.
2. Decompress the video.
3. Decompress the audio.
4. Send the decompressed video data to the display hardware.
5. Send the decompressed audio data to the audio hardware.
The video and its associated audio are stored on disk in audio video
interleaved (AVI) format, a special case of the Resource Interchange File
Format (RIFF). (For a discussion of RIFF, see "Inside the RIFF Specification,"
by Hamish Hubbard, DDJ, September 1994.) Conceptually, an AVI file appears as
a number of streams of data. For example, one stream will contain the video
and another the audio. For performance reasons, the video and audio frames are
interleaved in the file on a frame-by-frame basis. MCI reads the streams using
a VfW subsystem called AVIFile, which provides a rich set of functions for
reading and writing AVI files.
At this point, the audio and video are still compressed and must be
decompressed before rendering. VfW has two subsystems that handle this task.
The installable compression manager (ICM) handles video, and the audio
compression manager (ACM) handles audio. These two subsystems have a lot in
common: Each uses a driver architecture in which the task of decompressing the
data is delegated to DLLs known as "codecs." Once decompressed, the video and
audio can be sent to the display and audio hardware. At this point, the video
frame is a device independent bitmap (DIB) and is displayed using a
high-performance bitblt function in the DrawDib subsystem. When VfW is
installed or the display-driver mode is changed, DrawDib profiles the various
methods of performing a bitblt and selects the quickest. The latest release of
VfW (Version 1.1d) will use the display control interface (DCI) for accessing
the frame buffer directly as long as a DCI provider is present. 
That explanation gives a somewhat simplified view of what actually happens.
Some video codecs, known as "rendering drivers," are capable of sending data
directly to the display. They may even make use of video hardware for part of
the decompression process, such as color-space conversion or scaling. The
relationship between the ICM and DrawDib is quite tight. DrawDib will accept
compressed video data and automatically send it to the ICM for decompression.
The binding between the ACM and the Windows sound subsystem is equally tight.
A component called the "wave-form mapper" intercepts any compressed audio data
sent to the sound subsystem and routes it to the ACM for decompression.


A More Complicated VfW Program


The next example is similar to the last, except it scrolls the text "Hello
World" across the video window. This effect is achieved by adding a custom
draw handler to intercept the DIB just before it is displayed on the screen.
It then draws the text into the DIB before rendering it using DrawDib. This
illustrates a general technique that can be used for a variety of effects. For
example, it is the method used by Michael Windser to implement WinToon. Like
WinToon, this example uses WinG for drawing on the DIB.

Before going further, it is necessary to define a custom draw handler. To do
that, we must take a further step back and explain what an installable driver
is.


Installable Drivers and Draw Handlers


An installable driver is a DLL that has a particular entry point that must be
called DriverProc(). The driver is registered with Windows using the Drivers
applet in the control panel. If you start this applet, you can see the list of
installable drivers present on your machine. Several components already
mentioned--the audio and video codecs, the wave-form mapper, and the MCI
command interpreters--are installable drivers. Windows uses the DriverProc to
send messages to a driver in much the same way that it calls a window
procedure to send messages to a window, although the set of messages is
completely different.
The DriverProc has the following parameters:
driverID is a driver-supplied value that the driver passes back to Windows
when the driver is opened. This value is then passed to the driver on all
subsequent calls to DriverProc.
gDriver is a unique value assigned by Windows to identify the driver.
msg identifies the message.
 lParam1 and lParam2 are 32-bit values, the meaning of which depends on the
value of msg.
The messages can be divided into standard messages and driver-type-specific
messages. Standard messages are sent to all installable drivers, whereas
type-specific messages are sent only to drivers of a particular type. For
example, a defined set of type-specific messages is sent to all video codecs.
A draw handler is similar to the DriverProc of an installable driver in that
it must have the same prototype, and it receives the same messages. A draw
handler does not, however, need to be named DriverProc.


What is WinG?


Microsoft is anxious to make Windows a good platform for games. In support of
this initiative, Microsoft has developed WinG, the games-programming
interface, which was released late last year. A technique commonly used by
games writers is to compose an image in an off-screen buffer before copying it
to the display. This composition may involve using standard drawing primitives
or, for some operations, direct manipulation of the bits of the image. Using
both drawing primitives and direct manipulation is difficult in Windows
because the bitmaps used by the graphic device interface (GDI) graphics engine
are device dependent. WinG solves this problem by providing specialized device
contexts and bitmaps. You can draw into a WinG bitmap using the standard GDI
drawing primitives or directly manipulate it as a DIB. In the next example,
I'll use both of these access methods. 


In at the Deep End


The "Hello World" example program in Listing Two is based on the first example
with the addition of a draw handler. 
Since WinG can only cope with 256-color palletized displays, this program must
be run in a 256-color display mode. You should call the function
Is256ColorDisplay() in WinMain(), which performs the necessary check to ensure
this is the case. Most of the remaining code you need to write deals with the
draw handler together with the functions that it calls to process messages. A
word of warning about the prolog code for the draw handler: Because it is
called from within the context of MMTASK.TSK, smart callbacks will not work.
You must call MakeProcInstance() to generate the code to correctly load the
data segment on entry. This type of application is one of the few places where
it is still necessary to use instance thunks in Windows programming.
You pass the MCIWNDF_NOTIFYMEDIA flag to MCIWndCreate(), requesting the MCIWnd
window to send notifications (MCIWNDM_NOTIFYMEDIA) to the dialog whenever a
new file is loaded. Install your draw handler in response to this message.
Many of the messages that your draw handler receives require little or no
processing. In Listing Two, these messages are grouped together for
convenience at the beginning of the draw handler.
The first message that your draw handler will receive is DRV_OPEN, and in
response, you should allocate a data structure of type DrawInfo. This will be
used to store information required for processing later messages. The address
of this data structure should be returned from the draw handler and it will be
passed as the driverId parameter of subsequent messages. In C++, structures
can have methods as well as data members. For DrawInfo, this convenience
allows you to write methods to handle each of the messages received by the
draw handler. The constructor for the DrawInfo structure should allocate the
WinG device context that will be used for drawing and also call DrawDibOpen()
to register with DrawDib. The destructor for DrawDib must release any
resources acquired by the draw handler and also deregister with DrawDib.
The next message you get is the ICM_ DRAW_SUGGESTFORMAT, asking which DIB
formats you are prepared to accept. The proper response is, "8-bit-per-pixel
uncompressed DIBs." VfW will attempt to find a codec that will convert the DIB
to this format before passing it to you.
Before asking your draw handler to draw any DIBs, Windows will send the
ICM_DRAW_BEGIN message, allowing you to perform any necessary preparation.
There are several things you must then do. The DrawDib subsystem must be
prepared by calling the function DrawDibBegin(), and you must call
WinGCreateBitmap() to create a WinG bitmap, which will be used for drawing the
text onto the DIB. This also is the point where you should store the source
and destination rectangles in the DrawInfo structure. The ICM_DRAW_BEGIN
message may be sent to you several times, and you should prevent resource
leakage by deleting any WinG bitmap that you may have allocated in an earlier
call.
Before you can draw any DIBs, the palette must be initialized correctly. You
will receive the ICM_DRAW_REALIZE message asking you to realize the palette.
You should call DrawDibRealize to do this for you. 
And now to the main work of draw handler: You will be sent the ICM_DRAW
message whenever you must render a DIB. For this, you should compose the DIB
to be displayed in the WinG bitmap, then call DrawDibDraw() to display it. To
compose the DIB, you should first copy the DIB you are given into the WinG
bitmap, then use the GDI function TextOut() to write the text on top. In the
example, the code to compose the DIB is confined into the method
ComposeFrame(), making it easy for you to modify the composition and devise
your own interesting effects. 
Your draw handler must handle any palette change requests that may occur. Call
DrawDibChangePalette to do this.
Finally, some advice on how to build the example application: You will need
the VfW and WinG developer kits, which are both on the Microsoft Developers
Network Level 2 and Multimedia Jumpstart 2.0 CD-ROMs. In addition, the WinG
developer kit is available on the Internet from Microsoft's FTP site
(ftp.microsoft.com) and in the Windows Multimedia forum on Compuserve (GO
WINMM).
If the example works but displays garbage text rather than "Hello World," then
you are probably using smart callbacks. See your compiler documentation for
information on how to turn these off.


Where to Go from Here


For more information about VfW, you should refer to the help file that comes
with it. At first sight, this can be somewhat intimidating because there is no
architectural overview. However, the effort is worthwhile, since there is a
wealth of information hidden within. It is also worth spending some time
examining the sample applications that come with the VfW developer kit.

Listing One 
#define STRICT
#include <windows.h>
#include <string.h>
#include <vfw.h>

static HINSTANCE hInstanceG = 0; // Data instance handle.
static HWND hMCIWndG = 0 ; // Handle of the MCI display window.

// Function prototypes
static void ResizeWindowToFit(HWND hWnd);
// Make DlgProc extern "C" to prevent C++ name mangling.
extern "C" 
BOOL CALLBACK DlgProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);


int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInst, LPSTR pCmdLine, 
 int cmdShow)
{
 return DialogBox(hInstanceG = hInstance,"AVISEE",0,DlgProc); 
}

// Dialog Procedure
BOOL CALLBACK DlgProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
 switch(msg)
 {
 case WM_INITDIALOG:
 hMCIWndG = MCIWndCreate(hWnd,hInstanceG,
 WS_CHILD WS_VISIBLE MCIWNDF_NOTIFYSIZE,0);
 ResizeWindowToFit(hWnd);
 return TRUE;
 case WM_CLOSE:
 EndDialog(hWnd,0);
 return TRUE;
 case WM_PALETTECHANGED:
 case WM_QUERYNEWPALETTE:
 SendMessage(hMCIWndG,msg,wParam,lParam);
 return TRUE;
 case MCIWNDM_NOTIFYSIZE:
 ResizeWindowToFit(hWnd);
 return TRUE;
 }
 return FALSE;
}
static void ResizeWindowToFit(HWND hWnd)
{
 RECT rect;
 GetWindowRect(hMCIWndG,&rect);
 AdjustWindowRect(&rect,GetWindowLong(hWnd,GWL_STYLE),FALSE);
 SetWindowPos(hWnd,0,0,0,rect.right-rect.left,rect.bottom-rect.top,
 SWP_NOMOVE SWP_NOZORDER);
}


Listing Two
#define STRICT
#include <windows.h>
#include <windowsx.h>
#include <string.h>
#include <vfw.h>
#include <mmsystem.h>
#include <digitalv.h>
#include <mciavi.h>
#include <wing.h>

// Global Variables
static HINSTANCE hInstanceG = 0; // Data instance handle.
static HWND hMCIWndG = 0 ; // Handle of the MCI display window.
static FARPROC pDrawHandlerThunkG=0; // Instance thunk for draw handler.

// Private data structure used for storing drawing information.
// This is C++ so it can have methods.
struct DrawInfo
{

// Methods
 DrawInfo();
 ~DrawInfo();
 LRESULT Begin(ICDRAWBEGIN FAR *pBegin);
 LRESULT Draw(ICDRAW FAR *pDrawStruct);
 LRESULT End();
 LRESULT ChangePalette(LPBITMAPINFOHEADER pInfoHeader);
 LRESULT GetPalette();
 LRESULT Realize(HDC hDC, BOOL background);
 BOOL CanHandleFormat(LPBITMAPINFOHEADER pInfoHeader);
 void ComposeFrame(LPBITMAPINFOHEADER pInfoHeader, LPVOID pImageBits);
 LRESULT SuggestFormat(ICDRAWSUGGEST FAR *pSuggest);
// Data members
 LPVOID pBuffer_;
 HDRAWDIB hDD_;
 HDC hDC_;
 HDC hWinGDC_;
 HBITMAP hWinGBitmap_;
 HBITMAP hOldBitmap_;
 int xDst_; // Destination rectangle
 int yDst_; 
 int dxDst_;
 int dyDst_;
 int xSrc_; // Source rectangle
 int ySrc_; 
 int dxSrc_;
 int dySrc_;
 char aCaption_[32]; // Text to write
 int captionX_; // Current position of
 int captionY_; // the text on the window.
 int windowWidth_; // Width of the video window.
} ;
// Function prototypes
// Make exported functions extern "C" to prevent C++ name mangling.
extern "C" 
{
BOOL CALLBACK DlgProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
LRESULT CALLBACK DrawHandler(DWORD id, HDRVR hDriver, UINT MSG, 
 LPARAM lParam1, LPARAM lParam2);
}
static void ResizeWindowToFit(HWND hWnd);
static void CopySystemPalette(LPRGBQUAD pColors);
static BOOL Is256ColorDisplay();
static BOOL InstallDrawHandler(HWND hMCIWnd);
static LRESULT HandleDriverOpen(ICOPEN FAR *pOp);
static LRESULT HandleDriverClose(DrawInfo *pDraw);

int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInst, 
 LPSTR pCmdLine, int cmdShow)
{
 if(Is256ColorDisplay())
 DialogBox(hInstanceG = hInstance,"AVISEE",0,DlgProc); 
 else
 MessageBox(0,"This program requires a 256 color display", "AVISEE",MB_OK);
 return 0;
}

// Dialog Procedure
BOOL CALLBACK DlgProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)

{
 switch(msg)
 {
 case WM_INITDIALOG:
 // Create the video window.
 hMCIWndG = MCIWndCreate(hWnd,hInstanceG,
 WS_CHILD WS_VISIBLE MCIWNDF_NOTIFYSIZE 
 MCIWNDF_NOTIFYMEDIA,0);
 ResizeWindowToFit(hWnd);
 return TRUE;
 case WM_CLOSE:
 EndDialog(hWnd,0);
 return TRUE;
 case WM_PALETTECHANGED:
 case WM_QUERYNEWPALETTE:
 // Pass on palette messages.
 SendMessage(hMCIWndG,msg,wParam,lParam);
 return TRUE;
 case MCIWNDM_NOTIFYSIZE:
 ResizeWindowToFit(hWnd);
 return TRUE;
 case MCIWNDM_NOTIFYMEDIA:
 InstallDrawHandler((HWND)wParam);
 return TRUE;
 }
 return FALSE;
}
static void ResizeWindowToFit(HWND hWnd)
{
 RECT rect;
 GetWindowRect(hMCIWndG,&rect);
 AdjustWindowRect(&rect,GetWindowLong(hWnd,GWL_STYLE),FALSE);
 SetWindowPos(hWnd,0,0,0,rect.right-rect.left,rect.bottom-rect.top,
 SWP_NOMOVE SWP_NOZORDER);
}
static void CopySystemPalette(LPRGBQUAD pColors)
{
 PALETTEENTRY aPal[256];
 HDC hDC = GetDC(0);
 GetSystemPaletteEntries(hDC,0,256,aPal);
 // Unfortuanately RGBQUAD and PALETTEENTRY have the colors in the
 // opposite order so we have to copy them one by one.
 for(int i=0; i<256; i++)
 {
 pColors[i].rgbRed = aPal[i].peRed;
 pColors[i].rgbGreen = aPal[i].peGreen;
 pColors[i].rgbBlue = aPal[i].peBlue;
 pColors[i].rgbReserved = 0;
 }
 ReleaseDC(0,hDC);
}
static BOOL Is256ColorDisplay()
{
 BOOL ok = TRUE;
 HDC hDC = GetDC(0); // Get DC for desktop window.
 // Check it is a palettized display.
 if(GetDeviceCaps(hDC,RASTERCAPS) & RC_PALETTE==0)
 ok = FALSE;
 // Check it is 256 colors (8 bits per pixel).

 if(GetDeviceCaps(hDC,BITSPIXEL)*GetDeviceCaps(hDC,PLANES)!=8)
 ok = FALSE;
 ReleaseDC(0,hDC);
 return ok;
}
static BOOL InstallDrawHandler(HWND hMCIWnd)
{
 BOOL ok = TRUE;
 MCI_DGV_SETVIDEO_PARMS parms;

 // We may be called before we MCIWndCreate has returned and so the
 // MCI window handler will not have been assigned to hMCIWndG.
 if(!hMCIWndG)
 hMCIWndG = hMCIWnd;
 // If we haven't create the instance thunk then do so.
 if (!pDrawHandlerThunkG) 
 pDrawHandlerThunkG = MakeProcInstance((FARPROC)DrawHandler,hInstanceG);
 parms.dwValue = (DWORD)pDrawHandlerThunkG;
 parms.dwItem = MCI_AVI_SETVIDEO_DRAW_PROCEDURE;

 // MCIWnd does not provide a function for installing a draw handler
 // so we get the MCI device ID and set it the MCI_SETVIDEO window.
 UINT deviceID = MCIWndGetDeviceID(hMCIWndG);
 if(deviceID)
 {
 mciSendCommand(deviceID,MCI_SETVIDEO,
 MCI_DGV_SETVIDEO_ITEM MCI_DGV_SETVIDEO_VALUE,
 (DWORD) (MCI_DGV_SETVIDEO_PARMS FAR*)&parms);
 }
 return ok;
}
// The Draw Handler
LRESULT CALLBACK __export DrawHandler(DWORD id, HDRVR hDriver, UINT msg, 
 LPARAM lParam1, LPARAM lParam2)
{
 DrawInfo *pDraw = (DrawInfo*)id;
 switch (msg)
 {
 // Many of the driver messages require no processing so we
 // will get them out of the way first.
 case DRV_LOAD:
 case DRV_FREE:
 case DRV_DISABLE:
 case DRV_ENABLE:
 case DRV_INSTALL:
 case DRV_REMOVE:
 case DRV_CONFIGURE:
 return 1;
 case DRV_QUERYCONFIGURE:
 case ICM_GETSTATE:
 case ICM_SETSTATE:
 return 0;
 case ICM_CONFIGURE:
 case ICM_ABOUT:
 return ICERR_UNSUPPORTED;
 // Open and close we need to handle - this is where we allocate
 // and free our private data structure.
 case DRV_OPEN:
 return (lParam2) ? HandleDriverOpen((ICOPEN FAR *)lParam2):1;

 case DRV_CLOSE:
 return HandleDriverClose(pDraw);
 // Code for drawing.
 case ICM_DRAW_BEGIN:
 return pDraw ? pDraw->Begin((ICDRAWBEGIN FAR *)lParam1) 
 : ICERR_UNSUPPORTED;
 case ICM_DRAW:
 return pDraw ? pDraw->Draw((ICDRAW FAR *)lParam1) : ICERR_UNSUPPORTED;
 case ICM_DRAW_END:
 return pDraw ? pDraw->End() : ICERR_UNSUPPORTED ;
 case ICM_GETINFO:
 return ICERR_UNSUPPORTED;
 case ICM_DRAW_QUERY:
 return (pDraw && pDraw->CanHandleFormat((LPBITMAPINFOHEADER)lParam1)) 
 ? ICERR_OK : ICERR_BADFORMAT;
 case ICM_DRAW_SUGGESTFORMAT:
 return pDraw ? pDraw->SuggestFormat((ICDRAWSUGGEST FAR *)lParam1) 
 : ICERR_UNSUPPORTED;
 case ICM_DRAW_REALIZE:
 return pDraw ? pDraw->Realize((HDC)lParam1,(BOOL)lParam2) 
 : ICERR_UNSUPPORTED;
 case ICM_DRAW_GET_PALETTE:
 return pDraw ? pDraw->GetPalette() : ICERR_UNSUPPORTED;
 case ICM_DRAW_CHANGEPALETTE:
 return pDraw ? pDraw->ChangePalette((LPBITMAPINFOHEADER)lParam1) 
 : ICERR_UNSUPPORTED;
 }
 if (msg < DRV_USER)
 // Send all other standard installable driver messages for
 // default processing.
 return DefDriverProc(id,hDriver,msg,lParam1,lParam2);
 else
 // Anything else we don't support
 return ICERR_UNSUPPORTED;
}
static LRESULT HandleDriverOpen(ICOPEN FAR *pOpen)
{
 LRESULT retVal = 0L;
 if(pOpen)
 {
 // We only accept video streams and we do not
 // handle compression and decompression.
 if (pOpen->fccType == streamtypeVIDEO &&
 pOpen->dwFlags != ICMODE_COMPRESS &&
 pOpen->dwFlags != ICMODE_DECOMPRESS)
 {
 // Allocate a private structure for storing information.
 DrawInfo *pDraw = new DrawInfo;
 if(pDraw)
 {
 pOpen->dwError = ICERR_OK;
 retVal = (LRESULT)(DrawInfo FAR *)pDraw;
 }
 else
 pOpen->dwError = ICERR_MEMORY;
 }
 }
 return retVal; 
}

static LRESULT HandleDriverClose(DrawInfo *pDraw)
{
 delete pDraw; // Destructor tidys up.
 return 1;
}
// Methods for class DrawInfo
DrawInfo::DrawInfo():
 pBuffer_(0),
 captionX_(0),
 captionY_(0),
 hWinGDC_(0),
 hWinGBitmap_(0)
{
 hDD_ = DrawDibOpen();
 hWinGDC_ = WinGCreateDC();
 wsprintf(aCaption_,"Hello world");
}
DrawInfo::~DrawInfo()
{
 // Free any resources we still have.
 if(hDD_) 
 DrawDibClose(hDD_);
 if(hWinGDC_ && hWinGBitmap_)
 DeleteObject(SelectObject(hWinGDC_,(HGDIOBJ)hOldBitmap_));
 if(hWinGDC_)
 DeleteDC(hWinGDC_);
}
LRESULT DrawInfo::Begin(ICDRAWBEGIN FAR *pBegin)
{
 struct 
 {
 BITMAPINFOHEADER infoHeader;
 RGBQUAD colorTable[256];
 } infoHeader;

 if(CanHandleFormat(pBegin->lpbi))
 {
 // We may be called several times without a corresponding call to 
 // several times so must delete the WinG bitmap if it already exists.
 if(hWinGBitmap_)
 {
 DeleteObject(SelectObject(hWinGDC_,(HGDIOBJ)hOldBitmap_));
 hWinGBitmap_ =0;
 DrawDibEnd(hDD_);
 }
 hDC_ = pBegin->hdc;
 xDst_ = pBegin->xDst; yDst_ = pBegin->yDst;
 dxDst_ = pBegin->dxDst; dyDst_ = pBegin->dyDst;
 xSrc_ = pBegin->xSrc; ySrc_ = pBegin->ySrc;
 dxSrc_ = pBegin->dxSrc; dySrc_ = pBegin->dySrc;
 captionY_ = pBegin->dyDst/2;
 windowWidth_ = pBegin->dxDst;

 SetStretchBltMode(hDC_,COLORONCOLOR);

 if (DrawDibBegin(hDD_,hDC_,dxDst_,dyDst_,pBegin->lpbi,dxSrc_,dySrc_,0))
 {
 hmemcpy(&infoHeader,pBegin->lpbi,sizeof(BITMAPINFOHEADER));
 // Get the system palette entries.

 CopySystemPalette(infoHeader.colorTable);
 // Create the WinG bitmap.
 hWinGBitmap_ = 
 WinGCreateBitmap(hWinGDC_,(LPBITMAPINFO)&infoHeader,&pBuffer_);
 if(hWinGBitmap_ && pBuffer_)
 {
 // Select the WinG bitmap into the WinG device context.
 hOldBitmap_ = 
 (HBITMAP)SelectObject(hWinGDC_,(HGDIOBJ)hWinGBitmap_);
 return ICERR_OK;
 }
 else
 return ICERR_MEMORY;
 }
 else
 return ICERR_UNSUPPORTED;
 }
 else
 return ICERR_BADFORMAT;
}
LRESULT DrawInfo::Draw(ICDRAW FAR *pDrawStruct)
{
 UINT wFlags;
 wFlags = DDF_SAME_HDC;
 if ((pDrawStruct->dwFlags & ICDRAW_NULLFRAME) 
 pDrawStruct->lpData == NULL) 
 {
 if(pDrawStruct->dwFlags & ICDRAW_UPDATE)
 wFlags = DDF_UPDATE;
 else
 return ICERR_OK;
 }
 if (pDrawStruct->dwFlags & ICDRAW_PREROLL)
 wFlags = DDF_DONTDRAW;
 if (pDrawStruct->dwFlags & ICDRAW_HURRYUP)
 wFlags = DDF_HURRYUP;
 // Compose the DIB in the WinG bitmap.
 ComposeFrame((LPBITMAPINFOHEADER)pDrawStruct->lpFormat,pDrawStruct->lpData);
 // Blt the WinG bitmap to the screen.
 if (!DrawDibDraw(hDD_,hDC_,xDst_,yDst_,dxDst_,dyDst_,
 (LPBITMAPINFOHEADER)pDrawStruct->lpFormat,
 pBuffer_,xSrc_, ySrc_,dxSrc_, dySrc_,wFlags)) 
 {
 if (wFlags & DDF_UPDATE)
 return ICERR_CANTUPDATE;
 else
 return ICERR_UNSUPPORTED;
 }
 return ICERR_OK;
}
void DrawInfo::ComposeFrame(LPBITMAPINFOHEADER pInfoHeader, LPVOID pImageBits)
{
 if(pBuffer_)
 {
 // Copy the bitmap we are given into the WinG bitmap.
 hmemcpy(pBuffer_,pImageBits,pInfoHeader->biSizeImage);
 SetBkMode(hWinGDC_,TRANSPARENT);
 SetTextColor(hWinGDC_,RGB(255,0,0));
 // Draw 'Hello World' on top.

 TextOut(hWinGDC_,captionX_,captionY_,aCaption_,lstrlen(aCaption_));
 // Update the position to draw the text - causes scrolling.
 captionX_ = (captionX_+1)%windowWidth_;
 } 
}
LRESULT DrawInfo::End()
{
 return ICERR_OK;
}
LRESULT DrawInfo::GetPalette()
{
 return (LRESULT)(UINT)DrawDibGetPalette(hDD_);
}
LRESULT DrawInfo::ChangePalette(LPBITMAPINFOHEADER pInfoHeader)
{
 PALETTEENTRY aPalette[256];
 LPRGBQUAD pColors = (LPRGBQUAD)((LPBYTE)pInfoHeader + pInfoHeader->biSize);
 // That annoying RGB ordering problem again.
 for (int i=0; i<(int)pInfoHeader->biClrUsed; i++) 
 {
 aPalette[i].peRed = pColors[i].rgbRed;
 aPalette[i].peGreen = pColors[i].rgbGreen;
 aPalette[i].peBlue = pColors[i].rgbBlue;
 aPalette[i].peFlags = 0;
 }
 DrawDibChangePalette(hDD_,0,(int)pInfoHeader->biClrUsed,aPalette);
 return ICERR_OK;
}
LRESULT DrawInfo::Realize(HDC hDC, BOOL background)
{
 hDC_ = hDC;
 return (hDC_ && hDD_) ? DrawDibRealize(hDD_, hDC_,background) 
 : ICERR_UNSUPPORTED;
}
BOOL DrawInfo::CanHandleFormat(LPBITMAPINFOHEADER pInfoHeader)
{
 return (pInfoHeader && pInfoHeader->biCompression == BI_RGB && 
 (pInfoHeader->biPlanes*pInfoHeader->biBitCount==8)) 
 ? TRUE : FALSE;
}
LRESULT DrawInfo::SuggestFormat(ICDRAWSUGGEST FAR *pSuggest)
{
 if (pSuggest->lpbiSuggest == NULL)
 return sizeof(BITMAPINFOHEADER) + 256 * sizeof(RGBQUAD);
 // We only want 8 bits-per-pixel uncompressed RGB DIBs.
 pSuggest->lpbiSuggest->biCompression = BI_RGB;
 pSuggest->lpbiSuggest->biPlanes = 1;
 pSuggest->lpbiSuggest->biBitCount = 8;
 return sizeof(BITMAPINFOHEADER) + 
 pSuggest->lpbiSuggest->biClrUsed * sizeof(RGBQUAD);
}












Music and Sound for Interactive Games


Enhancing the power of your software




John W. Ratcliff


John is a graphic artist, designer, and programmer whose credits include
computer games such as 688 Attack Sub and SSN-21 Seawolf from Electronic Arts.
You can contact him on CompuServe at 70253,3237 or on his BBS at 314-939-0200.


Music and sound effects are the most powerful tools available for you to
emotionally impact users. Without them, users don't know how to "feel." Music
and sound help users understand context in your software. The shower scene in
Psycho would be meaningless without the accompanying music. How would you know
when to be scared in a horror film? On the edge of your seat in a suspense
movie? Near tears at the end of Old Yeller? What would make you leap out of
your seat screaming as a dinosaur rips through a Land Rover or an alien
monster comes tearing down a hallway without the accompanying sound and music?
While we know this intuitively, you can still test this for yourself in a
literal fashion. Rent a videotape of Terminator II, Conan the Barbarian, Star
Wars, Aliens, or Jurassic Park. Every time you hear your heart racing during
the film, close your eyes and think very hard about what you are hearing.
Listen to how the primary melody of the film's music score is interwoven and
allowed to build and evolve during different portions of the film. During a
really loud action sequence, turn the volume off. You will feel the tension of
the situation vanish as if you had closed the spigot to a faucet of rushing
water. 
Nothing is done with sound and music in film that we would not want to emulate
with software--with one exception. In computer games, we want the sound
effects, dialogue, foley, and music to be both interactive and contextual to
the environment. By adding interactive elements to the soundtrack, the
emotional content becomes magnified. A sound track has four major components:
dialogue, sound effects, foley, and music. Let's take a brief look at each and
examine how you should apply them to your game.
Dialogue. Most of the dialogue you hear in movies has been redone in a studio
after the scene was shot. This allows actors to focus on how they sound and
lets the sound designer control the exact balance of the audio in the finished
sound track. Clearly, it is important that all of your dialogue be done
professionally in a recording studio. However, unless you have been given a
huge budget, you probably can't afford Hollywood actors and an expensive
studio. The alternative is sound engineers who offer full audio services.
These professionals not only provide you with composing services but also
provide voice actors, custom digital sound effects, and mixing. They can even
deliver the audio in computer-data format at the resolution you need,
customized for the various hardware platforms. 
Sound effects. While many sound-effects libraries are available, some games
call for custom effects. Common sense suggests you use clip sound libraries
where you can, but go to a professional sound engineer to get effects that
exactly match the content of your game.
Foley. Foley effects are ambient, supporting sound effects that you don't even
notice when you watch a film--but you would notice them if they weren't there.
Foley effects include footsteps, cars, wind, birds, or any other environmental
sounds that support the content. In Hollywood, every single footstep and
rustle of fabric is added to the sound track and synchronized frame by frame
to the film as a post-production process. Foley effects create a greater sense
of "virtual reality" than the most exacting computer graphics. They are
greatly enhanced if used with special processing like Qsound, reverb, and
other digital-signal processing effects. Reverb is a technique where sounds
are fed through a signal-processing phase to approximate the echo and
reflections found in a real environment. For example, on the Creative Labs
AWE32 sound card you can program the exact characteristics of the shape of a
room through MIDI events. Instantly, all foley effects will sound as though
there were occurring in a room of that shape and size. 
Foley and digital sound effects are the most highly interactive tools you can
apply to your sound track. With foley, you let the user hear footsteps,
gunshots, growls of a monster around a corner, wind blowing, birds chirping,
and street noise. As long as these sounds are in real time, contextual to
where they are in your virtual reality, they will draw the user very deeply
into the world you have created. This magnification of the virtual-reality
experience through the use of interactive sound effects overpowers goggles,
gloves, head-tracking devices, or any of the other virtual-reality gadgetry
out there. Removing the soundtrack plunges the user back into the days of
silent movies.
Music. In film, the musical score unfolds in a linear fashion. The composer
knows exactly the amount of time required to build up to that great suspense
scene. But in interactive games, the suspense scene is unknown--it depends on
when the user opens the door marked "Pit From Hell." While some games simply
score a different song for each level, providing almost no interactivity,
others branch in and out of MIDI sequences to create a seamless transitions.
Some have even attempted algorithmic music, which is actually created in real
time by the computer. 
Probably the best middle-ground approach is to come up with all possible
variations of emotion you wish to communicate in the product, and then have
your composer score as if it were for film. Your composer should provide
branching points into and out of these sequences to communicate the emotional
context in pseudo real time. These branches will not be instantaneous, but
will model the underlying context of the game state very closely, such that
when you enter a danger or suspense state, the music will branch to reflect
that emotion. 
Another approach is to simply use the music to communicate the base ambiance
for the current level, and make heavy use of interactive foley, dialogue, and
sound effects to communicate the action. Obviously, gunshots, explosions, and
screams of terror will convey that information to the user very well. 
For years, PC developers have had to settle for audio devices that could do
little more then beep, warble, and belch. The only emotional reaction we could
elicit from the user was a deep desire to find the "turn music off" button.
The first generation of sound cards wasn't much of an improvement. Although
newer sound cards, such as the Adlib Personal Music System, did allow us to
add important interactive audio cues to a game, they had limited emotional
range. The fundamental weakness inherent in a cheesy FM synthesis device
allowed our orchestrations to carry about as much emotional content as
grade-schooler's FlutoFone.
With the proliferation of CD-ROM, digital sound cards, and wave-table
synthesis MIDI devices, the situation has improved dramatically. Now we can
use sound and music in ways that contain more emotional content than a Steven
Spielberg movie which, compelling as it is, is a passive experience. We watch
the dinosaur attack the Land Rover, but we have no control over the situation.
In an interactive game, we are afforded the opportunity to try to get away
from the dinosaur. As we attempt to escape the vicious beast, the music and
sound effects communicate that emotional distress in direct correlation to our
own actions. This results in a heightened sense of awareness that only an
interactive environment can bring. 
One of the best examples of interactive digital sound in a gaming environment
is id Software's DOOM. How many of you have jumped back in your chair when you
heard the eerie "growls" and "snorts" of a monster somewhere around a corner?
Although you didn't see the monster, simply hearing it precipitated an
emotional response so strong that when the beast lurches out and you cut it
down in a hail of bullets, you feel a much greater sense of accomplishment.
These kinds of subtle audio cues allow you to orchestrate the emotional
response in the user. Done properly, this effect will bring the game player
much deeper into the environment you are trying to create.
At this time I should sound a note of warning: While good use of sound and
music can greatly enhance your software, it is easy to do it wrong. Sound and
music that are of poor quality or that don't support the emotional direction
of your product are a waste of time, money, and disk space. Bad or
unprofessional production values, while they may not destroy a product, will
leave the user with an overall poor impression, regardless of how well done
the rest of elements might be.
Here are some suggestions of how you can make the sound and music in your game
as effective as possible:
Use professional sound effects. Either hire a sound-effects specialist or be
extraordinarily choosy about utilizing clip sounds. Do not steal your sound
effects from movies, records, or television; this is copyright violation. Your
software will not be accepted by a publisher, and you may even get sued. Just
because you pull a great Star Trek sound effect off of a BBS doesn't mean you
have rights to use it. 
Use professional music. Either hire an interactive-media composer or use
high-quality music clips that fit your project. Remember, just as you wouldn't
hire a bass player to play the saxophone, you should be aware that the talent
to compose for MIDI and interactive environments is unique. Being a great
musician and providing quality MIDI composition are not one and the same.
Production values need to be high, and squeezing quality out of limited music
devices is a talent your composer will need.
Make certain that all of the music supports the emotional content, theme, and
direction of the game at any given time. Think about the interactive nature of
your music and how you want it to shift in context, according to game play.
Look at games and movies of a similar genre. Every time you watch a movie, try
to be aware of how your emotions are manipulated by the music and sound
effects. 
Listen to your composer. Look for a composer who has an established track
record composing for the target hardware and is familiar with interactive
media. Communicate very strongly to your composer exactly what you want. Give
your composer specific music, either from CD or film, that matches the
emotional content you want to communicate. 
Effective use of sound and music in interactive games makes the difference
between "experiencing" and merely "playing" them. 


Types of Audio


The following are several ways to implement audio on the PC architecture:
Digital sound. Ever since the release of the Creative Labs' SoundBlaster, the
PC architecture has had a solid platform for implementing digital sound. Other
entries, including the Covox Speech Thing, the Walt Disney Sound Source, and
the MediaVision ProAudio Spectrum card, all provide this capability. With
digital sound, your program can play back anything that can be recorded with a
microphone--digital-sound effects, human speech, music, and the like. Digital
sound requires enormous amounts of memory and disk storage, but it has still
been used very effectively as a method for delivering sound effects and
voice-recorded responses.
FM synthesis. The earliest popular PC sound card was the Adlib Personal Music
System, which contained a Yamaha YM3812 (OPL2) FM synthesis chip. This device
can create waveforms by using oscillators that allow you to apply frequency
modulation and attack, sustain, decay, and release operators to a
semiprogrammable waveform, including a white-noise generator. If this sounds
complicated, it is! This device is phenomenally difficult to program, and even
your best programming efforts sound pretty lame. Fortunately, the importance
of FM synthesis is declining in the wake of the new generation of General MIDI
wave-table synthesis devices. A number of systems allow the YM3812 to emulate
a MIDI device, thus saving you from having to deal with its arcane nature.
MIDI. The Musical Instrument Digital Interface (MIDI) specification is an
internationally supported, de facto standard that defines a serial interface
for connections between music synthesizers, musical instruments, and
computers. MIDI, which is maintained by the MIDI Manufacturers Association
(Los Angeles, CA), is based both on hardware (I/O channels, cables, and the
like) and software (encoded messages defining device, pitch, volume, and so
forth). According to the specification, the receiving device in a MIDI system
interprets the musical data even though the sending device has no way of
knowing what the receiver can do. But this can be a problem if the receiving
device doesn't have the capability to interpret the data correctly. General
MIDI addresses this problem by identifying hardware capabilities in advance.
All general-MIDI devices have 128 sound effects as well as musical-instrument
and percussion sounds. General-MIDI systems support simultaneous use of 16
MIDI channels with a minimum of 24 notes each, and they have a specified set
of music controllers. This means that with general MIDI, the sender knows what
to expect of the receiver. Consequently, a file created with one general-MIDI
device is recognizable when played on any other--without losing notes or
changing instrumental balance. 
General-MIDI synthesizers available include the Roland Sound Canvas, the
Roland RAP-10, the Creative Labs Waveblaster, the Logitech SoundWave, the
Ensoniq SoundScape, the Gravis Ultrasound, the Turtle Beach Multisound, the
Turtle Beach Maui card, and the Sierra Semiconductor Aria card. Additionally,
general-MIDI emulation is available for FM-synthesis devices such as the
SoundBlaster via third-party developer toolkits like MIDPAK or the Audio
Interface Library.
The future of interactive music appears to be the general-MIDI platform, which
allows you to hire a composer to create fully orchestrated scores that will
play back at high quality on a large installed base of sound cards. MIDI data
streams are small, have a relatively low interrupt rate, and require low CPU
bandwidth.
CD RedBook audio. One benefit of CD-ROM drives is that they can play standard
CD audio tracks. However, you cannot have your software run from the CD and
play music at the same time. Access to the data and audio portions of the CD
is mutually exclusive, and you cannot switch CD audio tracks instantaneously
to achieve any semblance of interactivity or smooth transition. However, many
developers find benefits in placing portions of their music score on the CD as
an audio track, and you may find some uses for it in your design. 
Software digital mixing. With the exception of the Gravis Ultrasound and the
Creative Labs AWE32, almost every sound card on the market supports only a
single channel of digital audio. In the context of an interactive environment,
you want to play many sound effects at once. The way to do this is to
implement a software-based digital mixer. Since sound is additive, this is
pretty simple: Take all the sounds playing at any given time, add them
together into a buffer, clip for overflow, and pass that buffer off to the
sound card. A number of development packages support software-based digital
mixing in their API specification.
Customized, downloadable patches. On the Gravis Ultrasound and the Creative
Labs AWE32, an application can download musical instruments or digital sound
effects into memory on the sound card itself. Once on the card, you can
trigger these sounds simply by issuing a MIDI event. This is a very powerful
concept because not only do you get multichannel support, customized
instruments, and a lower burden on both system RAM and CPU, but you can also
manipulate those sound effects in real time using pitch shifting, pan-pot
controls, and even chorus and reverb effects. 
MOD files. MOD files are a proprietary music-file format originally developed
for the Commodore Amiga. MOD files effectively create a specification for
software-based wave-table synthesis. A MOD file contains multiple channels of
music, as well as the actual wave files used to perform each instrument. A
software interpreter digitally mixes and frequency shifts each sound effect in
real time to produce a single, digital-audio stream to be sent to the sound
card.
MOD files sound great on all sound cards. They don't require MIDI devices--any
system with a digital channel will get the same high-quality music. However,
MOD files lack adequate authoring tools and have huge memory requirements for
quality files. MOD files used to be a major CPU burden due to the overhead of
a multichannel digital mixer and interpreter, but today's PCs have a great
deal more processor power, and recent MOD interpreters are extremely
efficient.


Conclusion



Game developers are fortunate to be able to draw upon the cumulative
experience of composers ranging from Mozart, John Williams, and Basil
Poledouris, to the Beatles, Pink Floyd, and the Benedictine Monks of Santo
Domingo De Silos. We can also leverage the expertise of third-party audio
vendors who specialize in the mechanics of programming sound devices at the
hardware level. Several systems exist that relieve you of this burden and
allow you to focus on the sound and music you want to deliver.
Digital Sound Engineering for Game Development
Rob Wallace
Rob, who is executive producer of Wallace Music & Sound, can be contacted at
WallMus@ix.netcom.
Back in 1990, I decided to expand my music services to include sound effects
and voice tracks for game developers and publishers. I was experienced in
creating analog foley sound, voice tracks, and sound effects for radio, TV,
and film production, but I discovered that translating analog-audio
engineering skills to the digital domain created some unexpected challenges.
Here, I'll present techniques and recommend tools which should enable you to
make your waveforms the best they can be. 
To create and edit professional sounds for computer games, you'll first need
the right equipment for waveform production. This equipment includes a sizable
hard drive (750 Mbytes, minimum), off-line storage and shipment devices (QIC
80 drive, Syquest 270, or 2/4/8 gigabyte DAT drive), and a commercial-quality,
16-bit sound card (like the Turtle Beach Rio).
For high-resolution applications such as Redbook Audio, you'll need a
commercial stereo compressor/limiter (I use a dbx 166) and a graphic equalizer
with a minimum of 12 dB suppression/ attenuation. For low-resolution sound
effects and voice tracks, I suggest the Alesis 3630 stereo compressor/limiter.
You'll also want an analog mixer, such as the Mackie 1202, along with amps,
connectors, and speakers.
As for software, you'll need one or more sound-effects libraries. I use the
Sound Ideas Libraries, Hollywood Edge Cartoon Trax Library, and my own
collection of foley and sound effects acquired over the years. Lastly, you'll
want a waveform creator and editor. After working with all of the Macintosh-
and PC-based toolkits, I've settled on Sound Forge 3.0, from Sonic Foundry,
which reads/writes standard audio file formats, converts one format to
another, changes sampling rates and bit depths, and synthesizes MIDI files
into .WAV files. It also lets you capture sounds through sound boards or
samples from external synthesizers. The program includes all standard
audio-control features, including chorus, compress, double, echo, filter,
limit, and stretch. 
The inherent noise made by aliasing and downsampling waveforms demands
effective algorithms and equalization and signal-compression techniques in
order to achieve acceptable results. Nyquist's Theorem states that the highest
frequency you can record is equal to 1/2 the highest value of the resolution
of the waveform. At 11.025 kHz, the best high-end response you'll achieve is
5512.5 Hz. Since the bell curve of the audible frequencies of a human voice
average between 1000 and 3000 Hz, you'd think that making waveforms of the
human voice would be easy. Not so, because you first have to filter out all
frequencies above 5512.5 Hz if you're going to make 11.025-kHz resolution
waveforms. Also, by its nature, aliasing introduces undesirable parallel
frequencies into the waveform. Aliasing is analogous to filming a whirling
helicopter blade at 90 frames per second. The visual effect looks like the
blade is moving in reverse, or has a "chunky" look--quite different than when
viewing the blade live.
To prevent aliasing and produce the clearest, most-aesthetic low-resolution
8-bit digital waveform, you severely notch out the frequencies above the
Nyquist-theorem number. The depth of the notching in dB that you apply depends
on the complexity, timbre, and harmonics inherent in the original sound. This
equalization must occur prior to digitizing the sound.
Once the equalization is applied, you digitize the sound at the bit depth and
frequency rate needed. Listen to the playback carefully. If the results sound
hollow or booming, you have applied too many dB of equalization suppression or
notched too many frequencies above the Nyquist frequency. Here you begin to
learn, in depth, the craft of creating usable waveforms. The guiding principle
is: The lower the bit depth and sampling rate, the tougher it is to achieve an
acceptable sound. The timbre, complexity, harmonics, original sample quality,
and dynamic range of the sound you are digitizing will also influence the end
result.
Experimenting with different levels of suppression or attenuation of
frequencies will give you a sense of the most usable sample and will help you
to make educated judgments when making new samples. The challenge becomes
harder depending upon how low the bit depth and sampling frequency plunges. 
A sample waveform can be further manipulated with digital signal processing
(DSP). You can apply DSP to the sound using external hardware (like a Yamaha
SPX 900) before the signal reaches the analog to digital converter (ADC) on
your sound card. DSP can be algorithmically applied directly to the digitized
waveform after it is created. For game development, I recommend the
algorithmic approach. Unless you have a high-end professional-audio DSP unit
(like the Lexicon), it is easy to introduce noise that becomes intolerable
when aliased. This is particularly noticeable when creating pitch-change DSP.
The only problem with algorithmic DSP is that processing time can become
lengthy when your samples contain massive amounts of data (800K or greater).
With the Lexicon, the DSP is done in real time, so you get to hear and tweak
the effect prior to digitizing. 
Delay, reverb, chorusing, flanging, noise gating, distortion, pitch change,
and amplitude modulation are common DSP effects that enhance sound, voice, and
musical waveforms. In game applications the waveform must be properly
compressed. I have learned that digital samples in games need to be fat and
almost always at maximum amplitude. You minimize inherent hiss in
low-resolution application. Because the sound-wave data is as loud as it can
be, you are making all the possible qualities of the sound available to the
user by reducing the dynamic range of the effect. 
Compression is best applied before digitizing. For game applications, the
Alesis 3630 produces killer waveforms. You simply set compression to just
under the highest ratio it will compress, then ensure that the output volume
peaks around 0 dB and that input on the sound card matches the 0 dB level of
the compressor output. To do this, get a Shure tone generator (model A15TG),
which will produce a constant tone so you can balance and create a gain
structure. 
Always test your compression by making a sample and looking at the waveform to
see that it is fat and peaks, even flattens a bit, at the top. Then listen for
digital distortion, which appears as a crackling pop, or a hard-edged scratch
resonance. To cure this, lower the output volume of the compressor or increase
the compression ratio (or both). It is still possible to get some dynamic
range in low-resolution applications. A cricket chirp loop doesn't need to be
at maximum amplitude or even compressed very much because it is a subtle
ambient effect--but voice tracks need to be fat and maximized. 
It is now possible to alter pitch and compress or expand the same waveform in
time. This means that you can make one actor sound like a different person by
changing his delivery characteristics. By applying delay, flange, and
chorusing, your own voice can be used to create sounds for horrific space
beasts and demons of every description. 
Nothing beats the actual experience of creating sound effects for a game and
then hearing them while you run the application. This is the acid test for
your sound design. You may have to go back and change or recreate sounds, but
when it all comes together and works, the effect is dazzling.













































Attached Sprites


An efficient method for sprite animation




Diana Gruber


Diana is senior programmer at Ted Gruber Software, publishers of the Fastgraph
programmers' graphics library, and author of the book Action Arcade Adventure
Set (Coriolis Group Books, 1994). Diana can be contacted at Fastgraph@aol.com.


Game programmers are always looking for efficient ways to accomplish sprite
animation. As the science of game programming evolves, certain techniques,
such as the use of "attached" sprites, have become standard in many games.
Consider the case of a dogfight like that in the Quickfire demo in Figure 1. A
player-controlled airplane confronts one or more enemy airplanes and shoots
bullets at them; when an airplane is hit, it explodes and dies. To achieve a
pleasing visual effect, the airplane does not vanish immediately, but rather
catches fire and gradually becomes engulfed in flames before dissolving into a
puff of smoke.
Data structures are used to keep track of airplanes as they move across the
scrolling background. Each data structure holds an airplane's current x- and
y-position, its speed, a pointer to the function that controls its action
(called the "action function"), and a pointer to a bitmap which describes the
current image of the airplane.
An interesting thing happens when an airplane explodes. To achieve the
explosion effect, a single sprite must become two sprites--one, the airplane,
which remains unchanged; and the other, the explosion, which starts as a small
ball of fire in the nose of the airplane, then grows over the course of
several frames to a large, smoky fireball that eventually covers the entire
airplane. Finally, the airplane disappears as the fireball covers it, and all
that is left is the fireball, which dissipates and eventually disappears as
well.
The motion of the fireball depends on the motion of the airplane, which has
random elements and therefore cannot easily be predicted. It is important that
the fireball sprite knows where the airplane sprite is. If two airplanes are
exploding at the same time, it is also necessary to match each explosion with
the proper airplane. Therefore, you need a mechanism to pass information from
the airplane to the explosion.
Similarly, the airplane needs to get information from the explosion. In
particular, the airplane needs to know how big the fireball is, so it will
know when it is time to disappear.
The easiest way to pass information between the airplane and the explosion is
to use an attached sprite. The data structures of both sprites attach
themselves to each other by simply using a structure member to point to each
other. The action of one sprite is then conveniently influenced by the status
of the other sprite.
All the objects--airplanes, explosions, and bullets--are stored in a linked
list. Because of the nature of the game, nodes are constantly being added to
and removed from the list. This happens when bullets go off the edge of the
screen, enemies are killed, and so on. An object and its attached sprite may
exist anywhere in the list, as in Figure 2.
Notice that only the enemy that is currently exploding has an attached sprite
and that it is possible for the player to have an attached sprite. When the
player is hit, a small fireball appears, even though the player does not die
from the strike. When there is no explosion, the attached sprite is set to
NULL.
The OBJstruct structure holds the information about each sprite object,
including airplanes, bullets, and explosions. In this structure (see Example
1), the first two members are the pointers to adjacent nodes in the linked
list. The next two members, x and y, specify the current position of the
sprite. These values change in each frame according to the speed and direction
of the sprite, which are described in the next five members. The frame member
describes something about the animation: whether the plane is upright or
turning, for example. The next four elements specify the tile extents and are
used to determine when an object has moved off the edge of the screen. 
The image member is a pointer to the object's bitmap data, which is the actual
physical representation of the sprite. This is stored in the SPRITE structure,
as in Example 2. This structure holds all the information necessary to display
the sprite, including its width and height, and the offset values. The offsets
are used to adjust the position of the sprite and are especially useful with
explosions, which need to be centered around their midpoint rather than
displayed from a corner.
The action member of the object structure is a pointer to a function, such as
the do_explosion() function in Listing One . This function is an action
function and is executed once each frame. It determines the current state of
the object, such as going, falling, or dying.
The final member of the object structure, OBJp attached_sprite, is a pointer
to another object structure; in other words, the pointer to the attached
sprite. The pointer is always bidirectional, meaning the object points back to
whichever object points to it.
The code that controls the creation of the explosion is shown in the function
start_explosion(); see Listing One. As you can see, this function spawns an
object and adds it to the linked list in the traditional way. It also forms
the attachment between the airplane sprite (objp) and the explosion sprite
(node); see Example 3 and Figure 3. 
The motion of the explosion is controlled by the function do_explosion(). This
function first examines the current state of the explosion. If the state of
the animation has reached the third frame, the explosion is big enough to
cover the airplane. At that point, it is time to kill the airplane by setting
its action function to kill_enemy(), as in Example 4.
After the enemy airplane has been killed, the attached sprite is set to NULL,
indicating there is no longer an airplane attached to the explosion. The
explosion may now move independently for the next few frames, until it also is
killed.
During those three frames when both the airplane and the explosion are
visible, the x- and y-coordinates of the explosion are determined by the x-
and y-coordinates of the airplane, as in Example 5. These coordinates include
a 16-pixel horizontal adjustment and a four-pixel vertical adjustment to
center the explosion over the nose of the airplane.
Attached sprites have many applications. When a character needs to hold an
object, such as a gun, attached sprites can greatly simplify the code. This
also saves room, which is always at a premium when designing games. If you
have a sprite with 30 positions (running, jumping, falling, standing, and so
forth) and you add a gun to each of those positions, you will need to generate
30 more sprites. If the sprites have an average width of 30 pixels and height
of 40 pixels, this will use 36 Kbytes of sprite space. If you can reuse the
nonshooting sprites by simply adding an attached gun arm to each one, the
savings in RAM and disk space will be significant. (For more information about
sprite animation, see Chapters 12 and 13 in my book, Action Arcade Adventure
Set.)
Figure 1 Sample mid-air battle between airplanes (from the Quickfire demo).
Example 1: The OBJstruct structure.
typedef struct OBJstruct
{
 OBJp next;
 OBJp prev;
 int x;
 int y;
 int xspeed;
 int max_xspeed;
 int yspeed;
 int direction;
 int frame;
 int tile_xmin;
 int tile_xmax;
 int tile_ymin;
 int tile_ymax;
 SPRITE *image;
 ACTIONp action;
 OBJp attached_sprite;
};
Example 2: The SPRITE structure.
typedef struct _sprite
{
 char *bitmap;

 int width;
 int height;
 int xoffset;
 int yoffset;
} SPRITE;
Figure 2 Objects in the list may point to each other or to nothing.
Figure 3 The airplane and the explosion point to each other.
Example 3: The portion of the start_explosion() function that attaches two
sprites to each other.
/* set up the links between the explosion and the enemy plane */
node->attached_sprite = objp;
objp->attached_sprite = node;
Example 4: Setting a sprite's action function.
if (objp->attached_sprite != (OBJp)NULL)
 objp->attached_sprite->action = &kill_enemy;
Example 5: Determining x- and y-coordinates.
/* The position of the explosion depends on the position of the airplane */
if (objp->attached_sprite != (OBJp)NULL)
{
 objp->x = objp->attached_sprite->x+16;
 objp->y = objp->attached_sprite->y-4;
}

Listing One 

/******************** sprite declarations *************************/
int nsprites;
typedef struct _sprite
{
 char *bitmap;
 int width;
 int height;
 int xoffset;
 int yoffset;

} SPRITE;
SPRITE *sprite[40];

/* forward declarations */
struct OBJstruct;
typedef struct OBJstruct OBJ, near *OBJp;

/* pointer to object action function */
typedef void near ACTION (OBJp objp);
typedef ACTION *ACTIONp; 

/* data structure for objects */
typedef struct OBJstruct 
{
 OBJp next; 
 OBJp prev; 
 int x;
 int y;
 int xspeed;
 int max_xspeed;
 int yspeed;
 int direction;
 int frame;
 int tile_xmin;
 int tile_xmax;

 int tile_ymin;
 int tile_ymax;

 SPRITE *image;

 ACTIONp action; 
 OBJp attached_sprite;
};
SPRITE *explosion[11];

/**********************************************************************/
void near start_explosion(OBJp objp)
{
 OBJp node;

 /* allocate space for the object */
 node = (OBJp)malloc(sizeof(OBJ));
 if (node == (OBJp)NULL) return;

 /* assign values to the structure members */

 /* after the plane has been killed, the explosion moves at a slower
 speed because smoke drifts slower than metal */
 node->xspeed = objp->xspeed/2;
 node->yspeed = objp->yspeed/2;

 /* tile extents */
 node->tile_xmin = 2;
 node->tile_xmax = 21;
 node->tile_ymin = 0;
 node->tile_ymax = 14;

 /* the sprite will be the first frame explosion bitmap */
 node->image = explosion[0];
 node->x = objp->x+16;
 node->y = objp->y-4;
 node->frame = -1;

 /* insert at the top of the linked list */
 node->prev = top_node;
 node->prev->next = node;
 top_node = node;
 node->next = (OBJp)NULL;

 /* set up the links between the explosion and the enemy plane */
 node->attached_sprite = objp;
 objp->attached_sprite = node;

 /* assign the action function */
 node->action = do_explosion;
}
/**********************************************************************/
void near do_explosion(OBJp objp)
{
 /* If the explosion has reached the frame 3 state, at which point
 the bitmap is bigger than the airplane, it is time to kill the
 airplane. */

 if (objp->frame > 3)

 {

 /* if the attached sprite is NULL that means the airplane was
 already killed */
 if (objp->attached_sprite != (OBJp)NULL)
 objp->attached_sprite->action = &kill_enemy;

 objp->x += objp->xspeed;
 objp->y += objp->yspeed;
 }
 else
 {
 /* The position of the explosion depends on the position of
 the airplane */
 if (objp->attached_sprite != (OBJp)NULL)
 {
 objp->x = objp->attached_sprite->x+16;
 objp->y = objp->attached_sprite->y-4;
 }

 /* it is possible for the explosion to be at less than frame 3
 but there is no attached sprite. That happens when the enemy
 plane has drifted off the edge of the screen. */
 else
 {
 objp->x += objp->xspeed;
 objp->y += objp->yspeed;
 }
 }

 /* Increment the explosion frame */
 objp->frame++;

 /* define which sprite will be displayed this frame */
 objp->image = explosion[objp->frame];

 /* We have 10 frames for the explosion */
 if (objp->frame > 10)
 {
 objp->image = explosion[10];
 objp->action = kill_explosion;
 }
}




















Using the VESA BIOS 2.0 Linear Frame Buffer


Performance enhancement with and without bank switching




Brian Hook and Kendall Bennett


Brian is a graphics researcher specializing in the field of real-time 3-D
computer graphics. He currently works for a 3-D graphics hardware firm and is
the author of Building a 3D Games Engine in C++ (John Wiley & Sons, 1995). He
can be contacted at bwh@netcom.com. Kendall is the lead developer at SciTech
Software, which has developed the Universal VESA VBE, MGL graphics library,
and recently, WinDirect. He can be reached at KendallB@ScitechSoft.com.


High-performance graphics applications under DOS typically require direct
access to the graphics card's frame buffer. On the VGA, this memory is
accessed in the memory region from A000:0000 to A000:FFFF. Since this allows
only 64 Kbytes to be accessed at a time on a VGA, accessing all of the memory
on the VGA card requires a banking scheme. With a banking scheme, a window
into a part of the frame buffer is addressed at A000:0000, but the piece of
frame buffer this window points to can slide around. While this allows all of
the memory on the video card to be accessed, it requires that the frame buffer
be dealt with in 64-Kbyte chunks--a programming hassle at best, a serious
performance degradation at worst. To combat this problem, the Video
Electronics Standards Association (VESA) has implemented, as part of its VESA
BIOS Extension (VBE) 2.0, a method by which a pointer to a linear frame buffer
can be obtained by an application running on a VBE 2.0-compliant graphics
system.


Compatibility and the VESA BIOS


Manipulating a video card's banks requires in-depth programming knowledge of
its video chipset. Supporting a multitude of video cards, however, can be
arduous. To circumvent this, VESA designed and implemented a standard BIOS
interface that supports bank switching and other functions for a wide range of
video cards. This allows you to program for the VESA BIOS without supporting a
specific video card's idiosyncratic bank-switching mechanisms. If a video card
is VESA compatible, you can be reasonably sure that your code will run.


The Banking Performance Penalty


While the original VESA BIOS specification provided a hardware-independent
method to access all of a video card's display RAM, there was a significant
performance penalty because banking is inherently slow. For starters, bank
switching is expensive, since the VESA BIOS interface is accessed via software
interrupt 0x10. Not only is the interrupt slow, but there is a potential
context switch from protected mode to real mode and back, compounding the
expense. This, coupled with the significant bookkeeping overhead that bank
switching imposes (bank boundaries must be watched for at all times during
operations such as line drawing rectangle clears, and so on), makes banked
frame buffers inefficient for getting to a video card's RAM.


Linear Frame Buffer versus Banking


To address the problems of banked frame-buffer access, the VESA committee has
designed and ratified the VBE 2.0 specification. This major overhaul of the
VBE interface introduces two performance-enhancing capabilities. The first is
protected-mode bank switching, which removes the need for expensive context
switching when banking. The second capability removes the need for banking
altogether by handling the frame buffer as a single chunk of contiguous
memory, assuming that the underlying hardware is capable of supporting such
access. This is important, since VBE 2.0 doesn't guarantee the existence of
linear frame-buffer access--it only provides an interface to the linear frame
buffer if it exists. Linear frame-buffer support possesses many of the same
advantages the 32-bit flat model has over the 16-bit segmented memory model of
the Intel processor--simpler addressing, no segment/bank swapping, and access
to a larger address space.
As a fortunate side effect, VBE 2.0's linear frame-buffer access usually
provides significantly improved performance on PCI-bus-based video systems.
Specifically, when dealing with the VGA frame buffer at A000:0000 on PCI
systems, PCI burst mode is usually not available. However, when working with a
video card's frame buffer, linear burst mode is available, and performance can
double (or more) during mass-data transfers.


Accessing the Linear Frame Buffer


Acquiring a pointer to a graphics system's linear frame buffer is a simple but
lengthy process, requiring care since a misstep at any point renders the frame
buffer invalid. The steps involved are:
Getting VESA Super-VGA information, such as VBE revision number.
Determining if the desired linear video mode is available.
Creating a 48-bit far pointer to access the linear frame buffer.


Getting VBE Super-VGA Information


The function VBE_detect() first executes VESA function 0 (Get SuperVGA
Information), which fills in a VGA info block. The VBE is accessed via the
standard video interrupt 10h; however, AH is set to 4Fh so that the VBE knows
to intercept the call, and AL is set to the VBE function number. After the
call has been performed, the routine returns the VESA version as a BCD value.
The video-mode list returned in the VGA info block must be copied into another
buffer, because the VGA info table will be clobbered by any calls that use
this area of memory (for example, a call to get information on a specific
video mode).


Finding a Video Mode


In VBE 2.0, VESA has dropped the policy of introducing hardcoded video modes.
Instead, you can query the hardware directly for a video mode with a certain
set of attributes. The AvailableModes() function searches the video-mode list
for those video modes that fit a specific application's criteria.



Get a Pointer to the Frame Buffer


The GetPtrToLFB() function is responsible for returning a 48-bit far pointer
to the linear frame buffer. The process is somewhat lengthy but easy to
understand. DPMI service 0 is first used to allocate a selector, as
implemented by DPMI_allocSelector(). Note that this function immediately sets
the selector's access rights to 32-bit page granular. Next, the newly
allocated selector must have its base address set to the address of the frame
buffer.
This gets a little sticky, because the physical address of the frame buffer
(as given in the VBE_modeInfo structure) is not the same as the linear address
the selector expects as a base address. It's therefore necessary to use DPMI
service 0x800 to map the frame buffer's physical address into the processor's
currently running linear address space. The function
DPMI_mapPhysicalToLinear() performs this chore.
With linear address in hand, the selector's base address is set using DPMI
service 7, as DPMI_setSelectorBase() demonstrates. The only thing left now is
setting the selector's limit, which is done using DPMI function 8
(DPMI_setSelectorLimit()). LFBPROF sets the limit to 4 Mbytes, since this is
the most memory any modern VESA-compatible video card will likely have.
(Because we made the selector 32-bit page granular, the limit passed must be
in 4K increments and set to the value of limit-1.)
At this point, a selector has been secured that maps directly into the video
card's frame buffer. This selector forms the basis of a far pointer. The macro
MK_FP() in dos.h (which is Watcom C++ specific) creates a 48-bit far pointer
out of the selector. This pointer allows linear access to the frame buffer.


Accessing the Frame Buffer


The linear frame buffer can now be accessed either directly in assembly or
using Watcom's far-memory routines (for example, _fmemcpy() and _fmemset()).
The sample program, LFBPROF ("Line Frame Buffer Profiler"), uses Watcom inline
assembly to gain access to the frame buffer, since we must guarantee that data
is packed in 32-bit dwords as it goes across the bus. The functions
LfbMemcpy() and LfbMemset() in LFBPROF.H are implemented as Watcom inline
assembly and are guaranteed 32-bit memory copying and setting routines.


LFBPROF


LFBPROF implements everything we've discussed to this point, including banked
frame-buffer access. LFBPROF is a video-card benchmarking program that tests
the ability of the underlying hardware to handle system-to-video copies and
frame- buffer clears in both banked and linear frame-buffer modes. (For more
information on benchmarking frame buffers, see the accompanying text box
entitled, "Frame Buffer Performance Metrics.") Listings One and Two contain
the complete source to LFBPROF.C and LFBPROF.H, respectively. The compiler
used was Watcom C/C++ 10.0a using the 32-bit flat model, with DOS4GW.EXE as
the DOS extender. Because the VBE 2.0 is so new, few hardware manufacturers
have implemented the spec in their ROMs, so SciTech Software's Universal VESA
BIOS Extensions TSR (UniVBE) must be used as the VESA BIOS interface provider.
This package is shareware and available at most DOS ftp sites, including
ftp.scitechsoft.com, and on CompuServe GO IBMPRO.
LFBPROF takes two arguments on the command line, which are the resolution of
the desired video mode; for example, to test 640x480, the command line is
LFBPROF 640 480. 
If no arguments are given, a list of available video modes is printed. Note
that only 8-bit linear frame-buffer modes are tested, although it would be
relatively straightforward to add support for 15-bit and higher modes to
LFBPROF. Listing Three is a sample makefile that can be used to compile
LFBPROF.
LFBPROF's main() is subdivided into three basic parts: initialization,
benchmarking, and shutdown. Initialization is responsible for determining VBE
2.0 compliance, checking on the availability of the desired video mode, and
initializing the graphics mode (which includes securing a selector to the
linear frame buffer).
The benchmark tests a video card's frame-buffer clearing and setting speed
using LFBPROF's LfbMemcpy() and LfbMemset() routines in both linear and banked
modes. The benchmark runs for ten seconds so that any granularity in the
system timer can be factored out.
Shutdown restores VGA mode 3 (80 column text mode) and finally computes and
prints the results of the tests in both Mbytes/sec and frames/sec.
Frame-Buffer Performance Metrics
The ability to quantify frame-buffer performance is important because such
performance is often directly proportional to the performance of a game or
other type of graphics application. However, measuring frame-buffer
performance accurately (and in a form where the results can be interpreted
meaningfully) is tricky and often controversial.
Let's take as an example a very naive VGA frame-buffer performance test. Such
a test would consist of repeatedly blitting out a 320x200, 8-bit, off-screen
buffer to the VGA mode 0x13 real-mode address space from A000:0000 to
A000:FFFF. Time would be measured using the clock() standard-library function,
with enough test-loop iterations to factor out the fairly coarse granularity
typical of PC system timers (18.2 ticks/sec). The inner loop of such a test
would be straightforward, as Example 1(a) shows. This would seem to give a
very good indication of a system's capability to update VRAM, but this isn't
necessarily true. A benchmark attempts to determine not only the video card's
peak speed, but also its likely real-world performance. Often, the two are not
even close.
Sequential Transfer versus Random Access
Our sample test has several significant flaws when it comes to real-world
performance measurement. The first, and most obvious, is that it only tests
the speed of sequential writes to the video card from system RAM. This is fine
for many applications, but for programs that do random writes to video, this
type of measurement can be misleading. For example, video chips with
interleaved memory access (S3 805i, Tseng Labs ET4000/w32i) typically have
extremely fast sequential transfer rates because the interleaved RAM is
optimized for just this type of action. Random writes on these boards,
however, are not nearly as fast as their sequential write times might imply.
Additionally, sequential transfers usually trigger PCI burst-mode operation on
PCI bus systems, which significantly increases transfer speed, but only during
blits. This can easily lead to misinterpretation of transfer rates as measured
by a naive benchmarking program, since burst transfer rates are significantly
higher than random-access rates.
Also, the standard-library memcpy() function used in our naive benchmark isn't
guaranteed to decompose the system-to-video copy into 32-bit read/writes. If
memcpy() were implemented using MOVSB, for example, an 8-bit video card would
likely have the same performance as a 32-bit video card (assuming techniques
such as byte-merging were disabled in system-chipset setup).
Another deficiency is that this benchmark only tests VGA mode 0x13
performance. This does not necessarily scale accurately to other video modes,
such as tweaked, planar, VGA 8-bit modes ("ModeX") or high-resolution VESA
modes. There are many reasons for this. For starters, different chipsets
behave differently depending on the video mode; some chipsets have poor planar
performance but reasonable packed-mode performance. Another reason is that
some video cards handle different video modes with different video chips; the
Diamond Viper, for example. Video mode 0x13 is handled by a secondary VGA
processor, such as an OAK or Weitek 5x86 standard VGA chip, whereas VESA mode
640x400x8 (used by Microsoft Flight Simulator 5) uses the Weitek P9000
accelerator natively. As a result, mode 0x13 on the Viper is extremely
lackluster, yet the high-resolution VESA mode is almost breathtakingly fast.
Also, the sheer size difference of the different video modes (mode 0x13
requires only 64,000 bytes, whereas a mode such as 1024x768x32-bit can consume
over three million bytes) can greatly affect cache coherency.
Note that DRAM and VRAM boards do not perform the same in all video modes. In
very-low-resolution modes like VGA mode 0x13, little time is required by the
CRTC controller to refresh the display, leaving nearly all the DRAM bandwidth
available for CPU access. However, in the very-high-resolution modes like
1280x1024x256, much more of the DRAM bandwidth is required by the CRTC
controller; hence, frame-buffer performance will drop significantly. With
boards based on dual-ported VRAM, the CPU and CRTC controller can both gain
access to the memory at the same time; therefore, the performance generally
does not degrade as the resolution increases (making VRAM boards popular for
high-end CAD applications that need high-resolution video modes).
Cache Coherency
External (L2) cache coherency is extremely important in characterizing
frame-buffer performance. Modern computer systems can have caches anywhere
from nonexistent to 512K in size. This cache proves very important in
real-world tests, but in an application-specific (and thus unpredictable)
manner. In general, if the effect of a factor on an application's performance
is unpredictable, then an attempt should be made to remove it from the
benchmark. A good benchmarking program should try and minimize the effects of
the L2 cache as much as possible. The naive benchmark presented earlier does
not do this, and as a result will find significantly better performance on a
system with a 256K cache than on one without any cache, solely because the
source buffer will reside in the L2 cache for the duration of the program. To
defeat cache coherency, two techniques can be used. The first is to simply
step through multiple buffers. In Example 1(b), cycling through multiple
buffers instead of using a single 64K buffer pretty well destroys cache
coherency. This makes for lower benchmark figures, but they reflect real-world
performance more accurately.
The second method of defeating cache coherency is to simply use a video mode
larger than the system's external cache. A video mode such as 640x480x16bpp
consumes 600K, well beyond the size of a typical L2 cache.
When measuring frame-buffer performance, most local buses will operate at a
different speed, depending on the processor installed. For example, a PCI-bus
Pentium/66 will generally blit faster than a PCI-bus Pentium/90 because the
former clocks the PCI bus at 33 MHz (clock halved) and the latter, at 30 MHz
(clock thirded). This can't be accounted for by the benchmark, but it should
be noted when analyzing performance characteristics of different video cards
measured in different systems.
Conclusion
With so many factors, it would seem nearly impossible to devise a single,
ideal benchmark. However, a comprehensive benchmark isn't necessarily better
than an accurate, informative one. The most important criterion for any
measurement tool is that it specify clearly what is being measured, how it is
being measured, and what applications will find the benchmark's data relevant.
Accurate, comprehensive data is useless unless it is easy to understand and
translates into relevant, meaningful results. LFBPROF does not claim to
provide comprehensive data on a video card's performance; instead, it simply
states the performance of a video card when using VBE 2.0's linear
frame-buffer feature for clearing and system-to-video copying operations. Such
performance is very important for games, but often absolutely irrelevant for
other applications such as CAD or GUIs.
--B.H. and K.B.
Example 1: (a) Inner loop of test; (b) cycling through multiple buffers.
(a)
for ( i = 0; i < NUM_ITERATIONS; i++ )
{
 memcpy( video, source_buffer, SRC_BUF_SIZE );
}
(b)
for ( i = 0; i < NUM_ITERATIONS; i++ )
{
 memcpy( video, source_buffer[i%NUM_SRC_BUFS], SRC_BUF_SIZE );
}

Listing One 

/****************************************************************************
* VBE 2.0 Linear Framebuffer Profiler
* By Kendall Bennett and Brian Hook
* Filename: LFBPROF.C
* Language: ANSI C

* Environment: Watcom C/C++ 10.0a with DOS4GW
* Description: Simple program to profile the speed of screen clearing
* and full screen BitBlt operations using a VESA VBE 2.0
* linear framebuffer from 32 bit protected mode.
* For simplicity, this program only supports 256 color
* SuperVGA video modes that support a linear framebuffer.
****************************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <dos.h>
#include "lfbprof.h"

/*---------------------------- Global Variables ---------------------------*/
int VESABuf_len = 1024; /* Length of VESABuf */
int VESABuf_sel = 0; /* Selector for VESABuf */
int VESABuf_rseg; /* Real mode segment of VESABuf */
short modeList[50]; /* List of available VBE modes */
float clearsPerSec; /* Number of clears per second */
float clearsMbPerSec; /* Memory transfer for clears */
float bitBltsPerSec; /* Number of BitBlt's per second */
float bitBltsMbPerSec; /* Memory transfer for bitblt's */
int xres,yres; /* Video mode resolution */
int bytesperline; /* Bytes per scanline for mode */
long imageSize; /* Length of the video image */
char far *LFBPtr; /* Pointer to linear framebuffer */

/*------------------------- DPMI interface routines -----------------------*/

void DPMI_allocRealSeg(int size,int *sel,int *r_seg)
/****************************************************************************
* Function: DPMI_allocRealSeg
* Parameters: size - Size of memory block to allocate
* sel - Place to return protected mode selector
* r_seg - Place to return real mode segment
* Description: Allocates a block of real mode memory using DPMI services.
* This routine returns both a protected mode selector and
* real mode segment for accessing the memory block.
****************************************************************************/
{
 union REGS r;

 r.w.ax = 0x100; /* DPMI allocate DOS memory */
 r.w.bx = (size + 0xF) >> 4; /* number of paragraphs */
 int386(0x31, &r, &r);
 if (r.w.cflag)
 FatalError("DPMI_allocRealSeg failed!");
 *sel = r.w.dx; /* Protected mode selector */
 *r_seg = r.w.ax; /* Real mode segment */
}

void DPMI_freeRealSeg(unsigned sel)
/****************************************************************************
* Function: DPMI_allocRealSeg
* Parameters: sel - Protected mode selector of block to free
* Description: Frees a block of real mode memory.
****************************************************************************/

{
 union REGS r;

 r.w.ax = 0x101; /* DPMI free DOS memory */
 r.w.dx = sel; /* DX := selector from 0x100 */
 int386(0x31, &r, &r);
}
typedef struct {
 long edi;
 long esi;
 long ebp;
 long reserved;
 long ebx;
 long edx;
 long ecx;
 long eax;
 short flags;
 short es,ds,fs,gs,ip,cs,sp,ss;
 } _RMREGS;

#define IN(reg) rmregs.e##reg = in->x.reg
#define OUT(reg) out->x.reg = rmregs.e##reg

int DPMI_int86(int intno, RMREGS *in, RMREGS *out)
/****************************************************************************
* Function: DPMI_int86
* Parameters: intno - Interrupt number to issue
* in - Pointer to structure for input registers
* out - Pointer to structure for output registers
* Returns: Value returned by interrupt in AX
* Description: Issues a real mode interrupt using DPMI services.
****************************************************************************/
{
 _RMREGS rmregs;
 union REGS r;
 struct SREGS sr;

 memset(&rmregs, 0, sizeof(rmregs));
 IN(ax); IN(bx); IN(cx); IN(dx); IN(si); IN(di);

 segread(&sr);
 r.w.ax = 0x300; /* DPMI issue real interrupt */
 r.h.bl = intno;
 r.h.bh = 0;
 r.w.cx = 0;
 sr.es = sr.ds;
 r.x.edi = (unsigned)&rmregs;
 int386x(0x31, &r, &r, &sr); /* Issue the interrupt */

 OUT(ax); OUT(bx); OUT(cx); OUT(dx); OUT(si); OUT(di);
 out->x.cflag = rmregs.flags & 0x1;
 return out->x.ax;
}
int DPMI_int86x(int intno, RMREGS *in, RMREGS *out, RMSREGS *sregs)
/****************************************************************************
* Function: DPMI_int86
* Parameters: intno - Interrupt number to issue
* in - Pointer to structure for input registers
* out - Pointer to structure for output registers

* sregs - Values to load into segment registers
* Returns: Value returned by interrupt in AX
* Description: Issues a real mode interrupt using DPMI services.
****************************************************************************/
{
 _RMREGS rmregs;
 union REGS r;
 struct SREGS sr;

 memset(&rmregs, 0, sizeof(rmregs));
 IN(ax); IN(bx); IN(cx); IN(dx); IN(si); IN(di);
 rmregs.es = sregs->es;
 rmregs.ds = sregs->ds;

 segread(&sr);
 r.w.ax = 0x300; /* DPMI issue real interrupt */
 r.h.bl = intno;
 r.h.bh = 0;
 r.w.cx = 0;
 sr.es = sr.ds;
 r.x.edi = (unsigned)&rmregs;
 int386x(0x31, &r, &r, &sr); /* Issue the interrupt */

 OUT(ax); OUT(bx); OUT(cx); OUT(dx); OUT(si); OUT(di);
 sregs->es = rmregs.es;
 sregs->cs = rmregs.cs;
 sregs->ss = rmregs.ss;
 sregs->ds = rmregs.ds;
 out->x.cflag = rmregs.flags & 0x1;
 return out->x.ax;
}
int DPMI_allocSelector(void)
/****************************************************************************
* Function: DPMI_allocSelector
* Returns: Newly allocated protected mode selector
* Description: Allocates a new protected mode selector using DPMI
* services. This selector has a base address and limit of 0.
****************************************************************************/
{
 int sel;
 union REGS r;

 r.w.ax = 0; /* DPMI allocate selector */
 r.w.cx = 1; /* Allocate a single selector */
 int386(0x31, &r, &r);
 if (r.x.cflag)
 FatalError("DPMI_allocSelector() failed!");
 sel = r.w.ax;

 r.w.ax = 9; /* DPMI set access rights */
 r.w.bx = sel;
 r.w.cx = 0x8092; /* 32 bit page granular */
 int386(0x31, &r, &r);
 return sel;
}
long DPMI_mapPhysicalToLinear(long physAddr,long limit)
/****************************************************************************
* Function: DPMI_mapPhysicalToLinear
* Parameters: physAddr - Physical memory address to map

* limit - Length-1 of physical memory region to map
* Returns: Starting linear address for mapped memory
* Description: Maps a section of physical memory into the linear address
* space of a process using DPMI calls. Note that this linear
* address cannot be used directly, but must be used as the
* base address for a selector.
****************************************************************************/
{
 union REGS r;

 r.w.ax = 0x800; /* DPMI map physical to linear */
 r.w.bx = physAddr >> 16;
 r.w.cx = physAddr & 0xFFFF;
 r.w.si = limit >> 16;
 r.w.di = limit & 0xFFFF;
 int386(0x31, &r, &r);
 if (r.x.cflag)
 FatalError("DPMI_mapPhysicalToLinear() failed!");
 return ((long)r.w.bx << 16) + r.w.cx;
}
void DPMI_setSelectorBase(int sel,long linAddr)
/****************************************************************************
* Function: DPMI_setSelectorBase
* Parameters: sel - Selector to change base address for
* linAddr - Linear address used for new base address
* Description: Sets the base address for the specified selector.
****************************************************************************/
{
 union REGS r;

 r.w.ax = 7; /* DPMI set selector base address */
 r.w.bx = sel;
 r.w.cx = linAddr >> 16;
 r.w.dx = linAddr & 0xFFFF;
 int386(0x31, &r, &r);
 if (r.x.cflag)
 FatalError("DPMI_setSelectorBase() failed!");
}
void DPMI_setSelectorLimit(int sel,long limit)
/****************************************************************************
* Function: DPMI_setSelectorLimit
* Parameters: sel - Selector to change limit for
* limit - Limit-1 for the selector
* Description: Sets the memory limit for the specified selector.
****************************************************************************/
{
 union REGS r;

 r.w.ax = 8; /* DPMI set selector limit */
 r.w.bx = sel;
 r.w.cx = limit >> 16;
 r.w.dx = limit & 0xFFFF;
 int386(0x31, &r, &r);
 if (r.x.cflag)
 FatalError("DPMI_setSelectorLimit() failed!");
}
/*-------------------------- VBE Interface routines -----------------------*/
void FatalError(char *msg)
{

 fprintf(stderr,"%s\n", msg);
 exit(1);
}
static void ExitVBEBuf(void)
{
 DPMI_freeRealSeg(VESABuf_sel);
}
void VBE_initRMBuf(void)
/****************************************************************************
* Function: VBE_initRMBuf
* Description: Initialises the VBE transfer buffer in real mode memory.
* This routine is called by the VESAVBE module every time
* it needs to use the transfer buffer, so we simply allocate
* it once and then return.
****************************************************************************/
{
 if (!VESABuf_sel) {
 DPMI_allocRealSeg(VESABuf_len, &VESABuf_sel, &VESABuf_rseg);
 atexit(ExitVBEBuf);
 }
}
void VBE_callESDI(RMREGS *regs, void *buffer, int size)
/****************************************************************************
* Function: VBE_callESDI
* Parameters: regs - Registers to load when calling VBE
* buffer - Buffer to copy VBE info block to
* size - Size of buffer to fill
* Description: Calls the VESA VBE and passes in a buffer for the VBE to
* store information in, which is then copied into the users
* buffer space. This works in protected mode as the buffer
* passed to the VESA VBE is allocated in conventional
* memory, and is then copied into the users memory block.
****************************************************************************/
{
 RMSREGS sregs;

 VBE_initRMBuf();
 sregs.es = VESABuf_rseg;
 regs->x.di = 0;
 _fmemcpy(MK_FP(VESABuf_sel,0),buffer,size);
 DPMI_int86x(0x10, regs, regs, &sregs);
 _fmemcpy(buffer,MK_FP(VESABuf_sel,0),size);
}
int VBE_detect(void)
/****************************************************************************
* Function: VBE_detect
* Parameters: vgaInfo - Place to store the VGA information block
* Returns: VBE version number, or 0 if not detected.
* Description: Detects if a VESA VBE is out there and functioning
* correctly. If we detect a VBE interface we return the
* VGAInfoBlock returned by the VBE and the VBE version number.
****************************************************************************/
{
 RMREGS regs;
 short *p1,*p2;
 VBE_vgaInfo vgaInfo;
 /* Put 'VBE2' into the signature area so that the VBE 2.0 BIOS knows
 * that we have passed a 512 byte extended block to it, and wish
 * the extended information to be filled in.

 */
 strncpy(vgaInfo.VESASignature,"VBE2",4);
 /* Get the SuperVGA Information block */
 regs.x.ax = 0x4F00;
 VBE_callESDI(&regs, &vgaInfo, sizeof(VBE_vgaInfo));
 if (regs.x.ax != 0x004F)
 return 0;
 if (strncmp(vgaInfo.VESASignature,"VESA",4) != 0)
 return 0;
 /* Now that we have detected a VBE interface, copy the list of available
 * video modes into our local buffer. We *must* copy this mode list, since
 * the VBE will build the mode list in the VBE_vgaInfo buffer that we have
 * passed, so the next call to the VBE will trash the list of modes. */
 p1 = LfbMapRealPointer(vgaInfo.VideoModePtr);
 p2 = modeList;
 while (*p1 != -1)
 *p2++ = *p1++;
 *p2 = -1;
 return vgaInfo.VESAVersion;
}
int VBE_getModeInfo(int mode,VBE_modeInfo *modeInfo)
/****************************************************************************
* Function: VBE_getModeInfo
* Parameters: mode - VBE mode to get information for
* modeInfo - Place to store VBE mode information
* Returns: 1 on success, 0 if function failed.
* Description: Obtains information about a specific video mode from the
* VBE. You should use this function to find the video mode
* you wish to set, as the new VBE 2.0 mode numbers may be
* completely arbitrary.
****************************************************************************/
{
 RMREGS regs;

 regs.x.ax = 0x4F01; /* Get mode information */
 regs.x.cx = mode;
 VBE_callESDI(&regs, modeInfo, sizeof(VBE_modeInfo));
 if (regs.x.ax != 0x004F)
 return 0;
 if ((modeInfo->ModeAttributes & vbeMdAvailable) == 0)
 return 0;
 return 1;
}
void VBE_setVideoMode(int mode)
/****************************************************************************
* Function: VBE_setVideoMode
* Parameters: mode - VBE mode number to initialise
****************************************************************************/
{
 RMREGS regs;
 regs.x.ax = 0x4F02;
 regs.x.bx = mode;
 DPMI_int86(0x10,&regs,&regs);
}
/*-------------------- Application specific routines ----------------------*/
void far *GetPtrToLFB(long physAddr)
/****************************************************************************
* Function: GetPtrToLFB
* Parameters: physAddr - Physical memory address of linear framebuffer

* Returns: Far pointer to the linear framebuffer memory
****************************************************************************/
{
 int sel;
 long linAddr,limit = (4096 * 1024) - 1;

 sel = DPMI_allocSelector();
 linAddr = DPMI_mapPhysicalToLinear(physAddr,limit);
 DPMI_setSelectorBase(sel,linAddr);
 DPMI_setSelectorLimit(sel,limit);
 return MK_FP(sel,0);
}
void AvailableModes(void)
/****************************************************************************
* Function: AvailableModes
* Description: Display a list of available LFB mode resolutions.
****************************************************************************/
{
 short *p;
 VBE_modeInfo modeInfo;

 printf("Usage: LFBPROF <xres> <yres>\n\n");
 printf("Available 256 color video modes:\n");
 for (p = modeList; *p != -1; p++) {
 if (VBE_getModeInfo(*p, &modeInfo)) {
 /* Filter out only 8 bit linear framebuffer modes */
 if ((modeInfo.ModeAttributes & vbeMdLinear) == 0)
 continue;
 if (modeInfo.MemoryModel != vbeMemPK
 modeInfo.BitsPerPixel != 8
 modeInfo.NumberOfPlanes != 1)
 continue;
 printf(" %4d x %4d %d bits per pixel\n",
 modeInfo.XResolution, modeInfo.YResolution,
 modeInfo.BitsPerPixel);
 }
 }
 exit(1);
}
void InitGraphics(int x,int y)
/****************************************************************************
* Function: InitGraphics
* Parameters: x,y - Requested video mode resolution
* Description: Initialise the specified video mode. We search through
* the list of available video modes for one that matches
* the resolution and color depth are are looking for.
****************************************************************************/
{
 short *p;
 VBE_modeInfo modeInfo;

 for (p = modeList; *p != -1; p++) {
 if (VBE_getModeInfo(*p, &modeInfo)) {
 /* Filter out only 8 bit linear framebuffer modes */
 if ((modeInfo.ModeAttributes & vbeMdLinear) == 0)
 continue;
 if (modeInfo.MemoryModel != vbeMemPK
 modeInfo.BitsPerPixel != 8
 modeInfo.NumberOfPlanes != 1)

 continue;
 if (modeInfo.XResolution != x modeInfo.YResolution != y)
 continue;
 xres = x;
 yres = y;
 bytesperline = modeInfo.BytesPerScanLine;
 imageSize = bytesperline * yres;
 VBE_setVideoMode(*p vbeUseLFB);
 LFBPtr = GetPtrToLFB(modeInfo.PhysBasePtr);
 return;
 }
 }
 printf("Valid video mode not found\n");
 exit(1);
}
void EndGraphics(void)
/****************************************************************************
* Function: EndGraphics
* Description: Restores text mode.
****************************************************************************/
{
 RMREGS regs;
 regs.x.ax = 0x3;
 DPMI_int86(0x10, &regs, &regs);
}
void ProfileMode(void)
/****************************************************************************
* Function: ProfileMode
* Description: Profiles framebuffer performance for simple screen clearing
* and for copying from system memory to video memory (BitBlt).
* This routine thrashes the CPU cache by cycling through
* enough system memory buffers to invalidate the entire CPU 
* external cache before re-using the first memory buffer again.
****************************************************************************/
{
 int i,numClears,numBlts,maxImages;
 long startTicks,endTicks;
 void *image[10],*dst;

 /* Profile screen clearing operation */
 startTicks = LfbGetTicks();
 numClears = 0;
 while ((LfbGetTicks() - startTicks) < 182)
 LfbMemset(FP_SEG(LFBPtr),0,numClears++,imageSize);
 endTicks = LfbGetTicks();
 clearsPerSec = numClears / ((endTicks - startTicks) * 0.054925);
 clearsMbPerSec = (clearsPerSec * imageSize) / 1048576.0;

 /* Profile system memory to video memory copies */
 maxImages = ((512 * 1024U) / imageSize) + 2;
 for (i = 0; i < maxImages; i++) {
 image[i] = malloc(imageSize);
 if (image[i] == NULL)
 FatalError("Not enough memory to profile BitBlt!");
 memset(image[i],i+1,imageSize);
 }
 startTicks = LfbGetTicks();
 numBlts = 0;
 while ((LfbGetTicks() - startTicks) < 182)

 LfbMemcpy(FP_SEG(LFBPtr),0,image[numBlts++ % maxImages],imageSize);
 endTicks = LfbGetTicks();
 bitBltsPerSec = numBlts / ((endTicks - startTicks) * 0.054925);
 bitBltsMbPerSec = (bitBltsPerSec * imageSize) / 1048576.0;
}
void main(int argc, char *argv[])
{
 if (VBE_detect() < 0x200)
 FatalError("This program requires VBE 2.0; Please install UniVBE 5.1.");
 if (argc != 3)
 AvailableModes(); /* Display available modes */

 InitGraphics(atoi(argv[1]),atoi(argv[2])); /* Start graphics */
 ProfileMode(); /* Profile the video mode */
 EndGraphics(); /* Restore text mode */

 printf("Profiling results for %dx%d 8 bits per pixel.\n",xres,yres);
 printf("%3.2f clears/s, %2.2f Mb/s\n", clearsPerSec, clearsMbPerSec);
 printf("%3.2f bitBlt/s, %2.2f Mb/s\n", bitBltsPerSec, bitBltsMbPerSec);
}



Listing Two

/****************************************************************************
* VBE 2.0 Linear Framebuffer Profiler
* By Kendall Bennett and Brian Hook
* Filename: LFBPROF.H
* Language: ANSI C
* Environment: Watcom C/C++ 10.0a with DOS4GW
* Description: Header file for the LFBPROF.C progam.
****************************************************************************/

#ifndef __LFBPROF_H
#define __LFBPROF_H

/*---------------------- Macros and type definitions ----------------------*/
#pragma pack(1)

/* SuperVGA information block */
typedef struct {
 char VESASignature[4]; /* 'VESA' 4 byte signature */
 short VESAVersion; /* VBE version number */
 long OemStringPtr; /* Pointer to OEM string */
 long Capabilities; /* Capabilities of video card */
 long VideoModePtr; /* Pointer to supported modes */
 short TotalMemory; /* Number of 64kb memory blocks */

 /* VBE 2.0 extensions */
 short OemSoftwareRev; /* OEM Software revision number */
 long OemVendorNamePtr; /* Pointer to Vendor Name string */
 long OemProductNamePtr; /* Pointer to Product Name string */
 long OemProductRevPtr; /* Pointer to Product Revision str */
 char reserved[222]; /* Pad to 256 byte block size */
 char OemDATA[256]; /* Scratch pad for OEM data */
 } VBE_vgaInfo;

/* SuperVGA mode information block */

typedef struct {
 short ModeAttributes; /* Mode attributes */
 char WinAAttributes; /* Window A attributes */
 char WinBAttributes; /* Window B attributes */
 short WinGranularity; /* Window granularity in k */
 short WinSize; /* Window size in k */
 short WinASegment; /* Window A segment */
 short WinBSegment; /* Window B segment */
 long WinFuncPtr; /* Pointer to window function */
 short BytesPerScanLine; /* Bytes per scanline */
 short XResolution; /* Horizontal resolution */
 short YResolution; /* Vertical resolution */
 char XCharSize; /* Character cell width */
 char YCharSize; /* Character cell height */
 char NumberOfPlanes; /* Number of memory planes */
 char BitsPerPixel; /* Bits per pixel */
 char NumberOfBanks; /* Number of CGA style banks */
 char MemoryModel; /* Memory model type */
 char BankSize; /* Size of CGA style banks */
 char NumberOfImagePages; /* Number of images pages */
 char res1; /* Reserved */
 char RedMaskSize; /* Size of direct color red mask */
 char RedFieldPosition; /* Bit posn of lsb of red mask */
 char GreenMaskSize; /* Size of direct color green mask */
 char GreenFieldPosition; /* Bit posn of lsb of green mask */
 char BlueMaskSize; /* Size of direct color blue mask */
 char BlueFieldPosition; /* Bit posn of lsb of blue mask */
 char RsvdMaskSize; /* Size of direct color res mask */
 char RsvdFieldPosition; /* Bit posn of lsb of res mask */
 char DirectColorModeInfo; /* Direct color mode attributes */

 /* VBE 2.0 extensions */
 long PhysBasePtr; /* Physical address for linear buf */
 long OffScreenMemOffset; /* Pointer to start of offscreen mem*/
 short OffScreenMemSize; /* Amount of offscreen mem in 1K's */
 char res2[206]; /* Pad to 256 byte block size */
 } VBE_modeInfo;
#define vbeMemPK 4 /* Packed Pixel memory model */
#define vbeUseLFB 0x4000 /* Enable linear framebuffer mode */

/* Flags for the mode attributes returned by VBE_getModeInfo. If
 * vbeMdNonBanked is set to 1 and vbeMdLinear is also set to 1, then only
 * the linear framebuffer mode is available. */

#define vbeMdAvailable 0x0001 /* Video mode is available */
#define vbeMdColorMode 0x0008 /* Mode is a color video mode */
#define vbeMdGraphMode 0x0010 /* Mode is a graphics mode */
#define vbeMdNonBanked 0x0040 /* Banked mode is not supported */
#define vbeMdLinear 0x0080 /* Linear mode supported */

/* Structures for issuing real mode interrupts with DPMI */
struct _RMWORDREGS {
 unsigned short ax, bx, cx, dx, si, di, cflag;
 };
struct _RMBYTEREGS {
 unsigned char al, ah, bl, bh, cl, ch, dl, dh;
 };
typedef union {
 struct _RMWORDREGS x;

 struct _RMBYTEREGS h;
 } RMREGS;
typedef struct {
 unsigned short es;
 unsigned short cs;
 unsigned short ss;
 unsigned short ds;
 } RMSREGS;

/* Inline assembler block fill/move routines */
void LfbMemset(int sel,int off,int c,int n);
#pragma aux LfbMemset = \
 "push es" \
 "mov es,ax" \
 "shr ecx,2" \
 "xor eax,eax" \
 "mov al,bl" \
 "shl ebx,8" \
 "or ax,bx" \
 "mov ebx,eax" \
 "shl ebx,16" \
 "or eax,ebx" \
 "rep stosd" \
 "pop es" \
 parm [eax] [edi] [ebx] [ecx];

void LfbMemcpy(int sel,int off,void *src,int n);
#pragma aux LfbMemcpy = \
 "push es" \
 "mov es,ax" \
 "shr ecx,2" \
 "rep movsd" \
 "pop es" \
 parm [eax] [edi] [esi] [ecx];

/* Map a real mode pointer into address space */
#define LfbMapRealPointer(p) (void*)(((unsigned)(p) >> 12) + ((p) & 0xFFFF))

/* Get the current timer tick count */
#define LfbGetTicks() *((long*)0x46C)

#pragma pack()

#endif /* __LFBPROF_H */



Listing Three

# Very simple makefile for LFBPROF.C using Watcom C++ 10.0a with DOS4GW

lfbprof.exe: lfbprof.c lfbprof.h
 wcl386 -zq -s -d2 lfbprof.c










Implementing Games for Windows


Using the WinG API and the WaveMix DLL




James Finnegan


James is a developer specializing in operating-systems internals. He can be
reached at P.O. Box 436, Falmouth, MA 02541 or via Internet at
FINNEGANJ@delphi.com. Reprinted courtesy of Microsoft Systems Journal (C)1995
Miller Freeman.


At first glance, it seems that Windows, with its graphical nature, device
independence, access to boatloads of memory, and various levels of
multitasking, would be a perfect environment for games. But game developers
have not been flocking to Windows. To improve Windows' video and sound
performance, Microsoft released two APIs: WinG and WaveMix, both of which are
available for free on CompuServe (GO WINMM LIB 10) and the Internet
(ftp.microsoft.com). WinG provides high-performance, device-independent
graphics capabilities in the form of DLLs and a GDI device driver. WaveMix, is
a DLL that lets you mix sound files or resources into a single sound output at
run time and, optionally, to hook sounds up to events within your application.
In this article, I'll review some traditional PC game-animation techniques and
show how WinG can be used to implement them in Windows. I'll also use both
WinG and WaveMix in an application called "WinGTest," which should give you
enough of a foundation to start working with these APIs on your own. You'll
see that together, these APIs let you develop powerful games and other
multimedia apps.


Game Animation


Most game animation implemented on PCs and video games is cast based, whereby
a game player actively manipulates movable screen objects (members of the
cast). This differs from frame-based animation, as in a Video for Windows AVI
clip, where precomposed full-screen images are animated.
The movable screen objects in cast-based animations, commonly referred to as
"sprites," are bitmapped images that are usually animated against a background
image to add realism. Any developer who has tried to implement this type of
animation knows that getting visually acceptable results (smooth, flicker-free
sprite movement) requires quite a bit of work.
On dedicated home and coin-operated video games (and some computers),
specialized hardware is used to implement sprite animation. This makes the
programmer's job easier, since less knowledge of a particular animation
technique is required to move an object from place to place.
On PCs, specialized sprite-animation hardware is generally not at your
disposal. You therefore have to roll your own routines to continually place an
object on the screen, remove it, and place it at a new location. Under DOS,
direct access to the video adapter's memory and access to the adapter's
controller registers gave you explicit control over various attributes and
operations of the adapter (such as its palette of colors), enough control to
implement your own animation routines.
With direct access to the pixels that make up the screen image, you had to
build routines that manipulated these pixels to implement sprite animation.
These routines were necessary to "hide" the removal and replacement of an
animated sprite, which would otherwise create flicker. Typically, one of three
different animation techniques was used: XORing, page flipping, and double
buffering.
XOR animation involves writing a sprite directly to the video adapter's memory
by XORing the source (the sprite) and the destination (the memory that makes
up the screen). Removing the sprite is simply a matter of XORing the same
image in the same place. Doing this successively gives you sprite animation.
Since the video adapter's memory is being accessed directly, the on-screen
results are virtually instantaneous. However, this method only works against a
solid background, since the XORing alters, rather than hides, any background
scene that the sprite is placed over.
Page flipping involves dividing the video adapter's memory into two or more
"pages," where a page represents an entire screenful of graphics. While one
page is being displayed, another page is being constructed. When the page is
ready or when a certain amount of time has elapsed, the visible and hidden
pages are swapped by altering the graphics base address of the video adapter.
Since the screen is continually being refreshed, changing the memory address
results in the display changing.
The third technique is double-buffering, in which an application-managed
buffer is used to construct the screen image. When the buffer is complete, the
image (or a portion of it) is copied to the video adapter's memory (which
directly defines the screen image), updating the display accordingly. This is
conceptually similar to page flipping, except the performance is somewhat
different, since memory is being copied en masse to the display hardware.
All of this direct hardware access trades device independence for raw
performance. Although this was acceptable under DOS, it isn't under Windows,
where device independence, rather than specific hardware access, yields broad
compatibility.


Memory Woes


As if direct hardware access isn't bad enough, games have another hurdle to
overcome. Graphics of any kind usually hog memory. Bitmapped images tend to be
large, and the recent demands for increased screen resolution, with increased
game complexity, don't help the situation much (particularly when dealing with
DOS). People have traditionally addressed the problem of increased memory
demands with DOS extenders.
Windows has gained acceptance as a replacement platform for DOS for business
applications. Unfortunately, this has not been the case with games.
The two key concerns facing PC game developers are: providing smooth, fast
animation, and access to a lot of memory. Memory management is easy in
Windows. It is the graphics performance--particularly when trying to implement
animation routines--that is the problem.
Even though Windows offers high-level graphics primitives through GDI, it is
not well suited for high-performance animation. This is largely because of
Windows device independence. To support different hardware in a consistent
manner, you need device drivers. This means two things: First, the device
driver has to do something to convert a generic function into something device
specific. That extra code takes time to execute. Second, your application's
performance is at the mercy of those device drivers. Even though some device
drivers stink, their poor performance often goes unnoticed on machines running
only business applications.
For animation, particularly with games, extra code and poor performance are
enemies. Combine that with Windows' lack of flexibility drawing display
contexts (you can only use GDI functions), and Windows seems like a bad choice
for games.
Even if GDI and its device drivers performed exceedingly well, the functions
Windows offers are not always appropriate for all types of graphics. For
instance, GDI does no 3-D transformations. Nor does it do anything specific
for animation. Good or bad, the direct memory access that DOS provided to
developers yielded a huge number of software-based graphics solutions.


DIBs versus DDBs


Windows 3.0 introduced the device independent bitmap (DIB) to address bitmap
portability issues. A DIB defines a bitmap's dimension, colors, and pixels in
a single structure. Since the characteristics of a DIB are self encapsulated,
rendering it on different devices usually yields visually comparable results.
In addition, you have complete access to the entire DIB, which means that you
can fool with anything, including the pixels that make up its bitmap, at will.
The GDI API, however, deals with device dependent bitmaps (DDBs) that are
represented by a device context (DC), which GDI uses to do most of its
manipulations. This means anything you want to do to a DDB must be done
through GDI. This is primarily because the DC is allocated and maintained by
the output device's device driver. The memory used to define an image,
particularly with video and printer drivers, may be physically inaccessible to
your application. In addition, particularly with video drivers, the image may
not be stored contiguously, or in a format that you can determine. For
example, some video adapters divide up their memory into bit planes, where the
bits that make up the pixels are divided into individual color values and
stored separately. This is done for quicker access within the physical memory
frame that the video adapter uses. Other video adapters use a packed-pixel
format (where pixels are stored linearly, much like a DIB). Manipulating the
DDB without the aid of GDI (really the device driver) is not possible.
In short, you have the DIB, which you can fool with any way you want, and the
DDB, which Windows manipulates. There is little "glue" in between. For
instance, there are GDI functions that move a DIB to a DC; however, they
perform poorly and inconsistently across some device drivers. In addition,
there is no way to call GDI functions, such as Ellipse or Polygon, to
manipulate the DIB at a higher level.
To alleviate the latter problem, Windows 3.1 ships with DIB.DRV, a GDI device
driver with no associated output device. To GDI, DIB.DRV looks just like
another output device. This driver allows you to allocate a DIB and create a
memory DC to go with it, so you can manipulate the DIB directly while still
manipulating it with GDI function calls.
Although DIB.DRV is useful, it does not help move a DIB to the display easily
and quickly. You may think that since DIB.DRV allows you to associate a DC to
a DIB, you could call GDI's BitBlt to move the bitmap from one DC to the
other, but the actual BitBlt function is implemented in the device driver, not
GDI. Thus, the video driver's BitBlt can only copy from DCs that it knows;
DIB.DRV's DCs are not among them. You are then left to contend with
StretchDIBits. If StretchDIBits is not implemented in the device driver, GDI
will fake it by calling SetBitmapBits and StretchBlt. Needless to say, this
results in inconsistent performance across different hardware. 


Enter WinG


What you really need is DIB flexibility (with DIB.DRV-like functionality as a
bonus), with the speed of the DDB BitBlt. WinG ("G" for games) provides that
and more. The WinG toolkit provides access to a DIB, which provides DOS-like
double-buffering flexibility in a device-independent fashion. Using WinG, your
app copies the DIB to the screen so quickly that you get DOS-like performance
on most hardware.

WinG offers a number of advantages over programming in DOS. First, Windows
offers all of the memory benefits of a DOS extender and more. Second,
video-device independence lets you consistently access resolutions higher than
those under DOS.
DOS games usually have to take a lowest-common-denominator approach to video
graphics, so that most DOS games rarely go beyond VGA's standard mode X (the
"undocumented" 320x240x256 mode used in many games). To fill all of that
new-found screen real estate, WinG offers fast Blt stretching.
WinG gives you direct-access performance by utilizing the best path to your
hardware. To determine which path will be used, WinG analyzes your PC upon
installation. If WinG recognizes your hardware (as a video chipset from one of
about eight different manufacturers, such as Tseng Labs and Western Digital),
it will obtain a pointer directly to the video graphics memory (traditionally
at A000), which it will write to. WinG obtains this pointer by employing
DVA.386, the VFlatD device, which creates a selector to this memory address.
In the future, this interface will be replaced by the Display Control
Interface (DCI), which is designed to supply a consistent method of obtaining
the video memory pointer (among other things) for use by APIs like WinG.
WinGBitBIt and WinGStretchBlt exist only to get your bits from the DIB to the
screen. In the absence of a known video card, the WinG profiler times the
various ways to get bitmaps of different sizes to the screen. It determines
whether a top-down or bottom-up DIB rendering is better. For each case, WinG
will use the fastest combination of GDI functions and driver calls. On some
cards this might involve using direct video access (DVA.386); on others
StretchDIBits is optimized for pipelined data transfer to the video card. WinG
doesn't care; the fastest road is the right one. These results, along with the
current video driver's name and version number, are stored in a setting within
your Windows configuration (WIN.INI under Windows 3.1) information. This
performance analysis is done at installation time, and is only performed again
if the video driver or its version changes.


The WinG API


Fortunately, the actual run-time configuration is largely hidden in a small,
device-independent API. This API is conceptually similar to DIB.DRV, but
unlike DIB.DRV, it includes a high-level BitBlt routine to quickly copy DIBs
to a given display DC. This optimized BitBlt function, called WinGBitBlt, is
the core of WinG.
The WinG API also includes two functions for implementing a halftone palette.
This type of palette selects a set of colors that will emulate 24-bit true
color in an 8-bit, 256-color device. See Appendix A of "Writing HOT Games for
Microsoft Windows" included on the FTP server in GAMESUM.ZIP, for a detailed
description of each API.
Despite all these features (and its name), WinG isn't a high-level gaming API.
Things such as bitmap animation or collision detecting are not part of WinG.
This keeps WinG (much like DOS) from being tied to a particular animation
method. For instance, many games are 2-D, while others (like Atari's Marble
Madness) are isometric (2-D with a 3-D-like background-- a form of fake 3-D),
while still others (like id's DOOM) are true 3-D, with translation and scaling
of objects in a 3-D scene. Since many of these types of routines have already
been developed for DOS applications, porting these techniques to WinG is
relatively easy.
Unfortunately, WinG lacks some routines that just about all games need. For
instance, the routine to copy a sprite with a transparent color (since bitmaps
are rectangular by definition, a transparent color allows you to display
arbitrarily shaped sprites) is not part of WinG. DIB-oriented manipulation
functions are also absent. However, many of these routines are included with
the WinG sample applications, so you can just cut and paste.
WinG implements a double-buffering scheme, so techniques such as page flipping
cannot be implemented. This is not a big deal, since page flipping and
buffering are very similar. Their differences under DOS have to do only with
performance issues. Porting existing page-flipping code shouldn't be
difficult. Something similar to page flipping will be available (in a
device-independent form) in DCI, although this most likely will be hidden
behind the WinG API.
Of course, you can't access low-level video registers under WinG either. You
really shouldn't need to, since most access has to do with direct palette
manipulation, changing memory base addresses, and so on, none of which have to
be done with WinG. These limitations are only a concern if you are doing a
straight port from DOS. WinG does not include any timer-oriented functions,
but these are supplied by other parts of Windows. For instance, the multimedia
API supplies many of the preemptive timer functions that you would need to
implement games. Also, WinG has no support for sound.


32-Bit Hacking


One important note: WING.DLL and WINGDE.DLL contain highly optimized, 32-bit
code. Examination of either of these two DLLs with the EXEHDR utility shows
that many of their code segments are 32 bit. This is great for performance,
but you may wonder how it is done. For instance, if you could do the same in
your WinG code for critical functions, performance might improve
significantly. The secret is hidden within CMACRO32.INC, included on the MSDN
CD and in the WinG-toolkit samples. When you create an assembly function using
the cProc macro, the code in Example 1(a) is placed at the beginning of your
function.
Sixteen-bit Windows ignores the 32-bit segment flag that EXEHDR sees, which
means that the code segment is loaded as a 16-bit segment. When AX is added to
itself, the carry flag is set. The jumped-to code looks like Example 1(b).
This sets the descriptor in the LDT to USE32, causing the prologue code in
Example 1(b) to be interpreted as Example 1(c).
The LDT hacking code is not called again unless the code segment is discarded
and reloaded by Windows. The cEnd macro overrides the RETF instruction with
the operand override (66H) byte. You must set the linker option "/NOPACKCODE"
for this to work successfully. If the USE32 object file is packed with your
16-bit C or C++ functions, the LDT hacking would mess up your 16-bit code:
Usually, your application will GP fault after returning from the called 32-bit
function.


Getting to Work


Developing a game of any type is fairly complex. Collision sensing, keeping
tabs on all those screen images and their states, rotation and translation of
sprites, and the like are pretty involved. To keep things simple, I'll present
a sprite-animation program called "WinGTest," which demonstrates how to
construct WinG DCs, associate bitmaps to the DCS, and shuttle data between
DIBS, WinG DCs, and the display DC. The program (available electronically, see
"Availability," page 3) allows users to drag a sprite across a background with
the mouse, updating the off-screen buffer and ultimately the screen as needed.
WinGTest is made up of two modules: WINGTEST.C and UTILS.C, which is sample
code included with the WinG SDK for DIB manipulation. Reviewing its code
should give enough clues to get you started on your own animated game
projects.
The first thing to do is pull together your DIBS. In my example, I load three
DIBS, one for the window background, and the other two for sprites. I use the
DibOpenFile function from UTILS.C. DibOpenFile will load either a disk file or
an embedded resource, returning a pointer to the loaded DIB's BITMAPINFOHEADER
structure. UTILS.H defines this structure as a PDIB, which it uses in other
API calls and macros to extract relevant information for you.
The next step is the actual creation of the WinG DIB and its associated DC.
Here you determine the best DIB format (top down or bottom up), as well as the
identity palette for your application. I have placed this code within the
processing of my WM_SIZE message, so I can dynamically resize the WinG DIB
accordingly. For the most part, the DIB orientation is not tremendously
important to you; it is there if you need to know (in case you are
implementing your own bitmap-manipulation functions). The WinG-API calls hide
the bitmap orientation from you.
The first function to call is WinGRecommendDIBFormat; see Example 2(a). This
function takes a pointer to a BITMAPHEADERINFO structure and returns the
optimal format for the BitBlting DIBs to your display's DC. This information
assumes that you won't be stretching or using complex clipping regions, and
that you will be using an identity palette.
The only interesting bit of information returned is the DIB orientation. The
biHeight member of the BITMAPINFOHEADER structure will be 1 if the DIB should
be in a top-down format; otherwise, this field will be --1 to indicate a
bottom-up format. In the future, the biBitCount member will indicate the bits
per pixel for the output device. Keep in mind that, for now, this field will
only be 8, since this version of WinG only supports 8-bit, 256-color output
devices. For longer-term compatibility and optimal performance, you will want
to check this field and deal with it accordingly.
The next step is the creation of the identity palette. Windows reserves 20
colors within the 256-color palette for system-wide static colors. These
colors include the colors for title bars, push buttons, window frames, and so
on. These colors take up the first and last ten entries of the palette. They
are placed on either end so each can be XORed with its complement to allow
inversion. To be as friendly as possible to other applications, you should
leave these 20 colors in place. That leaves 236. If you need more colors, you
can get 254 of the 256 (black and white cannot be taken) by calling GDI's
SetSystemPaletteUse with SYSPAL_NOSTATIC, although this can make other
applications ugly. If you do this, I recommend that you make your games
full-screen to hide the hideous screen colors.
Once you have a suitable palette within a DIB, simply load the 236 colors into
an array of PALETTEENTRY structures; see Example 2(b).
To create the identity palette, each color should be flagged as PC_NOCOLLAPSE
(or PC_RESERVED, if you are going to be doing any palette animation). This
will keep the palette manager from combining identical colors into one entry.
The 20 system colors should also be derived at run time and saved in their
appropriate places, since the display driver determines these colors (they are
not fixed across all platforms); see Example 2(c). Once all 256 colors are in
place, the palette can be created as in Example 2(d). This palette can then be
selected and realized to the window's DC, just like any other GDI palette; see
Example 2(e).
This code is called in various places to ensure that the identity palette is
realized whenever the application is in the foreground.


WinG DCs


In a typical WinG application, you would create a single WinG DC, which you
would use as your off-screen buffer. In my example, I will create two. One is
for my background bitmap, which I will stretch into the DC using standard GDI
calls. I am using this DC as a buffer so I can quickly restore my background
when a sprite is moved. The second WinG DC is for my off-screen buffer, which
I will use to create the image that will be Blted to the display.
My first WinG DC is created by calling WinGCreate(); see Example 3(a). I then
create a bitmap using the WinGCreateBitmap call. The dimensions of the bitmap
are contained within WinGDIBHeader, as in Example 3(b). This function returns
a DDB HBITMAP for use with GDI calls, as well as a live pointer to the
bitmap's actual bits. I place this value in a huge pointer to seamlessly allow
access to bitmaps larger than 64 Kbytes. This huge pointer can subsequently be
used either in assembler as an FWORD (16:32) pointer or with the C run-time
library calls that support it. The new bitmap is then selected into the WinG
DC; see Example 3(c).
Finally, the bitmap loaded at the start of this program is stretched to fill
the DC. Like all GDI devices, this standard GDI call is actually implemented
by the device driver, WINGDIB.DRV. As you will see, when working within WinG,
you can still rely on a few familiar GDI calls. I've used these GDI calls for
expediency. You may have more stringent performance requirements, in which
case you should roll your own BitBlting routine to copy source to destination
DIBS. Be aware that WinG is not considered a palette device to GDI. That's why
DIB_RGB_ COLORS is used for the index parameter. Also note the use of the
UTILS.C functions here. The DibXxx functions are all implemented in Example
3(d). I am stretching the bitmap once here so that I can use BitBlt to quickly
copy portions of it later. The same thing is done for the double buffer that
will be used for constructing the screen image, as in Example 3(e).
After creating the double buffer, the entire background image is copied in;
see Example 3(f). Once again, a standard GDI function is used. This is also
implemented in WINGDIB.DRV. Be aware that many of these GDI functions only
work between DCs of the same type. In other words, WINGDIB.DRV's BitBlt
function cannot be used to BitBlt to a printer or to the screen.
Within the WM_PAINT message processing, the offscreen buffer is copied to the
window through a call to WinGBitBIt; see Example 3(g).
WinGBitBIt currently copies only from WinG DCs to display DCs. The identity
palette is selected and realized again, for safety's sake. If the palette is
already realized, these functions have very little overhead. With the identity
palette in place, WinGBitBIt copies the DIB to the window quickly, as quickly
as a BitBlt using a DDB.


WinG Wrap-Up


Even though WinG is billed as an API for the future, it does not support DCI.
This is not a big deal yet. If you stick to the WinG API now, you are likely
to get DCI compliance without doing anything.
Even though WinG 1.0 ships with WING32.DLL, this DLL uses the Win32sg
Universal Thunk, which is not supported under Windows NT. This means that your
16-bit WinG applications will run under Windows NT, but your Win32 ones won't
(yet).
One of the key decisions to make is whether to use GDI calls against the
HBITMAP and DC or roll your own drawing routines. As the WinG documentation
suggests, assume nothing! When performance is the only goal, you will almost
always be able to beat GDI by rolling your own functions. My experience shows
that directly manipulating the DIB and copying the results is almost always
faster than using GDI calls. There are, however, cases when it simply isn't
worth the time or money to roll your own code. My favorite example is TrueType
and text. Yes, you could probably write a slightly faster, more-specialized
version of the TrueType font-rendering code, but is it worth it? WinG's
primary benefit is that it enables you to work around the limitations of GDI,
all the while leveraging existing code and drivers.
Both Windows 95 and Windows NT 3.5 include a GDI API call named
CreateDIBSection, which essentially gives you the functionality of
WinGCreateDC and WinGCreateBitmap. CreateDIBSection will be the interface to
DCI in the future and existing WinG calls will map directly to
CreateDIBSection. Therefore, if you use the WinG API calls, you don't have to
worry about future compatibility. Also, if you are targeting Windows 95 or
Windows NT 3.5 exclusively, I suggest taking a good look at CreateDIBSection.
Its use isn't as obvious as WinG's API, but the power is there.



WaveMix


If you run WinGTest and have a sound card, you'll notice that WinGTest also
produces sound. Each sound is composed completely from WAV files. Most sound
cards only support a single WAV output. To play multiple WAV files
simultaneously, WinGTest uses a new DLL.
The WaveMix DLL, which first appeared in Microsoft Arcade, is available on the
Microsoft Multimedia JumpStart CD, as well as from Microsoft's Internet ftp
site and CompuServe forum. It's also included under the unsupported tools
section of the MSDN CD. WaveMix allows you to simultaneously play up to eight
PCM-sampled sounds. These sounds can be any uncompressed PCM (pulse code
modulation) wave file or resource. PCM is the method used to digitize analog
sounds. The amplitude of a sound is converted into an 8- or 16-bit number and
stored into a file. The amplitude sampling occurs at various intervals:
11,025, 22,050, or 44,100 times per second. When played back, the samples are
converted back into an analog waveform, which, depending on the sampling
resolution and rate, will be a close facsimile of the original sound.
WaveMix works by capitalizing on the low-level audio services of the Windows
Multimedia API. It takes a series of PCM samples (up to eight) and
algebraically sums them, creating a single new PCM waveform to output to your
audio device. Some compromise occurs. First, to achieve this in real time, it
only supports an 8-bit sample output. This type of sample tends to be noisy,
but it should be fine for games. In addition, on most of today's hardware,
supporting over a 22-kHz sample in real time is probably not realistic.
WaveMix allows you to output a 44.1-kHz waveform, but you will probably notice
moments of silence between sounds (the process of mixing cannot keep up with
the playing of the sound).
For more information about PCM encoding, see "The Multimedia Extensions for
Windows-Enhanced Sound and Video for the PC," by Charles Petzold (MSJ, March
1991).
WaveMix does more than just add the waveforms together. To support real-time
mixing, it forms output buffers in a circular buffer, so while one buffer is
playing, the next one can be premixed. This premixing introduces some problems
when you want to insert another sound immediately into the output wave. To
facilitate this, WaveMix allows you to flush the premixed buffers and remix
them with the new sound in place.
Finally, WaveMix manages the mixing of different-length wave inputs. It will
appropriately stop playing on a specific channel when there is no more PCM
data to mix. All of this functionality is hidden behind a reasonably simple
API.


Pulling It Together 


To use WaveMix API, you first open a session with a call to WaveMixInit or
WaveMixConfigureInit. WaveMixInit will configure the session using the
parameters in the WAVEMIX.INI file.
The WAVEMIX.INI file is broken up into a few sections. In the [General]
section, the WaveOutDevice setting specifies the multimedia output device to
utilize starting at 0 (most people will only have one sound output device). If
this value is --1, WaveMix uses the wave mapper. The wave mapper is a Control
Panel applet that allows a user to define the best wave-output devices for
particular wave formats. The wave mapper adds a layer of processing overhead
that you'll most likely find unacceptable.
WaveMixInit obtains the product name of the wave-output device by calling the
waveOutGetDevCaps function from MMSYSTEM, the 16-bit Windows multimedia API.
It uses this string to look up the appropriate subheading in WAVEMIX.INI for a
particular product. This is important to note, since the stock WAVEMIX.INI
includes settings for only five sound cards. If yours isn't one of the five,
you have to add it; otherwise some rather unimpressive defaults will be used.
WAVEMIX uses these settings, filling in missing settings with defaults from
the [Default] section, to configure the wave-output device.
WaveMixConfigureInit permits you to override the SamplesPerSec setting with
either 11, 22, or 44. In addition, WaveMixConfigureInit permits you to specify
whether the output will be mono or stereo. This cannot be set in WAVEMIX.INI.
In my sample app, I call WaveMixConfigureInit to set the number of channels to
2 (stereo). The rest of the settings are derived from the WAVEMIX.INI file;
see Example 4(a).
Once you have a session handle from WaveMixInit or WaveMixConfigLireInit, open
your PCM-encoded WAV files. WaveMixOpenWave supports the loading of both disk
files and embedded resources; see Example 4(b). The current version of WaveMix
only supports PCM-encoded WAV files. Since sound cards supporting various
types of wave compression are proliferating, I expect WaveMix to support these
formats in the future.
Before you can play a loaded WAV, you must open a channel in WaveMix. You can
open them one at a time, or open a series of them at once. In Example 4(c), I
open all of the channels that I need up front. WaveMix then has to be
activated with a call to WaveMixActivate, which allocates and releases the
sound output device, since only one application can use it at a time. The best
place to put this call is in the processing of the WM_ACTIVATE message. That
way, it will always be called when the application first starts, and WaveMix
will allocate and free the output device as the application goes from
foreground to background; see Example 4(d). After all of this, actually
playing the sounds is a snap. A call to WaveMixPlay, see Example 4(e) is all
that is required.
The code in Example 4 can be called from any event, which makes hooking up
sounds to various application events easy. It's important to note that WaveMix
uses a hidden window with a WM_TIMER message to continually mix output
buffers. This means that if your application does not relinquish control to
Windows periodically and consistently, the output sound will start to skip. If
you have code in your application that does not relinquish control
periodically, then you must call WaveMixpump periodically (which I do in the
processing of WM_CLOSE).


Conclusion


Between WinG and WaveMix, you have at your disposal a rich set of tools to
develop Windows-based games. Best of all, they are both free. Although neither
API is an exhaustive implementation of everything that they can be, they do
show an evolutionary path toward improving game development under Windows.
With an eye on the future, these APIs give you the flexibility and performance
you need today, with compatibility that you'll want tomorrow. 
Example 1: (a) Function header generated by cProc macro; (b) jumped-to from
header, sets the CS descriptor in the LDT; (c) the header in (a) effectively
becomes this.
(a)

xor ax,ax
mov ah,80h
add ax, ax ;;Will o/flow in 16 bits
jc short &n&_fix_cs

(b)

mov bx,cs
 ...
mov es,ax
mov di.sp
mov ax,000Bh
int 31h ;;; DPMI: Get the CS descr
;;; change the following to USE32
or byte ptr es:[di+6],40h
mov ax,000Ch
int 31h ;;; DPMI. Set the CS descr

(c)

xor eax,eax
mov ah,80h
add eax,eax ;; Doesn't overflow in
 32 bits
jc short &n&_fix_cs
Example 2: (a) Function prototype for WinGRecommendDIBFormat; (b) loading 236
colors into an array of PALETTEENTRY structures; (c) saving the 20 system
colors; (d) creating the palette; (e) selecting the palette.
(a)

WinGRecommendDIBFormat((BITMAPINFO far*)&WinGDIBHeader.bmiHeader);


(b)

for(iColorIndex = 10;iColorIndex < 246;iColorIndex++)
{
WinGDIBHeader.bmiColors[iColorIndex].rgbRed =
 LogicalPalette.palEntries[iColorIndex].peRed =
 pColorTable[icolorIndex].rgbRed;
 WinGDIBHeader.bmiColors[iColorIndexl.rgbGreen =
 LogicalPalette.palEntries[iColorIndex].peGreen =
 pColorTable[iColorIndex].rgbGreen;
 WinGDIBHeader.bmiColors[iColorIndex].rgbBlue =
 LogicalPdlette.palEntries[iColorIndex].peBlue =
 pColorTable[icolorIndexl.rgbBlue:
 WinGDIBHeader.bmiColors[!ColorIndexj.rgbReserved = 0;
 // This flag includes PC_NOCOLLAPSE
 LogicalPalette.palEntries[iColorlndexl.peFlags =
 PC_RESERVED;
 }

(c)

// Get the 20 static colors
hDC = GetDC(O);
GetSystemPaletteEntries(hDC,0,10,LogicalPalette.palEntries);
GetSystemPaletteEntries(hDC,246,10,LogicalPalette.palEntries + 246):
ReleaseDC(0,hDC);

(d)

ghMSJIdentityPalette = CreatePalette((LOGPALETTE far*) &LogicalPalette):

(e)

SelectPalette(hDC,ghMSJldentityPalette,FALSE);
RealizePalette(hDC);
Example 3: (a) Creating a WinG DC; (b) creating a bitmap ; (c) selecting the
new bitmap into the WinG DC; (d) stretching the bitmap; (e) constructing the
double buffer; (f) copying the background image into the DC; (g) copying the
off-screen buffer to the window.
(a)

ghWinGBackgroundDC = WinGCreateDC():

(b)

hBitmap = WinGCreateBitmap(ghWinGBackgroundDC,
 (BITMAPINFO far*)&WinGBackgroundDIBHeader,
 (void far*)&ghpWinGBackgroundBitmap);

(c)

gbm0ldBackgroundBitmap=(HBITMAP)SelectObject(ghWinGBackgroundDC,hBitmap);

(d)

StretchDIBits(ghWinGBackgroundDC,0,0,giWindowX,giWindowY,0,0,
 DibWidth(gpdibBackgroundBitmap),DibHeight(gpdibBackgroundBitmap),
 DibPtr(gpdibBackgroundBitmap),DibInfo(gpdibBackgroundBitmap),
 DIB_RGB_COLORS,SRCCOPY);

(e)


ghWinGDC = WinGCreateDC();
hBitmap = WinGCreateBitmap(ghWinGDC,(BITMAPINFO far*)&WinGDIBHeader,
 (void far *)&ghpWinGBitmap):
gbm0ldBitmap = (HBITMAP)SelectObject(ghWinGDC,hbitmap);

(f)

BitBlt(ghWinGDC,0,0,0,giWindowX,giWindowY,ghWinGBackgroundDC,0,0,SRCCOPY);

(g)

hDC = BeginPaint(hWnd,&ps);
SelectPalette(hDC,ghMSJldentityPalette,FALSE);
RealizePalette(hDC);
WinGBitBlt(hDC,0,0,giWindowX,giWindowY,ghWinGDC,0,0);
EndPaint(hWnd,&ps);
Example 4: (a) Configuring the wave output device; (b) opening WAVfiles; (c)
opening 4 channels, one per file; (d) activating the device; (e) playing
sounds by calling WaveMixPlay.
(a) mcConfig.wSize = sizeof(MIXCONFIG);
 mcConfig.dwFlags = WMIX_CONFIG_CHANNELS;
 mcConfig.wChannels = 2: // Start up 
 Wavemix
 ghMixSession = WaveMixConfigureInit(&mcConfig);

(b) glpMixl=WaveMixOpenWave(ghMixSession,"1.wav",NULL,WMIX_FILE);
 glpMix2=WaveMixOpenWave(ghMixSession,"2.wav",NULL,WMIX_FILE);
 glpMix3=WaveMixOpenWave(ghMixSession,"3.wav",NULL.WMIX_FILE);
 glpMix4=WaveMixOpenWave(ghMixSession,"4.wav",NULL,WMIX_FILE);

(c) WaveMixOpenChannel(ghMixSession,4,WHIX_OPENCOUNT):

(d) case WM_ACTIVATE:
 // WA_INACTIVE == FALSE;
 WaveMixActivate(ghMixSession, wparam);
 break;

(e) MixPlayParams.wSize = sizeof(MIXPLAYPARAMS);
 MixPlayParams.hMixSession = ghMixSession;
 MixPlayParams.hWndNotify = NULL:
 MixP]ayParams.dwFlags=WMIX_HIPRIORITY;
 MixPlayParams.wLoops=0;
 MixPlayParams.iChannel=3;
 MixPlayParams.lpMixWave=glpMix4;
 WaveMixPlay(&MixPlayParams);




















RAMBLINGS IN REAL TIME


BSP Trees




Michael Abrash


Michael Abrash is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be reached at mikeab@idsoftware.com.


The answer is: Wendy Tucker. The question that goes with that answer isn't
particularly interesting to anyone but me--but the manner in which I came up
with the answer might be.
I spent many of my childhood summers at Camp Chingacook, on Lake George in New
York. It was a great place to have fun and do some growing up, with swimming
and sailing and hiking and lots more.
When I was 14, Camp Chingacook had a mixer with a nearby girl's camp. As best
I can recall, I had never had any interest in girls before, but after the
older kids had paired up, I noticed a pretty girl looking at me, and, with
considerable trepidation, crossed the room to talk to her. To my amazement, we
hit it off terrifically. We talked nonstop for the rest of the evening, and I
walked back to my cabin floating on air. I had taken a first, tentative step
into adulthood, and my world would never be quite the same.
That was the only time I ever saw her, although I would occasionally remember
that warm glow and call up an image of her smiling face. That happened less
frequently as the years passed and I had real girlfriends. By the time I got
married, that particular memory was stashed in some back storeroom of my mind.
I didn't think of her again for more than a decade.
A few days ago, for some reason, that mixer popped into my mind as I was
trying to fall asleep, and I wondered, for the first time in 20 years, what
that girl's name was. The name was there in my mind, somewhere; I could feel
the shape of it, in that same back storeroom, if only I could figure out how
to retrieve it.
I poked and worried at that memory, trying to get it to come to the surface. I
concentrated on it as hard as I could and even started going through the
alphabet one letter at a time, trying to remember if her name started with
each letter. After 15 minutes, I was wide awake and totally frustrated. I was
also farther than ever from answering the question; all the focusing on the
memory was beginning to blur the original imprint.
At this point, I consciously relaxed and made myself think about something
completely different. Every time my mind returned to the mystery girl, I
gently shifted it to something else. After a while, I began to drift off to
sleep, and as I did, a connection was made, and a name popped, unbidden, into
my mind.
Wendy Tucker.
There are many problems that are amenable to the straight-ahead, purely
conscious sort of approach that I first tried to use to retrieve Wendy's name.
Writing code (once it's designed) is often like that, as are some sorts of
debugging, and technical writing, and balancing your checkbook. I find these
left-brain activities very appealing because they're finite and controllable;
when I start one, I know I'll be able to deal with whatever comes up and make
good progress, just by plowing along. Inspiration and intuitive leaps are
sometimes useful, but not required.
The problem is, though, that neither you nor I will ever do anything great
without inspiration and intuitive leaps, and especially not without stepping
away from what's known and venturing into territories beyond. The way to do
that is not by trying harder but, paradoxically, by trying less hard, stepping
back and giving your right brain room to work, then listening for and
nurturing whatever comes of that. On a small scale, that's how I remembered
Wendy's name, and on a larger scale, that's how programmers come up with
products that are more than me-too, checklist-oriented software.
Which, for a couple of reasons, brings us neatly to today's topic, binary
space partitioning (BSP) trees. First, games are probably the sort of software
in which the right-brain element is most important--blockbuster games are
almost always breakthroughs in one way or another--and some very successful
games use BSP trees, most notably the megahit DOOM. Second, BSP trees aren't
intuitively easy to grasp, and considerable ingenuity and inventiveness is
required to get the most from them.
First, I'd like to thank John Carmack, technical wizard of DOOM, for
generously sharing his knowledge of BSP trees.


BSP Trees


A BSP tree is, at heart, nothing more than a tree that subdivides space in
order to isolate features of interest. Each node of a BSP tree splits an area
or a volume (in two dimensions or three dimensions, respectively) into two
parts along a line or a plane; thus the name "binary space partitioning." The
subdivision is hierarchical; the root node splits the world into two
subspaces, then each of the root's two children splits one of those two
subspaces into two more parts. This continues, with each subspace being
further subdivided, until each component of interest (each line segment or
polygon, for example) has been assigned its own unique subspace. This is,
admittedly, a pretty abstract description, but the workings of BSP trees will
become clearer shortly; it may help to glance ahead to Figures 2 through 6.
Building a tree that subdivides space doesn't sound particularly profound, but
there's a lot that can be done with such a structure. BSP trees can be used to
represent shapes, and operating on those shapes is a simple matter of
combining trees as needed; this makes BSP trees a powerful way to implement
constructive solid geometry (CSG). BSP trees can also be used for hit testing,
line-of-sight determination, and collision detection.


Visibility Determination


I'm going to discuss only one of the many uses of BSP trees: their ability to
allow you to traverse a set of line segments or polygons in back-to-front or
front-to-back order as seen from any arbitrary viewpoint. This sort of
traversal can be very helpful in determining which parts of each line segment
or polygon are visible and which are occluded from the current viewpoint in a
3-D scene. Thus, a BSP tree makes possible an efficient implementation of the
painter's algorithm, whereby polygons are drawn in back-to-front order, with
closer polygons overwriting more distant ones that overlap, as shown in
Figures 1(b--d). (The line segments in Figure 1(a) represent vertical walls
viewed directly from above.) Alternatively, visibility determination can be
performed by front-to-back traversal working in conjunction with some method
for remembering which pixels have already been drawn. The latter approach is
more complex, but has the potential benefit of allowing you to early-out from
traversal of the scene database when all the pixels on the screen have been
drawn.
Back-to-front or front-to-back traversal in itself wouldn't be so
impressive--there are many ways to do that--were it not for one additional
detail: The traversal can always be performed in linear time, as we'll see
later on. In other words, you can traverse, say, a polygon list back-to-front
from any viewpoint simply by walking through the corresponding BSP tree once,
visiting each node once and only once, and performing only one relatively
inexpensive test at each node.
It's hard to get cheaper sorting than linear, and BSP-based rendering stacks
up well against alternatives such as z-buffering, octrees, z-scan sorting, and
polygon sorting. Better yet, a scene database represented as a BSP tree can be
clipped to the view pyramid very efficiently; huge chunks of a BSP tree can be
lopped off when clipping to the view pyramid, because if a splitting line or
plane lies entirely outside the view volume, then all surfaces on one or the
other side of the splitting surface must likewise be outside the view volume,
for reasons that will become clear as we delve into the workings of BSP trees.


Limitations of BSP Trees


Powerful as they are, BSP trees aren't perfect. By far the greatest limitation
of BSP trees is that they're time-consuming to build, enough so that, for all
practical purposes, BSP trees must be precalculated and cannot be built
dynamically at run time. In fact, a BSP-tree compiler that attempts to perform
some optimization (limiting the number of surfaces that need to be split, for
example) can easily take minutes or even hours to process large world
databases.
A fixed world database is fine for walkthrough or flythrough applications, but
not much use for games or virtual reality, where objects constantly move
relative to one another. Consequently, various workarounds have been developed
to allow moving objects to appear in BSP tree-based scenes. DOOM, for example,
uses 2-D sprites mixed into BSP-based, 3-D scenes; note, though, that this
approach requires maintaining z information so that sprites can be drawn and
occluded properly. Alternatively, movable objects could be represented as
separate BSP trees and merged into the BSPtree describing the static world
anew with each move. Dynamic merging may or may not be fast enough, depending
on the scene, but merging BSP trees tends to be quicker than building them,
because the BSP trees being merged are already spatially sorted.
Another possibility would be to generate a per-pixel z-buffer for each frame
as it's rendered, to allow dynamically changing objects to be drawn into the
BSP-based world. In this scheme, the BSP tree would allow fast traversal and
clipping of the complex, static world, and the z-buffer would handle the
relatively localized visibility determination involving moving objects. The
drawback of this is the need for a memory-hungry z-buffer; a typical 640x480
z-buffer requires a fairly appalling 600K, with equally appalling cache-miss
implications for performance.
Yet another possibility would be to build the world so that each dynamic
object falls entirely within a single subspace of the static BSP tree, rather
than straddling splitting lines or planes. In this case, dynamic objects can
be treated as points, which are then just sorted into the BSP tree on the fly
as they move. 
The only other drawbacks of BSP trees that I know of are the memory required
to store the tree, which amounts to a few pointers per node, and the relative
complexity of debugging BSP-tree compilation and usage; debugging a large data
set being processed by recursive code (which BSP code tends to be) can be
quite a challenge. Visual tools which, like the BSP compiler that I'll present
next time, depict the process of spatial subdivision as a BSP tree is
constructed can help a great deal with BSP debugging.


Building a BSP Tree


Now that we know a good bit about what a BSP tree is, how it helps in visible
surface determination, and what its strengths and weaknesses are, let's take a
look at how a BSP tree actually works to provide front-to-back or
back-to-front ordering. This month's discussion will be at a conceptual level,
with plenty of figures; next time we'll get into mechanisms and implementation
details.

I'm going to discuss only 2-D BSP trees from here on out, because they're much
easier to draw and to grasp than their 3-D counterparts. Don't worry, though;
the principles of 2-D BSP trees using line segments generalize directly to 3-D
BSP trees using polygons. Also, 2-D BSP trees are quite powerful in their own
right, as evidenced by DOOM, which is built around 2-D BSP trees.
First, let's construct a simple BSP tree. Figure 2 shows a set of four lines
that will constitute our sample world. I'll refer to these as walls viewed
from directly above, because that's an easily visualized context in which 2-D
BSP trees would be useful in a game. Note that each wall has a front side,
denoted by a normal (perpendicular) vector, and a back side. To make a BSP
tree for this sample set, we need to split the world in two, then each part
into two again, and so on, until each wall resides in its own unique subspace.
An obvious question, then, is how we should carve up the world of Figure 2.
There are many valid ways to carve up Figure 2, but the simplest is just to
carve along the lines of the walls themselves, with each node containing one
wall. This is not necessarily optimal in the sense of producing the smallest
tree, but it has the virtue of generating the splitting lines without
expensive analysis. It also saves on data storage, because the data for the
walls can do double duty in describing the splitting lines as well. (Putting
one wall on each splitting line doesn't actually create a unique subspace for
each wall, but it does create a unique subspace boundary for each wall; as
we'll see, this spatial organization provides for the same unambiguous
visibility ordering as unique subspaces would.)
Creating a BSP tree is a recursive process, so we'll perform the first split
and go from there. Figure 3 shows the world carved along the line of wall C,
into two parts: walls that are in front of wall C, and walls that are behind.
(Any of the walls would have been an equally valid choice for the initial
split; we'll return to the issue of choosing splitting walls next time.) This
splitting into front and back is the essential dualism of BSP trees. Next, in
Figure 4, the front subspace of wall C is split by wall D. This is the only
wall in that subspace, so we're done with wall C's front subspace.
Figure 5 shows the back subspace of wall C being split by wall B. There's a
difference here, though: Wall A straddles the splitting line generated from
wall B. Does wall A belong in the front or back subspace of wall B?
Both, actually. Wall A gets split into two pieces, which I'll call wall A and
wall E; each piece is assigned to the appropriate subspace and treated as a
separate wall. As shown in Figure 6, each of the split pieces then has a
subspace to itself, and each becomes a leaf of the tree. The BSP tree is now
complete.


Visibility Ordering


Now that we've successfully built a BSP tree, you might justifiably be a
little puzzled as to how any of this helps with visibility ordering. The
answer is that each BSP node can definitively determine which of its child
trees is nearer and which is farther from any and all viewpoints; applied
throughout the tree, this principle makes it possible to establish visibility
ordering for all the line segments or planes in a BSP tree, no matter what the
viewing angle.
Consider the world of Figure 2 viewed from an arbitrary angle, as in Figure 7.
The viewpoint is in front of wall C; this tells us that all walls belonging to
the front tree that descends from wall C are nearer along every ray from the
viewpoint than wall C is (that is, they can't be occluded by wall C). All the
walls in wall C's back tree are likewise farther away than wall C along any
ray. Thus, for this viewpoint, we know for sure that if we're using the
painter's algorithm, we want to draw all the walls in the back tree first,
then wall C, and then the walls in the front tree. If the viewpoint had been
on the back side of wall C, this order would have been reversed.
Of course, we need more ordering information than wall C alone can give us,
but we get that by traversing the tree recursively, making the same far/near
decision at each node. Figure 8 shows the painter's algorithm (back-to-front)
traversal order of the tree for the viewpoint of Figure 7. At each node, we
decide whether we're seeing the front or back side of that node's wall, then
visit whichever of the wall's children is on the far side from the viewpoint,
draw the wall, and visit the node's nearer child, in that order. Visiting a
child is recursive, involving the same far-near visiting order.
The key is that each BSP splitting line separates all the walls in the current
subspace into two groups relative to the viewpoint, and every single member of
the farther group is guaranteed not to occlude every single member of the
nearer. By applying this ordering recursively, the BSP tree can be traversed
to provide back-to-front or front-to-back ordering, with each node being
visited only once.
The type of tree walk used to produce front-to-back or back-to-front BSP
traversal is known as "inorder." (See my article, "Good Causes, Good Code," PC
Techniques, October 1994, or any book on data structures for a discussion of
inorder walking.) The only special aspect of BSP walks is that a decision has
to be made at each node about which way the node's wall is facing relative to
the viewpoint, so we know which child tree is nearer and which is farther.
Example 1 shows a function that draws a BSP tree back-to-front. The decision
whether a node's wall is facing forward, made by WallFacingForward() in
Listing One, can, in general, be made by generating a normal to the node's
wall in screenspace (perspective-corrected space as seen from the viewpoint)
and checking whether the z component of the normal is positive or negative, or
by checking the sign of the dot product of a viewspace
(nonperspective-corrected space as seen from the viewpoint) normal and a ray
from the viewpoint to the wall. In 2-D, the decision can be made by enforcing
the convention that when a wall is viewed from the front, the start vertex is
leftmost; then a simple screenspace comparison of the x-coordinates of the
left and right vertices indicates which way the wall is facing.
Finally, be aware that BSP trees can often be made smaller and more efficient
by detecting collinear surfaces (like aligned wall segments) and generating
only one BSP node for each collinear set, with the collinear surfaces stored
in, say, a linked list attached to that node. Collinear surfaces partition
space identically and can't occlude one another, so it suffices to generate
one splitting node for each collinear set.


BSP Trees, Continued


Next time, I'll build a BSP-tree compiler, then put together a rendering
system built around the BSP trees the compiler generates. In the meantime,
there's a World Wide Web page on BSP trees under construction at
http://www.graphics.cornell.edu/bspfaq; as I write this, the page contains
little more than an outline of things to come, but if the contents live up to
the promise of the outline, it could be worth checking out by the time you
read this.


Recommended Reading 


Short on space though I am, I'd be remiss if I didn't point out one of the
most valuable articles you're likely to come across this year. Chris Hecker's
column in the April 1995 issue of Game Developer magazine is by far the best
discussion of perspective texture mapping I've seen. Check it out; you won't
be sorry.


References


Foley, J., A. van Dam, S. Feiner, and J. Hughes. Computer Graphics: Principles
and Practice, Second Edition. Reading, MA: Addison-Wesley, 1990.
Fuchs, H., Z. Kedem, and B. Naylor. "On Visible Surface Generation by A Priori
Tree Structures." Computer Graphics (June 1980).
Gordon, D. and S. Chen. "Front-to-Back Display of BSP Trees." IEEE Computer
Graphics and Applications (September 1991).
Naylor, B. "Binary Space Partitioning Trees as an Alternative Representation
of Polytopes." Computer Aided Design (May 1990).
Figure 1: Visible surface determination via the painter's algorithm: (a) Walls
as viewed from above; (b) after drawing farthest wall; (c) after drawing
next-farthest wall; (d) after drawing nearest wall.
Figure 2 A sample set of walls, viewed from above.
Figure 3 Initial split along the line of wall C.
Figure 4 Split of wall C's front subspace along the line of wall D.
Figure 5 Split of wall C's back subspace along the line of wall B.
Figure 6 Final BSP tree.
Figure 7 Viewing the BSP tree from an arbitrary angle.
Figure 8 Back-to-front traversal of the BSP tree as viewed in Figure 7: Based
on far/near tests at each node. "F" and "N" indicate the respective far and
near children of each node.
Example 1: Function that draws a BSP tree back-to-front.
void WalkBSPTree(NODE *pNode)
{
 if (WallFacingForward(pNode) {
 if (pNode->BackChild) {
 WalkBSPTree(pNode->BackChild);
 }
 Draw(pNode);
 if (pNode->FrontChild) {
 WalkBSPTree(pNode->FrontChild);
 }
 } else {

 if (pNode->FrontChild) {
 WalkBSPTree(pNode->FrontChild);
 }
 Draw(pNode);
 if (pNode->BackChild) {
 WalkBSPTree(pNode->BackChild);
 }
 }
}






















































DTACK REVISITED


San Jose's High-Tech Ditch




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded and can be contacted through the DDJ offices.


The city of San Jose is going to dig a long ditch that will pass near my
house, according to the map published in the local newspaper. The minimum
feature design rules for this ditch will be about 2x106 microns (this
technical detail should allow me to sneak a ditch past DDJ editors). Nothing
unusual? Well, I live in Santa Clara, and it is the city of San Jose digging
the ditch.
You see, us folks in California are short of water. Rice farmers in California
aren't short of water--they get 85 percent of the state's ample supply. It
takes three gallons of water to grow each grain of rice. The California
legislature has more friends of rice farmers than friends of people, so
there's not much water left over for people. That's why San Jose is going to
dig the ditch.
The ditch is going to carry reclaimed waste water (from sewage plants) to be
used exclusively for watering lawns, mostly industrial-park lawns. Given that
rice is more important than people, this is a good idea.
But San Jose officials have a better idea: As long as they're digging a long
ditch that meanders all over the place, even into the city of Santa Clara, why
not run fiber-optic cable in that ditch? Yes, San Jose is going into the
cable/networking business as an entrepreneur.
San Jose is cleverly going to tax anybody other than the city of San Jose who
competes in San Jose using fiber cable. A fabulous idea: Tax your competitors!
(Wait until Borland and Novell hear about this.) The grounds are that fiber
cable is a utility, and utilities can be taxed by cities. This absolutely
assures that my section of the ditch, located in Santa Clara, will be taxed by
the city of Santa Clara. (There is no municipal idea so powerful as a new
tax.)
Al Gore tells me that the Information Highway is coming. With video capability
added, the Highway needs fiber-optic cable. It'll arrive complete with a lot
of federal censors and regulators. Schools and libraries will be connected for
free, but the feds won't foot the bill. Who'll pay for this largess and for
all the accompanying bureaucracy? Looked in the mirror lately?
I turn on Charlie Rose on PBS and I hear some talking heads tell me that a
consortium of Baby Bells are trying to recruit Howard Stringer, head of the
CBS TV network, to form a new, interactive-video cable enterprise. This is
typical of a dozen credible (meaning they can reasonably expect to raise a few
billion dollars) proposals; there are several dozen less credible.
The well-known futurist George Gilder wrote "Telecosm: The Bandwidth Tidal
Wave" in the December 1994 Forbes ASAP. It looks in part very much like a
public-relations release for "Tiger," a video-on-demand system backed by Bill
Gates, of whom you may have heard.
I'm afraid that some day soon I'll hear pounding on my front door; when I open
it, I'll find 17 people with 17 different cables competing for my attention.
The problem with this prospect is that they'll be thrusting 17 different
invoices at me.


Sipping from a Firehose


The thesis of Gilder's article is that network bandwidth is increasing much
faster than the internal bandwidth in our PCs. An accompanying graphic shows
the crossover in 1996, after which network bandwidth makes the PC look like
dogmeat: "_(network) bandwidth will expand 5 to 100 times as fast as the rise
of microprocessor speeds_a firehose of gigabits (billions of bits)."
Gates's Tiger is an all-software, PC-based approach to handling
video-on-demand and all that bandwidth. But maybe an all-software approach
won't work.
Consequently, Gilder introduces us to MicroUnity, "a flagrantly ambitious
(Silicon Valley) startup." MicroUnity is funded by, you guessed it, Microsoft.
MicroUnity plans to build not a microprocessor but a mediaprocessor, one that
can handle "not less than 400 billion bits per second" while replacing
special-purpose (video?) multimedia devices. This mediaprocessor will have to
be "hundreds of times faster than a Pentium." Flagrantly ambitious indeed!
Naturally (?), the video stream going out over those fiber cables will be
compressed using MPEG standards. For this to happen in real time, you need a
supercomputer that can execute video operations 1000 times faster than the raw
video bits (according to Gilder, but this sounds right). At the receiving end,
code division multiple access (CDMA) needs nearly as much video processing
power.
At this point, when I'm about to be buried in gee-whiz, Gilder admits that
Intel's Andy Grove "does not believe this possible." Whew!


Interactive Couch Potatoes?


I've often wondered just what those 17 fiber-optic cable purveyors will offer
me when they converge on my front porch. One answer is, apparently,
interactive TV. This has been tried on a small scale. I've yet to read of the
vast success of interactive TV. The only time I physically respond to TV (as
in to cheer) is on the (currently rare) occasions when USC stomps some
football opponent, as it did in one of the minor bowl games last season. Hey,
when I attended USC's University College (night school), USC was a football
powerhouse.
Okay, interactive TV won't work for the mass populace. Neither will its
sibling, home shopping. You need lots of money to do much home shopping and
lots of money is what the mass populace don't got.
Instantly available CNN-worldwide TV news (our invasion of Panama was on CNN
five minutes after the invasion started) is already here on the
non-fiber-optic cable we've had all along.
I hate to tell you this, but the smart money is on video-on-demand, which
translates into movies-on-demand, which means you can tap into your choice of
available movies. Your choice will start at your convenience, to a resolution
of 15 minutes. "Available movies" means the ones currently stored on the
network supplier's hard-disk drives. The Jazz Singer and Citizen Kane won't be
included. Given politics, Debbie Does Dallas won't be available either, even
though such movies are very popular as video rentals.


Check, Please


How much will all this cost? The best estimate for video-on-demand over
fiber-optic cable that I've seen so far is that it'll cost subscribers four
times as much as existing TV cable, about $120 a month. That's a lot of money
for federally regulated car chases on demand!
According to Gilder, three MPEG-compressed movies can be stored on a
9-gigabyte Seagate Barracuda disk drive. Given a "farm" of several hundred
disk drives at Network Central, that's lots of movies "on demand." Will the
movie you want to see tonight be currently stored on hard disk? (We don't need
real-time video compression to load movies onto those disk drives, so we can
lose the supercomputer.)


What is Success? What's Failure? 


IBM originally planned to sell a very few hundred thousand PCs. Meeting this
objective would have been judged a success. But IBM is selling millions of PCs
every year, and this is widely considered a failure. Let's keep this in mind. 
Gilder's Telecosm seems based on three tenets: 
N transistors on a given silicon die results in N2 performance and value--a
tad overstated, but not egregiously so.

N linked computers results in N2 performance and value (Gilder "credits"
Ethernet inventor Bob Metcalfe with this one). I think this is whacko; almost
every supercomputer maker using massively parallel processors--and there have
been a lot of them--has already gone bankrupt. The optimistic view is that N
linked computers result in almost-N performance.
Gilder may be adapting Metcalfe's view of the huge, loosely coupled Internet
and wrongly applying it to a tightly coupled set-top mediaprocessor. 
With bandwidth far exceeding the capacity of the desktop computer's CPU, the
CPU will simply be bypassed. Right. We call this "TV." (This tenet consigns
Tiger, Gates's all-software approach, to the outer darkness.)
The fact that one of Gilder's tenets is obviously wrong doesn't mean that the
yet-undefined fiber-optic-network industry will fail. On the other hand, many
millions of network subscribers won't guarantee the industry's success.


Building MicroUnity's Mediaprocessor


MicroUnity, funded by Microsoft, promised in 1994 to deliver 10,000 set-top
mediaprocessors in 1995. Don't laugh, it could happen. (Seriously, everyone
involved regards this as a high-risk gamble.) Here's how they are proceeding:
The mediaprocessor (MP) will have a very wide data bus. Pentium's 64-bit bus
will look narrow in comparison.
The MP will be built using design rules about five times smaller than the
Pentium; almost a tenth of a micron. This allows 25 times as many transistors
for a given die size, as compared to the Pentium.
The MP will take an existing industry trend--CPU operating voltages have
dropped from 5 volts to as low as 3.1 volts--to its logical conclusion and use
a 0.5-volt power supply.
The MP will use an idea (already proved in lower-integration devices) that I
think is brilliant: a signal layer using air, not silicon oxide, for
insulation. The dielectric constant is far lower, and the impedance is
proportionally higher. Even the signal-propagation speed is faster (trust me
on this stuff; this is my pidgin). Regrettably, dense CPUs need many signal
layers, and this approach, using air-insulated gold wires, will only work at
the topmost signal layer.
All the aforementioned ideas are fundamentally good; the devil is in the
details. If it were practical to make CPUs using 0.1-micron design rules,
0.5-volt power, and hugely wide data buses, then somebody would be doing it
right now. Intel, the leading microprocessor producer, is just now bringing
online a 0.5--0.4 micron production facility at its new fab in Rio Rancho,
just outside Albuquerque. Production at or near 0.1 micron? In 1995, that's
"Fantasy Island" stuff: "Da plane, da plane!"
For you hardware types, there's the ground bounce and nonexistent noise
margins associated with vastly wide data buses and a half-volt power supply
(the signal swing is, at most, 0.25 volts).
Now for the good news: A technology tour-de-force similar to the
mediaprocessor was successfully pulled off once. The year was 1982 and the
company was Hewlett-Packard. HP invested the then unheard-of sum of $90
million in the design of a 32-bit CPU using design rules that were then as
radical as those of the mediaprocessor today; see Figure 1.
This processor was highly successful. It had an unexpectedly high production
yield, or proportion of good chips per wafer. HP restricted its use to HP's
proprietary minicomputers and so succeeded, in terms of IBM's original
objective for the PC. Had HP offered the processor for general sale, it might
have ultimately sold millions and thus have been regarded as a failure: Why
spend $90 million to make your competitors' computers run faster?


Economic Cooperation and Competition


How will various network services (the Info Highway, home shopping,
interactive TV, video-on-demand) share the necessary infrastructure and income
stream? If video-on-demand is killed off by rented digital laser disks, are
the remaining services worth $120/month to the mass populace? I hate to
mention this in a family magazine, but the Politically Incorrect fact that
pornography is readily available at the rental store but not over federally
regulated networks is a powerful economic force against video-on-demand.
I think this whole thing will turn out just like HDTV: Politics and economics
will swamp the technology. And you have doubtless noticed that after a decade
of hoopla, you still don't have HDTV.
Figure 1 Microprocessor feature size (mm) versus date of first silicon (dashed
lines indicate radical technology jumps).
If network bandwidth is really increasing up to 100 times faster than
microprocessor performance, why not wait two to six weeks for network
bandwidth to increase so that video compression need not be used? That
eliminates both the high-performance video-compression processor required for
real-time MPEG and the need to decompress at the receiving end.
Look: Video compression is an expensive technique used to conserve a scarce
resource--bandwidth. If bandwidth is really increasing so swiftly, it ain't
scarce. Who else sees a credibility problem here?
**********
Nine-gigabyte drives go for $4500 these days, so it costs $1500 to store an
MPEG-compressed movie. If magnetic-disk-storage cost per megabyte continues to
drop by a factor of 2 each year, storage costs will drop to $200 per movie in
three years.
The problem is, in three years we'll be able to buy a CD-like, digital laser
disk that'll store a full-length movie for $15. I assume you'll be able to
rent one of these disks for $2 to $3, thus replacing low-quality, short-lived
video tape with high-quality, long-lived digital laser disks. This will be
powerful economic competition for video-on-demand.
Do you want to buy stock in a company that's going to invest hundreds of
millions, perhaps billions, of nonrecoverable dollars in the infrastructure
needed to deliver movies via fiber optics rather than by digital laser disk?































PATTERNS AND SOFTWARE DESIGN


Designing Objects for Extension




Richard Helm and Erich Gamma 


Richard and Erich are coauthors of Design Patterns: Elements of Reusable
Object-Oriented Software (Addison-Wesley, 1994). They can be reached at
Richard .Helm@dmr.ca and Erich_Gamma@Taligent.com, respectively.


The key to creating reusable software lies in knowing people's needs and
anticipating how those people might reuse your software to meet those needs.
To accomplish this, you must consider how your system might change over its
lifetime and be aware of typical causes of redesign. Among the many causes for
changes are the evolving requirements of current users and the needs of new
users. Other causes might be intrinsic to the business environment in which
your software is used, or stem from changes in technology or platforms. 
If places where such modifications occur are not isolated or decoupled from
the rest of the system, the resulting changes will cause modifications
throughout the software and risk the introduction of reuse errors. The design
issue then is to find and separate the potentially changing parts of an
application from its more stable parts. This idea is not new, and analogies
can be found in other domains--building and architecture, for instance.
In his wonderful and thoughtful book How Buildings Learn (Viking, 1994),
Stewart Brand considers buildings as consisting of six layers: site (its
physical space on ground), structure (exterior and load-bearing walls), skin
(exterior brickwork, cladding), services (electrical, plumbing), space plan
(interior walls, windows, ceilings), and stuff (furniture). Brand notes that
these layers evolve at different rates during the life of a building. Sites of
buildings often exist for hundreds of years, whereas the furniture tends to
move frequently. Brand also notes that these layers shear against each other
as they change at different rates. The slow-moving site and structure systems
dominate the fast-moving space-plan and stuff systems. He observes that "an
adaptive building has to allow slippage between the differently paced systems.
If not, the slow systems block the fast ones, and the fast ones tear up the
slow." If you lay your electrical and plumbing services within the concrete
slab of your house, you'll have trouble adding new wiring or fixing blocked
drains. 
The same ideas apply to software. There is typically a relatively stable, core
application architecture (event driven, client/ server, blackboard, or
whatever), around which exist less stable layers, from slow-moving subsystem
structures, to fast-moving details of the user interface. Just as in
buildings, the key to developing an extensible software system is for these
layers to be able to slip and shear against each other. When they cannot, you
often have a reuse error. There are multiple causes of reuse errors:
Algorithmic dependencies. Algorithms are often extended, optimized, and
replaced during development and reuse. Objects that depend on an algorithm
(both its behavior and data structures) will have to change when the algorithm
changes. Therefore, algorithms likely to change should be isolated from the
rest of the application.
Tight coupling. Classes that are tightly coupled are hard to reuse in
isolation, since they depend on each other. Tight coupling leads to monolithic
systems, where you can't change or remove a class without understanding and
changing many other classes. The system becomes a dense, brittle mass that's
hard to learn, port, and maintain. Techniques such as abstract coupling and
layering help create loosely coupled systems. Abstract coupling means that an
object talks to another object by using an interface defined by an abstract
class. This enables the object to communicate with objects of different
concrete subclasses.
Creating an object by specifying a class explicitly. Specifying a class name
when you create an object commits you to a particular implementation instead
of a particular interface. This commitment can complicate future changes if
the implementations change. One way to avoid such commitment is to make
requests to instantiate classes through a third-party factory object.
Dependence on specific operations. When you specify a particular operation in
a request, you commit to a particular way of satisfying that request. By
avoiding hard-coded requests for operations, you make it easier to change the
way a request gets satisfied.
Design patterns can help you avoid many reuse errors by ensuring that a system
can change in specific ways. Each design pattern lets you vary some aspect of
system structure independently of other aspects, thereby making a system more
robust to a particular kind of change. There are patterns which concern the
extensibility of objects, the flexible creation of objects, the distribution
of responsibilities, and patterns in managing relationships between objects.
We will look at some of these patterns in later columns. This month we will
look at patterns for designing extensible objects. 


Creating Extensible Objects


Since an object is defined by its class, extensions to the object are usually
defined through class inheritance--by creating subclasses of the original
class. Inheritance is a compile-time mechanism for extending a class and
reusing implementations. While simple and supported in most object-oriented
languages, class inheritance does have some problems: 
It exposes subclasses to their parent class's implementation, which introduces
dependencies and breaks encapsulation of the parent with respect to the child.
Parent classes appear as White Boxes.
It implies a commitment to a particular implementation rather than an
interface (unless the class is derived from an abstract class with no
representation).
Extensions defined through inheritance are defined at compile time and can't
be redefined or changed at run time.
Inheritance doesn't extend a particular object; rather it specifies a new
implementation of the object based on the implementation of an old. To
actually use the extension, you must be able to instantiate the extended class
where you instantiated the old.
As we discussed in our previous column, object composition offers an
alternative to inheritance. Instead of composing classes to create new
functionality, object composition creates new functionality by combining
objects in new ways. 
However, object composition is a little more difficult to use than
inheritance. It requires careful attention to interfaces defined by objects.
It also increases the difficulty in understanding a system: Relationships
between objects are more implicit, most of the code only uses objects through
interfaces, and the implementation classes of the objects behind these
interfaces are unknown. Despite these difficulties, object composition can
offer a more flexible alternative to inheritance as a reuse mechanism. 
So how do you use object composition to create extensible objects? First, you
need to consider what it is you are extending: an object's behavior for
particular states, the algorithms it uses, properties and individual
operations, or its interface. There are four patterns that address these
extensions, three of which--Decorator, Strategy, and State--are discussed in
our book Design Patterns: Elements of Reusable Object-Oriented Software. The
fourth is a new pattern called "Extension Objects," which we introduce in this
column. To simplify the discussion and permit comparison, we will name the
class to be extended ExtendedObject.


Extending Algorithms


Objects employ algorithms to implement their operations. The Strategy pattern,
which we discussed in our previous column, allows an object to be extended
with new kinds of algorithms. The basic idea behind the Strategy pattern is
that each algorithm is encapsulated and accessed through a common interface to
create a family of strategies. At run time, the ExtendedObject instance is
configured with a particular Strategy class.


Extending State-Specific Behavior


All objects have internal state. Sometimes the behavior of an object depends
on this state. At issue is how to represent this state-specific behavior,
isolate each state's behavior, and permit new states and behaviors to be
added. The State pattern does this as follows: It encapsulates all
state-specific behavior as a state object and creates state objects for each
distinct state. The ExtendedObject forwards state-specific behavior to the
state object. To change the ExtendedObject's behavior, configure it with a
different state object.


Extending Properties and Operations


How can you extend properties and operations? How can you add state and alter
the behavior of operations? The Decorator pattern does this by "wrapping" up
the ExtendedObject in a decorator object. The decorator presents the same
interface to clients as the decorated object (clients will therefore not
notice the presence of the decorator object). Most requests made of the
decorator are forwarded directly to the decorated object. However, for some
requests the decorator adds, removes, or modifies the behavior of the
decorated object. 


Extending Interface



All objects present interfaces to clients to allow manipulation. However, it
is difficult to anticipate how clients will want to use an object, and so it
is hard to create a general-purpose interface for all possible clients.
Attempts to do so will result in large, clumsy, bloated interfaces which tend
to detract from the abstraction the object represents. 
The Extension Objects pattern describes a solution to this problem of
extending interfaces. There are many different ways to write patterns. This
one is based on the format we introduced in our book.


Extension Objects 


Category: Object Structural
Intent. Enable clients to extend the interface of an object. Additional
interfaces are defined by extension objects.
Motivation. Consider a compound- document architecture as currently promoted
by OLE 2, OpenDoc, and soon Taligent's CommonPoint. A compound document is
made of components that are managed and arranged by container components. The
infrastructure for a compound document requires a common interface to various
components such as text, graphics, or spreadsheets. Let's assume that this
interface is defined by an abstract class Component. How can we define
additional interfaces for components that allow clients to use
component-specific functionality?
You cannot define all possible operations on components as part of the general
Component abstraction. First, it is not possible to foresee all operations
that component writers would like to perform. Second, even if we could, the
result would be a bloated interface reflecting all possible uses of
components. For example, a spell checker requires a specific interface to
enumerate the words of a text component. The spell checker should operate on
any component that provides this interface, independent of its concrete type.
One solution is to provide a mechanism allowing clients to define additional
interface extensions and to let clients query whether an object provides a
certain extension. There is a spectrum of techniques for how this mechanism
can be implemented.
The key idea is to define extensions to the interface as separate objects. The
extension object implements this extension interface, knows about the object
it extends, and implements the extension interface in terms of the extended
object's interface. Extensions themselves aren't useful--there needs to be an
interface that defines which extension is provided by Component. For our
purpose, the extensions will be identified by a name. To avoid conflicts, this
name should be registered at a central place. A client can query whether a
component provides a certain extension by calling the
GetExtension(extensionName) operation. If the component provides an extension
with the given name, it returns a corresponding extension object. All
extensions are derived from an abstract Extension class which provides only a
minimal interface used to manage the extension itself. For example, it can
provide an operation that a Component can call to notify its Extensions when
it is about to be deleted.
In the case of the spell checker, we define an extension named "TextAccessor."
The corresponding interface is defined by the abstract class TextAccessor. Its
key operations are GetNextWord(), which returns the next word in the text, and
ReplaceCurrentWord(), to replace a misspelled word. Let's assume that there
are two different implementations of text components: SimpleTextComponent and
FancyTextComponent. Both components want to provide spell-checking support. To
do so, both derive their own TextAccessor subclass that implements the
interface for their text implementation. The SimpleTextComponent and
FancyTextComponent classes implement GetExtension("TextAccessor") to return
their specific extension object.
Figure 1 summarizes the class relationships using OMTnotation (abstract
operations and abstract classes are shown in italics). The implementation of
SimpleText Accessor::GetExtension(_) is shown in Example 1. The extension
object stores a reference back to the extended object. It typically implements
the extension by forwarding requests to the extended object. 
Based on this extension infrastructure, a spell checker for a compound
document is implemented as follows: Traverse the components in the document.
Ask each component for its TextAccessor extension. If the component returns a
corresponding TextAccessor extension object, use it (after down casting it to
TextAccessor) to spell check the component. Otherwise, skip the component and
move on to the next.
Applicability. You use the Extension Objects pattern when: 
You need to support the addition of new interfaces. 
An interface to an abstraction should not be tied to a specific, existing
inheritance hierarchy.
An abstract class has a large, bloated interface reflecting its use by
multiple clients.
Structure. The structure for the Extension Objects pattern is shown in Figure
2.
Participants. The participants for the Extension Objects patterns are: 
ExtendedObject(Component), which defines an interface to query whether an
object has a particular named extension. 
ConcreteExtendedObject(FancyTextComponent,SimpleTextComponent), which returns
appropriate extension objects when queried for a specific extension.
Extension(Extension), the common base class for all extensions.
SpecificExtension(TextAccessor), which defines the interface of a specific
extension.
ConcreteExtension(FancyTextAccessor, SimpleTextAccessor), which implements the
extension interface by calling operations on the ConcreteExtended object. To
do so, it maintains a reference back to the ConcreteExtended object.
Collaborations. The client negotiates with an ExtendedObject for a named
extension. If it exists, it is returned. The client subsequently uses the
extension to access additional behavior of the ExtendedObject.
Consequences. The Extension Object pattern has several consequences:
The base class ExtendedObject does not require a large interface for all its
clients. This avoids monolithic interfaces for the ExtendedObject.
Peer objects can use extension objects to negotiate a more specific interface
(it could be more efficient or support better abstractions) between them.
An interface is not attached to a particu-lar class definition. Classes
providing an interface don't have to be related through inheritance. In the
spell-checker example, text components don't have to inherit from a general
TextComponent base class, but they can still participate in spell checking.
An extended interface is more complicated to use than one which is provided by
the extended object itself. It requires more work to obtain the interface.
Implementation. The implementation must define how the extension objects are
managed by ExtendedObject. A simple solution is to store an extension object
in an instance variable that is returned to clients when the extension is
requested. An alternative is to dynamically allocate the extension on demand
when it is requested. A further variation is to provide support for clients to
attach Extensions to existing objects. In this case, the ExtendedObject has to
maintain a dictionary that maps an attached extension to its name. 
In C++, the extension returned from GetExtension has to be cast to its
corresponding extension class. If the C++ implementation provides run-time
type identification, this can be achieved with dynamic_cast. 
To permit the extension to have full access to the Extended object, it can be
declared as its friend. 
Strings are a primitive way to identify extensions. Better solutions are to
use special interface identifiers or some internalized form of strings.
An alternative implementation is to use multiple inheritance and run-time type
identification. An extension interface can be defined as a mixin class. An
object that supports a given extension inherits from this mixin class and
implements the extension. See Figure 3.
The operations of the mixin class are all abstract and have to be implemented
by derived classes. To query an object for an extension, the client uses
run-time type information. For example, in C++, the spell checker would query
a component whether or not it is (inherits from) a TextAccessor; see Example
2.
The use of a dynamic cast is often suspicious and can point out a design flaw.
However, in this case it is okay since a dynamic cast is only used to ask an
object whether it supports a certain interface.
This pattern is less important in dynamic languages like Smalltalk. These
languages typically provide enough run-time information to allow asking an
object whether it responds to a specific request.
Known uses. Support for extension interfaces is common in Compound Document
architectures. The example from the Motivation discussion is based on OpenDoc.
In OpenDoc, the common base class ODObject provides the access to the
extension interface. 
Extension objects are related to Microsoft's Component Object Model (COM) and
its QueryInterface mechanism. QueryInterface enables a client to query an
object for an interface. In COM there is no extended object to start with, and
all interfaces of an object are accessed by QueryInterface.
Related patterns. One related pattern is Adapter which adapts an existing
interface. Extension Objects provide additional interfaces.
Figure 1 Class relationships using OMTnotation.
Example 1: SimpleTextAccessor::GetExtension(...) implementation.
Extension*SimpleTextComponent::GetExtension(char* extensionName)
{
 ... if ( strcmp(extensionName, "TextAccessor") == 0)
 return new SimpleTextAccessor(this);
 else if ( strcmp(extensionName, "AnotherExtension") == 0)
 ...
 else
 return 0;
}
Figure 2 Structure for the Extension Objects pattern.
Figure 3 Using multiple inheritance and run-time type identification.

Example 2: Querying a component.
Component*component;
TextAccessor* accessor =dynamic_cast<TextAccessor*>(component);
if (accessor != 0)
 // use the text accessor interface


























































SOFTWARE AND THE LAW


Software Development Contracts




Marc E. Brown


Marc is a patent attorney and shareholder of Poms, Smith, Lande & Rose, one of
the oldest intellectual-property law firms in Los Angeles. Marc specializes in
computer law and can be contacted at meb@delphi.com.


Developing software under contract is a risky business. Software is rarely
finished on time, cost overruns are common, and something invariably turns out
not to work.
A well-written development contract is essential to maintaining a good
business relationship while you sail these stormy waters and to avoiding the
expense and embarrassment of a lawsuit. Unfortunately, software developers and
their clients often do not give careful attention to their written contract
until after a dispute arises. By then, it is usually too late.
In this column, I'll cover many of the most important clauses to consider.
Review them now, and save yourself trouble later!


Identify the Parties


You would be amazed at the number of contracts that do not clearly identify
the parties to the contract.
This is more than a mere matter of form. Only parties to a contract have the
legal obligation to perform it and the legal right to receive its benefits.
When a payment is not made, you want to be absolutely sure you know who owes
that payment. Similarly, when the software fails or is not delivered on time,
you want to be sure that you personally are not held liable when it is your
company that is receiving the payment.
The legal capacity of each party to a contract should be specified, as well as
its exact legal name. Is it a corporation? Is it a partnership? Is it an
individual doing business under a fictitious business name? All personal
assets of an individual can usually be reached to satisfy a judgment against a
fictitious name under which the individual is doing business. If a party is a
member of a larger corporate structure (for instance, a subsidiary), the
contract should also make crystal clear which entity within that structure is
the contracting party. A parent is normally not liable for the debts of its
subsidiary, nor is a subsidiary normally liable for the debts of its parent.
When contracting with a business that has limited assets, consider asking for
one or more personal guarantees.


Scope of Work


You would never go to a dealer and buy a new car without carefully reviewing
its list of options. Yet, software development often proceeds without the
parties first having made sure that they each have the same understanding of
what will be done. Such a blind approach is a time bomb waiting to explode.
Don't let it happen to you.
Detail in writing the functions the software is expected to perform and the
results it is expected to achieve. Sufficiently detailed specifications
provide clear, objective standards by which the software can be judged. The
more expensive the contract, the greater the detail. Don't forget to specify
the operating environment, including hardware (processor type, RAM size,
storage speed/size, and so on), operating system(s), and necessary (and
prohibited) peripherals.
In many cases, the development of these specifications constitutes part of the
work for which the software developer has been hired. In these situations, the
specific details are not known at the time the contract is signed;
nevertheless, general guidelines should be specified in writing. The contract
should also contain a clause requiring the developer to later present the
detailed specifications in writing to the client for review and written
approval.
The contract should address whether the developer will be providing any
services or materials beyond the software itself. For example, will the
developer be converting existing data to run with the new software? Must he
install the software, train users, or provide other user support? Must he be
available to make upgrades?
One important area often not clearly delineated concerns documentation. Must
the developer provide user manuals? Must he provide documentation sufficient
to enable another developer to continue with the development, such as
source-code listings, descriptions of subroutines, data descriptions and
standards, and flowcharts? Clients often do not realize the importance of this
material until long after the contract is signed. Developers, on the other
hand, know that these materials are expensive to provide and sometimes use
silence as a vehicle to omit them. That is a mistake. The contract should
clearly specify the documentation the developer will provide and the
documentation he will not.


Modifications


Invariably, changes in the software will be requested during development, as
well as changes in the associated materials and services that the developer is
to provide. Unless documented in writing, misunderstandings concerning the
costs and deadlines for making these changes may arise.
The contract should include a clause requiring all changes to the contract to
be recited in writing and initialized by all parties. In addition to
specifying the exact nature of the change, the writing should set forth
whether the developer will receive any additional compensation for the change
or additional time for making it. For the protection of the developer, the
contract should provide that the developer is not obligated to make any change
not agreed upon in writing. To protect the client, the contract should provide
that the developer will not be paid or receive additional time for any change
not approved by the customer in writing.


Customer Responsibilities


Cooperation with the client is often required during the development of the
software. For example, the developer may need to meet with the client's staff
to determine the operational requirements for the software and to review the
developer's proposed specifications. The developer may also need access to the
customer's equipment.
The contract should specify each required area of client cooperation. The more
detailed the specification in terms of time and frequency, the better. Without
this clause, it may be difficult for the developer to later excuse a delay in
delivery caused by a lack of cooperation on the part of the customer.


Scheduling


Another fertile area for misunderstanding is scheduling. The client often has
one expectation, the developer another. A good contract should never allow
this to occur.
Deadlines for all stages of the software development should be specified. If
the developer is to design the specifications, when will they be presented to
the customer? How long does the client have to approve them? When will the
first version of the software be ready for testing? How long will it be
tested? When will the tested software be finished, fully operational, and
ready for delivery? How long will the developer have to correct bugs that are
discovered later?



Deliverables


A specification of what is to be delivered is just as important as a
specification of when delivery is required. The contract should clearly
describe exactly what the developer is to deliver. In addition to object code,
is he to deliver source code? Also, what form will the delivery take? Will the
software and documentation be on a floppy, CD-ROM, or tape? 
Some contracts provide that the developer need not deliver the source code to
the client. This approach forces the customer to use the developer for
upgrades to the software and, as a practical matter, to fully pay for the
software to get postdelivery help. For the protection of the customer, the
contract should require the developer to deposit a copy of the source code
with an escrow with instructions to turn the source code over to the customer
if the developer breaches or is otherwise unable to continue his performance.


Payment


Payment, of course, is usually the only topic given consideration by the
developer. Naturally, the contract should be clear about it.
The contract should specify the basis of the payment. If it is a fixed-price
contract, that price, of course, should be specified. If based on time, the
hourly rates of the various individuals (or classes of individuals) should be
specified. If reimbursement for expenses can be requested, the types of
reimbursable expenses should be itemized.
The deadlines for making payment should also be clearly specified. Most
agreements provide for a series of payments, including an initial payment when
the contract is signed. For substantial projects, it is often also useful to
require the developer to provide detailed reports and invoices as a condition
to each payment. This ensures the customer that the developer is on schedule
and reduces the chance of the customer raising an objection to an allegedly
problematic matter that was clearly disclosed in an earlier report or invoice.


Validation, Verification, and Testing


Customers usually want the contract to provide that their final payment is
subject to some form of validation, verification, and testing. This is usually
a fair request which cannot easily be refused.
But safeguards can and should be built in. The contract should specify the
types of tests which will be done, the identity of the testers, who will pay
for the expense of the test, and how much time will be allowed for the test.
Most importantly, the contract should specify an objective standard by which
the success of the test can be measured.
Customers would be wise to insist upon the software being tested by someone
other than the software developer. In addition to obvious bias, the developer
often fails to operate the software in the sequences that cause a problem.
After all, the developer usually designed the software to handle all of the
operational sequences that he or she can foresee. Someone other than the
developer may well operate the software in an unforeseen sequence, and it is
this type of unforeseeable sequence that needs to be tested the most.


Ownership


The agreement should clearly specify who owns the intellectual property that
relates to the development work, as well as the tangible materials that are
received or created in connection with the developer's work. This includes
rights to copyrights, patents, and trade secrets. A variety of specialized
concerns might need to be addressed.
One such concern arises when the developer uses independent subcontractors.
Under copyright law, title to a copyright created by an independent contractor
will usually not pass to either the developer or the customer unless the
independent contractor signs an agreement promising to assign that copyright.
If the customer wants to retain ownership of all copyrights in the developed
software, the contract should require the developer to procure such written
subcontracts with each independent subcontractor. Even in the absence of a
requirement for such subcontracts, the developer would be wise to nevertheless
procure them. Otherwise, he might be unable to deliver full title to the
customer and hence be in breach of his obligation to do so.
Developers often use in-house subroutines in their software. If the developer
wishes to retain the right to continue to use these subroutines in connection
with software for other customers, the agreement should provide that ownership
of these subroutines remain with the developer. Similarly, the agreement
should excuse the developer from delivering title to underlying commercial
products that the developer has chosen to incorporate into his design.
Sometimes, the customer may want to modify the software. The agreement should
specify whether the customer has this right and, if so, who owns the resulting
work.
In general, it is normal for the customer to obtain all rights in original
software purchased for resale. On the other hand, the customer usually
receives only a license (with the developer retaining title) to software
developed merely for use by the customer. When only a use license is granted,
the contract should specify whether the customer has the right to transfer his
license to another and whether the developer is then entitled to receive
additional compensation.


Confidentiality


Customers often provide developers with highly sensitive information during
development. Developers similarly often provide customers with equally
sensitive information, sometimes including the software itself. When this
occurs, the customer, developer, or both may often want that confidential
information to be protected.
The contract should require that the customer, developer, or both use
reasonable efforts to protect the confidentiality of this information. It
should further provide that the party providing the protection will not use or
disclose the confidential information to others, except to reasonably further
his performance under the contract. All confidential information that is to be
protected should be specifically identified.
A "confidentiality" clause should be included, even when the parties have a
high degree of confidence in each other's integrity and competency. The
absence of a "confidentiality" clause is often cited by courts as a reason for
refusing to protect what otherwise would be enforceable rights in trade
secrets.


Noncompetition


Another approach often used to protect the value of the software is to include
a clause prohibiting the developer from developing similar software for a
competitor or from directly competing with the customer. These clauses will be
enforced in many states if they are reasonable--that is, if they are limited
in duration, geographic area (when appropriate), and areas of competition.
However, in some states (California, for example), even reasonable restraints
on competition will usually not be enforced. If you have made a noncompetition
promise that you no longer wish to honor, legal research can determine whether
you are likely to be bound to it.


Developer Assurances


Normally, the contract provides that the software will conform to the
agreed-upon specifications, meaning, of course, that it will work. A clause is
sometimes included also making the developer liable if the software infringes
a patent, trademark, copyright, or trade secret owned by another. Liability
may also be imposed upon a developer who fails to deliver the software on
time.
In many instances, these liabilities will be imposed upon the developer, even
if he or she does not expressly promise to assume them in the contract.


Disclaimers


Read this section carefully!

The liability imposed upon the developer may often far exceed the amount of
money the developer is paid. 
What can developers do to protect themselves? The answer can be stated in
three words: Disclaim, disclaim, disclaim!
The contract can disclaim all liability for what is known as "consequential
damages"--damages caused by defective software or its late delivery, such as
lost profits, injury to reputation, and damage to data.
The contract might also provide that the developer is to correct any
deficiencies in the software within a stated number of days after they are
brought to the developer's attention, and that such correction is the sole and
exclusive remedy that the customer has for a defect in the software. The
contract might also limit the period of time following completion of the
software during which the developer shoulders this responsibility.
Courts do not always enforce disclaimers. Usually, a court will refuse to
enforce a disclaimer for one of three reasons: 
The disclaimer was not sufficiently conspicuous.
The disclaimer did not use the right language. 
The law of the state governing the contract provides that the disclaimer is
unenforceable. For example, most states will not enforce a disclaimer when its
effect would be to excuse liability for personal injury caused by a defect in
the software.


Insurance


Another method to protect against large liabilities is to require liability
insurance to be purchased. The customer can be required to make the purchase
and to name the developer as an "additional insured" or vice versa. Sometimes,
a customer's existing commercial general-liability policy can be inexpensively
amended to name the developer as an "additional insured." The type of
liability that must be insured should also be specified.


Breach


In the absence of express language, a material breach provides the aggrieved
party with the right to stop performance. Thus, if the customer misses a
payment, the developer can usually stop work. On the other hand, if the
software is behind schedule or is not working, the customer can often stop
making payments.
These legal rights can and often are modified by the contract. For example,
the contract can be written to give the developer a stated number of days
following written notice of a defect or untimely delivery to cure that problem
before the customer can start withholding payment. The contract can similarly
give the customer a stated number of additional days to make a late payment
after receiving a late-payment notice from the developer, before the developer
can stop work. 


Liquidated Damages


It is usually difficult for both parties to the contract to predict the
damages that might arise because of a breach in performance. This is
particularly true for the developer.
To limit liability, a "liquidated damages" provision is sometimes inserted.
Such a clause specifies an exact amount of money that the customer will
receive because of certain breaches by the developer. For example, it may
provide that the customer will receive $100.00 for each day the software is
late or $500.00 for each defect in the software. If both parties are
reasonable, the use of a "liquidated damages" provision often eliminates
disputes that otherwise might arise.


Arbitration and Attorney Fees


Development contracts sometimes provide that all disputes arising in
connection with the contract must be resolved by arbitration, rather than
court litigation. Arbitration is usually faster and far less expensive than
court litigation. 
The contract can also contain a clause providing the prevailing party with an
award of reasonable attorney fees, in addition to all other relief, whether
the dispute is resolved by arbitration or court litigation.
It is often difficult to predict whether either of these clauses will be
beneficial. Among the factors considered are the relative wealth of the
parties and the perceived likelihood of one side being more litigious than the
other.


Merger Clause


Most good contracts contain what is called a "merger clause." This clause
provides that the software-development contract contains all terms of the
contract between the parties and all representations that each is making. The
clause can also provide that any statements that may have been made in the
past by either party do not form a part of the contract and that no party is
relying upon any such past statement in entering into the contract, unless it
is expressed in the contract. Finally, the clause usually provides that no
modification to the contract will be effective unless the modification is in
writing and signed by both parties.
The value of this clause is obvious. Don't overlook it!


Conclusion


A carefully thought-out contract should lie at the foundation of every
independent software-development project. At all times, be comprehensive,
specific, and reasonable. If these three rules are followed, the contract will
nurture the business relationship like a marriage counselor and, if necessary,
will pave the way for a clean divorce without expensive and embarrassing
litigation.














EDITORIAL


GIFgate


It's too bad that analysis of Intel's Pentium debacle hasn't had time to make
it into the Business 101 textbooks. If it had, then CompuServe's MBAs might
have avoided "GIFgate," a fiasco involving CompuServe's GIF file format and
the Unisys-patented LZW compression algorithm. Although still unfolding, the
story has had more twists and turns than Newt Gingrich explaining his book
deal. 
To recap: During the post-Christmas holiday lull, CompuServe shocked
developers who use the Graphics Interchange Format (GIF) with the announcement
that they had to register their use by January 10, 1995, and begin paying
royalties of 1.5 percent or $.15/unit, whichever is greater. The demand was
based on an agreement hammered out between CompuServe and Unisys more than six
months earlier. GIF, which is copyrighted by CompuServe, is built upon
Unisys's LZW compression algorithm. 
According to a prepared statement, upon learning of GIF's use of LZW in 1992,
Unisys "immediately" began negotiations with CompuServe. A licensing agreement
was subsequently reached in June 1994, which required CompuServe to pay Unisys
a royalty of 1 percent of the average selling price (or about $.11/copy)
charged for the CompuServe Information Manager connection software.
Additionally, CompuServe had to pay a one-time fee of $125,000 for past use,
and, we've since been told, an ongoing $5000 monthly fee. As part of the
agreement, CompuServe also got the rights to relicense LZW technology to
commercial developers who use the GIF specification in software that connects
directly to the CompuServe information service. 
Money aside, one problem developers had with this announcement is that the GIF
specification has been publicly available for years--since 1987, in fact--and
CompuServe has encouraged its free use by developers. The only string attached
to developer's use of GIF was that source code implementing the GIF spec must
maintain CompuServe's copyright notice. Through CompuServe's encouragement,
GIF has been widely implemented, becoming the de facto file format for
graphics interchange on the Internet as well. Now, said the information
service, it's time to pay the piper.
Developers immediately raised a ruckus, along with a number of serious
questions needing answers. Feeling the heat of a coming firestorm, Unisys
quickly tried to put the onus on CompuServe. For instance, in response to the
charge that CompuServe was in violation of a six-month implementation
agreement, Unisys stated that CompuServe asked for, and was granted, a
one-month extension. Unisys also made it clear that the agreement did not
require CompuServe to relicense LZW technology--CompuServe did so at its own
discretion.
Still, a number of serious questions have yet to be adequately answered. For
instance, do the terms of CompuServe's developer agreement suggest that GIF
can only be used to support CompuServe-related software? (Maybe that's the
intent, since confining GIF to CompuServe would put the brakes on Internet
browsers such as Mosaic, thereby buying CompuServe time in combating what's
become its biggest rival.) 
The latest upshot, as of this writing anyway, is that CompuServe is proposing
"GIF24," a free-of-charge update to the current GIF89a specification. The good
news is that GIF24 is supposed to be based on Huffman encoding, not LZW. The
bad news is that the GIF24 spec won't be available until the end of the year. 
In the meantime, Unisys believes it has discovered a gold mine. The company is
actively going after online-related developers who write commercially
available LZW-based software.
In the long term, GIF as we know it today will likely go by the wayside, as
developers turn to nonpatented compression alternatives such as Huffman
encoding. Pat Clawson of TeleGrafix has proposed one such alternative for GIF
files. However, before the algorithm can be implemented for GIF, he'll need
CompuServe's permission. CompuServe's response will tell a lot about what kind
of company it really is.
Jonathan Erickson
Editor-in-chief












































Windows Apps and Exception Handlers


Enhancing an already powerful debugging tool




Joe Hlavaty


Joe is a programmer at a major hardware vendor. He is a graduate of Georgetown
University and currently lives in the Washington, D.C. area. Joes can be
contacted at jhlavaty@aol.com.


In my article, "Exception Handlers and Windows Applications" (Dr. Dobb's
Sourcebook of Windows Programming, Fall 1994), I discussed issues relative to
Windows exception handlers, including the System VM, DPMI, and numerous
protected-mode concepts such as selectors. In doing so, I presented TrapMan, a
Windows debugging tool for analyzing exceptions in Windows applications. In
this article, I'll enhance TrapMan by adding features to the trap handlers for
displaying exception registers, dumping the exception stack, identifying the
faulting application, and more.
A (usually) nonfatal exception is Interrupt 11 (Trap B) or Segment Not
Present. The Windows kernel processes this message when demand loading
segments of Windows applications. TrapMan watches these segments because it
can be useful when you need to wait until a segment is loaded to set a
breakpoint for debugging. It also gives you some idea of how often Windows
environments are processing exceptions under the covers, even in small
applications. Note that Interrupt 11 is very different from Interrupt 14 (Page
Fault). Page Faults are handled by WIN386, while Segment Not Present faults
are handled by KRNL386.EXE and friends. I will not discuss Page Faults here.
TrapMan is a Windows application that should run in any protected-mode version
of the 16-bit Windows environment, including Win-OS/2 2.1 and 2.11. I've even
run it under NT's WOW, the multithreaded DOS-box subsystem for 16-bit apps.
Additionally, if you wish to debug a currently faulting application caught in
one of TrapMan's handlers, you will need to be running with a debugger capable
of processing unowned Int 3hs in code, as Trapman uses the Intel INT3
instruction to return control to a waiting debugger. I prefer Nu-Mega's
(Nashua, NH) Soft-Ice for Windows for debugging DOS-based versions of the
Windows environment and the OS/2 kernel debugger for debugging OS/2-based
versions (I use both on almost a daily basis). Unlike some debuggers, both of
these can handle an Int 3h instruction that they themselves did not place in
the code. 
Intel documentation often refers to exceptions as "interrupts," Windows refers
to them as "faults," and OS/2, as "traps." Interrupt is followed by a decimal
number, trap, by a hex number, and fault is usually preceded by the name of
the exception. In other words, Interrupt 13, General Protection Fault, and
Trap D are all the same thing when discussing exceptions. For this article,
I'll generally use the Windows versions of these names to avoid questions
about the base of numbers given. 


TrapMan Background


I developed TrapMan with Microsoft C 6.x, the Windows 3.1 SDK, and a MASM
5.1-compatible assembler. It will run under Windows 3.0 but requires COMMDLG
for its SAVEAS and OPEN dialogs. TrapMan will show you how to use DPMI calls
to replace the default Windows and Win-OS/2 handlers in order to provide an
application-specific level of depth in debugging information while running
under retail Windows or Win-OS/2. All of the source code, including related
files and executables, is available electronically; see "Availability," page
3.
Fatal exceptions such as Stack Faults generally cause the operating system to
terminate the task producing the exception. A nonfatal exception permits the
task to continue at the instruction causing the fault after the operating
system has processed the fault so that the current instruction will no longer
cause an exception. An example would be a Segment Not Present fault. It is
possible to write an exception handler in C, as in Listing One. This handler
simply calls the Windows API DebugBreak() to interrupt to a waiting debugger
(if available), and then calls FatalExit() to exit the faulting task. Notice
that the handler does not attempt any access to program data. You can see why
by looking at the mixed listing file created by the compiler for the handler
in Listing Two. Notice that no segment registers are set in this routine.
While CS is set through the action of calling this handler, no other segment
registers are valid. These segment registers are set during C initialization
and are not normally changed during the "life" of a Windows program. If the
handler attempted to access C data, the handler would very likely GPFault (as
DS is a random, possibly completely invalid value), which would cause the
handler to be called repeatedly. (For more details, see Windows Internals, by
Matt Pietrek, Addison-Wesley, 1993.)
The HANDLER example in the Windows SDK shows one way to make sure DS is valid.
HANDLER sets an interrupt handler and guarantees accessibility to DS by
exporting the interrupt handler. You need to be sure that your C compiler is
generating correct code for a Windows prolog so that the Windows loader will
set DS correctly (or you may need to call MakeProcInstance() yourself to force
DS to be correct). 
The easier way to make sure DS is valid is simply not to use DS! This is the
technique used in TrapMan's handlers, which store information that they need
to access at exception time inside the handler code segments. In other words,
TrapMan is self-modifying code (even though the changes are mostly data). As
code segments are shared across multiple instances, any modifications made by
the second or greater instances would modify the data for all instances,
including the first; therefore, only one instance of TrapMan is permitted.
In order to make TrapMan's exception handlers as flexible as possible, I wrote
the handlers in assembly language. The other modules (window functions and the
like, the scaffolding of TrapMan) are written in C to make that code as
uncomplicated as possible.
Lastly, TrapMan makes extensive use of 16-bit DPMI services to monitor
exceptions, and will only work in those systems that supply a 16-bit DPMI host
of at least the 0.90 level (as do all 16-bit versions of Windows and
Win-OS/2). Of course, such techniques should also work in protected-mode DOS
applications, provided a DPMI host meeting these requirements is available,
but such applications will not be discussed in this article.


Why Exception Handlers?


While Windows (and Win-OS/2) already supply their own exception handlers,
these handlers do not help you gather information on the fault. In many cases,
only the address of the faulting instruction is available. While helpful, a
raw address is not very useful when taken out of context. Wouldn't it be nice
to have registers, flags, or a stack dump--even without a debugger? TrapMan
will help you do this. Is it dangerous to set exception handlers directly from
an executable? The Windows 3.1 Guide to Programming states:
Because interrupts can occur at any time, not just during the execution of the
application that is using the device, device interrupt-handling code must be
in a fixed segment.
This also applies to exception handlers.
Likewise, the Windows 3.1 Multimedia Programmer's Reference notes that
interrupt (and so, exception) handlers "must reside in a DLL," and the handler
data and code segments "must be specified as FIXED." 
The danger is that the code or data needed for the exception handler might
have been discarded. While Windows 386 Enhanced mode and OS/2 both support
paging, their underlying Windows systems do not. Windows and Win-OS/2 use
segment-level linear memory; code and data are loaded on a per segment basis
(via Interrupt 11, Segment Not Present fault).
The Intel documentation specifically states that any two Contributory
Exceptions (Divide By Zero, Segment Not Present, Stack Fault, or GP Fault)
will generate a Double Fault, and the processor will enter shutdown mode. (See
the i486 Processor Programmer's Manual, Table 9-4.) In other words, if an
application GPFaults and the processor attempts to demand load the handler
segment, a double fault would result.
If it really worries you, put your exception handlers in a DLL with FIXED
segments as Microsoft requires. Another possibility is to page lock the memory
via GlobalPageLock(), but this function is only available in Windows Enhanced
mode and is not available in all versions of the Windows operating
environment. I have left TrapMan's handlers in an application instead of a
library for simplicity's sake, with the code PRELOAD and NONDISCARDABLE. (For
a very readable discussion on memory-segment attributes like PRELOAD, refer to
Pietrek.) You'll see sections of TrapMan's handlers where the handlers check
for code movement (because application code is always MOVEABLE), but this
isn't too time-consuming.
Should you use exception handlers all the time?
No. It is probably best to leave the current Windows handlers alone,
especially if you are shipping a retail version of your application. The
Windows handlers, while not useful for debugging, are generic and work well
with all Windows applications.
If you are writing a debugging tool such as TrapMan, you will probably want to
replace the Windows handlers. TrapMan will (at the user's option) either
replace the Windows handler or hook it. Replacing a Windows exception handler
means that only TrapMan's handler will be active. Hooking the Windows handler
means that TrapMan will call the original Windows handler after first
preprocessing the Windows exception. Of course, replacing a Windows or
Win-OS/2 handler does not remove the handler from memory; the DPMI host simply
calls us instead of them.
If you write a regular (nondebug) application, you must be careful to only
install your handlers by user choice. It should then be perfectly okay to ship
exception handlers in your application (in testing TrapMan, I've run into at
least one mainstream Windows application that did so). However, I would
certainly not leave them in by default.
I envision the following scenario: A user calls your support line to report a
problem; the support team has the customer turn on exception handling and
reproduce the problem. The customer ships you a file with exception
information. You fix the bug! Easy, huh? Exception handling is not something
you'd want to leave on all the time. If everybody replaced the Windows
handlers unnecessarily, who knows what would happen?


Setup for the Handlers


One of the first things TrapMan does at startup is set the default values for
this debugging session (see Listing Three). This routine sets up handlers for
the exceptions our user wishes to watch. These values are currently
hard-coded, but the code could easily be changed to read defaults from an .INI
file. The options are as follows:
Save trap settings. Exiting TrapMan will cause this session's settings to
become the default.
Nuke app. Trapping applications will be terminated. Don't turn this option off
unless you are calling the Windows handlers, or a faulting application will
fault endlessly as the trap handlers aren't recovering from the trap.
Call PrevHandler. Calls the handler of a fault that was active when TrapMan
added its own handlers. This usually means the handlers in the Windows kernel
will be called. Note that this is post-processing. TrapMan's handlers will
have already processed the exception by the point at which we call the
previous handler.
Break on fault. TrapMan will attempt to break to a debugger via Int 3h at
fault time. Application fault registers are preserved except for CS:IP and
SS:SP (which are available on the DPMI exception frame).

Beep on fault. Beeps to let you know that a fault has occurred. This reminds
you to look at your debugger. If you're using the Break on Fault option,
TrapMan will be unable to paint the edit control with the debug information
until you've released your debugger with a GO command.
Intercept OutputDebugString(). Allows TrapMan to avoid the need for a serial
connection to write debug information with OutputDebugString()--no more CANNOT
WRITE TO DEVICE AUX messages! Note that this is a replacement and not a hook;
the original Windows kernel routine is not called (although the original call
will be restored if you uncheck this option from the menu).
Add CRLF to ODS() strings. Adding a carriage return/line feed to
OutputDebugString() arguments improves readability in the edit control. If the
arguments already have CRLFs, then this option is unnecessary and should not
be used.
DebugBreak() on <PrntScrn>. This would be a nice hook into a debugger, but I
haven't had time to implement it.
The handlers for exceptions that TrapMan will watch are also installed at this
point. Set default handlers for GPFault, Stack Fault, Invalid Op Code Fault,
and Divide By Zero. Additional exceptions can be handled at user request. All
program options are set in standard fashion with SendMessage() using the
appropriate WM_COMMAND for the option. TrapMan also sets two global variables
here that are handles to the Trap and Options menus for use during WM_COMMAND
processing, which requires access to TrapMan's menu to check and uncheck
options. This WM_COMMAND processing is a standard method of simulating user
menu input in Windows programs. It allows you to use the same logic to set
internal variables from within the program as you use to process external user
requests.
For debugging purposes, TrapMan also can launch a single application from the
command line. For convenience, TrapMan uses standard C argument processing to
extract these values (see Listing Four). This code may be specific to your C
library startup source code. Please check your compiler for more information.
The current implementation works with Microsoft C 6.x.
Using SendMessage() guarantees that our exception handlers are set before any
application the user gave on the command line is launched (thus, any faults
that occur on launch of the application are caught by TrapMan). You should do
something similar in your application to ensure that your handlers are
available as soon as needed. Remember that SendMessage() is processed through
an immediate call to your window procedure, while PostMessage() messages are
handled later.
As mentioned previously, handlers store information in their code segments
that is needed during exception processing. You'll find TrapMan's SetVars()
routine in Listing Five. Currently, TrapMan creates and stores a DS alias to
our HANDLER code segment and also stores the DS value for TrapMan. Both of
these values will be accessed via CS from within exception handlers.


Handlers for Fatal Exceptions


Fatal exceptions include GPFault, Stack Fault, Invalid Op Code fault, and
Divide by Zero. Note that the code in TrapMan's fatal exception handlers could
be rewritten to take up less space, if necessary. You could assign specific
entry points to each fatal exception and have them display exception-specific
information (a text string, for example). The specific entry points could then
jump to a generic handler for the rest of the exception. TrapMan's handlers
are small enough that I felt that optimizing them would make them harder to
understand, and only slightly more efficient.
Fatal exception handlers begin with a call to the TELLDEBUGGER macro (see
Listing Eight). This macro is responsible for the majority of information
output to the user. For the moment, I'll discuss the Invalid OpCode exception
handler found in Listing Six. The TELLDEBUGGER macro first saves the processor
flags and calls the SAVEREGS macro, which saves all general 16-bit registers
except the SP, IP, and CS registers. The SP register will be preserved through
the normal maintenance of the stack in the handler; the CS and IP registers
are not saved due to their nature--CS can only be changed through RET/
JMP/CALL instructions and the like. One reason for using the SAVEREGS macro is
to save the registers in a more understandable format than the PUSHA
instruction, which saves the general-purpose registers in the order of AX, CX,
DX, BX according to the Intel i486 programmer's guide. This macro saves ten
(decimal) words on the stack. The registers are restored by a call to the
UNSAVEREGS macro.
Next, the TELLDEBUGGER macro checks to see if TrapMan is to call the Windows
MessageBeep() function to notify the user of an error. If the wBeepOnTrap flag
is set, then MessageBeep() is called. After this, the trap message is
displayed in the edit control. (This is done by calling our replacement
procedure for the Windows OutputDebugString() API, which can be called
regardless of whether or not we are currently replacing the
OutputDebugString() API.) In this case, the message is "Trap 6!".
At this point, the TELLDEBUGGER macro isolates the exception to a task. This
is done through a call to GetCurrentTask(). The GetCurrentTask() API returns a
Task Data Base (TDB). We'll extract the Module Data Base (MDB) from the TDB
and then extract the fully qualified path to the executable, which is then
displayed in the edit control. (See Undocumented Windows, by Andrew Schulman
et al.)
Next, the TELLDEBUGGER macro displays the DPMI exception frame in the edit
control (through a call to _PrintOutFaultFrame()). Note that the argument to
this procedure is a near pointer to the beginning of the fault frame (which is
assumed to be on the stack and thus relative to SS).
At this point, the TELLDEBUGGER macro restores the application registers
through a call to UNSAVEREGS in preparation for dumping the registers to the
edit control. Of course, as the act of dumping the registers might destroy
some of them, we do an immediate SAVEREGS before calling
_PrintOutPointerData() to display the application stack. This procedure simply
takes a far pointer (which must be valid!) that will be displayed in the edit
control. We then call the UNSAVEREGS macro, restore the processor flags, and
the TELLDEBUGGER macro is done.
This is the most complicated portion of the handler; at this point only a few
things remain to be done. For starters, the macro BREAKIFUSERWANTS is called.
This macro will result in an Int 3h instruction (to break to a waiting
debugger) if the Break On Fault option is checked. Then the value of Call Prev
Handler is tested; if checked, the previous handler is called and our
processing of this exception ends. Finally, if we did not need to call the
previous handler, then TrapMan is responsible for bringing down the faulting
task. It does this by a call to the NUKEAPPCHECK macro, which checks the state
of the wNukeApp variable. The NUKEAPPCHECK macro will call FORCEAPPEXIT to
bring down the current task if the wNukeApp variable is set. It does this by
resetting the Faulting CS and Faulting IP fields of the DPMI exception frame
to point to the ThisAppIsHistory() procedure in TrapMan. The task will be
terminated when control is returned to DPMI. 
Be careful! TrapMan will permit you to disable faulting-application
termination without calling a previous handler (in other words, both the Call
Prev Handler and Nuke App settings are unchecked). If an application faults
while these settings are in effect, the faulting application will fault
continuously. Someone must always process the exception! GPFaults (and most
other exceptions) are restartable. Upon your handler's return to DPMI, Windows
will begin execution at the current CS:IP, which will be whatever instruction
is faulting, unless a handler resets it.


Handlers for Nonfatal Exceptions


Nonfatal exceptions, which are normal in the course of program execution, can
be thought of as "requests for work" by the operating system. Two normal
requests for work would be Interrupt 14 (Page Fault) and Interrupt 11 (Segment
Not Present). While 16-bit Windows doesn't really concern itself with page
faults (which are the responsibility of a ring 0 VxD under DOS Windows or the
OS/2 kernel under OS/2), Segment Not Present faults are frequent. Problems in
demand loading segments lead to the infamous "SEGMENT LOAD FAILURE" message
from the Windows kernel.
Nonfatal exceptions require different processing than fatal exceptions.
TrapMan does not process these nonfatal exceptions itself. The hooks are only
for informational purposes. The original (Windows or Win-OS/2) handler is
always called, and all registers (including flags) must be preserved in our
handlers for these exceptions.
As an example of how a nonfatal handler might be written, let's take a look at
TrapMan's Segment Not Present fault (Trap B) handler (see Listing Seven). The
first important section of code resets the base of the data alias to the
HANDLER code segment. You ensure that the alias points to the same address
(that is, has the same base address) as the HANDLER code segment by simply
setting the base of the alias to that of the code segment. (This is only
necessary because our code is in an executable and cannot be FIXED. If Windows
decides to move the location of the HANDLER segment after SetVars() allocates
the alias, then the alias will be out of sync with the original selector, and
nothing good will result.)
Unlike other (fatal) fault handlers, this nonfatal fault handler does not
begin with a call to the TELLDEBUGGER macro (Listing Eight). The TELLDEBUGGER
macro dumps out information to the user such as fault location, stack, and
DPMI-exception frame. In the case of a nonfatal exception, most of this
information is not necessary, and only part of the TELLDEBUGGER function is
used (in-line) in this procedure.
The most important portion of the handler is that it is responsible for
copying the value of _Prev11 (a variable in TrapMan's auto DS) to the variable
MyFarProc in HANDLER's CS. This is necessary because only CS will be valid
when you call the previous Windows handler to process the Segment Not Present
fault. All other registers will be those of the application causing the fault.
None of this would be necessary at fault time if _Prev11 and other variables
were CS variables and directly accessible to the handlers; see Listing Nine.
After all of this, TrapMan passes the nonfatal exception on to Windows by
jumping to the contents of MyFarProc (which contains the address of the
appropriate Windows or Win-OS/2 handler) with the jmp dword ptr cs:MyFarProc
instruction. 
There are two important points here: 
All registers at this point (except CS:IP, which will be set by the JMP
instruction) must be set at the values they contained when DPMI called
TrapMan.
The stack pointer (SP) must point to the beginning of the exception frame from
DPMI on entrance to the native handler (this is why you must JMP to the
previous handler; a CALL FAR PTR would have pushed the return address of the
handler onto the stack below the DPMI exception frame and the native handler
would have failed).
Where to Go from Here?
You'll probably notice that Windows parameter-validation faults are caught by
TrapMan as straight GPFaults. By default, debug information is taken, and the
application causing the parameter-validation fault is terminated. For now, if
you need to bypass parameter-validation errors (by passing them on to the
Windows kernels), simply make sure that Call Prev Handler is checked. Windows
or Win-OS/2 will then process the parameter validation normally. 


References


The DPMI Committee. DOS Protected Mode Interface (DPMI) Specification, ver.
0.9. Intel Corp., 1990.
Duncan, Ray. Power Programming with Microsoft Macro Assembler. Redmond, WA:
Microsoft Press, 1992.
Guide to Programming. Microsoft Corp., 1992.
i486 Processor Programmer's Reference Manual. Intel Corp., 1990.
Lafore, Robert. Assembly Language Primer for the IBM PC & XT. New York, NY:New
American Library, 1984.
Multimedia Programmer's Reference. Microsoft Corp., 1992.
Pietrek, Matt. Windows Internals. Reading, MA: Addison-Wesley, 1993.
Schulman, A., D. Maxey, and M. Pietrek. Undocumented Windows. Reading, MA:
Addison-Wesley, 1992.
Socha, John, and Peter Norton. Assembly Language for the PC, Third Edition.
Carmel, IN: Brady, 1992.
Thielen, David, and Bryan Woodruff. Writing Windows Virtual Device Drivers.
Reading, MA: Addison-Wesley, 1994.
Virtual Device Adaptation Guide. Microsoft Corp., 1992.

Listing One 

#include <windows.h>
void _far MyGPProc() // handler for Trap D (13 decimal)
{

 DebugBreak() ; // break to a waiting debugger
 // Equivalent to _asm int 3h
 FatalExit( 13 ) ; // exit the faulting task
}



Listing Two

;*** #include <windows.h>
; Line 1
;***
;*** void _far MyGPProc() // handler for Trap D (13 decimal)
;*** {
; Line 4
 PUBLIC _MyGPProc
_MyGPProc PROC FAR
;*** DebugBreak() ; // break to a waiting debugger
; Line 5
 *** 000000 9a 00 00 00 00 call FAR PTR DEBUGBREAK
;*** // Equivalent to _asm int 3h
;*** FatalExit( 13 ) ; // exit the faulting task

; Line 7
 *** 000005 b8 0d 00 mov ax,13
 *** 000008 50 push ax
 *** 000009 9a 00 00 00 00 call FAR PTR FATALEXIT
;*** }
; Line 8
 *** 00000e cb ret 
 *** 00000f 90 nop 
_MyGPProc ENDP



Listing Three

#define OPTIONMENU 2 // the THIRD pull down
#define TRAPMENU 1 // the SECOND pull down (0 is first)

hwndOptionMenu = GetSubMenu(GetMenu(hwnd), OPTIONMENU) ;
hwndTrapMenu = GetSubMenu (GetMenu(hwnd), TRAPMENU);

SetJumpToPrevHandler( 0 ) ; // DON'T jump to the previous fault handler

SetVars ( hInstance ) ;
SendMessage( hwnd, WM_COMMAND, IDM_NUKEAPP, 0L) ; 
 // DO nuke faulting app
SendMessage( hwnd, WM_COMMAND, IDM_BREAKONTRAP, 0L); 
 // DO break to debugger
SendMessage( hwnd, WM_COMMAND, IDM_BEEPONTRAP, 0L) ; 
 // DO beep on faults
SendMessage( hwnd, WM_COMMAND, IDM_DEFAULT, 0L) ; 
 // Watch default exceptions
SendMessage( hwnd, WM_COMMAND, IDM_ODSCRLF, 0L) ;



Listing Four


// The following is for standard C main() args using MSC 6. Please review
// your compiler startup source code to see what is the proper name for
// the argc/argv globals for your compiler
 #define argc __argc
 #define argv __argv
 extern int argc ;
 extern char **argv ;

 if (argc > 1) {
 int rc ;


 rc = WinExec(argv[1], SW_SHOW) ;
 if (rc < 0x20) {
 /* WARNING: As wsprintf is a vararg call, it cannot be
 ** fully prototyped. If you give an argument that should
 ** be a far pointer, you must cast it (as we use LPSTR below)
 ** to force the compiler to pass the argument as a far
 ** pointer. Otherwise you're likely to get garbage or trap
 ** instead of the text in szBuffer that you wanted
 */
 wsprintf( szBuffer, "Cannot load '%s', error code=%d",
 (LPSTR) argv[1], rc) ;
 MessageBox(NULL, szBuffer, "TrapMan", MB_ICONEXCLAMATION MB_OK) ;
 }
 }



Listing Five

; RW == read/write, RE = read/execute

_SetVars proc far
 push es
 cmp cs:wHandlerDS, 0 ; if (0 == wHandlerDS)
 jnz @F
 CREATEALIASDESCRIPTOR cs ; then
 jc SV_done ; allocate a data (RW) selector to our
 ; code segment (RE) via DPMI call to
 mov es, ax ; CreateAliasDescriptor
 assume es:handler
 mov ES:wHandlerDS, ax ; save in CS variable wHandlerDS

 jmp SV_DSset
 @@: ; else
 mov ax, cs:wHandlerDS
 mov es, ax
 assume es:handler
SV_DSset: ; endif
 mov ax, ds
 mov es:wTrapManDS, ax ; save TrapMan DS in CS variable wTrapManDS
SV_done:
 assume es:nothing
 pop es
 ret
_SetVars endp




Listing Six

;int _far MyInvalidOpProc()
_MyInvalidOpProc proc far
 TELLDEBUGGER <cs>, <msgTrap6>

 BREAKIFUSERWANTS
 JmpPrevHandler <_GetJumpToPrevHandler>, <_Prev6>
 NUKEAPPCHECK
 ret
_MyInvalidOpProc endp



Listing Seven

;void _far MySegNotPresentProc()
_MySegNotPresentProc proc far
 pushf ; save flags
 SAVEREGS ; save registers of faulting process

; the following is a modified version of the TELLDEBUGGER macro
; As SegNotPresent is a non-fatal exception, register and stack
; dumps are not necessary and will not be done
 push ax
 mov ax, offset msgTrapB
 push cs
 push ax
 call far ptr MyODS ; inform user of SegNotPresent fault
 pop ax
 call far ptr GetCurrentTask

 ; AX now contains a task data block
 push ds
 push bx
 mov ds, ax
 mov bx, 1eh ; offset of NE header for the current module
 mov ax, word ptr ds:[bx]
 mov ds, ax
 mov ax, word ptr ds:[0ah] ; get offset to path
 add ax, 8h ; ... add 8 because we have to!
 push ds
 push ax
 call far ptr MyODS ; put in edit control

 pop bx
 pop ds
 mov ax, word ptr CS:wTrapManDS
 mov ds, ax ; assume ds: data
 mov ax, sp ; AX = current stack pointer
 add ax, 14h ; skip saved regs...
 add ax, 2h ; skip flags on stack
 push ax ; SS:AX == far pointer to DPMI exception frame
 call far ptr _PrintOutFaultFrame

 UNSAVEREGS ; we've now reset ALL REGS (except CS:IP/SS:SP)
 ; to their values at the time of the fault

 ; Flags are still on the stack

 push ds ; save DS register

 push bx
 mov bx, wTrapManDS ; get access to our data segment
 mov ds, bx
 assume ds: data
 pop bx

 push ax
 push bx
 push cx
 push dx

 mov bx, wHandlerDS ; is our CS alias set?
 cmp bx, 0 ; 0 = NO, so skip this!
 jz @F
 ; wHandlerDS is non-zero
 GETSEGMENTBASEADDRESS <cs> ; Get CS base address
 SETSEGMENTBASEADDRESS <bx>, <cx>, <dx> ; make sure our alias points
 ; to CS base in case CS has
 ; moved. Otherwise our data
 ; will be unaddressable.
 @@:

 mov bx, offset _Prev11 ; dword (Windows Interrupt B handler)
 mov ax, word ptr DS:[bx][2] ; Sel of Windows handler
 mov bx, word ptr DS:[bx] ; Offset of Windows handler
 ; AX:BX = windows handler


 mov cx, wHandlerDS
 mov ds, cx ; DS now our CS alias
 assume ds: handler

 push bx
 pop cx ; now AX:CX = Windows handler
 mov bx, offset MyFarProc ; DS:BX now points to MyFarProc
 mov word ptr DS:[bx][2], ax ; save Sel of Windows handler
 mov word ptr DS:[bx], cx ; save Offset of Windows handler

 pop dx
 pop cx
 pop bx
 pop ax

 pop ds
 popf

 jmp dword ptr cs:MyFarProc ; SP now points to the DPMI exception frame
 ; that we had on entry
 retf ; this retf will never get executed as
 ; the Windows kernel RETF at the end of
 ; the handler will return to DPMI for us
_MySegNotPresentProc endp




Listing Eight

; the major work of the handlers -- includes the following steps: a) save
; registers of faulting process, b) call messagebeep(), c) display type of
; fault message, d) get faulting module, e) dump DPMI exception frame,
; f) dump registers, g) dump stack
TELLDEBUGGER MACRO sel, var
local around, arounddata, datalabel
 pushf
 SAVEREGS ; save registers of faulting process
 push ax
 call far ptr _GetMessageBeep ; see if user wants us to beep
 cmp ax, 0
 pop ax
 jz around ; if 0, then don't beep
 push ax

 xor ax, ax
 push ax
 call far ptr MessageBeep
 pop ax
around:
;; Second, display message in edit control... remember to set DS
 push ax
 mov ax, offset var ; sel:var = far pointer of message to display
 push sel
 push ax
 call far ptr MyODS ; put in edit control...
 pop ax
;; Thirdly, find a module to blame
;;---------------------
 call far ptr GetCurrentTask
 ; AX now contains a task data block
 push ds
 push bx
 mov ds, ax
 mov bx, 1eh ; offset of NE header for the current module
 mov ax, word ptr ds:[bx]
 mov ds, ax
 mov ax, word ptr ds:[0ah] ; get offset to path
 add ax, 8h ; ... add 8 because we have to!
 push ds
 push ax
 call far ptr MyODS ; put in edit control

 pop bx
 pop ds
;;---------------------
;; Fourth, now dump the faulting frame...
;;---------------------
 mov ax, word ptr CS:wTrapManDS
 mov ds, ax ; assume ds: data
 mov ax, sp ; AX = current stack pointer
 add ax, 14h ; skip saved regs...
 add ax, 2h ; skip flags on stack
 push ax ; SS:AX == far pointer to DPMI exception frame
 call far ptr _PrintOutFaultFrame
;;---------------------
;; Fifth, print out the faulting regs...

 ;; the above call has destroyed app registers
 UNSAVEREGS ;; so unsave regs of faulting process
 SAVEREGS ;; note that we must save them again in case
 ;; the user wishes to break to a debugger
 call far ptr _PrintOutFaultRegs ;; CS:IP, SS:SP are bad all other registers
 ;; are valid
;;---------------------
;; Sixth, dump stack of faulting app...
 nop
 mov bx, sp ; BX = stack pointer
 add bx, 14h ; skip saved regs (pushed by the SAVEREGS macro)
 add bx, 2h ; skip flags on stack
 ; SS:BX now points to DPMI fault frame

 mov ax, ss:[bx][0Eh] ; [bx][0eh] = segment of faulting app's stack
 push ax
 mov ax, ss:[bx][0Ch] ; [bx][0ch] = offset of faulting app's stack
 push ax
 call far ptr _PrintOutPointerData

;; end default (retail and debug) processing
;; Seventh, (DEBUG only), write to AUX device
ifdef DEBUG
 push ax
 mov ax, offset var
 push sel
 push ax
 call OutputDebugString
 pop ax
endif

 UNSAVEREGS ;; we've now reset ALL REGS (except CS:IP/SS:SP)
 ;; to their values at the time of the fault
 ;; in case our user wants to break
 popf ;; and reset the flags!
ENDM



Listing Nine

mov bx, offset _Prev11 ; dword (Windows Interrupt B
 ; handler)
mov ax, word ptr DS:[bx][2] ; Sel of Windows handler
mov bx, word ptr DS:[bx] ; Offset of Windows handler

mov cx, wHandlerDS
mov ds, cx ; DS now points to our DS alias 
 ; assumes ds, handler of 
 ; our HANDLER segment
push bx
pop cx
mov bx, offset MyFarProc ; Update MyFarProc -- a CS DWORD
mov word ptr DS:[bx][2], ax ; Sel of Windows handler
mov word ptr DS:[bx], cx ; Offset of Windows handler








Simplifying Windows Development


Your own Windows C/C++ toolkit




Al Williams


Al is the author of several books, including OLE 2.0 and DDE Distilled and
Commando Windows Programming (both from Addison-Wesley). You can reach Al on
CompuServe at 72010,3574.


Because most Windows applications are so similar, programmers often start with
a skeletal Windows program and add to it. But since Windows is object
oriented, you may ask, why not encapsulate these common pieces of code? This
is the idea behind C++ class libraries like the Microsoft Foundation Classes
(MFC) and Borland's ObjectWindows Library (OWL). However, these class
libraries have limitations. For one thing, you must embrace the programming
philosophy behind the library, as well as use C++. Also, it's difficult to
adapt existing code to work with these frameworks. 
To deal with such issues, I've created a toolkit called "CoolWorx" that uses
the object-oriented nature of Windows to simplify application programming. You
can use the library from C or C++. CoolWorx allows you to automatically create
and customize window classes on the fly. The library includes an integrated
event loop suitable for nearly all applications, encapsulation of common
message-handling code, tool and status bars that automatically manage
themselves, and a text-editor software component. Figure 1 shows the main
screen of an editor written using CoolWorx. Even though the editor is less
than 200 lines of code, it still contains full clipboard support, file I/O, a
tool bar, and a status bar.


A Typical Windows Program


Figure 2 shows the basic flow of nearly every ordinary Windows program. The
application registers its window classes, creates some windows, and enters an
event loop. Each window class supplies a callback for its windows. These
callbacks determine the window's personality. The callback probably calls the
standard DefWindowProc() function to handle many Windows messages.
DefWindowProc(), for example, responds to nonclient mouse messages.
Several things prevent easy reuse of windows and code. First, each window of a
particular class shares the same window procedure. If you want to modify the
behavior of a window, you must create a new class, subclass an existing
window, or superclass a base class. Another problem with window procedures is
the amount of code each function duplicates. For example, nearly all windows
process the WM_CLOSE message in a similar way. Using child windows compounds
this problem. Instead of child windows managing themselves automatically, most
require the parent window to handle special cases. For example, when the
parent window resizes, it must resize all child windows.
The event loop is another common piece of code that most applications modify
slightly. Although its function is similar in all programs, you must make
special modifications to it if you use modeless dialogs, accelerators, or want
to do idle-time processing.


Common Actions


The simplest way to encapsulate some common functionality between windows is
to write a custom DefWindowProc() (using a different name, of course). The
custom procedure can handle any messages that you often process the same way
in many different windows. Your custom code can call DefWindowProc() to handle
most messages. 
Remember, you can define more than one custom default procedure. Any window
that needs a certain type of behavior can call the appropriate default
procedure. Your window code can still call DefWindowProc() in cases where it
does not want the specialized behavior.
WINEXTSD.C (available electronically, see "Availability," page 3), a typical
custom procedure, handles WM_CLOSE and some WM_COMMAND messages. The key to
processing WM_COMMAND messages this way is to agree on a set of menu IDs that
always mean the same thing. CWMENU.H (also available electronically) defines
menu IDs for Open, Exit, and other common commands. Later, you will see that
there are other techniques you can use if you have common menu IDs.


Creating a Universal Event Loop


Nearly all Windows programs have an event loop. Simple programs use a
GetMessage() and DispatchMessage() call. More complex programs contain
processing for modeless dialogs, accelerators, and idle-time processing. With
a little sleight of hand, you can create a nearly universal event loop; see
Listing One, page 19. Figure 3(a) is the prototype for the universal event
loop. The w parameter is the application's main window. This window receives
WM_COMMAND messages from accelerators and idle-processing messages. The mdi
parameter contains the MDI client window (for multiple-document-interface
applications). Ordinary applications can just set this window handle to NULL.
You can translate multiple accelerator tables by supplying an accelerator
array in the accary parameter and the length of the table in the nracc
parameter. If you have a single accelerator, just pass its address and set
nracc to 1. 
Figure 3(b) is a typical call to the event loop. If the idle flag is TRUE,
cw_Run() will send the main application window CW_IDLE messages when there are
no messages waiting in the queue, but Windows allows the application to run.
You can use the CW_IDLE message to update status bars or do other background
operations.
To handle modeless dialogs, cw_Run() calls the cw_DialogProcess() function.
This function uses window properties to mark and process modeless dialogs. By
using the cw_CreateDialog() or cw_ModelessRegister() function, each modeless
dialog receives the aCW_MODELESS property. The cw_DialogProcess() function
checks for this property on any window message. It also checks for the
property on the window's parent window. If the window or its parent has the
aCW_MODELESS property set, then the window is either a modeless dialog or a
control inside a modeless dialog. Either way, cw_DialogProcess() calls
IsDialogMessage() to process the message. If the message was WM_NCDESTROY,
which is the last message a window receives, cw_DialogProcess() removes the
property from the modeless dialog.
Properties are ideal for storing this type of information. You can add any
property to any window at any time. Contrast this with another common way to
associate data with a window--extra words. To use extra window words, you must
allocate them when you register the window class. By using properties, even
modeless dialogs you don't control (the common search dialog in COMMDLG.DLL,
for example) can work with the universal event loop. 
Many programmers believe that properties are inefficient. This is somewhat
true when you use strings as property keys. Windows must then search the
string table to locate the property. However, if you use atom keys, as
CoolWorx does, the calls are very fast since the 16-bit atom is a direct index
to the string table.
If you roll your own event loop, you can still call cw_DialogProcess(). If the
function returns TRUE, you simply go on to the next event. If the function
returns FALSE, you must continue processing the nondialog message. You can use
cw_Run() as a starting point for your own event loops.


Creating Window-Class Templates


Many Windows applications need windows that use similar class definitions.
However, since each application wants private windows with a different window
procedure and custom icons, it registers its classes. Occasionally, a useful
window class will reside in a DLL with the CS_GLOBALCLASS style set. This
makes the window available to all apps but gives it a fixed icon, window
procedure, and so on.
When a DLL makes a window class available, it typically calls RegisterClass()
as part of its initialization code (in LibMain() for Windows 3.1 or
DLLEntryPoint(), by convention, for Windows NT). However, you can defer class
registration until a later time. Consider the cw_BeginIcons() function in
Listing One. Each application that wants to use CoolWorx calls this function,
passing some icon handles and the program's instance handle. The CoolWorx DLL
then registers windows on the program's behalf. This allows CoolWorx to hide
the bulk of the class registration, while still allowing each application to
have custom icons and private window classes. I call this type of window class
a "class template." 


Self-Subclassing


Even with class templates, you must supply a callback routine for the class.
The application program could supply the callback, but that would force it to
use the class for only one type of window. A better answer is
self-subclassing. With self-subclassing, you supply a dummy callback for the
window (see self_subclass() in Listing One). This callback uses the 32-bit
create parameter you supply when creating a window to set the callback address
during the WM_CREATE message. It then delegates all future messages to this
callback. Instead of using property values, the self-subclass routine stores
the callback address in an extra window word. That means that classes that use
self-subclassing must set the cbWndExtra field in the WndClass structure to at
least four bytes and reserve the first four bytes for the use of the
self-subclass routine. This is reasonable since you must choose to use the
self-subclass technique when registering the class. This technique allows one
class to handle multiple types of windows. By directly manipulating the window
words, you could even dynamically change the callback a window uses.



The Edit Component


Although Windows provides an edit control, it falls short of being a true
software component. To use a multiline edit control as a stand-alone text
editor, you need to write a bit of support code. The code in WINEXTED.C
provides a method of encapsulating an edit control inside a component window.
The edit component, CWEditClass, can load and store files, handle common menu
commands, and stand as a main window. Again, the standard menu IDs from
CWMENU.H are useful. The component window responds to messages like
CM_EDITUNDO, CM_FILESAVE, and others.
The edit component delegates many messages to the underlying edit control.
(The edit control's handle is in the first window word of the component.) It
also provides calls to save and load files, place the status in a static
control, and a few other miscellaneous functions. 


The Ideal Toolbar


Many programs today use tool bars and status bars (ribbons). These ribbons are
small, strip-like windows that cling to the edge of another window and contain
controls. 
Word for Windows, for example, uses small bitmapped buttons, combo boxes, and
static controls in its ribbons. Since these ribbons contain controls, it is
natural to think of using modeless dialogs to implement them. This allows you
to simply construct tool and status bars using any dialog editor. Using
unadorned dialogs is unwieldy, however. Each time the parent window resizes,
it must resize the ribbon dialogs. Also, the window must deduct the size of
all visible windows from its client area.
To solve these problems, CoolWorx uses a private dialog class for ribbons.
Ordinary dialogs use a special form of callback that you supply when you
create the dialog box. Private dialog classes use this callback, but they also
allow you to specify a normal window procedure that gets control before the
ordinary dialog callback. This window procedure is ordinary in all respects
except that it calls DefDlgProc() instead of DefWindowProc() to handle
unprocessed messages. Using a private dialog class also allows you to add
extra words to the dialog, change the dialog's icon, and make other
modifications. To specify a private dialog class, simply use the CLASS
statement in the RC file. Of course, you must register the class before
creating any dialogs that use it. CoolWorx automatically registers the
RibbonClass class for you.
RibbonClass uses the ribbon_proc() function as a callback. The window
procedure forces the focus away from the ribbon. This prevents the ribbon from
getting the keyboard focus. In addition, the new class defines several extra
words (above the DLGWINDOWEXTRA bytes that all dialogs require). The
ribbondp() function is an ordinary dialog callback that ribbons use. It has
four main functions. First, it intercepts WM_COMMAND messages from its
children and passes them to its parent (the main window). Also, unless you
create the ribbon with the CWRIBBON_FOCUSOK style, the WM_COMMAND message
forces the focus to the main window. This prevents a button or other control
from retaining the focus after the user presses it. When the ribbon is enabled
or disabled (using EnableWindow()) it, in turn, enables or disables all of its
controls. This forces a disabled ribbon's controls to appear gray.
When you create a ribbon, ribbondp()'s WM_DLGINIT handler takes two actions.
The simplest of these is that it calls Ctl3dSubclassDlgEx(). This is a call to
CTL3DV2.DLL, the standard Microsoft 3-D control DLL. It can automatically
subclass standard dialogs, but not those that use private dialog classes. By
adding this call to the dialog's initialization, CTL3DV2 still appears to work
automatically.


Auto-Subclassing 


The more complex action that the WM_DLGINIT handler takes is to set up a
linked list of ribbons that belong to the main window. The first ribbon reads
the main window's window procedure and places it in the ribbon's extra
storage. It then sets its window handle in the aCW_RIBBON property of the main
window, and installs a new window handler, ribbonfilt(). Subsequent ribbons
simply add themselves to the end of the chain that starts with the window in
the aCW_RIBBON property.
Technically, this is window subclassing, but I call it "auto-subclassing."
Usually, subclassing augments or restricts a window's functionality. For
example, you often subclass edit controls to restrict their input to numeric
digits. In this case, ribbonfilt() does not add or take away functions from
the parent window; the parent window never knows the subclass is in place. All
ribbonfilt() needs to process is the WM_SIZE and WM_DESTROY messages. When a
WM_SIZE occurs, all ribbons adjust their sizes automatically. When the
WM_DESTROY message comes through, ribbonfilt() removes itself without a trace.
This is a very powerful concept. A ribbon can automatically manage itself with
no code in the parent window. The only drawback is if you use multiple types
of auto-subclassing windows, it may be difficult to remove them. This is
similar to hooking interrupt vectors. If the parent's current window procedure
is not your filter procedure, how do you remove yourself from the chain? In
the case of ribbons, this is not a problem. Once you create a ribbon, it
remains until you destroy its parent window. You can make a ribbon invisible,
but you can't destroy it. Since ribbons are aware of other ribbons that the
parent window uses, they can account for them when resizing. For example,
suppose you create a ribbon at the top of a window and then add a ribbon to
the left side of the same window. The left ribbon will shrink to make room for
the top ribbon.


Ribbon-Support Functions


To create a ribbon, use the cw_Ribbon() function. You supply a resource ID, a
parent window, and one of the CWRIBBON_* constants in CW.H (available
electronically). The ID must refer to a dialog template in the same module as
the parent window. When you want to know the client area of the parent window,
call cw_GetClientRect() instead of GetClientRect(). This special function
computes the client area after taking the visible ribbons into account. Unlike
GetClientRect(), cw_GetClientRect() may return nonzero values for the
upper-left coordinates of the rectangle. Many programs assume that these
values are always zero. Consider, for example, the code in Figure 4(a).
Although this works, it is technically incorrect. It only works because r.left
and r.bottom are always 0. When using cw_GetClientRect(), you should use the
code shown in Figure 4(b). 
If you are making many GDI calls, it is simple to adjust the viewport origin
and clipping region to account for the ribbons. However, if your main window
specifies the WS_CLIPCHILDREN style, the clipping region will already exclude
the ribbons. Another solution is to create a child window in the client area
and use that for your drawing. The standard MDI client window uses this
technique.


Putting it Together


I've compiled CoolWorx with both Borland and Microsoft C. However, it should
compile and run with any Windows C or C++ compiler. CoolWorx resides in a DLL
for easy access by application programs. To run a CoolWorx program, you need
COOLWORX.DLL and CTL3DV2.DLL (available from Microsoft). The complete CoolWorx
API is shown in Table 1. There is also a Windows help file with the online
listings.
Each application that uses CoolWorx must call the cw_Begin() or
cw_BeginIcons() function before making other CoolWorx calls. Before the
program terminates, it should call cw_End(). When your program calls
cw_Begin(), CoolWorx creates window classes (using the class templates) and
sets up CTL3DV2 to automatically give your dialogs a 3-D look. The cw_End()
function unhooks the CTL3DV2 library, but it doesn't call UnregisterClass(),
as you might expect from a DLL that registers classes. Since the DLL creates
the window classes on behalf of your application, multiple instances of the
application use the same classes. When one program terminates, other instances
of the program may still be in use. Therefore, you must not unregister the
window classes. Since the classes are private, Windows will free them when all
instances terminate.
Since CoolWorx is a DLL, it must not use global variables unless they are
truly global across all applications. When CoolWorx starts, it creates several
string atoms. All programs use the same values for these atoms. All other data
must be on a per-instance basis.


An Example Program


Also available electronically is XEDITOR.C, a simple editor (see Figure 1)
built with CoolWorx. The WinMain() function is similar to an ordinary
program's main function. The only difference is that XEDITOR calls cw_Begin(),
cw_Run(), and cw_End() to do most of the work. The init() function only needs
to create a window using cw_SDICreate(). 
The main window procedure handles seven ordinary Windows messages and two that
are CoolWorx specific. Table 2 shows the messages and their corresponding
actions. Note that CoolWorx windows don't process WM_CREATE. Instead, they
process CW_SDICREATE. Also, notice that many menu commands don't have
handlers. This is because the editor component processes many of them
directly. The main window routine passes the WM_ COMMAND message data to the
editor with a CW_STDMENU message. If this call returns TRUE, the editor
processed the command, and the window procedure is free to return. If the
return value is FALSE, the editor did not know how to process the command. The
editor does not handle CM_FILENEW and CM_FILEOPEN, since some MDI programs
will create new editors to handle those messages. XEDITOR, on the other hand,
simply delegates to the editor component for these commands.
When the component sees a WM_QUERYENDSESSION message, it prompts the user to
save the file (if necessary). The cw_DefWindowProc() function converts
WM_CLOSE messages to WM_QUERYENDSESSION. This allows the same behavior when
Windows shuts down, or when the user closes the application.
The other WM_COMMAND messages XEDITOR processes are specific to this program.
For example, VIEW_FONT, VIEW_TB, and VIEW_SB are XEDITOR specific. The
NUM_BUTTON and CAP_BUTTON commands originate from the buttons on the status
bar. 


Future Directions


The version of CoolWorx in this article only handles SDI applications. A demo
of a larger version of CoolWorx with the code for this article is available
electronically; see "Availability," page 3. This version of CoolWorx handles
MDI, graphical buttons, progress bars, and much more. 
However, you don't need to adopt CoolWorx to use the techniques presented
here. You can easily make better use of the code you already write by using
these methods. When you write window procedures, ask yourself how much code
you can either factor out with a custom default procedure or combine with a
similar window. Look for ways to use window properties and extra storage words
to make universal routines like cw_Run(). Think about self-subclassing and
auto-subclassing when designing new child windows. The more code you can
reuse, the quicker you can write future applications.
Figure 1 Main screen of an editor program written in approximately 200 lines
of code using CoolWorx.
Figure 2 Basic flow of a typical Windows program.
Figure 3: (a) Prototype for the universal event loop; (b) initiating the event
loop.
(a) WORD WINAPI cw_Run(HWND w, HWND mdi, int nracc, 

 LPHANDLE accary, BOOL idle);

(b) cw_Run(w,NULL,1,&acc,TRUE);
Figure 4: (a) This code works only because the application assumes that r.left
and r.bottom are always 0; (b) proper coding technique when using
cw_GetClientRect().
(a)
int wid,hi;
RECT r;
GetClientRect(w,&r);
wid=r.right;
hi=r.bottom;

(b)
int wid,hi;
RECT r;
GetClientRect(w,&r);
wid=r.right-r.left;
hi=r.bottom-r.top;
Table 1: CoolWorx API (* indicates seldom-used functions).
Function/Message Description 
cw_Begin Start CoolWorx with default icons.
cw_BeginIcons Start CoolWorx with custom icons.
cw_End End CoolWorx.
cw_ModelessRegister* Mark a modeless dialog for processing
 by the universal event loop (cw_Run()).
cw_DialogProcess* Called by event loops to process modeless dialogs.
cw_Run Universal event loop.
cw_Ribbon Create ribbon.
cw_SetRibbon Set text in ribbon control.
cw_RibbonAdj* Adjust client rectangle to account for visible ribbons.
cw_RibbonInvAdj* Adjust client rectangle to account for all ribbons.
cw_GetClientRect Get adjusted client rectangle.
cw_GetRibbon Return ribbon handle.
cw_EnableCommand Enable or disable a menu and associated ribbon controls.
cw_SDICreate Create a single document-interface window.
cw_DefWindowProc CoolWorx default window procedure (replaces DefWindowProc()).
cw_EditSaveFile Saves editor file.
cw_EditStatus Display editor status in ribbon.
cw_EditNew Clear editor file.
cw_EditOpen Open an editor file by name.
cw_EditOpenFile Open an editor file with dialog.
cw_EditGetSel Get editor's selected text.
cw_EditSetFont Set editor's display font.
cw_StatusKeys Get key status (for example, Num Lock).
cw_ToggleKeyState Toggle key status.
cw_StatusTime Display current time.
cw_StatusHelp Display menu help.
cw_GetFilename Get filename using common dialog.
cw_CreateDialog Create modeless dialog
 (also cw_CreateDialogIndirect, cw_CreateDialogParam, and so on).
CW_SDICREATE Process instead of WM_CREATE.
CW_IDLE Message delivered when idle time is available.
CW_INITTOOL Cause edit component to set up toolbar.
CW_STDMENU Send menu command to edit component.
Table 2: XEDITOR message processing.
Message Action 
WM_SETFOCUS Pass focus to edit component.
WM_SIZE Resize edit component to fill window.
CW_SDICREATE Make edit component.
WM_INITMENU Delegate to edit component which

 sets state of menu items
 (Undo, Cut, Paste, and so on).
WM_MENUSELECT Place help on status bar.
CW_IDLE Update status bar and allow
 edit component to update toolbar.
WM_COMMAND Delegate to edit component;
 if edit component doesn't process
 this message, then XEDITOR checks
 for local commands.
WM_QUERYENDSESSION Delegate to edit component.
WM_DESTROY End application.

Listing One 

/* COOLWORX.C -- Al Williams */

#include <windows.h>
#include <windowsx.h>
#include "coolworx.h"
#include "cwh.h"
#include <ctl3d.h>

/* Get HWND of WM_COMMAND message */
#ifndef GET_WM_COMMAND_HWND
#ifndef WIN32
#define GET_WM_COMMAND_HWND(wp, lp) (HWND)LOWORD(lp)
#else
#define GET_WM_COMMAND_HWND(wp, lp) (HWND)(lp)
#endif
#endif

ATOM aCW_RIBBONv; // atom for Ribbon prop
ATOM aCW_MODELESSv; // atom for modeless prop
/* Class names */
char cw_RibbonClass[]="RibbonClass";
char cw_EditClass[]="CWEditClass";
char cw_SdiClass[]="CWSDIClass";
HANDLE cw_hInst;
HANDLE dllInst;

/* Avoid warnings */
#define aCW_RIBBON MAKEINTATOM(aCW_RIBBONv)
#define aCW_MODELESS MAKEINTATOM(aCW_MODELESSv)

static void size_ribbon(HWND parent,HWND hDlg,
 LONG style,BOOL f);

/* Process possible modeless dialog. cw_Run() calls this so you only need it 
if you are rolling your own message loop */
BOOL WINAPI _export cw_DialogProcess(LPMSG m)
 {
 HWND w=NULL,p=NULL;
 if (!m->hwnd) return FALSE;
/* If window is modeless... */
 if (GetProp(m->hwnd,aCW_MODELESS)) w=m->hwnd;
 else p=GetParent(m->hwnd);
/* ... or parent is modeless... */
 if (p&&GetProp(p,aCW_MODELESS))
 w=GetParent(m->hwnd);

/* ... then process */
 if (w)
 {
/* clean up at end */
 if (m->message==WM_NCDESTROY&&w==m->hwnd)
 RemoveProp(m->hwnd,aCW_MODELESS);
/* Do it */
 return IsDialogMessage(w,m);
 }
 return FALSE;
 }
/* Register modeless dialog--don't need to this if you use cw_CreateDialog */
BOOL WINAPI _export cw_ModelessRegister(HWND w,BOOL f)
 {
 if (f)
 SetProp(w,aCW_MODELESS,1);
 else
 RemoveProp(w,aCW_MODELESS);
 return TRUE;
 }
/* Ribbon filter -- installed on ribbon's parent window */
long WINAPI _export ribbonfilt(HWND hWnd,UINT message,UINT wParam,LONG lParam)
 {
 FARPROC chain;
 HWND ribbon1=(HWND)GetProp(hWnd,aCW_RIBBON);
 chain=(FARPROC)GetWindowLong(ribbon1,CHAIN_LONG);
 switch (message)
 {
 case WM_SIZE:
/* Relocate all ribbons */
 {
 while (ribbon1)
 {
 size_ribbon(hWnd,ribbon1,
 GetWindowLong(ribbon1,STYLE_LONG),TRUE);
 ribbon1=(HWND)GetWindowLong(ribbon1,NEXT_LONG);
 }
 }
 break;
/* Clean up */
 case WM_DESTROY:
 SetWindowLong(hWnd,GWL_WNDPROC,(DWORD)chain);
 RemoveProp(hWnd,aCW_RIBBON);
 while (ribbon1)
 {
 HWND nxt=(HWND)GetWindowLong(ribbon1,NEXT_LONG);
 DestroyWindow(ribbon1);
 ribbon1=nxt;
 }
 break;
 }
 return CallWindowProc((FARPROC)chain,hWnd,message,
 wParam,lParam);
 }
/* Adjust ribbons to get out of each others way */
static void adj_ribbon(LONG style,int *x,int *y,int *len,
 HWND head,HWND rib)
 {
 HWND w=GetParent(rib);

 RECT base,r,bar;
 GetClientRect(w,&base);
 GetClientRect(w,&r);
 GetWindowRect(rib,&bar);
 cw_RibbonInvAdj(w,&r,TRUE,rib);
 switch (style)
 {
 case CWRIBBON_TOP:
 *x=r.left;
 *y=0;
 *len=r.right-r.left;
 break;
 case CWRIBBON_BOTTOM:
 *x=r.left;
 *y=base.bottom-(bar.bottom-bar.top);
 *len=r.right-r.left;
 break;
 case CWRIBBON_RIGHT:
 *x=base.right-(bar.right-bar.left);
 *y=r.top;
 *len=r.bottom-r.top;
 break;
 case CWRIBBON_LEFT:
 *x=0;
 *y=r.top;
 *len=r.bottom-r.top;
 break;
 }
 }
/* Compute correct size for ribbon -- account for existing
 ribbons (even if invisible) */
static void size_ribbon(HWND parent,HWND hDlg,LONG style,
 BOOL f)
 {
 RECT r,pr;
 int x,y,len=0;
 HWND head=GetProp(parent,aCW_RIBBON);
 style&=CWRIBBON_LEFT; /* includes all position bits */
 GetWindowRect(hDlg,&r);
 GetClientRect(parent,&pr);
 if (style<CWRIBBON_RIGHT)
 {
 x=0;
 len=pr.right;
 if (style==CWRIBBON_TOP)
 y=0;
 else
 y=pr.bottom-(r.bottom-r.top);
 adj_ribbon(style,&x,&y,&len,head,hDlg);
 MoveWindow(hDlg,x,y,len?len:GetSystemMetrics(SM_CXSCREEN),
 r.bottom-r.top,f);
 }
 else
 {
 y=0;
 len=pr.bottom;
 if (style==CWRIBBON_LEFT)
 x=0;
 else

 x=pr.right-(r.right-r.left);
 adj_ribbon(style,&x,&y,&len,head,hDlg);
 MoveWindow(hDlg,x,y,r.right-r.left,
 len?len:GetSystemMetrics(SM_CXSCREEN),f);
 }
 }
/* Ordinary dialog procedure for ribbon */
BOOL WINAPI _export ribbondp(HWND hDlg,UINT message,
 UINT wParam,LONG lParam)
 {
 switch (message)
 {
 case WM_INITDIALOG:
 {
 HWND parent,pdlg;
 parent=GetParent(hDlg);
 SetWindowLong(hDlg,STYLE_LONG,lParam);
 if (!(pdlg=GetProp(parent,aCW_RIBBON)))
 {
 LONG val;
 /* we are #1 ribbon -- start chain */
 val=GetWindowLong(parent,GWL_WNDPROC);
 SetWindowLong(hDlg,CHAIN_LONG,val);
 SetProp(parent,aCW_RIBBON,hDlg);
 SetWindowLong(parent,GWL_WNDPROC,
 (DWORD)ribbonfilt);
 }
 else
 {
 HWND ndlg;
/* Add yourself to existing chain */
 while (ndlg=(HWND)GetWindowLong(pdlg,NEXT_LONG))
 pdlg=ndlg;
 SetWindowLong(pdlg,NEXT_LONG,hDlg);
 }
 SetWindowLong(hDlg,NEXT_LONG,0L);
 size_ribbon(parent,hDlg,lParam,FALSE);
/* Make Dialog 3D */
 Ctl3dSubclassDlgEx(hDlg,CTL3D_ALL);
 SetFocus(parent);
 }
 return FALSE;
 case WM_COMMAND: // pass commands to parent
 {
 HWND parent;
 LONG style;
 SendMessage(parent=GetParent(hDlg),message,wParam,
 lParam);
 style=GetWindowLong(hDlg,STYLE_LONG);
 if (!(style&CWRIBBON_FOCUSOK))
 {
 HWND fw=GetFocus();
 if (fw==hDlg
 fw==GET_WM_COMMAND_HWND(wParam,lParam))
 SetFocus(parent);
 }
 }
 return 0;
/* Enable/disable all controls */

 case WM_ENABLE:
 {
 HWND ctl=GetWindow(hDlg,GW_CHILD);
 while (ctl)
 {
 EnableWindow(ctl,wParam);
 ctl=GetWindow(ctl,GW_HWNDNEXT);
 }
 }
 break;
 }
 return 0;
 }
/* Ribbon private dialog class */
long WINAPI _export ribbonproc(HWND hWnd,
 UINT message,UINT wParam, LONG lParam)
 {
 if (message==WM_NCDESTROY)
 {
 RemoveProp(hWnd,aCW_MODELESS); // clean up
 }
 else if (message==WM_SETFOCUS) // pass focus to parent
 {
 SetFocus((wParam&&!IsChild(hWnd,wParam))?
 wParam:GetParent(hWnd));
 return 0;
 }
 return DefDlgProc(hWnd,message,wParam,lParam);
 }
/* Start CoolWorx */
BOOL WINAPI _export cw_Begin(HANDLE hInst)
 {
 return cw_BeginIcons(hInst,
 LoadIcon(NULL,IDI_APPLICATION),
 LoadIcon(NULL,IDI_APPLICATION));
 }
/* Register our classes */
static BOOL reg_classes(HANDLE hInst,HICON sdiicon,
 HICON eicon)
 {
 WNDCLASS wc;
/* Ribbon */
 wc.style=0;
 wc.lpfnWndProc=(void FAR *)ribbonproc;
 wc.cbClsExtra=0;
 wc.cbWndExtra=DLGWINDOWEXTRA+12;
 wc.hInstance=hInst;
 wc.hIcon=NULL;
 wc.hCursor=LoadCursor(NULL,IDC_ARROW);
 wc.hbrBackground=COLOR_BTNFACE+1;
 wc.lpszMenuName=NULL;
 wc.lpszClassName=cw_RibbonClass;
 if (!RegisterClass(&wc)) return FALSE;
/* Editor class */
 wc.style=CS_HREDRAWCS_VREDRAW;
 wc.lpfnWndProc=(void FAR *)cwed_proc;
 wc.cbClsExtra=0;
 wc.cbWndExtra=10;
 wc.hInstance=hInst;

 wc.hIcon=eicon;
 wc.hCursor=LoadCursor(NULL, IDC_ARROW);
 wc.hbrBackground=COLOR_WINDOW+1;
 wc.lpszMenuName=NULL;
 wc.lpszClassName=cw_EditClass;
 if (!RegisterClass(&wc)) return FALSE;
/* SDI window */
 wc.style=CS_HREDRAWCS_VREDRAW;
 wc.lpfnWndProc=(void FAR *)self_subclass;
 wc.cbClsExtra=0;
 wc.cbWndExtra=4;
 wc.hInstance=hInst;
 wc.hIcon=sdiicon;
 wc.hCursor=LoadCursor(NULL,IDC_ARROW);
 wc.hbrBackground=COLOR_WINDOW+1;
 wc.lpszMenuName=NULL;
 wc.lpszClassName=cw_SdiClass;
 if (!RegisterClass(&wc)) return FALSE;
 return TRUE;
 }
/* Start CoolWorx & set icons */
BOOL WINAPI _export cw_BeginIcons(HANDLE hInst,
 HICON sdiicon,HICON eicon)
 {
 WNDCLASS wc;
/* Fire up CTL3DV2 */
 Ctl3dRegister(hInst);
 Ctl3dAutoSubclass(hInst);
/* If ribbons exist -- we are already going */
 if (!GetClassInfo(hInst,cw_RibbonClass,&wc))
 if (!reg_classes(hInst,sdiicon,eicon)) return FALSE;
 cw_hInst=dllInst;
 return TRUE;
 }
/* End CoolWorx */
void WINAPI _export cw_End(HANDLE hInst)
 {
/* Can't unregister classes -- other instances might
 be using them. Let Windows clean it up */
 Ctl3dUnregister(hInst);
 }
/* Create a ribbon */
HWND WINAPI _export cw_Ribbon(LPCSTR id,HWND par,
 LONG param)
 {
#ifdef WIN32
 return cw_CreateDialogParam(
 GetWindowLong(par,GWL_HINSTANCE),
 id,par,ribbondp,param);
#else
 return cw_CreateDialogParam(
 GetWindowWord(par,GWW_HINSTANCE),
 id,par,ribbondp,param);
#endif
 }
/* Silly helper function */
void WINAPI _export cw_SetRibbon(HWND w,UINT id,LPCSTR s)
 {
 SendDlgItemMessage(w,id,WM_SETTEXT,0,(DWORD)s);

 }
/* General purpose message loop */
WORD WINAPI _export cw_Run(HWND w,HWND mdi,int nracc,
 LPHANDLE ary,BOOL idle)
 {
 MSG msg;
 int i;
 while (1)
 {
/* Check for message */
 if (PeekMessage(&msg,NULL,0,0,PM_NOREMOVE))
 {
/* Got one */
 BOOL accproc=FALSE; // no accel processed (yet)
 if (!GetMessage(&msg,NULL,0,0)) break;
/* Translate MDI accels for MDI apps */
 if (mdi)
 if (TranslateMDISysAccel(mdi,&msg)) continue;
/* Scan accel list */
 for (i=0;i<nracc;i++)
 {
 HANDLE acc=ary[i];
 if (acc&&TranslateAccelerator(w,acc,&msg))
 {
 accproc=TRUE;
 break;
 }
 }
/* If no accel processed, keep going */
 if (!accproc)
 {
/* try modeless dialog */
 if (cw_DialogProcess(&msg))
 continue;
/* Translate & dispatch */
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 }
 } /* No message -- do idle processing */
 else if (w&&idle&&IsWindow(w))
 SendMessage(w,CW_IDLE,0,0);
 }
 return msg.wParam;
 }

/* Create modeless dialog and set property for dialog manager */
HWND WINAPI _export cw_CreateDialogIndirectParam
 (HANDLE hInst,const void FAR *tname,
 HWND parent,DLGPROC cb,
 LPARAM lParam)
 {
 HWND rc;
 rc=CreateDialogIndirectParam(hInst,tname,
 parent,cb,lParam);
 if (rc) SetProp(rc,aCW_MODELESS,1);
 return rc;
 }
/* Create modeless dialog and set property for dialog manager */
HWND WINAPI _export cw_CreateDialogParam(HANDLE hInst,

 LPCSTR tname,HWND parent,DLGPROC cb,LPARAM lParam)
 {
 HWND rc;
 rc=CreateDialogParam(hInst,tname,parent,cb,lParam);
 if (rc) SetProp(rc,aCW_MODELESS,(HANDLE)1);
 return rc;
 }
/* Adjust rectangle to account for ribbon */
BOOL WINAPI _export cw_RibbonInvAdj(HWND w,LPRECT r,
 BOOL vis,HWND stop)
 {
 BOOL rv=FALSE;
 int adj[4],i;
 LONG style;
 HWND head=(HWND)GetProp(w,aCW_RIBBON);
 RECT bar;
 adj[0]=adj[1]=adj[2]=adj[3]=0;
 while (head&&head!=stop)
 {
 if (vis&&!IsWindowVisible(head)) // skip invisible
 {
 head=(HWND)GetWindowLong(head,NEXT_LONG);
 continue;
 }
 GetWindowRect(head,&bar);
 style=GetWindowLong(head,STYLE_LONG)&3;
 switch (style)
 {
 case CWRIBBON_TOP:
 adj[CWRIBBON_TOP]=max(adj[CWRIBBON_TOP],
 bar.bottom-bar.top);
 break;
 case CWRIBBON_BOTTOM:
 adj[CWRIBBON_BOTTOM]=max(adj[CWRIBBON_BOTTOM],
 bar.bottom-bar.top);
 break;
 case CWRIBBON_RIGHT:
 adj[CWRIBBON_RIGHT]=max(adj[CWRIBBON_RIGHT],
 bar.right-bar.left);
 break;
 case CWRIBBON_LEFT:
 adj[CWRIBBON_LEFT]=max(adj[CWRIBBON_LEFT],
 bar.right-bar.left);
 break;
 }
 head=(HWND)GetWindowLong(head,NEXT_LONG);
 }
 for (i=0;i<4;i++)
 if (adj[i])
 {
 rv=TRUE;
 switch (i)
 {
 case CWRIBBON_TOP:
 r->top+=adj[i];
 break;
 case CWRIBBON_BOTTOM:
 r->bottom-=adj[i];
 break;

 case CWRIBBON_RIGHT:
 r->right-=adj[i];
 break;
 case CWRIBBON_LEFT:
 r->left+=adj[i];
 break;
 }
 }
 return rv;
 }
BOOL WINAPI _export cw_RibbonAdj(HWND w,LPRECT r)
 {
 return cw_RibbonInvAdj(w,r,TRUE,NULL);
 }
/* Get client rectangle taking ribbons into account
 NOTE: r.top and r.left may not be zero when this call returns. */
void WINAPI _export cw_GetClientRect(HWND w,LPRECT r)
 {
 GetClientRect(w,r);
 cw_RibbonAdj(w,r);
 }
/* Window proc for self-subclassing windows */
LONG WINAPI _export self_subclass(HWND w,UINT message,
 WPARAM wParam, LPARAM lParam)
 {
 FARPROC p;
 if (message==WM_CREATE)
 {
/* Get createstruct to fetch callback address */
 LPCREATESTRUCT cs=(LPCREATESTRUCT)lParam;
 SetWindowLong(w,0,(LONG)cs->lpCreateParams);
 }
 p=(FARPROC)GetWindowLong(w,0);
 return p?
 CallWindowProc(p,w,message,wParam,lParam):
 DefWindowProc(w,message,wParam,lParam);
 }
/* Get ribbon of specified type */
HWND WINAPI _export cw_GetRibbon(HWND w,LONG type)
 {
 for (w=(HWND)GetProp(w,aCW_RIBBON);w;
 w=(HWND)GetWindowLong(w,NEXT_LONG))
 {
 if (IsWindowVisible(w)&&
 ((GetWindowLong(w,STYLE_LONG)&3)==type))
 return w;
 }
 return NULL;
 }
/* End of the (DLL) world */
#ifdef __BORLANDC
int FAR PASCAL WEP ( int bSystemExit )
#else
int FAR PASCAL _WEP ( int bSystemExit )
#endif
 {
 DeleteAtom(aCW_RIBBONv);
 DeleteAtom(aCW_MODELESSv);
 return 1;

 }
/* DLL startup */
int FAR PASCAL LibMain( HINSTANCE hModule, WORD wDataSeg,
 WORD cbHeapSize, LPSTR lpszCmdLine )
 {
// Save module handle to use as instance later
 dllInst = hModule;
 if (cbHeapSize>0) UnlockData(0);
 aCW_RIBBONv=AddAtom("CW_RIBBON");
 aCW_MODELESSv=AddAtom("CW_MODELESS");
 return TRUE;
 }
/* Enable command (menu & toolbar) */
void WINAPI _export cw_EnableCommand(HMENU m,HWND w,
 int cmd,BOOL f,BOOL vis)
 {
 if (m)
 {
 EnableMenuItem(m,cmd,
 (f?MF_ENABLED:(MF_DISABLEDMF_GRAYED))
 MF_BYCOMMAND);
 }
 if (w)
 {
 HWND ctl,bar;
 bar=(HWND)GetProp(w,aCW_RIBBON);
 while (bar)
 {
 if (!visIsWindowVisible(bar))
 {
 ctl=GetDlgItem(bar,cmd);
 if (ctl) EnableWindow(ctl,f);
 }
 bar=(HWND)GetWindowLong(bar,NEXT_LONG);
 }
 }
 }


























A Visual Basic Form Generator


Speeding up the Windows development process




Wei Xiao


Wei is currently a research assistant in the computer-science department at
the University of Wisconsin-Madison and can be reached at wei@cs.wisc.edu.


At the commercial auto-insurance company where I work, I was assigned the task
of developing a Windows-based insurance-policy system using Visual Basic (VB).
Because of the wide range of options available on the insurance policies,
hundreds of pages of documents had to be converted to VB forms. It took hours
to draw one form using the interface designer that came with Visual Basic,
because there were usually hundreds of labels and text boxes to draw. Changing
the font or character size for all the controls also took a lot of effort.
Realizing that end users can easily enter the form using a text editor (with
certain special symbols to specify fill-in blanks and check boxes), I wrote a
program to parse a text file and generate a VB form that can be loaded into
the project file directly. An end user can also scan the paper form and use an
OCR program to generate the text file. With this approach, it takes less than
ten minutes to edit the form's text file to make sure the OCR gets it right.
My form-generator program handles text boxes, combo boxes, check boxes, and
form attributes such as font, margin, and color.
This article describes how I designed the form-description-file format and the
form-generator program. The format of the form-description file is simple
enough for end users to edit, yet powerful enough for professional programmers
who need to quickly generate GUIs with end-user involvement. I'll also discuss
how to combine the benefits of both the WYSIWYG interface designer and the
form generator.
The form generator is a C program that converts a text file into a VB form
file (.FRM file). The program converts a text string in the text file into a
label in the .FRM file and converts a series of underscores (often used as
fill-in blanks) into a text box. If there are hundreds of labels and text
boxes in the form, it is easier and faster to enter or modify the text file
rather than to draw the form using the Visual Basic interface designer. The
end user can edit the text file or scan the paper forms. When I converted the
insurance-policy forms into VB forms, using the form generator was ten times
faster than drawing the forms manually.
A form can be as simple as Figure 1, which has three fill-in blanks and three
words describing the information requested. When this form is converted to a
VB .FRM file, there will be three labels and three text boxes. For the format
of the .FRM file, see the Visual Basic Programmer's Reference Manual. The
form-generator program reads the text file in Figure 1(the "form-description
file") and generates the VB .FRM file; see Figure 2. The .FRM file can be
loaded into a Visual Basic project directly.
Check boxes and combo boxes are often used in forms. The form-description file
requires a special symbol to tell the form-generator program where those
special boxes are. I use an asterisk (*) to denote a check box. For example,
the line in Example 1(a) in a form-description file becomes the line in
Example 1(b) in the generated VB Form.
Combo boxes need special symbols to specify their locations and default
options. In the beginning of the form-description file, a command section is
needed which includes the command that defines the options of a combo box. The
command section is enclosed in a pair of braces, { }. The command I use is
ComboSet. For example, Figure 3 defines two combo-box types: "car-make" with
the options "Ford," "Honda," and "Chevy;" and "fruit-type" with the options
"apple," "orange," and "grape." The ComboSet command takes one argument. If
the argument starts with an @ character, the string after the @ character is
the type name of a combo box. Otherwise, it is one of the options of a combo
box. The type name of the combo box always immediately follows the box's
options. This syntax simplifies the form-generator program. In the
form-description file, a combo box is denoted by *@, followed by the type
name. The form defined by the file in Figure 3 has two questions, with combo
boxes providing default answers.
The command section contains a few other commands: FormName and FormCaption
specify the form's name and caption, respectively. FormSet and ControlSet
specify some of the form's properties and all of its controls. In Figure 4,
the form is set to be an MDIChild, with a border style of 0 (NONE), and with
all the controls on the form in bold fonts. The LeftMargin command specifies
how much space is on the left side of the form. Notice that all these commands
are optional. Even the command section is optional. Users do not have to learn
the commands until they need to use them. The last command in Figure 4,
Summary, puts appropriate values for five variables in the Visual Basic
Form_Load() procedure: nCheckBox, nTextBox, nComboBox, nFormWidth, and
nFormHeight, so that programmers can write some routines using these
properties of the generated form.


The Program


Listing One is a C implementation of the form-generator program. It scans the
form-description file twice. (An executable version of the program, plus
associated forms, is available electronically; see "Availability," page 3.)
The first pass counts the number of lines and maximum line width in the
form-description file in order to calculate the form width and height, because
these two values are needed in the beginning of the VB .FRM file. The second
pass generates the .FRM file as it scans the form-description file, following
these three steps:
1. The command section is processed, and the header of the form is generated.
Three stacks are used to store arguments of the FormSet, ControlSet, and
ComboSet commands. Some flags and values are also stored in this step. 
2. The rest of the form-description file is scanned. For underscores, text
boxes are generated; for strings that start with an asterisk (*), check boxes
are generated; for strings that start with *@, combo boxes are generated. The
rest of the text is converted to labels. For each control, the stack that
stores the arguments of ControlSet is dumped as part of the attribute list for
that control. Thus, ControlSet sets attributes for all the controls on the
form. 
3. The Form_Load() function is written to the end of the .FRM file. This
function includes initialization statements for the summary variables if the
Summary command is in the command section. It also includes the initialization
code for combo boxes if they are used in the form.
The syntax of the form-description file can be extended into a
form-description language in which you can define new objects and write
application code. Based on the same form-description file, the form generator
may also generate forms and code for other GUI development environments.
Visual GUI designers (like the one that comes with Visual Basic) are great for
beginners, but they may not fully satisfy the needs of professional
programmers. The best solution may be to use the visual GUI designer for
designs that need immediate interaction, and to use a form generator to
eliminate tedious drawing jobs.
Figure 1: Simple form-description file.
Name _______________
Age ______________
Address ____________
Figure 2: The beginning of the .FRM file for the file in Figure 1.
VERSION 2.00
Begin Form form1
 Caption = "form1"
 Width = 2100
 Height = 1275
 Top = 100
 Left = 100
 Begin Label Label1
 Index = 0
 Caption = "Name"
 FontUnderline = 0 'False
 FontBold = 0 'False
 FontItalic = 0 'False
 FontName = "Courier New"
Figure 3: Form-description file with combo boxes.
{
 ComboSet: apple
 ComboSet: orange
 ComboSet: grape
 ComboSet: @fruit-type
 ComboSet: Ford

 ComboSet: Honda
 ComboSet: Chevy
 ComboSet: @car-make
}
What kind of fruit do you like? *@fruit-type
What kind of car do you drive? *@car-make
Figure 4: Sample command section.
{
FormName: MF1
 FormCaption: Optional Coverage
 FormSet: MDIChild = -1
 FormSet: BorderStyle = 0
 ControlSet: FontBold = -1
 LeftMargin: 100
 Summary
}
Example 1 (a) Form-description file; (b) generated Visual Basic form.

Listing One 

/* Visual Basic Form Generator -- Wei Xiao -- 1994 COPYRIGHT -- MS C700 */
#include <stdio.h>
#include <string.h>

#define DEF_ClientWidth 8400 /* default window width
 80 column, 8.25 Courier New,
 105 twips per char*/
#define DEF_ClientHeight 6435 /* default window height*/
#define DEF_ControlHeight 255 /* default control height*/
#define LINE_LEN_MAX 200 /* maximum line length*/
#define DEF_LINE_LEN 80
#define DEF_PAGE_LEN 25
#define FORM_NAME_LEN 30 /* maximum form name length*/
#define FORM_CAPTION_LEN 80 /* " form caption "*/
#define DEF_FontBold 0 /* 0 = false */
#define DEF_FontItalic 0 
#define DEF_FontName "Courier New"
#define DEF_FontSize 8.25
#define DEF_FontStrikethru 0 
#define DEF_FontUnderline 0 
#define DEF_CHECK_BOX_ADJ 200

int nCheckBox=0, /*check box count*/
 nTextBox=0, /*text box count*/
 nLabel=0, /*label count*/
 nLine=0, /*line count */
 nPerChar, /*average char width in twips*/
 nPerLine, /*Line Height in twips */
 nComboBox=0, /*number of combo boxes */
 fSummary=0, /* 1 if Summary command appears */
 nLeftMargin=0, /*leftmargin in twips*/
 fFormNameSpec = 0, /* 1 if form name is on command line */
 firstLine=1, /* 1 if the first line has not been read*/
 fHeaderNotWritten=1; /* assigned 0 in write_header() */
 nNumOfLines=DEF_PAGE_LEN, /* total # of lines in input file */
 nNumOfCmdLines=0, /* total # of lines in commad part */
 nMaxLineLength=DEF_LINE_LEN;/* maximum line length in input file */ 

long nFormWidth,

 nFormHeight; /*dimention of the VB form */

char sFormName[FORM_NAME_LEN]= "form1";
char sFormCaption[FORM_CAPTION_LEN] = "form1";
char buf[LINE_LEN_MAX];
char buf2[LINE_LEN_MAX]; 

char sFrmName[80]; /* form name */
FILE *fIn, *fOut; 

struct stack {
 char * pStr;
 struct stack *next;
} *stForm = 0, /* form settings */ 
 *stCntl = 0, /* control settings */
 *stCobDef=0, /* combo box type definitions items for the
 same type of combo are pushed into the 
 stack followed by the type name. Different
 types of combo are pushed into the stack 
 one after another. See form.txt for a 
 sample of definition*/
 *stCobList=0; /* combo box control types on the form the 
 combo box on the form are indexed by 
 0, 1, 2, ... Their types are stored in this
 order in a stack, with last combo on top*/

push_stack(char *buf, struct stack**pst) /* & *st is better :-) */
{
 char *p;
 struct stack *st1;
 if((p=strdup(buf)) &&
 (st1=(struct stack *)malloc(sizeof(struct stack)))) {
 st1->pStr = p;
 st1->next = *pst;
 *pst = st1;
 }else
 printf("heap space out:%s not processed",buf);
}

dump_stack(struct stack*st) /*for control setting or form 
 setting only */
{
while(st){
 fprintf(fOut," %s\n",st->pStr);
 st = st->next;
 }
}

dump_combos(struct stack *st) /* for combos */
{
 int n;

 n=nComboBox;
 while(st) {
 fprintf(fOut," ' %s\n",st->pStr);
 dump_combo(stCobDef,st,--n);
 st = st->next;
 }
}


dump_combo(struct stack *stCobDef, struct stack *st,int n)
{
 while (stCobDef && (strcmp(stCobDef->pStr+1,st->pStr))) 
 stCobDef = stCobDef->next; 
 if (stCobDef) 
 stCobDef = stCobDef->next; 
 while (stCobDef && (stCobDef->pStr[0]!='@' )) {
 fprintf(fOut, "Combo1(%d).AddItem \"%s\"\n",n, stCobDef->pStr);
 stCobDef= stCobDef->next;
 }
}
char * trail_sp(char *p) /* take out spaces at the end of string*/
{
 int n;
 n=strlen(p)-1;
 while((n>=0) && isspace(p[n]))
 p[n--]='\0';
 return p;
}
 
print_pos(char *p, int left,int wAdj) /* p: the string left: starting pos
 wAdj: adjustment for strlen(p) */
{

 fprintf(fOut," FontBold = 0 'False\n");
 fprintf(fOut," FontItalic = 0 'False\n");
 fprintf(fOut," FontName = \"Courier New\"\n");
 fprintf(fOut," FontSize = 8.25\n");
 fprintf(fOut," FontStrikethru = 0 'False\n");
 fprintf(fOut," Width = %d\n", strlen(p)*nPerChar+wAdj);
 fprintf(fOut," Top = %d\n", (nLine)*nPerLine);
 fprintf(fOut," Height = %d\n", DEF_ControlHeight);
 fprintf(fOut," Left = %d\n", left*nPerChar+nLeftMargin);
 dump_stack(stCntl);
 fprintf(fOut," End\n");

}

proc_form(char *buf) /* processing text part*/
{
 char *p,*p1;
 int n;

 p = buf;
 while(*p) {
 p1=p;
 switch (*p) {
 case ' ':
 p+=strspn(p," ");
 break;
 case '_':
 n=strspn(p,"_");
 memcpy(buf2,p,n);
 buf2[n]='\0';
 p+=n;
 fprintf(fOut," Begin TextBox Text1\n");
 fprintf(fOut," BorderStyle = 0 \n");
 fprintf(fOut," Index = %d\n",nTextBox++);

 fprintf(fOut," FontUnderline = -1 \n");
 fprintf(fOut," Text = \"%*s\"\n",strlen(buf2)," ");;
 print_pos(buf2,p1-buf,0);
 break;

 case '*':
 n = strcspn(p+1, "~_*");
 n++;
 memcpy(buf2,p,n);
 if (*(p+n)=='~') p++;
 buf2[n]='\0';
 p+=n;
 if ((n>2) && (buf2[1]=='@')) {
 trail_sp(buf2+2);
 fprintf(fOut," Begin ComboBox Combo1\n");
 push_stack(buf2+2,&stCobList);
 fprintf(fOut," Text = \"%s\"\n", buf2+2);
 fprintf(fOut," Index = %d\n",nComboBox++);
 print_pos(buf2,p1-buf,DEF_CHECK_BOX_ADJ); 
 break;
 }else {
 trail_sp(buf2+1);
 fprintf(fOut," Begin CheckBox CheckBox1\n");
 fprintf(fOut," Caption = \"%s\"\n", buf2+1);
 fprintf(fOut," Index = %d\n",nCheckBox++);
 fprintf(fOut," FontUnderline = 0 'False\n");
 print_pos(buf2,p1-buf,DEF_CHECK_BOX_ADJ); 
 break;
 }

 default:
 n = strcspn(p, "_*~"); 
 memcpy(buf2,p,n);
 if (*(p+n)=='~') p++;
 p+=n;
 buf2[n]='\0';
 if (n-- >0) 
 while(n>=0 && (buf2[n] == ' '))
 buf2[n--]='\0'; 
 fprintf(fOut," Begin Label Label1\n");
 fprintf(fOut," Index = %d\n",nLabel++);
 fprintf(fOut," Caption = \"%s\"\n", buf2);
 fprintf(fOut," FontUnderline = 0 'False\n");
 print_pos(buf2,p1-buf,0);
 break;
 }
 }
 nLine++;
}

write_header()
{ 
 fHeaderNotWritten =0;

 printf("%d Lines, %d Command lines, %d Chars per line max\n",
 nNumOfLines, nNumOfCmdLines, nMaxLineLength); 

 fprintf(fOut, "Begin Form %s\n",sFormName);
 fprintf(fOut, " Caption = \"%s\"\n",sFormCaption);

 nFormWidth= (long) DEF_ClientWidth* (long) nMaxLineLength/DEF_LINE_LEN;
 fprintf(fOut, " Width = %d\n",nFormWidth);

 nFormHeight= (long) DEF_ControlHeight* 
 (long)(nNumOfLines-nNumOfCmdLines + 2); 

 fprintf(fOut, " Height = %d\n",nFormHeight);

 fprintf(fOut, " Top = 100\n");
 fprintf(fOut, " Left = 100\n");
 dump_stack(stForm);
}

int proc_command(char *buf) /* command part*/
{
 char *vars, *value;

 if (!buf)
 return 0;
 
 nNumOfCmdLines++;

 value = buf + strcspn(buf, ":") + 1;

 trail_sp(value);
 
 vars= strtok(buf, " :");

 
 if (vars){
 if (!strcmp(vars,"}")){
 sFormName[FORM_NAME_LEN -1 ] ='\0';
 sFormCaption[FORM_CAPTION_LEN -1] = '\0';
 write_header();
 return 1;
 } 

 if (!fFormNameSpec) {
 if (!_stricmp(vars, "FormName"))
 strncpy(sFormName,value,FORM_NAME_LEN);
 else if (!_stricmp(vars, "FormCaption"))
 strncpy(sFormCaption, value, FORM_CAPTION_LEN);
 }
 if(!_stricmp(vars, "FormSet")) 
 push_stack(value, &stForm);
 else if (!_stricmp(vars,"ControlSet"))
 push_stack(value, &stCntl);
 else if (!_stricmp(vars,"ComboSet"))
 push_stack(value + strspn(value," "), &stCobDef);
 else if (!_stricmp(vars,"Summary"))
 fSummary =1;
 else if (!_stricmp(vars,"LeftMargin")) 
 nLeftMargin=atoi(value);
 else if (!_stricmp(vars,"CharWidth"))
 nPerChar = atoi(value);
 else if (!_stricmp(vars,"LineHeight"))
 nPerLine = atoi(value);
 } 
 return 0;

}
 

fatal(char *msg) 
{
 perror(msg);
 exit(1);
}

count_lines()
{

 fgets(buf, LINE_LEN_MAX-1,fIn);

 nNumOfLines =0;
 nMaxLineLength = 0;

 while (! feof(fIn)) {
 nNumOfLines++;
 buf[LINE_LEN_MAX -1]='\0';
 if (nMaxLineLength < strlen(buf))
 nMaxLineLength = strlen(buf);
 fgets(buf, LINE_LEN_MAX-1,fIn);
 }

}

main(int argc, char *argv[])
{
 char *rest;
 int fFormPart = 0;

 nPerChar = DEF_ClientWidth/DEF_LINE_LEN;
 nPerLine = DEF_ControlHeight;

 if (argc<3){
 printf("Usage: %s <form description file> <VB form file name> [options]
\n",argv[0]);
 printf("Options: <form name>, -f fast mode, ");
 exit(1);
 }

 if (argc==4){
 strncpy(sFormName, argv[3], FORM_NAME_LEN);
 sFormName[FORM_NAME_LEN -1] = '\0';
 strncpy(sFormCaption, argv[3], FORM_CAPTION_LEN);
 sFormName[FORM_CAPTION_LEN -1] = '\0';
 fFormNameSpec = 1;
 }
 
 if ((fIn=fopen(argv[1],"rt")) == NULL) 
 fatal(argv[1]);
 
 count_lines();

 if (fseek(fIn, 0L, SEEK_SET))
 fatal("fseek Input");
 
 if ((fOut=fopen(argv[2],"wt")) == NULL) 
 fatal(argv[2]);


 fprintf(fOut,"VERSION 2.00\n");

 fgets(buf, LINE_LEN_MAX-1,fIn);
 while (! feof(fIn)) {
 
 if(rest=strchr(buf,'\n'))
 *rest = '\0'; 
 if (firstLine && (strcmp(buf,"{"))) {
 fFormPart = 1;
 write_header();
 }
 firstLine = 0;
 if (! fFormPart) {
 if (proc_command(buf))
 fFormPart = 1; /* form started */
 }else
 proc_form(buf); 

 fgets(buf, LINE_LEN_MAX-1,fIn); 
 }
 
 if (fHeaderNotWritten)
 write_header();

 fprintf(fOut, "End\n");

 fprintf(fOut, "Sub Form_Load()\n");
 if (fSummary){
 fprintf(fOut, "nTextBox = %d\n",nTextBox);
 fprintf(fOut, "nComboBox = %d\n",nComboBox);
 fprintf(fOut, "nCheckBox = %d\n",nCheckBox);
 fprintf(fOut, "nFormHeight = %d\n",nFormHeight);
 fprintf(fOut, "nFormWidth = %d\n",nFormWidth);
 }
 if (nComboBox) 
 dump_combos(stCobList);
 fprintf(fOut, "End Sub\n"); 

 fclose(fIn);
 fclose(fOut);
 printf("%d Labels, %d TextBoxes, %d Checkboxes, %d Combos\n",
 nLabel,nTextBox,nCheckBox, nComboBox);
 return 0;
}
End Listing

















Adding Auxiliary Views for Windows Apps


Taking advantage of MFC's document/view support




Robert Rosenberg


Bob is a systems engineer and MFC/ C++ instructor for STEP Technology, in
Portland, Oregon.


Many applications require, or would benefit from, multiple types of views of
the application data. Coincidentally, one strength of the Microsoft Foundation
Classes (MFC) is its support of the document/view architecture. In this
article, I'll examine how MFC supports the multiple-view facility, presenting
code for implementing three types of views in an application. I'll also
address the topic of customizing the titles that appear in MDI (Multiple
Document Interface) frame windows, one of the first obstacles you'll encounter
when building multiple-view applications.


A New MDI Child: Theory 


The document template is the unifying construct for the MFC multiple-view
architecture--specifically, the class CMultiDocTemplate represents the
blueprint used to create a new frame-view-document-menu-icon combination
called an "MDI Child."
When a user running a Windows program invokes File/New or File/ Open, a new
window pops up on the screen. In an MDI MFC application, this new window is a
"child frame." Inside the frame is a view. Associated with the view is a
document, which manages the data being presented in the view. Additionally,
the main menu may change to a menu specifically associated with the active
view. When the frame is minimized, the associated icon is used for displaying
the minimized window. These associations (the MFC "plumbing") and the dynamic
creation of the frame and view objects are handled through the document
template. The CMultiDocTemplate::CreateNewFrame function is the high-level
call you use from your application. The document template keeps a list of all
documents created according to its blueprint. Each document in turn keeps a
list of its views. Each view, by virtue of its participation in the Windows
system hierarchy of parent and child windows, can find its way back to its
encompassing frame. Up at the top, the application object keeps a list of
document templates and a pointer to its main frame window.
When a user selects Window/New from the menu, a new MDI Child also pops up,
but this time, it's an additional view associated with an existing document.
Which document? The active document. What's an active document? The document
associated with the active view. What's an active view? The view inside of the
active frame, with the highlighted title bar. Again,
CMultiDocTemplate::CreateNewFrame is the function to call. This time, however,
its first parameter, a document pointer, should point to the document
associated with the active view. Then you'll get another view on the same
document.
To get a different type of view (graphical view, data-entry-form view,
text-editor view, data-cell view, or the like) of the same data or document,
just use a unique document template. How? Since the CreateNewFrame function is
a member function of the CMultiDocTemplate class, you call it through a
template pointer. When created, the template object stores the information
necessary to create a specific type of MDI child frame and a specific type of
view (that is, a C++ class) associated with a specific type of document. When
creating a new-document template instance, you pass the necessary information
about these classes to the CMultiDocTemplate constructor, along with a
resource ID that identifies a menu, an icon, and a string resource. The string
resource, in turn, is a string composed of seven substrings, delimited by \n,
which supply additional information for the framework. 


A New MDI Child: Practice


The accompanying listings present the source code for the C++ classes in a
sample application called "Seminar." As Figure 1 illustrates, Seminar is a
program that displays a view similar to a seating chart for a classroom of ten
students. One auxiliary view displays a data-entry form that contains
information about a single student. Another auxiliary view provides a simple
text editor for keeping notes on that student.
Seminar is a typical Visual-C++ MFC application initially generated with
App-Wizard. AppWizard generates other large portions of the code. In the
listings, comments starting with //*** identify code that the human programmer
would write. Table 1(a) lists the C++ classes generated by AppWizard, while
Table 1(b) lists the classes generated by ClassWizard or coded by hand. The
complete Seminar sources (available electronically; see "Availability," page
3) include additional files, such as resources. For this program, I have used
the default resources created by App-Wizard, changing only the menus and
string-table entries for the new templates.
While Seminar's source code demonstrates a number of techniques for
communicating between document and views, my discussion here focuses only on
those programming techniques that directly apply to adding an auxiliary MDI
Child to an application.


Creating the New Auxiliary-View Classes 


To start with, you need a new CView-derived class. In Seminar, there are two
auxiliary-view classes, CStudentView and CNotesView. CStudentView is derived
from CFormView using the AppStudio and ClassWizard tools. The results are
studview.h and studview.cpp (see Listings Eleven and Twelve, pages 37 and 38,
respectively). For CNotesView, derive the class from CEditView by generating a
CView-derived class with ClassWizard and changing the base class from CView to
CEditView. Add a GetDocument override to return a pointer to the document cast
to a CSeminarDoc pointer. Add code to support the transfer of data between
document and view and for updating the view; see Listings Thirteen and
Fourteen (pages 38 and 39, respectively).


Adding Resources for the New Templates


To create a new document template, you need a frame class, document class,
view class, and resources. You can use the standard CMDIChildWnd class for the
frame, CSeminarDoc for the document, and the new view classes described in
Step 1. As for the resources, you need a menu, icon, and string resource, each
identified by the same resource ID. These can be created using AppStudio by
cutting and pasting the corresponding resources generated by AppWizard. For
example, use copy and paste to duplicate the menu resource identified as
IDR_CLASSTYPE, and rename the ID to IDR_STUDENTTYPE. Do the same for the
IDR_CLASSTYPE icon, and the IDR_CLASSTYPE string resource. In the string
resource, remove the substring between the second and third \n delimiters; see
Example 1. This will prevent the framework from treating this template as the
source of a possible primary view. Otherwise, for File/New or File/Open, the
user will be presented with a dialog listing the possible primary views. This
third substring in the document template string is the text that will be
displayed to the user to identify this MDI Child. 


Creating and Initializing the New Templates


You now have all the pieces for a document template. Since you'll want access
to a particular template while the application is running, keep a data member
to hold the document-template pointer when you've created it. The application
class is an appropriate place for this data member (see Listings One and Two).
Create the document template in the InitInstance function and provide a
public-access function to retrieve the pointer from another class; see Example
2.
Template instances that are added to the application's template list with a
call to AddDocTemplate are deleted in the application's destructor, so an
explicit call to delete m_student_view_template should not be added to the
ExitInstance function. If you created a template object and did not call
AddDocTemplate, then you should delete the template object in ExitInstance.
In Seminar, CRoomView is derived from the FormViewclass and functions as the
primary view. To make this work, just find the creation of the document
template that AppWizard put into the InitInstance function and change
CSeminarView to CRoomView.


Creating and Opening Auxiliary Views



Now go to the document class, where you add a function to create the auxiliary
view (see Listings Five and Six). In this example, you create the
CSeminarDoc::ViewStudent function. The idea is to retrieve the appropriate
template pointer and then call the necessary functions in the template class.
Since the application object is globally accessible through the global C
function ::AfxGetApp, you can retrieve the document template for the student
view. Use that template pointer to call CreateNewFrame(this, NULL). The first
parameter is a pointer to a document. Since the ViewStudent function is in the
document class, you use the this pointer. If not NULL, the second parameter is
a pointer to an existing frame on which the new frame is based. This is used
by the MFC framework in implementing the Window/New command. That parameter
doesn't matter here, so just use NULL. The other required mumbo jumbo is to
invoke InitialUpdateFrame through the template object, passing the new frame
and the document as parameters: doc_template->InitialUpdateFrame(
m_student_frame, this );.


User Interface Issues


To support the user interface, you need something that calls ViewStudent.
Typically, this would be in a handler for a user event, like a menu-item
selection in the View menu, or a mouse click on an item about which more
detail is desired. In the case of the Seminar application, ViewStudent is
invoked automatically whenever a new CRoomView is created. It does this by
overriding CRoomView::OnInitialUpdate. However, if the user had closed an
auxiliary view (or minimized it) it will reappear when the user clicks a
student button in the room view. These details are repeated for CNotesView to
provide the Seminar application with two auxiliary views.
In addition to the previously described incantations for creating a new MDI
Child, there are a few other practical details to consider. First, in the case
of auxiliary views, some user action represents a request to bring up that
view. In the Seminar example, the user clicks on a student button in the room
view. The auxiliary views are created, restored, or brought to the top, as
appropriate. The Seminar code handles these situations in a single function
for each view. The functions CSeminarDoc::ViewStudent and
CSeminarDoc::ViewNotes demonstrate this. A data member in the document stores
the pointer to the MDI Child frame. Zero implies that the frame does not exist
and needs to be created. Nonzero implies the frame already exists; however, it
may be minimized. A call to IsIconic() checks this. If it is minimized, then a
call to ShowWindow( SW_RESTORE ) will bring the window back up and activate
it. If the frame is not minimized, it may be "behind" other windows. In this
case, a call to MDIActivate() will bring the window to the front and activate
it. If you want to bring the window to the front without activating it, use
the SetWindowPos function (see Example 3).
Now you have your bases covered. Users get what they probably expect, and you
have a single function to call that can be invoked from handlers for various
user events. Often, you will want to supply the user with more than one way to
bring up auxiliary views, such as mouse clicks, mouse double clicks, or
menu-item selections. You may want to add an item to the View menu to let the
user control the appearance and disappearance of an auxiliary view.


Custom Framing


A typical document template uses the standard CMDIChildWnd class as the type
of frame. The CMDIChildWnd class supports automatic titling for the frames.
You may have noticed titles such as "Class1.sem:1", "Class1.sem:2" in the
title bars for MDI Children. To customize titling of your child frames,
override the OnUpdateFrameTitle function, derive your own class from
CMDIChildWnd, and supply the OnUpdateFrameTitle override. Now use your own
frame class in place of CMDIChildWnd in the constructor of the appropriate
document template. The Seminar application uses the class CNotesFrame to
display the student name as the frame title. In the CSeminarApp::InitInstance
function, CNotesFrame is used as the frame-class parameter when constructing
the document template; see Example 4.
Figure 1 The Seminar application depicts three views of a single document. The
primary view is similar to a seating chart, the first auxiliary view shows
information on a selected student, and the second auxiliary view is a
free-form notepad for additional comments about the same student.
Table 1: (a) C++ classes generated by AppWizard; (b) classes either generated
by ClassWizard or coded by hand.
(a)
SEMINAR.H (listing 1) The application-framework class.
SEMINAR.CPP (listing 2)
MAINFRM.H (listing 3) The main MDI frame window with menu.
MAINFRM.CPP (listing 4)
SEMDOC.H (listing 5) The Document class.
SEMDOC.CPP (listing 6)
ROOMVIEW.H (listing 7) The primary View class.
ROOMVIEW.CPP (listing 8)

(b)
STUDENT.H (listing 9) The simple CStudent data object.
STUDENT.CPP (listing 10)
STUDVIEW.H (listing 11) The first aux View class (form view).
STUDVIEW.CPP (listing 12)
NOTEVIEW.H (listing 13) The second aux View class (notes view).
NOTEVIEW.CPP (listing 14)
NOTESFRM.H (listing 15) Customized Frame class (notes view).
NOTESFRM.CPP (listing 16)
Example 1: (a) Removing the substring between the second and third \n
delimiters; (b) the result.
(a) \nClass\nCLASS Document\nCLASS Files(*.sem)\n.sem\nClassFileType
 \nCLASSFile Type

(b) \nClass\n\nCLASS Files(*.sem)\n.sem\nClassFileType\nCLASS File Type
Example 2: Creating the document template in the InitInstance function and
providing a public-access function to retrieve the pointer from another class.
class CSeminarApp : public CWinApp
{
 ...
protected:
 CMultiDocTemplate *m_student_view_template;
public:
 CMultiDocTemplate *GetTemplateForStudentView()
 { return m_student_view_template; }
};
BOOL CSeminarApp::InitInstance()
{
 ...
m_student_view_template = new CMultiDocTemplate(IDR_STUDENTTYPE,
 RUNTIME_CLASS(CSeminarDoc),
 RUNTIME_CLASS(CMDIChildWnd),
 RUNTIME_CLASS(CStudentView));
AddDocTemplate( m_student_view_template );

 ...
}
Example 3: Using the SetWindowPos function.
 SWP_NOMOVE SWP_NOSIZE SWP_NOACTIVATE );
Example 4: CNotesFrame is used as the frame-class parameter when constructing
the document template.
m_notes_template = new CMultiDocTemplate(IDR_NOTESTYPE,
 RUNTIME_CLASS(CSeminarDoc),
 RUNTIME_CLASS(CNotesFrame),
 RUNTIME_CLASS(CNotesView));

Listing One

//////////////////////////////////////////////////////////////////////
// seminar.h : main header file for the Seminar application
// and interface for the CSeminarApp class

#ifndef __AFXWIN_H__
 #error include 'stdafx.h' before including this file for PCH
#endif

#include "resource.h" // main symbols

class CSeminarApp : public CWinApp
{
public:
 CSeminarApp();

// Attributes
protected:
 //*** Document templates for the auxiliary views
 CMultiDocTemplate *m_student_view_template;
 CMultiDocTemplate *m_notes_template;
 
public:
 //*** Access functions for the auxiliary document templates.
 CMultiDocTemplate *GetTemplateForStudentView();
 CMultiDocTemplate *GetTemplateForNotesView();
// Overrides
 virtual BOOL InitInstance();
// Implementation
 //{{AFX_MSG(CSeminarApp)
 afx_msg void OnAppAbout();
 // NOTE - the ClassWizard will add and remove functions here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};
//*** Inline implementations for access functions.
inline CMultiDocTemplate *CSeminarApp::GetTemplateForStudentView()
 { return m_student_view_template; }
inline CMultiDocTemplate *CSeminarApp::GetTemplateForNotesView()
 { return m_notes_template; }



Listing Two

//////////////////////////////////////////////////////////////////////
// seminar.cpp : implementation of the CSeminarApp class and of the

// CAboutDlg class.

#include "stdafx.h"
#include "seminar.h"
#include "mainfrm.h" 
#include "semdoc.h"
#include "roomview.h"
#include "studview.h"
#include "notesfrm.h"
#include "noteview.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

BEGIN_MESSAGE_MAP(CSeminarApp, CWinApp)
 //{{AFX_MSG_MAP(CSeminarApp)
 ON_COMMAND(ID_APP_ABOUT, OnAppAbout)
 // NOTE - ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG_MAP
 // Standard file based document commands
 ON_COMMAND(ID_FILE_NEW, CWinApp::OnFileNew)
 ON_COMMAND(ID_FILE_OPEN, CWinApp::OnFileOpen)
END_MESSAGE_MAP()

//////////////////////////////////////////////////////////////////////
// CSeminarApp construction

CSeminarApp::CSeminarApp()
{
}
//////////////////////////////////////////////////////////////////////
// The one and only CSeminarApp object

CSeminarApp NEAR theApp;

//////////////////////////////////////////////////////////////////////
// CSeminarApp initialization

BOOL CSeminarApp::InitInstance()
{
 // Standard initialization
 SetDialogBkColor(); // set dialog background color to gray
 LoadStdProfileSettings(); // Load standard INI file options

 // Register the application's document templates. Document
 // templates serve as the connection between documents,
 // frame windows and views.

 //*** Modify creation of primary document template to use
 //*** the CRoomView class (as a primary form view).
 AddDocTemplate(new CMultiDocTemplate(IDR_CLASSTYPE,
 RUNTIME_CLASS(CSeminarDoc),
 RUNTIME_CLASS(CMDIChildWnd),
 RUNTIME_CLASS(CRoomView)));

 //*** Create document templates for auxiliary views 

 m_student_view_template = new CMultiDocTemplate(IDR_STUDENTTYPE,
 RUNTIME_CLASS(CSeminarDoc),
 RUNTIME_CLASS(CMDIChildWnd),
 RUNTIME_CLASS(CStudentView));
 AddDocTemplate( m_student_view_template ); 
 m_notes_template = new CMultiDocTemplate(IDR_NOTESTYPE,
 RUNTIME_CLASS(CSeminarDoc),
 RUNTIME_CLASS(CNotesFrame),
 RUNTIME_CLASS(CNotesView));
 AddDocTemplate( m_notes_template ); 

 // create main MDI Frame window
 CMainFrame* pMainFrame = new CMainFrame;
 if (!pMainFrame->LoadFrame(IDR_MAINFRAME))
 return FALSE;
 pMainFrame->ShowWindow(m_nCmdShow);
 pMainFrame->UpdateWindow();
 m_pMainWnd = pMainFrame;

 // enable file manager drag/drop and DDE Execute open
 m_pMainWnd->DragAcceptFiles();
 EnableShellOpen();
 RegisterShellFileTypes();

 // simple command line parsing
 if (m_lpCmdLine[0] == '\0')
 {
 // create a new (empty) document
 OnFileNew();
 }
 else if ((m_lpCmdLine[0] == '-' m_lpCmdLine[0] == '/') &&
 (m_lpCmdLine[1] == 'e' m_lpCmdLine[1] == 'E'))
 {
 // program launched embedded - wait for DDE or OLE open
 }
 else
 {
 // open an existing document
 OpenDocumentFile(m_lpCmdLine);
 }

 return TRUE;
}

//////////////////////////////////////////////////////////////////////
// CAboutDlg dialog used for App About

class CAboutDlg : public CDialog
{
public:
 CAboutDlg();

// Dialog Data
 //{{AFX_DATA(CAboutDlg)
 enum { IDD = IDD_ABOUTBOX };
 //}}AFX_DATA

// Implementation
protected:

 virtual void DoDataExchange(CDataExchange* pDX);
 //{{AFX_MSG(CAboutDlg)
 // No message handlers
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};

CAboutDlg::CAboutDlg() : CDialog(CAboutDlg::IDD)
{
 //{{AFX_DATA_INIT(CAboutDlg)
 //}}AFX_DATA_INIT
}

void CAboutDlg::DoDataExchange(CDataExchange* pDX)
{
 CDialog::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CAboutDlg)
 //}}AFX_DATA_MAP
}

BEGIN_MESSAGE_MAP(CAboutDlg, CDialog)
 //{{AFX_MSG_MAP(CAboutDlg)
 // No message handlers
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

// App command to run the dialog
void CSeminarApp::OnAppAbout()
{
 CAboutDlg aboutDlg;
 aboutDlg.DoModal();
}



Listing Three

//////////////////////////////////////////////////////////////////////
// mainfrm.h : interface for the CMainFrame class

class CMainFrame : public CMDIFrameWnd
{
 DECLARE_DYNAMIC(CMainFrame)
public:
 CMainFrame();

// Attributes
public:

// Operations
public:

// Implementation
public:
 virtual ~CMainFrame();
#ifdef _DEBUG
 virtual void AssertValid() const;
 virtual void Dump(CDumpContext& dc) const;
#endif


// Generated message map functions
protected:
 //{{AFX_MSG(CMainFrame)
 afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct);
 afx_msg void OnDestroy();
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};



Listing Four

//////////////////////////////////////////////////////////////////////
// mainfrm.cpp : implementation of the CMainFrame class

#include "stdafx.h"
#include "seminar.h"
#include "mainfrm.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
 
//*** Constants for saving/restoring window size in .ini file
static LPCSTR frame_size = "Frame Size";
static LPCSTR top = "Top";
static LPCSTR left = "Left";
static LPCSTR bottom = "Bottom";
static LPCSTR right = "Right"; 
#define FRAME_LEFT 0
#define FRAME_TOP 0
#define FRAME_RIGHT 625
#define FRAME_BOTTOM 375
 
//////////////////////////////////////////////////////////////////////
// CMainFrame

IMPLEMENT_DYNAMIC(CMainFrame, CMDIFrameWnd)

BEGIN_MESSAGE_MAP(CMainFrame, CMDIFrameWnd)
 //{{AFX_MSG_MAP(CMainFrame)
 ON_WM_CREATE()
 ON_WM_DESTROY()
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

//////////////////////////////////////////////////////////////////////
// CMainFrame construction/destruction

CMainFrame::CMainFrame()
{
}
CMainFrame::~CMainFrame()
{
}
//////////////////////////////////////////////////////////////////////

// CMainFrame diagnostics

#ifdef _DEBUG
void CMainFrame::AssertValid() const
{
 CMDIFrameWnd::AssertValid();
}
void CMainFrame::Dump(CDumpContext& dc) const
{
 CMDIFrameWnd::Dump(dc);
}
#endif //_DEBUG

/////////////////////////////////////////////////////////////////////////////
// CMainFrame message handlers

int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct)
{
 if (CMDIFrameWnd::OnCreate(lpCreateStruct) == -1)
 return -1;
 
 //*** Retrieve size and position of main window from .ini file
 CRect r;
 GetWindowRect( r );
 CWinApp *app = ::AfxGetApp();
 if ( app != 0 )
 {
 r.left = app->GetProfileInt( frame_size, left, FRAME_LEFT );
 r.top = app->GetProfileInt( frame_size, top, FRAME_TOP );
 r.right = app->GetProfileInt( frame_size, right, FRAME_RIGHT );
 r.bottom = app->GetProfileInt( frame_size, bottom,
 FRAME_BOTTOM );
 MoveWindow( r );
 }
 return 0;
}
void CMainFrame::OnDestroy()
{
 CMDIFrameWnd::OnDestroy();

 //*** Save size and position of main frame window in .ini file
 CRect r;
 GetWindowRect( r );
 CWinApp *app = ::AfxGetApp();
 if ( app != 0 )
 {
 app->WriteProfileInt( frame_size, left, r.left );
 app->WriteProfileInt( frame_size, top, r.top );
 app->WriteProfileInt( frame_size, right, r.right );
 app->WriteProfileInt( frame_size, bottom, r.bottom );
 }
}



Listing Five

//////////////////////////////////////////////////////////////////////
// semdoc.h : interface for the CSeminarDoc class

 
//*** constants used by document and its associated views
#define MAX_STUDENTS 10

#define ROOM_X 5
#define ROOM_Y 5
#define ROOM_WIDTH 505
#define ROOM_HEIGHT 155 
#define STUDENT_X (ROOM_X)
#define STUDENT_Y (ROOM_Y+ROOM_HEIGHT+5)
#define STUDENT_WIDTH 300
#define STUDENT_HEIGHT 160
#define NOTES_X (STUDENT_X+STUDENT_WIDTH+5)
#define NOTES_Y (STUDENT_Y)
#define NOTES_WIDTH 300
#define NOTES_HEIGHT 160

enum Action { ACTION_ACTIVATE = 1, ACTION_CLEAR, ACTION_CHANGENAME };

//*** forward references
class CStudent;
class CNotesFrame;

// class declaration
class CSeminarDoc : public CDocument
{
protected: // create from serialization only
 CSeminarDoc();
 DECLARE_DYNCREATE(CSeminarDoc)
// Attributes
protected:
 //*** data members for keeping track of open views.
 CMDIChildWnd *m_student_frame;
 CNotesFrame *m_notes_frame;
 CMDIChildWnd *m_room_frame;

 //*** data member for holding pointers to student objects
 CObArray m_students;
// Operations
public:
 //*** services provided by the document class
 void ViewStudent( int nIndex, BOOL bActivate );
 void ViewNotes( int nIndex, BOOL bActivate );
 CMultiDocTemplate *GetTemplateForStudentView();
 CMultiDocTemplate *GetTemplateForNotesView();
 void StudentViewClosed();
 void NotesViewClosed();
 void SetRoomFrame( CMDIChildWnd *frame );
 void CloseStudentView();
 void CloseNotesView();
 void RetrieveDataFromViews();
 CStudent *GetStudent( int nIndex );
// Implementation
public:
 virtual ~CSeminarDoc();
 virtual void Serialize(CArchive& ar); // override for document i/o
#ifdef _DEBUG
 virtual void AssertValid() const;
 virtual void Dump(CDumpContext& dc) const;

#endif
protected:
 virtual BOOL OnNewDocument();
// Generated message map functions
protected:
 //{{AFX_MSG(CSeminarDoc)
 // NOTE - the ClassWizard will add and remove functions here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};
//*** Inline implementations for access functions.
inline void CSeminarDoc::SetRoomFrame( CMDIChildWnd *frame )
 { m_room_frame = frame; }
inline CStudent *CSeminarDoc::GetStudent( int nIndex )
 { return (CStudent *)m_students.GetAt( nIndex ); }



Listing Six

//////////////////////////////////////////////////////////////////////
// semdoc.cpp : implementation of the CSeminarDoc class

#include "stdafx.h"
#include "seminar.h"
#include "student.h"
#include "semdoc.h"
#include "studview.h"
#include "notesfrm.h"
#include "noteview.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

IMPLEMENT_DYNCREATE(CSeminarDoc, CDocument)

BEGIN_MESSAGE_MAP(CSeminarDoc, CDocument)
 //{{AFX_MSG_MAP(CSeminarDoc)
 // NOTE - ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

//////////////////////////////////////////////////////////////////////
// CSeminarDoc construction/destruction

CSeminarDoc::CSeminarDoc()
{
 //*** Initialize data members.
 m_student_frame = 0;
 m_notes_frame = 0;
 m_room_frame = 0;

 //** Create collection of 'blank' student objects.
 for ( int i = 0; i < MAX_STUDENTS; i++ )
 {

 m_students.Add( new CStudent() );
 }
}
CSeminarDoc::~CSeminarDoc()
{
 //*** Destroy collection of student objects.
 CStudent *student;
 for ( int i = 0; i < MAX_STUDENTS; i++ )
 {
 student = (CStudent *)m_students.GetAt( i );
 if ( student != 0 )
 {
 delete student;
 }
 }
 m_students.RemoveAll();
}
BOOL CSeminarDoc::OnNewDocument()
{
 if (!CDocument::OnNewDocument())
 return FALSE;
 // TODO: add reinitialization code here
 // (SDI documents will reuse this document)
 return TRUE;
}
//////////////////////////////////////////////////////////////////////
// CSeminarDoc serialization

void CSeminarDoc::Serialize(CArchive& ar)
{ 
 //*** If storing, retrieve data for currently selected student
 if ( ar.IsStoring )
 { 
 RetrieveDataFromViews();
 }
 //*** Serialize the collection of students
 m_students.Serialize( ar );
}
//////////////////////////////////////////////////////////////////////
// CSeminarDoc diagnostics

#ifdef _DEBUG
void CSeminarDoc::AssertValid() const
{
 CDocument::AssertValid();
}
void CSeminarDoc::Dump(CDumpContext& dc) const
{
 CDocument::Dump(dc);
}
#endif //_DEBUG

//*** Provide access to document template pointer stored in
//*** application class.
CMultiDocTemplate *CSeminarDoc::GetTemplateForStudentView()
{ 
 CSeminarApp *app = (CSeminarApp *)::AfxGetApp();
 ASSERT( app != 0 );
 return app->GetTemplateForStudentView();

}
//*** Provide access to document template pointer stored in
//*** applicatino class.
CMultiDocTemplate *CSeminarDoc::GetTemplateForNotesView()
{ 
 CSeminarApp *app = (CSeminarApp *)::AfxGetApp();
 ASSERT( app != 0 );
 return app->GetTemplateForNotesView();
}
//*** Bring up the student view, creating it if necessary
void CSeminarDoc::ViewStudent( int nIndex, BOOL bActivate )
{ 
 if ( m_student_frame != 0 ) 
 {
 //*** If the frame already exists, then check to see if the
 //*** frame window is minimized. If so, restoring it will
 //*** open it back up and activate it.
 if ( m_student_frame->IsIconic() )
 {
 m_student_frame->ShowWindow( SW_RESTORE );
 }
 else if ( bActivate )
 { 
 //*** If the frame already exists, but is not miminimized,
 //*** then just call MDIActivate(), which will bring it
 //*** front and activate it.
 m_student_frame->MDIActivate();
 }
 else
 {
 //*** Bring frame to the front, but don't activate it.
 m_student_frame->SetWindowPos( &CWnd::wndTop, 0, 0, 0, 0,
 SWP_NOMOVE SWP_NOSIZE SWP_NOACTIVATE );
 }
 }
 else
 { 
 //*** If the frame has not yet been created, create it here by
 //*** using the appropriate document template pointer.
 CMultiDocTemplate *doc_template = GetTemplateForStudentView();
 if ( doc_template != 0 )
 {
 m_student_frame = 
 (CMDIChildWnd *)doc_template->CreateNewFrame(this, NULL);
 if( m_student_frame != 0 )
 { 
 if ( m_room_frame != 0 )
 {
 //*** Initialize size and position of the frame.
 CRect roomRect;
 m_room_frame->GetWindowRect( roomRect );
 m_student_frame->SetWindowPos( 0, STUDENT_X, 
 STUDENT_Y, STUDENT_WIDTH, STUDENT_HEIGHT,
 SWP_NOZORDER SWP_NOACTIVATE );
 }
 m_student_frame->ShowWindow( SW_SHOW );
 m_student_frame->UpdateWindow();
 doc_template->InitialUpdateFrame(m_student_frame, this );
 }

 }
 } 
 if ( m_student_frame != 0 )
 {
 //*** Initialize view with appropriate student data.
 CStudentView *view = 
 (CStudentView *)m_student_frame->GetActiveView();
 if(view != 0)
 {
 view->SetStudent( (CStudent *)m_students.GetAt( nIndex ) );
 }
 } 
}
//*** Bring up the notes view, creating it if necessary.
//*** This code is similar to the preceding ViewStudent function.
//*** and the comments in ViewStudent apply as appropriate here.
void CSeminarDoc::ViewNotes( int nIndex, BOOL bActivate )
{ 
 if ( m_notes_frame != 0 ) 
 {
 if ( m_notes_frame->IsIconic() )
 {
 m_notes_frame->ShowWindow( SW_RESTORE );
 }
 else if ( bActivate )
 { 
 m_notes_frame->MDIActivate();
 }
 else
 {
 m_notes_frame->SetWindowPos( &CWnd::wndTop, 0, 0, 0, 0,
 SWP_NOMOVE SWP_NOSIZE SWP_NOACTIVATE );
 }
 }
 else
 { 
 CMultiDocTemplate *doc_template = GetTemplateForNotesView(); 
 if ( doc_template != 0 )
 {
 m_notes_frame = 
 (CNotesFrame *)doc_template->CreateNewFrame(this, NULL);
 if(m_notes_frame != 0)
 {
 m_notes_frame->SetWindowPos( 0, NOTES_X, NOTES_Y, 
 NOTES_WIDTH, NOTES_HEIGHT,
 SWP_NOZORDER SWP_NOACTIVATE );
 doc_template->InitialUpdateFrame(m_notes_frame, this);
 }
 }
 }
 if ( m_notes_frame != 0 )
 {
 CNotesView *view = 
 (CNotesView *)m_notes_frame->GetActiveView();
 if(view != 0)
 {
 view->SetStudent( (CStudent *)m_students.GetAt( nIndex ) );
 }
 m_notes_frame->OnUpdateFrameTitle( FALSE );

 } 
}
//*** Remember that the student view is gone, so that the ViewStudent
//*** function will know whether it needs to create it.
void CSeminarDoc::StudentViewClosed()
{
 m_student_frame = 0;
}
//*** Remember that the notes view is gone, so that the ViewNotes
//*** function will know whether it needs to create it.
void CSeminarDoc::NotesViewClosed()
{
 m_notes_frame = 0;
}
//*** Force closing and destruction of the student view and frame
void CSeminarDoc::CloseStudentView()
{
 if ( m_student_frame != 0 )
 {
 m_student_frame->PostMessage( WM_CLOSE );
 }
}
//*** Force closing and destruction of the notes view and frame
void CSeminarDoc::CloseNotesView()
{
 if ( m_notes_frame != 0 )
 {
 m_notes_frame->PostMessage( WM_CLOSE );
 }
}
//*** Transfer data from the views to the document
void CSeminarDoc::RetrieveDataFromViews()
{ 
 CStudentView *student_view;
 CNotesView *notes_view;
 
 if ( m_student_frame != 0 )
 {
 student_view = (CStudentView *)m_student_frame->GetActiveView();
 if ( student_view != 0 )
 {
 student_view->UpdateDocument();
 }
 }
 if ( m_notes_frame != 0 )
 {
 notes_view = (CNotesView *)m_notes_frame->GetActiveView();
 if ( notes_view != 0 )
 {
 notes_view->UpdateDocument();
 }
 }
} 



Listing Seven

//////////////////////////////////////////////////////////////////////

// roomview.h : interface for the CRoomView class

#ifndef __AFXEXT_H__
#include <afxext.h>
#endif
 
//*** forward references
class CSeminarDoc;

// class declaration 
class CRoomView : public CFormView
{
 DECLARE_DYNCREATE(CRoomView)
protected:
 CRoomView(); // protected constructor used by dynamic creation
// Form Data
public:
 //{{AFX_DATA(CRoomView)
 enum { IDD = IDD_CLASSROOM };
 // NOTE: the ClassWizard will add data members here
 //}}AFX_DATA

// Attributes
public:
 CSeminarDoc* GetDocument();
protected:

 //*** Data members for keeping track of active student.
 int m_button_id[MAX_STUDENTS];
 int m_active_index; 

// Operations
public:

 //*** Make the given student the active student, updating
 //*** other views.
 void ActivateStudent( int nIndex );

// Implementation
protected:
 virtual ~CRoomView();
 virtual void DoDataExchange(CDataExchange* pDX);

 //*** Provide one-time initialization for the view.
 virtual void OnInitialUpdate();

 //*** Override to update view with changes in data
 virtual void OnUpdate(CView* pSender, LPARAM lHint,
 CObject* pHint);

 // Generated message map functions
 //{{AFX_MSG(CRoomView)
 afx_msg void OnClickedButton1();
 afx_msg void OnClickedButton2();
 afx_msg void OnClickedButton3();
 afx_msg void OnClickedButton4();
 afx_msg void OnClickedButton5();
 afx_msg void OnClickedButton6();
 afx_msg void OnClickedButton7();

 afx_msg void OnClickedButton8();
 afx_msg void OnClickedButton9();
 afx_msg void OnClickedButton10();
 afx_msg void OnDestroy();
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};
//*** Inline implementations for access functions.
inline CSeminarDoc* CRoomView::GetDocument()
 { return (CSeminarDoc*) m_pDocument; }



Listing Eight

//////////////////////////////////////////////////////////////////////
// roomview.cpp : implementation of the CRoomView class

#include "stdafx.h"
#include "seminar.h"
#include "student.h"
#include "semdoc.h"
#include "roomview.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

IMPLEMENT_DYNCREATE(CRoomView, CFormView)

CRoomView::CRoomView()
 : CFormView(CRoomView::IDD)
{
 //{{AFX_DATA_INIT(CRoomView)
 // NOTE: the ClassWizard will add member initialization here
 //}}AFX_DATA_INIT
 m_button_id[0] = IDC_BUTTON1;
 m_button_id[1] = IDC_BUTTON2;
 m_button_id[2] = IDC_BUTTON3;
 m_button_id[3] = IDC_BUTTON4;
 m_button_id[4] = IDC_BUTTON5;
 m_button_id[5] = IDC_BUTTON6;
 m_button_id[6] = IDC_BUTTON7;
 m_button_id[7] = IDC_BUTTON8;
 m_button_id[8] = IDC_BUTTON9;
 m_button_id[9] = IDC_BUTTON10;
 
 m_active_index = 0;
}
CRoomView::~CRoomView()
{
}
void CRoomView::DoDataExchange(CDataExchange* pDX)
{
 CFormView::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CRoomView)
 // NOTE: the ClassWizard will add DDX and DDV calls here
 //}}AFX_DATA_MAP

}
BEGIN_MESSAGE_MAP(CRoomView, CFormView)
 //{{AFX_MSG_MAP(CRoomView)
 ON_BN_CLICKED(IDC_BUTTON1, OnClickedButton1)
 ON_BN_CLICKED(IDC_BUTTON2, OnClickedButton2)
 ON_BN_CLICKED(IDC_BUTTON3, OnClickedButton3)
 ON_BN_CLICKED(IDC_BUTTON4, OnClickedButton4)
 ON_BN_CLICKED(IDC_BUTTON5, OnClickedButton5)
 ON_BN_CLICKED(IDC_BUTTON6, OnClickedButton6)
 ON_BN_CLICKED(IDC_BUTTON7, OnClickedButton7)
 ON_BN_CLICKED(IDC_BUTTON8, OnClickedButton8)
 ON_BN_CLICKED(IDC_BUTTON9, OnClickedButton9)
 ON_BN_CLICKED(IDC_BUTTON10, OnClickedButton10)
 ON_WM_DESTROY()
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

//*** Override to initialize size and position of the window,
//*** to initialize the association with the document, and to
//*** activate the default student.
void CRoomView::OnInitialUpdate()
{
 CFormView::OnInitialUpdate();
 
 CMDIChildWnd *frame = (CMDIChildWnd *)GetParentFrame();
 if ( frame != 0 )
 {
 frame->SetWindowPos( 0, ROOM_X, ROOM_Y, ROOM_WIDTH,
 ROOM_HEIGHT, SWP_NOZORDER SWP_NOACTIVATE );
 }
 CSeminarDoc *pDoc = GetDocument();
 if ( pDoc != 0 )
 {
 pDoc->SetRoomFrame( frame );
 }
 ActivateStudent( 0 );
}
//*** Override to update view with changes in data
void CRoomView::OnUpdate(CView* pSender, LPARAM lHint, CObject* pHint)
{ 
 CButton *button;
 CStudent *student;
 CSeminarDoc *pDoc = GetDocument();

 if ( pDoc != 0 )
 {
 switch ( lHint )
 {
 case ACTION_CHANGENAME:
 //*** Update the button text for the active student.
 button =
 (CButton *)GetDlgItem( m_button_id[m_active_index] );
 student = pDoc->GetStudent( m_active_index );
 if ( button != 0 && student != 0 )
 {
 button->SetWindowText( student->GetName() );
 }
 break; 
 default:

 //*** If any of the student names have changed, update
 //*** the text for the corresponding button.
 for ( int i = 0; i < MAX_STUDENTS; i++ )
 { 
 CButton *button = (CButton *)GetDlgItem( m_button_id[i] );
 if ( button != 0 )
 { 
 CString button_name;
 button->GetWindowText( button_name );
 CStudent *student = pDoc->GetStudent(i);
 if ( student != 0
 && button_name != student->GetName() )
 { 
 button->SetWindowText( student->GetName() );
 } 
 } 
 } 
 }
 }
}
//*** Make the given student the active student,
//*** updating other views.
void CRoomView::ActivateStudent( int nIndex )
{
 CSeminarDoc *pDoc = GetDocument();
 if ( pDoc != 0 )
 {
 pDoc->RetrieveDataFromViews();
 m_active_index = nIndex;
 pDoc->ViewStudent( nIndex, FALSE );
 pDoc->ViewNotes( nIndex, FALSE );
 pDoc->UpdateAllViews( NULL, ACTION_ACTIVATE, 
 pDoc->GetStudent( nIndex ) );
 }
} 
 
//////////////////////////////////////////////////////////////////////
// CRoomView message handlers

void CRoomView::OnClickedButton1()
{
 //*** Activate the indicated student
 ActivateStudent( 0 ); 
}
void CRoomView::OnClickedButton2()
{
 //*** Activate the indicated student
 ActivateStudent( 1 ); 
}
void CRoomView::OnClickedButton3()
{
 //*** Activate the indicated student
 ActivateStudent( 2 ); 
}
void CRoomView::OnClickedButton4()
{
 //*** Activate the indicated student
 ActivateStudent( 3 ); 
}

void CRoomView::OnClickedButton5()
{
 //*** Activate the indicated student
 ActivateStudent( 4 ); 
}
void CRoomView::OnClickedButton6()
{
 //*** Activate the indicated student
 ActivateStudent( 5 ); 
}
void CRoomView::OnClickedButton7()
{
 //*** Activate the indicated student
 ActivateStudent( 6 ); 
}
void CRoomView::OnClickedButton8()
{
 //*** Activate the indicated student
 ActivateStudent( 7 ); 
}
void CRoomView::OnClickedButton9()
{
 //*** Activate the indicated student
 ActivateStudent( 8 ); 
}
void CRoomView::OnClickedButton10()
{
 //*** Activate the indicated student
 ActivateStudent( 9 ); 
}
//*** Notify the document that this primary view is being closed,
//*** and force closing of any auxiliary views that are open.
void CRoomView::OnDestroy()
{
 CFormView::OnDestroy();
 
 // TODO: Add your message handler code here
 CSeminarDoc *pDoc = GetDocument();
 if ( pDoc != 0 )
 {
 pDoc->SetRoomFrame( 0 );
 pDoc->CloseStudentView();
 pDoc->CloseNotesView();
 }
}



Listing Nine

//////////////////////////////////////////////////////////////////////
//*** student.h - interface for the CStudent class

class CStudent : public CObject
{
public:
 CStudent();

 //*** Support serialization of the student object

 DECLARE_SERIAL( CStudent )
 void Serialize( CArchive& ar );

 //*** Access functions for data members.
 void SetName( const CString& name ) {m_name = name;}
 const CString& GetName() {return m_name;}
 void SetFullname( const CString& fullname ){m_fullname = fullname;}
 const CString& GetFullname() {return m_fullname;}
 void SetTitle( const CString& title ) {m_title = title;}
 const CString& GetTitle() {return m_title;}
 void SetCompany( const CString& company ) {m_company = company;}
 const CString& GetCompany() {return m_company;}
 void SetPhone( const CString& phone ) {m_phone = phone;}
 const CString& GetPhone() {return m_phone;}
 void SetInfo( const CString& info ) {m_info = info;}
 const CString& GetInfo() {return m_info;}
protected:

 //*** Data members representing information describing a student.
 CString m_name;
 CString m_fullname;
 CString m_title;
 CString m_company;
 CString m_phone;
 CString m_info;
};



Listing Ten

//////////////////////////////////////////////////////////////////////
// student.cpp - implementation of the CStudent class

#include "stdafx.h"
#include "student.h"

//*** Support serialization of the student object.
IMPLEMENT_SERIAL( CStudent, CObject, 1 ) 
 
CStudent::CStudent()
{
 //*** Initialize data members.
 m_name = "";
 m_fullname = "";
 m_title = "";
 m_company = "";
 m_phone = "";
 m_info = "";
}
//*** Serialize the data members representating information about
//*** a student.
void
CStudent::Serialize( CArchive& ar )
{
 CObject::Serialize(ar);
 if(ar.IsStoring())
 {
 ar << m_name << m_fullname << m_title

 << m_company << m_phone << m_info;
 }
 else
 {
 ar >> m_name >> m_fullname >> m_title
 >> m_company >> m_phone >> m_info;
 }
}



Listing Eleven

//////////////////////////////////////////////////////////////////////
// studview.h : interface for the CStudentView class

#ifndef __AFXEXT_H__
#include <afxext.h>
#endif

// forward references
class CSeminarDoc;
class CStudent;

// class declaration
class CStudentView : public CFormView
{
 DECLARE_DYNCREATE(CStudentView)
protected:
 CStudentView(); // protected constructor used by dynamic creation

// Form Data
public:
 //{{AFX_DATA(CStudentView)
 enum { IDD = IDD_STUDENT };
 CString m_company;
 CString m_fullname;
 CString m_name;
 CString m_phone;
 CString m_title;
 //}}AFX_DATA

// Attributes
public:

 //*** Access functions for data members.
 CSeminarDoc* GetDocument();
 void SetStudent( CStudent *student );

protected:

 //*** Data member to keep track of active student.
 CStudent *m_student; 

// Operations
public:

 //*** Provide service for transferring data from view to document
 void UpdateDocument();

 
// Implementation
protected:
 virtual ~CStudentView();
 virtual void DoDataExchange(CDataExchange* pDX);

 //*** Override to update view with changes in data
 virtual void OnUpdate(CView* pSender, LPARAM lHint,
 CObject* pHint);

 //*** Reinitialize all data members holding student data.
 void ClearData();
 
 // Generated message map functions
 //{{AFX_MSG(CStudentView)
 afx_msg void OnDestroy();
 afx_msg void OnChangeName();
 afx_msg void OnChangeFullname();
 afx_msg void OnChangeTitle();
 afx_msg void OnChangeCompany();
 afx_msg void OnChangePhone();
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};
//*** Inline implementations for access functions.
inline CSeminarDoc* CStudentView::GetDocument()
 { return (CSeminarDoc*) m_pDocument; }
inline void CStudentView::SetStudent( CStudent *student )
 { m_student = student; } 



Listing Twelve

//////////////////////////////////////////////////////////////////////
// studview.cpp : implementation of the CStudentView class

#include "stdafx.h"
#include "seminar.h"
#include "student.h"
#include "semdoc.h"
#include "studview.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

IMPLEMENT_DYNCREATE(CStudentView, CFormView)

CStudentView::CStudentView()
 : CFormView(CStudentView::IDD)
{ 
 //{{AFX_DATA_INIT(CStudentView)
 m_company = "";
 m_fullname = "";
 m_name = "";
 m_phone = "";
 m_title = "";

 //}}AFX_DATA_INIT
 
 m_student = 0;
}
CStudentView::~CStudentView()
{
}
void CStudentView::DoDataExchange(CDataExchange* pDX)
{
 CFormView::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CStudentView)
 DDX_Text(pDX, IDC_COMPANY, m_company);
 DDX_Text(pDX, IDC_FULLNAME, m_fullname);
 DDX_Text(pDX, IDC_NAME, m_name);
 DDX_Text(pDX, IDC_PHONE, m_phone);
 DDX_Text(pDX, IDC_TITLE, m_title);
 //}}AFX_DATA_MAP
}
BEGIN_MESSAGE_MAP(CStudentView, CFormView)
 //{{AFX_MSG_MAP(CStudentView)
 ON_WM_DESTROY()
 ON_EN_CHANGE(IDC_NAME, OnChangeName)
 ON_EN_CHANGE(IDC_FULLNAME, OnChangeFullname)
 ON_EN_CHANGE(IDC_TITLE, OnChangeTitle)
 ON_EN_CHANGE(IDC_COMPANY, OnChangeCompany)
 ON_EN_CHANGE(IDC_PHONE, OnChangePhone)
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

void CStudentView::ClearData()
{
 //*** Clear all data members holding student data.
 m_company = "";
 m_fullname = "";
 m_name = "";
 m_phone = "";
 m_title = "";
}
//*** Override to update view with changes in data
void CStudentView::OnUpdate(CView* pSender, LPARAM lHint, 
 CObject* pHint)
{
 if ( m_student == 0 )
 { 
 ClearData();
 }
 else
 { 
 m_name = m_student->GetName();
 m_fullname = m_student->GetFullname();
 m_title = m_student->GetTitle();
 m_company = m_student->GetCompany();
 m_phone = m_student->GetPhone();
 }
 UpdateData( FALSE );
} 
//*** Transfer data from view to document
void CStudentView::UpdateDocument()
{

 UpdateData( TRUE );
 if ( m_student != 0 )
 {
 m_student->SetName( m_name );
 m_student->SetFullname( m_fullname );
 m_student->SetTitle( m_title );
 m_student->SetCompany( m_company );
 m_student->SetPhone( m_phone );
 }
}
//////////////////////////////////////////////////////////////////////
// CStudentView message handlers

void CStudentView::OnDestroy()
{
 //*** Transfer student data to document and notify document of
 //*** view closing.
 UpdateDocument();
 CFormView::OnDestroy();
 CSeminarDoc *pDoc = GetDocument();
 if ( pDoc != 0 )
 {
 pDoc->StudentViewClosed();
 }
}
void CStudentView::OnChangeName()
{
 //*** Retrieve student name after every keystroke and notify
 //*** other views of the changes.
 CSeminarDoc *pDoc = GetDocument();
 if ( pDoc != 0 )
 { 
 CString name;
 CWnd *control = GetDlgItem( IDC_NAME );
 if ( control != 0 && m_student != 0 )
 {
 control->GetWindowText( name );
 m_student->SetName( name );
 pDoc->UpdateAllViews( this, ACTION_CHANGENAME, m_student );
 } 
 pDoc->SetModifiedFlag(); 
 }
}
void CStudentView::OnChangeFullname()
{
 //*** Mark the document as 'dirty' to force prompt for saving
 //*** document if user exits program,r closes document, or
 //*** causes destruction of document object by closing primary
 //*** view (which in this application forces closing of all
 //*** auxiliary views).
 GetDocument()->SetModifiedFlag();
}
void CStudentView::OnChangeTitle()
{
 //*** Mark document as 'dirty'
 GetDocument()->SetModifiedFlag();
}
void CStudentView::OnChangeCompany()
{

 //*** Mark document as 'dirty'
 GetDocument()->SetModifiedFlag();
}
void CStudentView::OnChangePhone()
{
 //*** Mark document as 'dirty'
 GetDocument()->SetModifiedFlag();
}



Listing Thirteen

//////////////////////////////////////////////////////////////////////
// noteview.h : interface for the CNotesView class

//*** forward references
class CSeminarDoc;

// class declaration
class CNotesView : public CEditView
{
 DECLARE_DYNCREATE(CNotesView)
protected:
 CNotesView(); // protected constructor used by dynamic creation

// Attributes
public:
 CSeminarDoc* GetDocument();

 //*** Maintain pointer to current student and provide
 //*** access functions.
 CStudent *GetStudent();
 void SetStudent( CStudent *student );
protected:
 CStudent *m_student; 

// Operations
public:

 //*** Provide service for transferring data from view to document
 void UpdateDocument();

// Implementation
protected:
 virtual ~CNotesView();
 virtual void OnDraw(CDC* pDC); // overridden to draw this view

 //*** Override to update view with changes in data
 virtual void OnUpdate(CView* pSender, LPARAM lHint,
 CObject* pHint);

 // Generated message map functions
protected:
 //{{AFX_MSG(CNotesView)
 afx_msg void OnDestroy();
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};

//*** Inline implementations for access functions.
inline CSeminarDoc* CNotesView::GetDocument()
 { return (CSeminarDoc*) m_pDocument; }
inline void CNotesView::SetStudent( CStudent *student )
 { m_student = student; }
inline CStudent *CNotesView::GetStudent()
 { return m_student; }
 


Listing Fourteen

//////////////////////////////////////////////////////////////////////
// noteview.cpp : implementation of the CNotesView class

#include "stdafx.h"
#include "seminar.h"
#include "student.h"
#include "semdoc.h"
#include "noteview.h"
#include "notesfrm.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

IMPLEMENT_DYNCREATE(CNotesView, CEditView)

CNotesView::CNotesView()
{
 m_student = 0;
}
CNotesView::~CNotesView()
{
}
BEGIN_MESSAGE_MAP(CNotesView, CEditView)
 //{{AFX_MSG_MAP(CNotesView)
 ON_WM_DESTROY()
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

//////////////////////////////////////////////////////////////////////
// CNotesView drawing

void CNotesView::OnDraw(CDC* pDC)
{
}
//*** Override to update view with changes in data
void CNotesView::OnUpdate(CView* pSender, LPARAM lHint, 
 CObject* pHint)
{
 CNotesFrame *frame = (CNotesFrame *)GetParentFrame();
 switch ( lHint )
 { 
 case ACTION_CHANGENAME:
 if ( frame != 0 && 
 frame->IsKindOf( RUNTIME_CLASS(CNotesFrame) ) )
 {

 frame->OnUpdateFrameTitle( FALSE );
 }
 break;
 default: 
 CEdit& notes_control = GetEditCtrl();
 if ( m_student != 0 )
 { 
 const CString& info = m_student->GetInfo();
 notes_control.SetWindowText( info );
 notes_control.SetModify( FALSE );
 }
 } 
} 
//*** Transfer data from view to document
void CNotesView::UpdateDocument()
{
 CEdit& notes_control = GetEditCtrl();
 if ( notes_control.GetModify() && m_student != 0 )
 {
 // Transfer info from the edit control to the student data 
 CString notes;
 notes_control.GetWindowText( notes );
 m_student->SetInfo( notes );
 notes_control.SetModify( FALSE );
 }
}
//////////////////////////////////////////////////////////////////////
// CNotesView message handlers

void CNotesView::OnDestroy()
{
 //*** Transfer notes to document and notify document of
 //*** view closing.
 UpdateDocument();
 CEditView::OnDestroy();
 GetDocument()->NotesViewClosed();
}



Listing Fifteen

//////////////////////////////////////////////////////////////////////
// notesfrm.h : interface for the CNotesFrame class

class CNotesFrame : public CMDIChildWnd
{
 DECLARE_DYNCREATE(CNotesFrame)
protected:
 CNotesFrame(); // protected constructor used by dynamic creation

// Attributes
public:

// Operations
public:
 //*** Override to customize frame title
 virtual void OnUpdateFrameTitle( BOOL bAddToTitle );


// Implementation
protected:
 virtual ~CNotesFrame(); 
 
 // Generated message map functions
 //{{AFX_MSG(CNotesFrame)
 // NOTE - the ClassWizard will add and remove functions here.
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};



Listing Sixteen

//////////////////////////////////////////////////////////////////////
// notesfrm.cpp : implementation of the CNotesFrame class

#include "stdafx.h"
#include "seminar.h" 
#include "student.h"
#include "notesfrm.h"
#include "noteview.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

IMPLEMENT_DYNCREATE(CNotesFrame, CMDIChildWnd)

CNotesFrame::CNotesFrame()
{
}
CNotesFrame::~CNotesFrame()
{
}
BEGIN_MESSAGE_MAP(CNotesFrame, CMDIChildWnd)
 //{{AFX_MSG_MAP(CNotesFrame)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

//*** Override to customize frame title
void CNotesFrame::OnUpdateFrameTitle( BOOL bAddToTitle )
{ 
 CStudent *student = 
 ((CNotesView *)GetActiveView())->GetStudent();
 CString student_name = "< nameless student >";
 if ( student != 0 && student->GetName() != "" )
 {
 student_name = student->GetName();
 } 
 CString title = "Notes on " + student_name;
 SetWindowText( title );
}







Creating Special-Effect Bitmaps


Skipping is one way to achieve great bitmap effects




Saurabh Dixit


Saurabh is a contract programmer. He can be contacted at 12703 Jones Rd., Apt.
1014, Houston, TX 77070.


Recently, I needed special-effects bitmaps for a Windows project my group was
working on. While commercial software is available for creating special
effects, none filled our specific needs. Consequently, I was "appointed" to
develop a set of custom special effects. In this article, I'll present details
for a set of procedures I call "skipping" algorithms, which are the basis for
the special-effect bitmap routines in Table 1.
Most effects can be achieved by combining several rectangular "blits"
(block-bit transfers) from the bitmapped image and displaying them on the
screen. The underlying logic that decides the order in which these block
transfers take place defines the effect. The special-effects routines I
present here assume that your GUI API has capabilities similar to those
provided by the Windows BitBlt and associated graphics device interface (GDI)
functions. (For information on BitBlt, see the accompanying text box entitled,
"The BitBlt Function.")
There are several different types of special effects:
Exploding and imploding, where the bitmap comes into view from the center in a
rectangular or circular fashion, either from the edges towards the center or
vice versa.
Inward and outward spiraling, where the bitmap is rendered in small
rectangular blocks, either from a corner or from the center.
Horizontal and vertical curtaining, where the bitmap is rendered through
adjacent rows or columns moving towards each other from opposite ends of the
bitmap and meeting at the center or vice versa.
Sliding, where the bitmap gradually slides into view along the left, right,
top, or bottom edges.
Horizontal and vertical blinding, where the bitmap comes into view as strips
in a venetian blind, either opening or closing.
Most effects are referred to as "continuous," because part of the bitmap is
gradually rendered onto the screen continuously from or toward an edge. When
"skipping," you delay rendering an area of the bitmap. Skipping achieves great
bitmap effects. Rather than slowly building the bitmap from scratch, as is the
case with most effects, skipping appears to slowly merge the bitmap with the
background. This is because the background remains visible for a longer period
of the time it takes to render the bitmap completely.
The horizontal-skipping algorithm is a simple modification of the
horizontal-curtain algorithm. Instead of traversing the bitmap continuously,
alternate columns are skipped, leaving them periodically unchanged. Blitting
does not stop when the columns meet at the center--it continues to the edge
opposite the one it started from. Thus, columns missed on the pass from one
edge are filled on the pass from the other.
Figure 1(a) shows the order in which the bitmap is blitted on screen to
achieve this effect. The numbers in the vertical columns indicate the exact
order in which the bitmap is realized. Note that when the column width is an
even multiple of the bitmap width, the second pass begins at the last column.
On the other hand, if the number of resulting columns is odd, then the second
pass begins one column before the last. Reducing the column width to 1 results
in a "line" of the bitmap. A column width larger than the bitmap width proves
to be self-defeating.
Once you understand how horizontal skipping works, implementing a
vertical-skipping algorithm is straightforward: Instead of skipping columns,
you skip rows, applying similar considerations; see Figure 1(b). 
You can also implement a skipping algorithm along the diagonal, which would
generate the bitmap on screen in diagonal lines moving toward each other from
the opposite ends of the diagonal of choice. This can be painstaking, as the
diagonal line itself has to be generated using multiple block (square or
rectangular) blits. The order in which you would blit the various bitmap
blocks is given in Figure 2(a).
The principle behind the diagonal skipping algorithm remains the same as that
for the horizontal- and vertical-skipping algorithms. The block size can be as
small as 1'1 for a smooth diagonal line. Figure 2(b) shows how the bitmap has
to be rendered to diagonal skipping. 
How about skipping other shapes? A hollow-rectangle skipping algorithm can be
achieved by blitting "hollow" rectangles from the bitmap onto the screen. The
edges of the rectangles are portions of the bitmap. You blit hollow rectangles
alternatively, moving simultaneously from the bitmap borders inward and from
the bitmap center outward; see Figure 3. Both inward and outward passes leave
rectangular portions of the background unchanged, to be "filled" by outward
and inward passes moving in the opposite direction. Note that each rectangular
blit is achieved through four BitBlt()s, one for each edge of the rectangle.
In my experience, the effect looks best if edge thickness is fixed at 1.


Implementation


Listings One through Five provide a Windows implementation of the algorithms
presented in this article, together with a demo program to view these effects.
Listing One is the C source for the bitmap special-effects algorithms. I used
the Borland C/C++ compiler in my efforts, although the implementation should
be compiler independent. Besides the normally required arguments (source and
destination device contexts, bitmap dimensions), the function
FXSkipHorizontal() takes the width of each vertical column that the effect
will use. An additional delay parameter is also passed to slow or accelerate
the speed of the effect. The implementation takes care of whether the
requested column's width is a factor of the bitmap width, and works as long as
the column size is not specified to be 0. Similarly, FXSkipVertical() takes
the height of each horizontal row and the delay. The FXSkipDiagonalSquares()
and FXSkipDiagonals() implementations assume that the bitmap width and height
are multiples of the element size. Finally, FXSkipRectangles() does not have
room for a user-specified edge thickness for the rectangles. 
Listing Two is the demo program. Listings Three through Five make up the
remaining files in the project. The absence of a make file listing stems from
my use of the Borland C++ IDE, which works with Project (.PRJ) files rather
than having to work with (.MAK) files. I used the small model for my
executable.
The demo program uses two bitmaps, "bitmap1" and "bitmap2," which reside in
the executable. When you compile the sources, make sure to qualify two bitmaps
with these names in your resource (.RC) file. Double clicking the left mouse
button in the application client-window area cycles through the different
effects. Double clicking the right mouse button alternates the bitmaps. The RC
files, executables, and additional files are available electronically; see
"Availability," page 3.
The bitmaps accompanying the source code for this article are in the standard
Windows .BMP format and use 16 colors. If you are going to attempt any kind of
scaling before you use the effects, or plan to use 256, 64K, or more color
bitmaps, then you will have to work within the capabilities of your GUI
library and to some extent, your video hardware.
To use bitmaps with colors not found in the standard system palette, you will
first have to extract the palette information from the bitmap and select it
into the appropriate device contexts using SelectPalette() and
RealizePalette(). For stretching in memory contexts, remember to use
SetStretchBltMode() with the STRETCH_DELETESCANS parameter, if you wish to
preserve the colors on your bitmap.
I compiled and executed the source on a 486, 66-MHz PC with 8 Mbytes of RAM.
The delay (in milliseconds) used by the routines is interpreted in terms of
CPU clock cycles. The actual speed of your effect will therefore depend on the
frequency of clock ticks on your target machine.


Conclusion


I have used BitBlt() in all my implementations except FXSkipDiagonals(), where
I had to use regions along with clipping to get the true diagonal-skipping
effect. However, region operations are often slow, especially as the regions
grow in complexity. Also, regions use up quite a bit of memory.
My implementation deals only with the BitBlt() SRCCOPY raster operation. You
might want to experiment with other values for the rop parameter to BitBlt(). 
The code is yours to modify and use as you deem fit as long as you don't wipe
out the headers at the beginning of each source file. If you have had similar
experiences, I would appreciate hearing about them. Any effect is most useful
when you have better control over the time it takes to prepare and render it.
Please send me a copy of your efforts if you manage to make my algorithm
implementations more efficient.
The BitBlt() Function
The BitBlt() function copies a bitmap from a specified-device context to a
destination-device context. BitBlt() can display both monochrome and color
bitmaps. No special steps are required to display bitmaps of different
formats. However, BitBlt() can convert the bitmap if its color format is not
the same as that of the destination device. For example, when displaying a
color bitmap on a monochrome screen, BitBlt() converts the pixels having the
current background color to white and all other pixels to black. The
parameters to BitBlt() are shown in Example 1. rop specifies the raster
operation to be performed. Raster-operation codes define how the graphics
device interface (GDI) combines colors in output operations that involve a
current brush, a possible source bitmap, and a destination bitmap. rop
parameters are listed in Table 2. The return value is nonzero if the function
is successful; otherwise, it is zero.
----S.D.
Example 1 BitBlt() and its parameters.
Table 1 Skipping-algorithm implementations.
Table 2 The BitBlt function's rop parameter options.
Figure 1 (a) Skipping columns; (b) skipping rows.
Figure 2 (a) Skipping squares; (b) skipping diagonal lines.
Figure 3 Hollow-rectangle skipping.


Listing One

// File: FX.c
// Author: Saurabh Dixit
// Purpose: Implementations for various bitmap effects algorithms

#include <windowsx.h>
#include <math.h>
#include <time.h>

BOOL FXSkipHorizontal (HDC hdc, // screen device context
 HDC hdcMem, // memory device context
 int bmpWidth, // bitmap width
 int bmpHeight, // bitmap height
 int colwidth, // skip width
 int delay // delay in milliseconds
 )
{
 int l, r, m, d;
 int left, right;
 DWORD nexttime;
 BOOL success;

 // initialize
 m = bmpWidth % colwidth;
 d = bmpWidth / colwidth;
 if (m) { // bitmap width not an exact multiple of column width
 if (d % 2) // odd number of elements
 r = d * colwidth;
 else // even number of elements
 r = (d - 1) * colwidth;
 }
 else { // bitmap width is an exact multiple of column width
 if (d % 2) // odd number of elements
 r = bmpWidth - (colwidth << 1);
 else // even number of elements
 r = bmpWidth - colwidth;
 }
 l = 0;

 left = l;
 right = bmpWidth;


 nexttime = GetTickCount () + delay;

 while (r > left l < right) {
 // blit from the left
 success = BitBlt (hdc, l, 0, colwidth, bmpHeight,
 hdcMem, l, 0,
 SRCCOPY);
 if (!success)
 break;
 // blit from the right
 success = BitBlt (hdc, r, 0, colwidth, bmpHeight,
 hdcMem, r, 0,
 SRCCOPY);
 if (!success)

 break;
 l += (colwidth << 1); // skip a column to the right
 r -= (colwidth << 1); // skip a column to the left

 // wait a little
 while (GetTickCount () < nexttime);
 nexttime += delay;
 }

 return success;
}

BOOL FXSkipVertical (HDC hdc, // screen device context
 HDC hdcMem, // memory device context
 int bmpWidth, // bitmap width
 int bmpHeight, // bitmap height
 int rowheight, // skip height
 int delay // delay in milliseconds
 )
{
 int t, b, m, d;
 int top, bottom;
 DWORD nexttime;
 BOOL success;

 // initialize
 m = bmpHeight % rowheight;
 d = bmpHeight / rowheight;
 if (m) { // bitmap height not an exact multiple of row height
 if (d % 2) // odd number of elements
 b = d * rowheight;
 else // even number of elements
 b = (d - 1) * rowheight;
 }
 else { // bitmap height is an exact multiple of row height
 if (d % 2) // odd number of elements
 b = bmpHeight - (rowheight << 1);
 else // even number of elements
 b = bmpHeight - rowheight;

 }
 t = 0;

 top = t;
 bottom = bmpHeight;

 nexttime = GetTickCount () + delay;

 while (b > top t < bottom) {
 // blit from the top
 success = BitBlt (hdc, 0, t, bmpWidth, rowheight,
 hdcMem, 0, t,
 SRCCOPY);
 if (!success)
 break;
 // blit from the bottom
 success = BitBlt (hdc, 0, b, bmpWidth, rowheight,
 hdcMem, 0, b,
 SRCCOPY);

 if (!success)
 break;
 t += (rowheight << 1); // skip a row down
 b -= (rowheight << 1); // skip a row up

 // wait a sec!...
 while (GetTickCount () < nexttime);
 nexttime += delay;
 }

 return success;
}

BOOL FXSkipDiagonalSquares (HDC hdc, // screen device context
 HDC hdcMem, // memory device context
 int bmpWidth, // bitmap width
 int bmpHeight, // bitmap height
 int size, // square element size
 int delay // delay in milliseconds
 )
{
 int id, l, t, nlines;
 DWORD nexttime;
 BOOL success;

 // formula for calculating number of diagonal 'lines' in which
 // (size X size) squares are blitted
 nlines = (bmpWidth + bmpHeight) / size - 1;

 nexttime = GetTickCount () + delay;

 for (id = 1; id <= nlines; id ++) {
 if (id % 2) { // odd line id
 l = id * size;

 if (l > bmpWidth)
 l = bmpWidth;
 t = (id - l / size + 1) * size;

 // blit all squares in line
 while (t <= bmpHeight) {
 success = BitBlt (hdc, l - size, t - size, size, size,
 hdcMem, l - size, t - size,
 SRCCOPY);
 if (!success)
 break;
 l -= size;
 t += size;
 }
 if (!success)
 break;
 }
 else { // even line id
 // check here for starting position of second pass
 if ((bmpWidth + bmpHeight) / size % 2)
 l = bmpWidth + (2 - id) * size;
 else
 l = bmpWidth + (1 - id) * size;
 t = bmpHeight;


 // blit all squares in line
 while (l <= bmpWidth) {
 success = BitBlt (hdc, l - size, t - size, size, size,
 hdcMem, l - size, t - size,
 SRCCOPY);
 if (!success)
 break;
 l += size;
 t -= size;
 }
 if (!success)
 break;
 }

 // wait a sec!...
 while (GetTickCount () < nexttime);
 nexttime += delay;
 }

 return success;
}

BOOL FXSkipDiagonals (HDC hdc, // screen device context
 HDC hdcMem, // memory device context
 int bmpWidth, // bitmap width
 int bmpHeight, // bitmap height
 int thickness, // 'drawing brush' thickness
 int delay // delay in milliseconds

 )
{
 int id, l, t, ndiags;
 POINT p[4];
 DWORD nexttime;
 BOOL success;
 HRGN hrgn;

 // formula for calculating the number of diagonal portions
 ndiags = (bmpWidth + bmpHeight) / thickness;

 nexttime = GetTickCount () + delay;

 for (id = 1; id <= ndiags; id ++) {
 switch (id) {
 case 1: // always a triangle
 p[0].x = p[0].y = 0;
 p[1].x = thickness; p[1].y = 0;
 p[2].x = 0; p[2].y = thickness;
 hrgn = CreatePolygonRgn (p, 3, ALTERNATE);
 break;
 case 2:
 if (!(ndiags % 2)) { // a triangle
 p[0].x = bmpWidth; p[0].y = bmpHeight;
 p[1].x = bmpWidth - thickness; p[1].y = bmpHeight;
 p[2].x = bmpWidth; p[2].y = bmpHeight - thickness;
 hrgn = CreatePolygonRgn (p, 3, ALTERNATE);
 break;
 }

 // else - not a triangle so fall thru
 default:
 if (id % 2) {
 p[0].x = (id - 1) * thickness;
 p[1].x = p[0].x + thickness;
 if (p[0].x > bmpWidth) {
 p[0].y = p[0].x - bmpWidth;
 p[0].x = bmpWidth;
 }
 else
 p[0].y = 0;
 if (p[1].x > bmpWidth) {
 p[1].y = p[1].x - bmpWidth;
 p[1].x = bmpWidth;
 }
 else
 p[1].y = 0;

 p[2].y = id * thickness;
 p[3].y = p[2].y - thickness;
 if (p[2].y > bmpHeight) {
 p[2].x = p[2].y - bmpHeight;
 p[2].y = bmpHeight;

 }
 else
 p[2].x = 0;
 if (p[3].y > bmpHeight) {
 p[3].x = p[3].y - bmpHeight;
 p[3].y = bmpHeight;
 }
 else
 p[3].x = 0;
 }
 else {
 if (ndiags % 2)
 p[0].x = bmpWidth - (id - 1) * thickness;
 else
 p[0].x = bmpWidth - (id - 2) * thickness;
 p[1].x = p[0].x - thickness;
 if (p[0].x < 0) {
 p[0].y = bmpHeight + p[0].x;
 p[0].x = 0;
 }
 else
 p[0].y = bmpHeight;
 if (p[1].x < 0) {
 p[1].y = bmpHeight + p[1].x;
 p[1].x = 0;
 }
 else
 p[1].y = bmpHeight;

 if (ndiags % 2)
 p[2].y = bmpHeight - id * thickness;
 else
 p[2].y = bmpHeight - (id - 1) * thickness;
 p[3].y = p[2].y + thickness;
 if (p[2].y < 0) {

 p[2].x = bmpWidth + p[2].y;
 p[2].y = 0;
 }
 else
 p[2].x = bmpWidth;
 if (p[3].y < 0) {
 p[3].x = bmpWidth + p[3].y;
 p[3].y = 0;
 }
 else
 p[3].x = bmpWidth;
 }
 hrgn = CreatePolygonRgn (p, 4, ALTERNATE);
 break;
 }
 // Select appropriate clipping region in device context
 // for this line and blit
 if (hrgn) {

 SelectClipRgn (hdc, hrgn);
 success = BitBlt (hdc, 0, 0, bmpWidth, bmpHeight,
 hdcMem, 0, 0,
 SRCCOPY);
 SelectClipRgn (hdc, 0);
 DeleteRgn (hrgn);
 }
 else
 success = FALSE;
 if (!success)
 break;

 // wait a little
 while (GetTickCount () < nexttime);
 nexttime += delay;
 }

 return success;
}

BOOL FXSkipRectangles (HDC hdc, // screen device context
 HDC hdcMem, // memory device context
 int bmpWidth, // bitmap width
 int bmpHeight, // bitmap height
 int delay // delay in milliseconds
 )
{
 int xl, yl, halfsize, minsize, xr, yr, temp;
 int inwidth, inheight, outwidth, outheight;
 BOOL stilldrawing, success;
 DWORD nexttime;

 // initialize
 xl = 0;
 yl = 0;
 minsize = (bmpWidth > bmpHeight ? bmpHeight : bmpWidth);
 xr = minsize % 2 ? (minsize >> 1) : (minsize >> 1) + 1;
 temp = xl;
 while (temp <= xr)
 temp += 2;

 if (temp == xr + 2)
 xr ++;

 yr = xr;

 // store approximate center for terminating inward pass
 halfsize = xr;

 nexttime = GetTickCount () + delay;

 while (xl <= halfsize yl <= halfsize) {
 stilldrawing = FALSE;

 // draw the rectangle going inward
 if (xl <= halfsize && yl <= halfsize) {

 inwidth = bmpWidth - (xl << 1);
 inheight = bmpHeight - (yl << 1);

 // top edge
 success = BitBlt (hdc, xl, yl, inwidth, 1,
 hdcMem, xl, yl,
 SRCCOPY);
 if (!success)
 break;

 // left edge
 success = BitBlt (hdc, xl, yl, 1, inheight,
 hdcMem, xl, yl,
 SRCCOPY);
 if (!success)
 break;

 // bottom edge
 success = BitBlt (hdc, xl, yl + inheight, inwidth + 1, 1,
 hdcMem, xl, yl + inheight,
 SRCCOPY);
 if (!success)
 break;

 // right edge
 success = BitBlt (hdc, xl + inwidth, yl, 1, inheight + 1,
 hdcMem, xl + inwidth, yl,
 SRCCOPY);
 if (!success)
 break;
 stilldrawing = TRUE;
 }

 // draw the rectangle going outward
 if (xr >= 0 && yr >= 0) {
 outwidth = bmpWidth - (xr << 1);
 outheight = bmpHeight - (yr << 1);

 // top edge
 success = BitBlt (hdc, xr, yr, outwidth, 1,
 hdcMem, xr, yr,
 SRCCOPY);
 if (!success)

 break;

 // left edge
 success = BitBlt (hdc, xr, yr, 1, outheight,
 hdcMem, xr, yr,
 SRCCOPY);
 if (!success)
 break;

 // bottom edge

 success = BitBlt (hdc, xr, yr + outheight, outwidth + 1, 1,
 hdcMem, xr, yr + outheight,
 SRCCOPY);
 if (!success)
 break;

 // right edge
 success = BitBlt (hdc, xr + outwidth, yr, 1, outheight + 1,
 hdcMem, xr + outwidth, yr,
 SRCCOPY);
 if (!success)
 break;
 stilldrawing = TRUE;
 }
 if (stilldrawing) {
 // skip a rectangle moving inward
 xl += 2; yl += 2;

 // skip a rectangle moving outward
 xr -= 2; yr -= 2;
 }
 else
 break;

 // wait a little
 while (GetTickCount () < nexttime);
 nexttime += delay;
 }

 return success;
}



Listing Two 

// File: FXDemo.c
// Author: Saurabh Dixit
// Purpose: Demonstrating Bitmap Special Effects

#include <windowsx.h>
#include "fx.h"

#define WIDTH 640
#define HEIGHT 480

#define DELAY 25
#define COLWIDTH 4

#define ROWHEIGHT 2
#define SIZESQUARE 5
#define SIZEWIDTH 10

#define FX_SKIPHORIZONTAL 1

#define FX_SKIPVERTICAL 2
#define FX_SKIPDIAGONALSQUARES 3
#define FX_SKIPDIAGONALS 4
#define FX_SKIPRECTANGLES 5

LRESULT CALLBACK WndProc (HWND, UINT, UINT, LONG);
BOOL RenderEffect (HDC, HBITMAP, int);

int PASCAL WinMain (HANDLE hInstance, HANDLE hPrevInstance,
 LPSTR lpszCmdParam, int nCmdShow)
{
 char ClassName[] = "WHO-CARES";
 char AppName[] = "FXDemo by Saurabh Dixit";
 HWND hwnd;
 MSG msg;
 WNDCLASS wc;

 if (hPrevInstance) {
 MessageBox (NULL,
 "Only one instance allowed!",
 AppName,
 MB_ICONINFORMATION MB_TASKMODAL);
 MessageBeep (MB_ICONINFORMATION);
 return FALSE;
 }

 wc.style = CS_HREDRAW CS_VREDRAW CS_DBLCLKS;
 wc.lpfnWndProc = WndProc;
 wc.cbClsExtra = 0;
 wc.cbWndExtra = 0;
 wc.hInstance = hInstance;
 wc.hIcon = LoadIcon (NULL, IDI_APPLICATION);
 wc.hCursor = LoadCursor (NULL, IDC_ARROW);
 wc.hbrBackground = GetStockObject (LTGRAY_BRUSH);
 wc.lpszMenuName = NULL;
 wc.lpszClassName = ClassName;

 RegisterClass (&wc) ;

 hwnd = CreateWindow (ClassName,
 AppName,
 WS_OVERLAPPED WS_CAPTION WS_SYSMENU 
 WS_MINIMIZEBOX WS_BORDER,
 (GetSystemMetrics (SM_CXSCREEN) - WIDTH) >> 1,
 (GetSystemMetrics (SM_CYSCREEN) - HEIGHT) >> 1,
 WIDTH,
 HEIGHT,
 NULL,
 NULL,
 hInstance,
 NULL);

 ShowWindow (hwnd, nCmdShow);


 UpdateWindow (hwnd);

 while (GetMessage (&msg, NULL, 0, 0)) {
 TranslateMessage (&msg);
 DispatchMessage (&msg);
 }
 return msg.wParam;
}

LRESULT CALLBACK WndProc (HWND hwnd, UINT message,
 WPARAM wParam, LPARAM lParam)
{
 PAINTSTRUCT ps;
 LRESULT ret;
 HBITMAP hbitmap;
 HCURSOR hcursor;
 HINSTANCE hinst;

 static char bmpname[8] = {'b', 'i', 't', 'm', 'a', 'p', '1', 0};
 static int effect = FX_SKIPHORIZONTAL;

 switch (message) {
 case WM_NCHITTEST:
 ret = DefWindowProc (hwnd, message, wParam, lParam);
 if (ret == HTCAPTION)
 // so window can't be moved, just for kicks
 ret = HTCLIENT;
 return ret;

 case WM_PAINT:
 BeginPaint (hwnd, &ps) ;
 hinst = GetWindowInstance (hwnd);
 hbitmap = LoadBitmap (hinst, bmpname);
 if (hbitmap) {
 if (!RenderEffect (ps.hdc, hbitmap, effect))
 MessageBeep (MB_ICONSTOP);
 DeleteBitmap (hbitmap);
 }
 else
 MessageBeep (MB_ICONSTOP);
 EndPaint (hwnd, &ps) ;
 return FALSE;

 case WM_RBUTTONDBLCLK:
 // switch the bitmaps
 switch (bmpname[6]) {
 case '1': bmpname[6] = '2'; break;
 case '2': bmpname[6] = '1'; break;
 }
 // don't erase the background
 InvalidateRect (hwnd, NULL, FALSE);
 UpdateWindow (hwnd);
 return FALSE;


 case WM_LBUTTONDBLCLK:
 // cycle through the effects
 effect ++;

 if (effect > FX_SKIPRECTANGLES)
 effect = FX_SKIPHORIZONTAL;
 // erase the background
 InvalidateRect (hwnd, NULL, TRUE);
 UpdateWindow (hwnd);
 return FALSE;


 case WM_DESTROY:
 PostQuitMessage (0);
 return FALSE;
 }

 return DefWindowProc (hwnd, message, wParam, lParam);
}

BOOL RenderEffect (HDC hdc, HBITMAP hbitmap, int effect)
{
 BITMAP bm;
 HDC hdcMem;
 DWORD dwSize;

 BOOL success;

 hdcMem = CreateCompatibleDC (hdc);
 SelectBitmap (hdcMem, hbitmap);
 SetMapMode (hdcMem, GetMapMode (hdc));

 GetObject (hbitmap, sizeof (BITMAP), (LPSTR) &bm);

 switch (effect) {
 case FX_SKIPHORIZONTAL:
 success = FXSkipHorizontal (hdc, hdcMem,
 bm.bmWidth, bm.bmHeight,
 COLWIDTH, DELAY);
 break;
 case FX_SKIPVERTICAL:
 success = FXSkipVertical (hdc, hdcMem,
 bm.bmWidth, bm.bmHeight,
 ROWHEIGHT, DELAY);
 break;
 case FX_SKIPDIAGONALSQUARES:
 success = FXSkipDiagonalSquares (hdc, hdcMem,
 bm.bmWidth, bm.bmHeight,
 SIZESQUARE, DELAY);
 break;
 case FX_SKIPDIAGONALS:
 success = FXSkipDiagonals (hdc, hdcMem,
 bm.bmWidth, bm.bmHeight,
 SIZEWIDTH, DELAY);
 break;
 case FX_SKIPRECTANGLES:

 success = FXSkipRectangles (hdc, hdcMem,
 bm.bmWidth, bm.bmHeight,
 DELAY);
 break;
 default: // just in case
 success = BitBlt (hdc, 0, 0, bm.bmWidth, bm.bmHeight,

 hdcMem, 0, 0,
 SRCCOPY);
 }

 DeleteDC (hdcMem);
 return success;
}





Listing Three

// File: FX.h
// Author: Saurabh Dixit
// Purpose: Prototypes for Bitmap Effects

BOOL FXSkipHorizontal (HDC, HDC, int, int, int, int);
BOOL FXSkipVertical (HDC, HDC, int, int, int, int);
BOOL FXSkipDiagonalSquares (HDC, HDC, int, int, int, int);
BOOL FXSkipDiagonals (HDC, HDC, int, int, int, int);
BOOL FXSkipRectangles (HDC, HDC, int, int, int);



Listing Four

;--------------------------------------------------------------------------
; File: FXdemo.def
; Author: Saurabh Dixit
; Purpose: Module definition file
;--------------------------------------------------------------------------

NAME FXDEMO

DESCRIPTION 'Special Bitmap Effects Demo'
EXETYPE WINDOWS
STUB 'WINSTUB.EXE'
CODE PRELOAD MOVEABLE DISCARDABLE
DATA PRELOAD MOVEABLE MULTIPLE
HEAPSIZE 8192
STACKSIZE 8192



Listing Five 

// File: FXDemo.rc
// Author: Saurabh Dixit
// Purpose: Resource file for including bitmaps

bitmap1 BITMAP class.bmp
bitmap2 BITMAP tiger.bmp









RAMBLINGS IN REAL TIME


The Day The World Changed




Michael Abrash


Michael is the author of numerous programming books, including Zen of Code
Optimization. He can be contacted at mabrash@bix.com.


This must have been what it was like for Jon Landau when he attended a concert
at the Harvard Square Theater in 1974 and afterwards wrote, "I saw rock'n'roll
future and its name is Bruce Springsteen."
Surely this is the way the people at Apple felt when they visited Xerox PARC,
used a strange device called a "mouse" to move bitmapped windows around the
screen, and suddenly glimpsed a future very different from the one they had
envisioned just hours before.
We're talking the apes dancing around the monolith in 2001; an asteroid
hitting the Earth and clearing the way for us mammals; a bunch of amino acids
coming together in just the right way to create the first dim hint of life.
What we're talking here is The Day The World Changed, if you catch my drift.
Let me explain.
At the Game Developers Conference in April 1994, there were rumors flying
around about DOOM running under Windows. Given that DOOM was and is a DOS
phenomenon sui generis, and that Windows had zero mindshare for real-time
games, the rumors seemed improbable, to say the least, but they were confirmed
when Microsoft urged people to stop by Chris Hecker's talk to see WinDOOM. So
the room was packed with people expecting to see DOOM running under
Windows--and still the crowd literally gasped when it actually happened. There
was DOOM, running along in what was indisputably real time--in a window, with
Program Manager, and Clock, and all the rest up there too. And in that moment
I had one of those rare glimpses of the future--a future in which Windows
becomes the mainstream, real-time graphics platform. That future is one in
which all the rules of real-time graphics programming change, a future in
which Mode X and SuperVGA and DOS extenders and adapter support and the other
myriad complexities of DOS programming vanish.
You see, Windows is a DOS extender, it provides driver support in a generally
device-independent way, and it attracts far more drivers than any other
platform. The only thing missing for real-time graphics is fast, direct,
double-buffered graphics-- or, rather, I should say "was," because that gap is
now filled by WinG (pronounced win-GEE), the software that made WinDOOM
possible. WinG does nothing more or less profound than support fast,
double-buffered drawing on Win 3.1, Win32s, and Win32, and that's precisely
the missing ingredient in making Windows an excellent real-time animation
platform.
I know what you're thinking: Is that all? Is that what you're getting so
excited about? Well, yes. Think of it this way. Right now, if you want to do
real-time, 256-color graphics above 320x200 under DOS, you have to deal with
the complexities of Mode X. If you want to go past 360x480, you have to deal
with supporting dozens of different SuperVGAs, and you have to handle banked
video memory. You also have to do all the drawing yourself, including text.
Furthermore, you have to deal with protected mode somehow, plus input
handling, and all the rest.
With WinG, the details of protected mode, devices, and input handling are
handled by Windows, and 32-bit programming is a snap with Windows NT or Win32s
or, soon, Windows 95, with tools galore available. Better yet, you can do all
your drawing into a single, linear pixel buffer, with GDI's help, if you'd
like, and then WinG will copy or stretch that buffer to the screen at memory
bandwidth. No banking, no Mode X, no mode-set complexities, just the big,
linear pixel buffer with 32-bit addressing that you've always dreamed of--and
Windows is everywhere nowadays, so we're talking about a huge and rapidly
growing installed base.
So sure, I think WinG is worth getting excited about, if for no other reason
than because it means that game authors are well on their way to having a
single, standard, hardware-independent PC environment in which to program. I
have a more selfish reason for liking WinG, though; it allows me to have all
of the code in this column draw to linear pixel buffers, freeing me to
concentrate on challenging, high-level, real-time graphics issues rather than
hardware- and OS-specific details. What I'm going to do this month is provide
a quick overview of WinG, then present a simple, WinG-based animation program
that will serve as the foundation for our future ramblings in real time. Once
we've got that basic program in place, we can put just about any kind of
graphics on top of it, without spending any more precious column space on
low-level details. And believe me, column space is indeed precious--I can
already think of more hardware-independent graphics topics than I could cover
in a decade's worth of columns!


WinG


WinG is a set of binaries that can be used by any Win32, Win32s, or Win 3.1
("Win 3.1" means WFW as well, in this column) app to get fast graphics. The
problem solved by WinG is this: You--as a real-time graphics programmer--want
fast, smooth graphics. That requires two things: a buffer in which to compose
each new screen, and a way to get the pixels from each finished buffer onto
the screen quickly. The typical way to do that under DOS is to draw the pixels
to an off screen buffer with your own carefully optimized code, then display
that buffer as quickly as possible, either by copying it to the screen or by
page flipping. Although page flipping isn't available under Windows, copying a
pixel buffer to the screen is certainly possible, but neither of the two
standard Windows drawing surfaces--compatible bitmaps and Device Independent
Bitmaps (DIBs)--provides everything needed for fast graphics.
DIBs give the app complete control over drawing, which is good. A DIB is a
packed-pixel buffer into which Windows apps can draw directly, and which GDI
can then copy or stretch to the screen. See the Win32 documentation on the
MSDN CD, available from Microsoft, for an astonishing amount of information
about DIBs and everything else about Windows programming; if you don't
subscribe to MSDN, you should--it is the single most valuable Windows
programming resource I've ever seen. However, DIBs can't be copied to the
screen at memory bandwidth for a couple of reasons. First, each DIB comes with
a color table (a table describing the colors to which the DIB pixel values
correspond); the translation between the color table and the hardware palette
must be recalculated every time a normal DIB is copied to the screen with
SetDIBitsToDevice() or StretchDIBits(); see Figure 1. Then, too, a number of
drivers don't implement the DIB APIs very efficiently. Also, under Windows NT,
DIBs must go through the time-consuming process of being copied from the app's
address space into GDI's address space, where the frame buffer resides.
Finally, sometimes it's helpful to have GDI do some of the drawing, especially
of text, so your app doesn't have to--but GDI can't draw into DIBs.
GDI can draw into compatible bitmaps, which don't have color-table or
address-space problems and can be copied very efficiently--but apps can't draw
directly into compatible bitmaps, as shown in Figure 1. That's a fatal flaw
for real-time graphics.
WinG meets all the needs for real-time graphics by creating a hybrid of a DIB
and a compatible bitmap, called a "WinGBitmap," as shown in Figure 2. On Win
3.1 and Win32s, WinG bitmaps are always in 8-bit-per-pixel DIB format but can
work with any display mode; however, copies from a WinGBitmap to the screen
are fastest when the screen is in 8-bpp mode.
There are three special things about WinGBitmaps. First, both the app and GDI
can draw into them; the app has complete control whenever needed for
performance or quality reasons, and GDI does the rest of the work.
Second, because WinGBitmaps are mapped into both the app and GDI address
spaces under Windows NT, GDI can access them directly when copying, so no
copying overhead is incurred in the process of getting the WinGBitmap's pixels
onto the screen. (On Win 3.1 and Win32s, WinG will seek out and use the
fastest possible way of copying WinGBitmaps to the screen.)
Third, a WinGBitmap's color table is set explicitly when the WinGBitmap is
created and stays that way unless it is changed via WinGSetDIBColorTable(). As
a result, the translation to screen colors can be cached, rather than
recalculated every time the WinGBitmap is copied to the screen. (All
translations are not created equal, though; an identity palette--as described
and illustrated in detail in the WinG online help, and as implemented in the
code in this column--is essential for best performance.)
Although it's a compact API, WinG offers a few extra goodies to complement
WinGBitmaps. WinGBitmaps are selected into WinGDCs, which are passed to
WinGBitBlt() and WinGStretchBlt() for maximum-speed copies of WinGBitmaps to
the screen. WinG can also make a run-time recommendation about the
fastest-bltting DIB orientation--top-down or bottom-up--for the current
system. (This may sound arcane, but it's covered beautifully in the WinG
help--and can make a big difference in performance.) WinG also provides a
handy palette, with a 6-6-6 color set for halftoning, a good selection of
grays, and a guaranteed identity mapping if used as shown in the WinG SDK
(this is the palette used in Listing One). Finally, WinG can create halftoned
brushes to approximate any arbitrary RGB color.
What's the downside of using WinG? Not much. The WinG software is free and
freely redistributable (read the license agreement first, but that's what it
amounts to), so the only cost is having to put somewhere around 300K of WinG
binaries on your distribution disks. The other drawback is the lack of support
for high-color pixel buffers under Win 3.1 and Win32s, as I'll discuss later.
Where can you get WinG? You can download it from
ftp.microsoft.com:/developer/drg/wing/wing10.zip, or from the WinMM forum on
CompuServe, or you can find it on the MSDN level-2 CD.
One key to effective use of WinG: Read the online help! It's excellent and
thorough, so much so that I'm not going to discuss WinG programming in any
more detail.
When all is said and done, it comes down to this: WinG provides the missing
pieces for real-time graphics under Windows. Enough said.


WinG on Win32


WinG is not required on Win32 proper (excluding Win32s), because native Win32
APIs can do everything WinG does, except generate halftone brushes. (The
palette created by Windows NT's CreateHalftonePalette() call differs from the
WinG halftone palette, but any app can easily create a custom palette that
matches the WinG palette.) On Win32, WinG is mostly a layer onto Win32 APIs
such as CreateDIBSection(), SetDIBColorTable(), BitBlt(), and StretchBlt().
Nonetheless, even if you're developing only 32-bit apps, it often makes sense
to use WinG. Apps don't lose any significant performance or features by using
WinG on Win32, and WinG allows apps to run unmodified on Win32s; if I were
writing a 32-bit Windows game right now, I'd certainly want to expand my
potential market by having it run on Win32s. Better yet, if you should want to
port the code to Win 3.1, you won't have to change the WinG calls at all. If
you use WinG, your code will work well on all current and future x86 Windows
platforms. (Note, however, that WinG doesn't run on RISC Windows NT platforms;
there you'll have to use CreateDIBSection(). Also, be aware that neither WinG
nor CreateDIBSection() works on any version of Windows NT prior to 3.5.)
CreateDIBSection() is a full-fledged part of GDI, so all DIB formats (1, 4, 8,
16, 24, and 32 bpp) are supported by the DIB sections (equivalent to
WinGBitmaps) it creates. Because WinG is just a layer onto Win32 APIs on
Win32, WinG, likewise, supports all DIB formats as WinGBitmaps on Win32.


Win32: Our Platform for Rambling


I'll work primarily with RGB modes in this column, because RGB is both a much
simpler model to work with and the wave of the future for 3-D. The upcoming
generation of low-cost 3-D accelerators will target RGB modes almost
exclusively because it's difficult to do hardware acceleration of
smooth-shaded texture maps in palettized modes. Consequently, the code I'll
develop in this column will be Win32 code developed in Visual C++ 2.0, in
large part so I can use the 16-bpp-and-up WinGBitmap formats that aren't
supported on Win 3.1 and Win32s. The code in this column will mostly use
direct-drawn graphics, so it will be reasonably easy to port it to Win32s and
Win 3.1, but I'll develop and test it on Win32. Because the code is
Win32-specific, it's actually not necessary to use WinG, but I'm going to use
WinG anyway, to make it easier for you to get it running on Win32s and Win 3.1
if you want to, and so I can use the WinG halftone palette.
Why am I targeting Win32, when Win 3.1 is wildly popular right now? With
Windows 95 and Windows NT, Win32 will clearly be the standard soon, and
although neither of those systems is the standard today, I'm guessing that
most of you readers, being developers, already have one or the other. Also, by
not having to deal with the complications of 16-bit code, we'll be better able
to focus on more interesting issues. Finally, the point of this column is
real-time graphics, not Windows, and WinG in a 32-bit environment makes for
very portable graphics code.



An Animation Framework


Listing One is a simple WinG-based animation program. Simple--but nonetheless
complete, in the sense that it includes all the basic elements needed by a
fast graphics application under Win32: a message loop, input handling, and, of
course, direct drawing to a DIB that is then copied to the screen at high
speed. (Set DRAW_DIRECT to 1 to draw directly into the WinGBitmap, or to 0 to
have GDI do the drawing; this illustrates that WinGBitmaps support both sorts
of drawing.) Listing One will be the foundation for the real-time graphics
software that we'll develop from here on out.
Listing Two is the header file for Listing One, and Listing Three is the
resource file. To build the sample program, put the files in a directory,
start a new project in VC++ 2.0, add the CPP and RC files to the project,
build, and run.
Listing One can run in any display mode, but the WinGBitmap itself is always 8
bpp in Listing One, because WinGRecommendDIBFormat() always returns a
BITMAPINFOHEADER with biBitCount set to 8. However, you can, if you wish, set
biBitCount to any valid DIB value to change a WinGBitmap's color depth on
Win32.


Coming Up


I think the most interesting part of the industry right now is real-time 3-D,
so that's where we'll head next. We'll put a polygon-based rendering layer on
top of WinG, and we'll look at the really interesting stuff that sits on top
of rendering: clipping, transforms, and, most of all, hidden-surface removal.
We'll check out z-buffering, BSP trees (the heart of DOOM), octrees, and more.
Somewhere in there, I'll try to get my X-Sharp package introduced in the
"Graphics Programming" column of Dr. Dobb's Journal back in 1992 onto WinG,
and let you know what I learned in the process.
I have seen the future of real-time PC graphics, and I'm heading there as fast
as I can. I'd be delighted if you'd join me for the trip.
Figure 1 The two standard types of Windows drawing surface.
Figure 2 A WinGBitmap is a hybrid of a DIB and a compatible bitmap.

Listing One 

// ANIMSAMP.CPP: Simple animation demo using WinG. Tested with
// Microsoft VC++ 2.0 running under Windows NT. Always draws to an
// 8-bpp DIB, but the results can be displayed at any color depth.
// Adapted from the Cube sample program in the WinG SDK.

#include <windows.h>
#include <windowsx.h>
#include <wing.h>
#include "animsamp.hpp"

#define DRAW_DIRECT 1 // 0 to draw boxes via GDI, 1 for direct
#define MIN_DIB_WIDTH 100 // minimum dimensions of WinGBitmap,
#define MIN_DIB_HEIGHT 100 // to avoid out-of-bounds drawing
// A box is the only sort of object in the world we'll draw
#define NUM_BOXES 7
typedef struct _box {
 int X;
 int Y;
 int Width;
 int Height;
 int XInc;
 int YInc;
} box;
box Boxes[NUM_BOXES] = {
 {0, 0, 20, 20, 5, 5},
 {0, 0, 30, 10, 4, 7},
 {0, 0, 60, 24, 5, 9},
 {0, 0, 24, 24, 7, 7},
 {0, 0, 16, 30, 8, 1},
 {0, 0, 40, 20, 2, 6},
 {0, 0, 24, 30, 4, 2},
};
static char szAppName[]="Sample WinG animation demo";
static int appActive;
static HWND hwndApp;
static HPALETTE hpalApp = 0;
static HDC hdcWinG;
static HBITMAP hbmOld;
static HBITMAP hbmWinG;
struct {
 BITMAPINFOHEADER Header;

 RGBQUAD aColorTable[256];
} HeaderAndPalette = {
 sizeof(BITMAPINFOHEADER), 50, 50, 1, 8, BI_RGB, 0, 0, 0, 0, 0
};
static int DibWidth, DibHeight, DibPitch;;
char *pBits;

LRESULT CALLBACK AppWndProc(HWND hwnd, UINT msg, WPARAM wParam,
 LPARAM lParam);
void AppExit(void);
int AppIdle(void);
void AppPaint(HWND hwnd, HDC hdc);
void ClearSystemPalette(void);

// Load time initialization.

int LoadInit(HINSTANCE hInst,HINSTANCE hPrev,int sw,LPSTR szCmdLine)
{
 WNDCLASS cls;

 ClearSystemPalette(); // Make sure we can get the whole palette

 if (!hPrev) {
 cls.hCursor = LoadCursor(0, IDC_ARROW);
 cls.hIcon = 0;
 cls.lpszMenuName = "AppMenu";
 cls.lpszClassName = szAppName;
 cls.hbrBackground = (HBRUSH)GetStockObject(BLACK_BRUSH);
 cls.hInstance = hInst;
 cls.style = CS_BYTEALIGNCLIENT CS_VREDRAW 
 CS_HREDRAW;
 cls.lpfnWndProc = (WNDPROC)AppWndProc;
 cls.cbClsExtra = 0;
 cls.cbWndExtra = 0;
 if (!RegisterClass(&cls))
 return FALSE;
 }

 hwndApp = CreateWindow (szAppName, szAppName,
 WS_OVERLAPPEDWINDOW,
 CW_USEDEFAULT, 0, 350,350, 0, 0, hInst, 0);
 hdcWinG = WinGCreateDC();
 ShowWindow(hwndApp, sw);
 return TRUE;
}

// Main proc and message pump.

int CALLBACK WinMain(HINSTANCE hInst,HINSTANCE hPrev,LPSTR szCmdLine,
 int sw)
{
 MSG msg;

 if (!LoadInit(hInst, hPrev, sw, szCmdLine)) // initialize the app
 return FALSE;

 // Pump messages until quitting time, letting the idle proc
 // draw, if possible, when there's nothing else to do.
 for (;;) {

 if (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) {
 if (msg.message == WM_QUIT)
 break;
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 } else {
 if (AppIdle())
 WaitMessage();
 }
 }
 return msg.wParam;
}

// Idle loop; draws if the app is the active app or is an icon,
// does nothing otherwise.

int AppIdle()
{
 if (appActive IsIconic(hwndApp)) {
 // Move all the objects
 for (int i=0; i<NUM_BOXES; i++) {
 // Bounce if at edge
 if (((Boxes[i].XInc < 0) &&
 ((Boxes[i].X + Boxes[i].XInc) < 0)) 
 ((Boxes[i].XInc > 0) &&
 ((Boxes[i].X + Boxes[i].Width + Boxes[i].XInc)
 >= DibWidth))) {
 Boxes[i].XInc = -Boxes[i].XInc;
 }
 if (((Boxes[i].YInc < 0) &&
 ((Boxes[i].Y + Boxes[i].YInc) < 0)) 
 ((Boxes[i].YInc > 0) &&
 ((Boxes[i].Y + Boxes[i].Height + Boxes[i].YInc)
 >= DibHeight))) {
 Boxes[i].YInc = -Boxes[i].YInc;
 }
 Boxes[i].X += Boxes[i].XInc;
 Boxes[i].Y += Boxes[i].YInc;
 }
 // Draw the world with the new positions
 HDC hdc = GetDC(hwndApp);
 if (hpalApp) {
 SelectPalette(hdc, hpalApp, FALSE);
 RealizePalette(hdc);
 }
 AppPaint(hwndApp, hdc); // draw the world
 ReleaseDC(hwndApp, hdc);
 return FALSE;
 } else {
 return TRUE; // nothing to do
 }
}

// Draws the current state of the world to the DIB (WinGBitmap),
// then copies the result to the passed-in DC (the screen).

void AppPaint(HWND hwnd, HDC hdc)
{
 // Clear the DIB to black

 PatBlt(hdcWinG, 0, 0, DibWidth, DibHeight, BLACKNESS);
#if DRAW_DIRECT
 GdiFlush(); // make sure this gets drawn right away, so it
 // happens before the direct drawing (GDI batches
 // drawing calls under Windows NT)
#endif

 // Draw the world (all the boxes) to the DIB
 for (int i=0; i<NUM_BOXES; i++) {
#if DRAW_DIRECT
 int Color = GetNearestPaletteIndex(hpalApp,
 RGB(((i+1)&0x04)*63, ((i+1)&0x02)*127, ((i+1)&0x01)*255));
 char *pTemp = pBits + (Boxes[i].Y * DibPitch) + Boxes[i].X;
 int Step = DibPitch - Boxes[i].Width;
 for (int j=0; j<Boxes[i].Height; j++) {
 for (int k=0; k<Boxes[i].Width; k++) {
 *pTemp++ = Color;
 }
 pTemp += Step;
 }
#else
 HBRUSH hbr;
 RECT rect;
 hbr = CreateSolidBrush(RGB(((i+1)&0x04)*63, ((i+1)&0x02)*127,
 ((i+1)&0x01)*255));
 rect.top = Boxes[i].Y;
 rect.left = Boxes[i].X;
 rect.bottom = Boxes[i].Y + Boxes[i].Height;
 rect.right = Boxes[i].X + Boxes[i].Width;
 FillRect(hdcWinG, &rect, hbr);
 DeleteObject(hbr);
#endif
 }

 // Copy the DIB to the screen.
 RECT rc;
 GetClientRect(hwndApp, &rc);
 if (IsIconic(hwndApp)) {
 WinGStretchBlt(hdc, 0, 0, rc.right, rc.bottom, hdcWinG, 0, 0,
 DibWidth, DibHeight);
 } else {
 WinGBitBlt(hdc, 0, 0, rc.right, rc.bottom, hdcWinG, 0, 0);
 }
 GdiFlush(); // make sure this gets drawn right away
}

// Main window proc. Receives all messages.

LRESULT CALLBACK AppWndProc(HWND hwnd, UINT msg, WPARAM wParam,
 LPARAM lParam)
{
 PAINTSTRUCT ps;
 HDC hdc;
 int f;
 int counter;

 switch (msg) {
 case WM_CREATE: 
 // Use the WinG halftone palette

 hpalApp = WinGCreateHalftonePalette();
 GetPaletteEntries(hpalApp, 0, 256,
 (PALETTEENTRY *)HeaderAndPalette.aColorTable);

 for(counter = 0; counter < 256; counter++) {
 // PALETTEENTRYs and RGBQUADs are reversed
 BYTE Temp =
 HeaderAndPalette.aColorTable[counter].rgbBlue;
 HeaderAndPalette.aColorTable[counter].rgbBlue =
 HeaderAndPalette.aColorTable[counter].rgbRed;
 HeaderAndPalette.aColorTable[counter].rgbRed = Temp;
 }
 break;

 case WM_ACTIVATEAPP: // track if app in foreground
 appActive = (int)wParam;
 break;
 
 case WM_COMMAND:
 switch(wParam) {
 case MENU_EXIT:
 PostMessage(hwnd, WM_CLOSE, 0, 0L);
 break;
 }
 return 0L;

 case WM_DESTROY: // clean up before leaving
 if (hpalApp)
 DeleteObject(hpalApp);
 if (hdcWinG) {
 SelectObject(hdcWinG, hbmOld);
 DeleteObject(hbmWinG);
 DeleteDC(hdcWinG);
 }
 PostQuitMessage(0);
 break;

 case WM_PALETTECHANGED:
 if ((HWND)wParam == hwnd)
 break;
 // if not current window doing the changing, fall through

 case WM_QUERYNEWPALETTE:
 hdc = GetDC(hwnd);
 if (hpalApp)
 SelectPalette(hdc, hpalApp, FALSE);
 f = RealizePalette(hdc);
 ReleaseDC(hwnd, hdc);
 if (f) // if we got a realization, force a redraw
 InvalidateRect(hwnd, 0, FALSE);
 return f;

 case WM_PAINT:
 hdc = BeginPaint(hwnd, &ps);
 if (hpalApp) {
 SelectPalette(hdc, hpalApp, FALSE);
 RealizePalette(hdc);
 }
 AppPaint (hwnd, hdc);

 EndPaint(hwnd,&ps);
 return 0L;

 case WM_SIZE:
 if (wParam != SIZE_MINIMIZED) {
 // Create a WinGBitmap to match the client area
 if (hbmWinG) {
 SelectObject(hdcWinG, hbmOld);
 DeleteObject(hbmWinG);
 }
 RECT rect;
 GetClientRect(hwnd, &rect);

 // Set up the header for the WinGBitmap, making sure
 // it never gets so small that objects could draw
 // out-of-bounds
 WinGRecommendDIBFormat((BITMAPINFO *)
 &HeaderAndPalette);
 DibWidth = (rect.right > MIN_DIB_WIDTH) ?
 rect.right : MIN_DIB_WIDTH;
 DibPitch = (DibWidth+3) & ~0x03; // round up to dword
 DibHeight = (rect.bottom > MIN_DIB_HEIGHT) ?
 rect.bottom : MIN_DIB_HEIGHT;
 HeaderAndPalette.Header.biWidth = DibWidth;
 HeaderAndPalette.Header.biHeight *= DibHeight;
 hbmWinG = WinGCreateBitmap(hdcWinG,
 (BITMAPINFO *)&HeaderAndPalette,
 (void **)&pBits);
 hbmOld = SelectBitmap(hdcWinG, hbmWinG);

 // If bottom-up bitmap, point to the top scan & make
 // the scan size negative to scan from top to bottom
 if (HeaderAndPalette.Header.biHeight > 0) {
 pBits += (HeaderAndPalette.Header.biHeight - 1) *
 DibPitch;
 DibPitch = -DibPitch;
 }

 // Reset all the objects to the upper left to 
 // sure they're in the DIB
 for (int i=0; i<NUM_BOXES; i++)
 Boxes[i].X = Boxes[i].Y = 0;
 }
 }
 return DefWindowProc(hwnd,msg,wParam,lParam);
}

// Fills and empties the system palette, so if this is the
// foreground app, it can be sure of grabbing all the non-static
// entries starting at entry #10.

void ClearSystemPalette(void)
{
 static struct {
 WORD Version;
 WORD NumberOfEntries;
 PALETTEENTRY aEntries[256];
 } Palette = {
 0x300,

 256
 };
 
 // Make the whole palette black, with nocollapse to force a
 // separate system palette entry for each black entry. The RGB
 // entries are statically initialized to zero
 for(int counter = 0; counter < 256; counter++)
 Palette.aEntries[counter].peFlags = PC_NOCOLLAPSE;

 // Realize the palette & discard it, to clear the system palette.
 HDC ScreenDC = GetDC(NULL);
 HPALETTE hpalScreen= CreatePalette((LOGPALETTE *)&Palette);
 if (hpalScreen) {
 hpalScreen = SelectPalette(ScreenDC, hpalScreen, FALSE);
 RealizePalette(ScreenDC);
 hpalScreen = SelectPalette(ScreenDC, hpalScreen, FALSE);
 DeleteObject(hpalScreen);
 }
 ReleaseDC(NULL, ScreenDC);
 return;
}




Listing Two

// ANIMSAMP.HPP: Header file for simple animation demo using WinG.
#define MENU_EXIT 1




Listing Three

// ANIMSAMP.RC: Resource file for simple animation demo using WinG.
#include <windows.h>
#include "animsamp.hpp"

AppMenu menu
begin
 POPUP "&File"
 begin
 MENUITEM "E&xit", MENU_EXIT
 end
end

















DTACK REVISITED


Robots Around Us




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded and can be contacted through the DDJ offices.


Let's look back at Germany a while before Gutenberg, specifically, at a new
city on the edge of the Schwarzwald, the Black Forest. Specialization of
labor, which had been with us from the invention of agriculture, was becoming
formalized and legalized amongst the emerging middle class by the Guild
system. Trade between cities was becoming commonplace.
But not everybody managed to join a Guild; there was still need for common
labor. For example, the Guild workers were too busy and too specialized to go
into the forest for the firewood needed to cook meals and heat houses in
winter. In this small city alone, 80 woodsmen (no woodswomen) earned a living
cutting firewood in the forest, bringing the wood into the city, and hawking
it in the streets to the populace. One of these woodsmen was a man we'll call
"Clumsy."
One day Clumsy's ax struck a knot, and the ax was deflected, striking Clumsy's
left foot just below the ankle and ending Clumsy's career as a woodsman. By
whatever happenstance, Clumsy survived this industrial accident. While
healing, if that's the word, Clumsy had plenty of time to think, and it turned
out he was good at that.
So Clumsy invested his life savings in four chain saws, the only ones
available, and hired four woodsmen to operate them. He had a custom wagon
built with wide, iron tyres, drawn by four oxen, to carry the firewood into
the city. And with a large, steady supply of firewood guaranteed, he drew up
delivery contracts with individual households. Where the traditional woodsmen
spent more time in distribution--hawking firewood from the street is a
time-consuming activity, as is carrying the firewood into town on donkey's
back--Clumsy's system was highly efficient and could handle the substantial
output generated by the four chain saws.
Clumsy's company--what we would today call a company--employed four chain-saw
men, a drover, and a youthful drover's assistant to help load and unload that
four-ox wagon. These 5.5 employees proved able to produce and deliver as much
firewood as 40 traditional woodsmen. And since there remained 75 traditional
woodsmen (Clumsy and his four chain-saw men deleted from the former assembly
of 80), that meant the city had an oversupply of firewood.
All of you who have taken Economics 101 know that what followed was a short,
intense shakeout while the excess 35 traditional woodsmen were forced out of
the market. Yep, they had to find some other employment to feed themselves and
their wives and children. This economic displacement was Politically Correct:
The displacees were all white European males and hence of no concern.
Now we had 40 traditional woodsmen producing half the city's firewood, and
Clumsy's 5.5 employees producing the other half. The 5.5 employees generated
the same income as the 40 woodsmen, nearly 8 times as much per employee. How
should this economic windfall be shared?
It was Clumsy's capital that bought the chain saws, and the improved
distribution method was Clumsy's idea. If you think his employees should be
paid at the same level as the ordinary woodsmen, with the substantial excess
going to Clumsy, then you're a capitalist.
If you think his employees should be paid a substantially higher wage than the
regular woodsmen to reflect their improved productivity, then you're a
liberal. Clumsy still gets rich, if more slowly.
If you think the economic windfall should be shared amongst all the woodsmen,
you're a socialist. And if you think the windfall should properly be shared by
all the residents of the city, you're a communist!
I worked out this little scenario in 1965, shortly after taking Economics 101.
These days we'd have to include the Greens in our political spectrum--they'd
want to absolutely stop the woodsmen to protect the spotted owl--and I don't
even want to discuss fringe groups like the syndicalists. How you'd like to
distribute Clumsy's economic windfall really does reveal your politics, but in
1965, it proved a very bad idea to go around asking people if they were
communists.
Three decades later, this scenario is not a communist detector but a metaphor
of our current economic system, with automation causing massive dislocation
and unemployment. Yesterday's excess woodsman is today's excess middle
manager. What is today's chain saw? Why, spreadsheets and database software. A
dozen years ago, Walmart had four layers of management, while Sears had
seventeen (honest). Which was more successful? This is not an entirely
rhetorical question; if we define success as providing employment for a lot of
middle managers and sustenance for their families, then Sears was, back then,
more successful.
While middle managers are becoming a threatened species, stenographers,
traditionally female, are disappearing under a double whammy: They provided
typing services for all those male middle managers. Today, the remaining
middle managers almost all use word processors.
When most of you readers think of automation-related economic replacement,
what probably springs to mind are the industrial robots on Ford's production
lines. Industrial automatons not only don't get sick, take vacations, or go on
strike; they also produce more-consistent and therefore more-reliable
products. A hand-built automobile is an unreliable automobile. When I was a
kid, cars broke down often.
Industrial automatons? The productivity software you have all collectively
written to run on personal computers has been responsible for millions of
displaced workers. How shall we distribute the excess income produced by a
surviving middle manager who uses WordPerfect, Lotus 1-2-3, and dBase IV?
We can't go home again. Nobody wants to un-invent the horse collar so humans
can be employed to pull plows. If you try to take away my copy of WordPerfect
5.1, the one I'm using right now, you'll have a real fight on your hands. I'm
pleased that my highly reliable car was assembled mostly by industrial robots.
A very few of you may be working on the periphery of robotics. To me, a robot
is an autonomous, mobile, industrial automaton whose purpose is to free up
human labor; that is, to create unemployment. The first guy I'd like to
unemploy is the point man on an infantry patrol (as I write this, American
infantrymen have just been sent to the near proximity of Bosnia). But the
first robots--very crude and inefficient--are already experimentally deployed,
not as infantry. They're hospital orderly/assistants, because hospitals are
where the most intense economic activity is found today.
* * *
Now I'd like to introduce you to Dr. Hans Moravec, who has spent the past 25
years working in the field of robotics, and to his book, Mind Children
(Harvard University Press, 1988). This book is chock-full of historical
information on robotics and its practitioners. For example:
We came out of World War II with many operational analog computers, which
aimed guns and bombs and navigated aircraft. This led to the post-WW2
development of "cybernetics," which flourished until the mid '60s, when it was
subsumed by the artificial-intelligence community, a product of yet another
WW2 development, the digital computer.
Not all activity was confined to the academic community. In 1954, Devol
patented the programmable robotic arm (a lineal descendant of the 1801
Jacquard loom). Devol founded Unimation in 1958 to build these arms, and
General Motors began using industrial robots in 1961.
John McCarthy was an MIT prof who had a student named Marvin Minsky. McCarthy
moved to Stanford University in 1963 in time to found Stanford's Artificial
Intelligence Project, later renamed the Stanford Artificial Intelligence
Laboratory (SAIL). McCarthy expected to produce a fully intelligent machine in
a decade. Sigh. Meanwhile, back at MIT, Minsky's students were connecting TV
cameras and mechanical arms to computers.
Stanford had a close relationship to the Stanford Research Institute (SRI),
which was generously funded by NASA to develop a remote-controlled lunar
rover. With this advantage, Stanford surpassed MIT's robotics group. Following
"Shakey" (Stanford's first experimental robot, completed in 1969), SAIL
acquired a simulated lunar rover from SRI and used it as the basis of the
"Stanford Cart"--the subject of Moravec's PhD thesis.
In the mid '70s, NASA funded Cal Tech's Jet Propulsion Laboratory (JPL) to
develop a Mars Rover. Because of the distance to Mars, remote control is
impractical even at the speed of light, so the Mars Rover had to be
autonomous. JPL developed a prototype, the Robotics Research Vehicle (RRV).
Lacking modern microprocessors, the RRV was connected by an umbilical cord to
a large computer. Although the RRV was achieving many of its objectives, the
project was terminated in 1978 (the 1984 Mars mission was scrubbed).
Moravec, in his 1988 book, expected to see general-purpose robots for the
factory and home by the millennium. My take: Follow the money. Housework isn't
explicitly valued by our society, and factory workers are a dime a dozen (more
or less). Infantry point men can be drafted cheaply; see the Wall in
Washington DC. As previously mentioned, the first experimental robots are
alive and well and working in hospitals, where today's most intense economic
activity is found.
Hans Moravec was a graduate student at Stanford in the 1970s. In 1970, he used
a time-sharing DEC PDP-10 mainframe. In 1980, he was still using a DEC 10,
this time a KL-10. In contrast to the 1970 machine, it was not very fast for
its day (and DEC soon discontinued the product line).
Moravec published an article entitled, "Today's Computers, Intelligent
Machines, and Our Future" in the February 1979 issue of Analog (a
science-fiction magazine) that compared human computational bandwidth to that
KL-10 and noted that computers were increasing in speed by a factor of 10
every seven years (equivalent to doubling every two years). As an Analog
reader since 1952, when it was Astounding Science Fiction, I read Moravec's
article when it first appeared. By 1988 computers had sped up by more than an
order of magnitude, so he wrote the book Mind Children to update and expand on
the article. 1995 marks yet another x10 improvement, and, according to
Moravec, yet another update is in progress.
In 1980, Moravec left Stanford to become director of Carnegie Mellon
University's Mobile Robot Laboratory, a part of the Robotics Institute. (Was
he frustrated by the inferior computational capacity then available at
Stanford?) His interest in comparing human and machine computational bandwidth
continued at CMU and is a principal subject of his book.
Moravec has tried several approaches to estimating human computational
bandwidth and comparing that to machine computational bandwidth. The one he
focuses on in Mind Children is the visual system. He hopes that some of his
inevitable errors will tend to cancel and points out that even an error of two
orders of magnitude isn't terribly important when machine capability is
increasing an order of magnitude every seven years.
Moravec does another very smart (and nonacademic) thing: He concentrates on
computational cost effectiveness, what is commonly called "bang per buck." His
projected trend line shows that personal computers will, if present trends
continue, attain human parity in computational bandwidth shortly after the
year 2030 and that before 2040, human-equivalent machine capability will cost
less than $1000.
This also means that, in 1995, personal computers are (by Moravec's estimate)
about 200,000 times (37 years) slower than the human brain. Moravec
specifically claims to work in robotics, not in AI. Amongst practical folk, AI
has a bad name. I would suggest that factor of 200,000 has much to do with
AI's bad name and its practical failures to this point. Every seven years we
drop a digit off that deficit.
Trends do not necessarily continue forever. Bought any $2/megabyte DRAM
lately? (See Figure 1.) In my article, "CPU Performance: Where Are We Headed?"
(DDJ, January 1994), I predicted that the long-time trend of computer speed
doubling every two years would fall off due to the limits of parallelism. I
predicted that about 1996, the new computational trend would double (only)
every three years. This trend could continue only as long as the dimensions
("design rules") used to fabricate microprocessors could continue to shrink at
the established rate (halving every seven years). Other folks (Nick
Tredennick, for one) believe that ways will be found to end run these limits.
So the basic problem--the Moravec prediction that robots replacing human
beings will become not just a crisis but "oblivion"--may not arrive as
scheduled four decades hence. (Read the book's "Prologue" to find that
predicted oblivion.)
Moravec discusses the bottom-up versus top-down approaches to robotics. In
top-down, you start with a supercomputer and bolt a TV camera and mechanical
arm onto it, then try to figure out how to mount wheels. This is the approach
Minsky's MIT group took in the mid '60s. Moravec favors the bottom-up approach
used at Stanford, where you start with something very much like the prototype
Mars rover--a mobile device that has machine vision and manipulative
ability--and bolt a small, fast computer onto it. The top-down approach is
favored by the AI community, while the bottom-up approach is a
dirty-fingernailed philosophical descendant of cybernetics.
Robots, primitive ones, are here today. As they improve, robots as menial
workers will migrate downward into areas of less-intense economic activity,
and upward into areas of more-intellectual (less-menial) activity. When
household robots clean and cook, and the more cerebral ones write columns for
computer magazines, robots will have subsumed all human activity and Moravec's
oblivion will be upon us.
Every robot deployed represents a person seeking other work. An enormous
number of robots will be deployed in the next several decades. We really don't
want to un-invent the horse collar, or the printing press, or the steam
engine, or the computer. Will we someday want to un-invent robots?
Mind Children is still in print and is only $8.95 in softcover. I know that
Computer Literacy Bookstore (Sunnyvale, California) will accept a special
order for this book because that's where I got mine. Best of all, this book is
not written by some Sunday-supplement journalist hack but by a leading
researcher in robotics. If you have any interest in robotics and/or computer
trends, you really should read this book.
Figure 1 Memory prices versus time.









PATTERNS AND SOFTWARE DESIGN


Patterns for Reusable Object-Oriented Software




Richard Helm and Erich Gamma


Richard and Erich are coauthors of Design Patterns: Elements of Reusable
Object-Oriented Software. (Addison-Wesley, 1994) They can be reached at
Richard.Helm@dmr.ca and Erich_Gamma@Taligent.com, respectively.


Component-based software, interoperable objects, reusable application
toolkits, and frameworks are becoming increasingly important development
technologies. Many application architectures and component-interconnect
technologies and standards are being put in place: CORBA, OpenDoc, TalAE, COM,
SOM, and OLE, to mention a few (see Dr. Dobb's Special Report on Interoperable
Objects, Winter 1994/95). However, we still face the problem of creating
designs which can effectively exploit these emerging technologies. A common
theme in these technologies, standards, and implementations is the notion of
object orientation--building systems from objects which offer services to
clients. And a common goal when designing with objects is reuse--how to design
your applications so that the objects used to build them can, in turn, be used
in other applications.
Reuse in object-oriented software is enabled through three primary mechanisms:
parameterized types, class inheritance, and object composition. Parameterized
types allow you to create new functionality by parameterizing software by the
types of objects on which it operates. Inheritance allows you to define new
classes in terms of old, reusing implementation from parent classes in the
implementation of the new child classes. Inheritance is simple, performed at
compile time, and supported directly by most object-oriented languages. Object
composition permits you to create new functionality by composing existing
objects together in new and interesting ways. It relies on polymorphism and
dynamic binding--the ability to substitute objects with similar interfaces for
each other at run time. Object composition lets clients make very few
assumptions about the implementations of objects they deal with, other than
that they support a particular interface. Object composition makes it easy for
new, user-defined objects to work with existing objects.
During the initial stages of object-oriented software's
design-and-implementation life cycle, inheritance is the predominant means to
achieve reuse. Most effort is spent creating and deriving new classes. In
later stages of the life cycle (especially after redesign or refactoring of
class hierarchies), the dominant means of reuse is the composition of objects
having standard interfaces. At this stage, the important abstractions in the
domain have emerged and have their own class hierarchies. Inheritance is only
used as an implementation technique to rapidly define families of objects with
similar interfaces. Once a design begins to focus on object composition rather
than inheritance, the interconnect technologies described here can also be
considered.
Toolkits, class libraries, and frameworks are ways to package and deliver
larger, reusable abstractions. Many vendors now provide toolkits or frameworks
of some sort. Toolkits can be thought of as the object-oriented equivalent of
a subroutine library. They provide low-level, ready-made classes which can be
extended, through inheritance, to provide access to some underlying
abstraction, such as data structures, operating-system services, windowing, or
graphics systems. Frameworks provide higher-level functionality, generally
targeting a particular application domain such as graphical-object editors,
operating systems, or financial engineering. Compared to toolkits and class
libraries, frameworks provide higher-level application infrastructures. In
particular, they usually include classes which define an application's
internal control flow and logic. These classes form the backbone of the
application to which the flesh of the application--created from user-defined
extensions to the framework's classes and toolkits--is attached. Reusers of
the framework customize the framework by:
Instantiating classes provided as-is by the framework. 
Extending the framework by deriving new classes from framework-supplied
classes, specializing their behavior and functionality, and instantiating them
to create new kinds of objects.
Composing these objects with the logic and control-flow classes in
framework-defined ways to create a working application.
To be able to extend a framework, the objects composed with the framework must
work with and respect the internal interfaces and protocols expected by the
framework classes. 
Frameworks give rise to an architecture in which most code resides in the
reused classes. A characteristic feature of such an architecture is its
inverted structure. Most of the high-level application control flow is
determined by the reused code, which periodically makes calls of user-supplied
extensions (usually to subclasses of framework classes) to request
application-specific services and data. Larger applications are usually built
from multiple frameworks and toolkits; see Figure 1.


Designing for Reuse


The preceding discussion touches on some of the issues concerning reuse in
object-oriented applications: frameworks within frameworks, and objects within
each framework communicating with one another. When creating a reusable
application, framework, or toolkit, the questions we have to face are: Exactly
what sort of abstraction will be supported by our design? How will we enable
our design to permit users to extend it easily? How will we design our
application to be reusable?
There are no easy answers. Designing reusable, object-oriented software is
hard and can be an elusive goal. Many issues must be considered: finding
appropriate objects; factoring them into classes at the right level of
granularity; defining inheritance hierarchies; defining object interfaces; and
specifying appropriate relationships between objects. All these issues must be
addressed in designing a system specific to the problem at hand, while
remaining general enough to address--and be extended for--future problems and
requirements. Experienced designers will tell you that a reusable, flexible
design is difficult, if not impossible, to get "right" the first time,
especially under the constraints of project and product deadlines, and that
multiple attempts at reuse with subsequent redesign is the norm.
Despite these difficulties, it is still possible to write reusable
object-oriented software. Many successful object-oriented systems exhibit
idiomatic and recurring patterns and structures of communicating objects that
solve particular design problems. These design structures are what make these
systems flexible, elegant, and ultimately reusable. 


Design Patterns


Design structures that occur repeatedly across application domains,
programming languages provide well-defined, controlled ways to extend and
reuse these applications. Unfortunately, these design structures are not well
known, are only learned with much experience, and so are independently
rediscovered over and over again by designers creating reusable software. Many
of us have had this kind of design dj-vu. Wouldn't it be great if there were
a record of the design decisions and experience of others? One way to do this,
which is currently gaining a lot of interest, is through "patterns." 
Patterns are a way to record and codify expertise and experience so that
others may reuse it. Patterns help you base new work on others' prior,
distilled experience. A designer familiar with such patterns can apply them
immediately to design problems. Many ways of writing patterns to record this
experience are being explored, but most agree on the following definition: A
pattern describes a solution to a problem in a particular context in such a
way that others can reuse this solution over and over again. Our personal
interest is in patterns for reusable, object-oriented designs, or "design
patterns." A simple example illustrates the concept.
The Strategy design pattern addresses the problem of defining families of
interchangeable algorithms so that the algorithms may vary independently from
the clients that use them. Situations or "contexts" where it might be
applicable include those in which: 
Many classes only differ in behavior.
There exist variants in algorithms, and you want the flexibility to pick and
choose.
Algorithms have local and private data to which clients should not be exposed.
A class defines many different behaviors, typically spread throughout its
operations as conditionals governed by internal flags.
The solution provided by the Strategy pattern consists of encapsulating each
variation of behavior or algorithm in its own class and accessing this
behavior though a common interface, defined by a Strategy class; see Figure 2.
At run time, an instance of a StrategyContext is composed with an instance of
ConcreteStrategy. They interact through the interface defined by the abstract
Strategy class. Because StrategyContext is only aware of a strategy through
the interface defined by the Strategy class, any Concrete-Strategy may be
composed with it, and we have freed the StrategyContext from any dependencies
on a particular strategy. 
Many examples of the Strategy pattern are found in frameworks. For example,
word-processor frameworks, have families of text-formatting algorithms that
can be interchanged according to how well or how fast you want text formatted.
Compiler frameworks have different instruction-scheduling policies that depend
on the underlying machine architecture. Financial-engineering frameworks have
different ways to value financial instruments. In all these implementations,
what is common (the repeating pattern) is that the family of algorithms is
defined in its own class hierarchy and is accessed through a common interface
in some context.
This touches on only the essentials of the Strategy pattern. A full
description would include details of implementation techniques, design
trade-offs, benefits and liabilities of the pattern, and relationships with
other patterns. The key point is, however, that the Strategy pattern lets you
factor out algorithms, allowing your application to be independent of the
algorithms it uses.
Note that a design pattern does not describe a particular design for any
particular system. The Strategy pattern does not describe how to design
text-formatting algorithms. Rather, it abstracts from many designs and
(hopefully) describes what is essential, common, and intrinsic to the problems
addressed, and solutions found, across all these designs. Just as we can take
abstract pseudocode descriptions of algorithms (quick sort or generational
garbage collectors, for example) and implement our own systems to sort or
collect garbage, so you can take design patterns and create designs and
implementations based on them in some modeling notation or object-oriented
language; see Figure 3. 
An often-asked question is, how does a pattern differ from a framework? Most
simply, a framework is a design realized as code. In contrast, a pattern
describes an abstract design and must first become a design and then an
implementation. A pattern will also usually have multiple implementations,
each providing different design trade-offs. Patterns also tend to describe
designs which are at a different scale than a framework. Think of classes and
objects as building blocks, and, of a framework as defining an applications
architecture or macro architecture. Most patterns describe something in
between, what we call a "micro-architecture"--an architectural element that
contributes to the overall software architecture; see Figure 4. 


Designing for Change


The key to creating reusable software lies in anticipating people's needs and
how they might use your solutions to meet those needs. It's important to
understand and prepare for future changes in requirements and usage. These
could arise from the evolving needs of current users, new users, or both. A
design that doesn't take change into account risks major redesign in the
future. That will involve class redefinition and reimplementation,
modification of existing clients, and retesting. Redesign affects many parts
of the software system, and unanticipated changes are invariably expensive to
correct. If the expense (real or perceived) is too great, a system will not be
reused.
Consequently, you must consider how your system might need to change over its
lifetime, and be aware of typical causes of redesign and rework. Ideally, you
want to design in this capability from the very beginning. A design that stops
people from reusing may be thought of as containing reuse errors. A reuse
error does not mean that your software is broken, it is just not as reusable
as it might be. 
A simple example of a reuse error occurs in C++ when you forget to declare a
member function virtual in a parent class for a set of classes that form a
class hierarchy. Reusers of these classes will not be able to extend them to
change the way their application uses this class hierarchy. The lack of the
virtual member function is one reuse error. A higher-level reuse error is when
an operation defined by a class is not factored at the right level of
granularity to be overridden by subclasses. Subclasses might only want to
customize parts of the operation. Unless the operation is designed with such
extensions in mind, subclasses will typically copy the existing code and
modify it, resulting in duplicated code.

These are minor examples, and careful class design allows you to avoid them.
But if you are in the business of providing reusable code to your clients,
your organization, or yourself for later reuse, avoiding reuse errors is
something to strive for.
Reuse errors have many causes. One common cause is exposing to clients too
many details of the implementations of objects it uses. As implementations
change, so will client code break. Other causes of reuse errors are badly
designed inheritance hierarchies, which grow brittle and difficult to extend
as more functionality is added. For example, embedding algorithms into classes
on which the algorithm operates on means that as more algorithms are
implemented, the classes tend to be buried under the weight of their
implementations. The original purpose and abstraction defined by the class
will be lost. 
Design patterns allow a design to be extended in controlled ways to avoid
specific kinds of reuse errors. Strategy, for example, avoids having
implementations of algorithms spread all over your code. This property of
design patterns gives you a suite of design techniques that will provide your
designs with flexibility, extensibility, and ultimately reusability.


What's Ahead


During the coming months, we will look at the patterns that occur in existing
systems, new patterns, ways in which patterns help to solve particular design
problems, and how patterns interact to form larger structures. We will also
look at particular reuse errors, and which patterns help you avoid them. You
may be familiar with our Design Patterns: Elements of Reusable Object-Oriented
Software, by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
(Addison-Wesley, 1994), which contains a collection of 23 design patterns
(such as Strategy) for object-oriented software. This column will not be
simply an excerpt from the book; still, we will use the book as a basis,
because many of the patterns are useful for object-oriented design. We will
also look at some of the other efforts in applying patterns to software
development and report on experiences using patterns in practice. 


Pattern Resources


While there is currently a lot of interest in applying patterns or handbooks
of design to develop software, patterns are not new. Patterns have already
been used to describe different parts of the software-development process,
including reusable object-oriented designs, team structure and process
organization, reuse of application frameworks, and description of common
themes during systems analysis (see "Evaluating the Software Development
Process," by James Coplien, DDJ, October 1994 and "Patterns and Software
Development," by Kent Beck, DDJ, February 1994, as well as recent articles in
the Journal of Object-Oriented Programming, Object Magazine, and C++ Report).
Within other disciplines of engineering and architecture, it is common to find
handbooks of standard design techniques and practices that form a body of
common knowledge shared by engineers and designers. Christopher Alexander's
book, A Pattern Language: Towns, Buildings, Construction (Oxford University
Press, 1977) is one widely quoted example. Another is a five-volume Russian
handbook of mechanical engineering we recently discovered, which contains over
4000 mechanical devices, one-per-page, ranging from clutches to
aircraft-landing gear. Such a resource is undoubtedly a valuable reference for
designers.
But software has few such equivalents. Part of the focus of those working with
patterns in software is how to best describe something as intangible as
software design and practice. The interest in this effort is reflected in
people writing patterns in various books, papers, and a recent conference
devoted to the topic as listed shortly. The study of patterns is young but
flourishing. 
Resources which will help you find more information about patterns and how
they are used are included in the World Wide Web home page at
http://st-www.cs.uiuc.edu/users/patterns.html. The home page also contains
details of forthcoming conferences and examples of patterns and pattern
languages and permits you to subscribe to mailing lists about patterns.
Important papers relating to patterns include "Documenting Frameworks Using
Patterns," by Ralph Johnson (Proceedings of OOPSLA '92), "Design Patterns:
Abstraction and Reuse of Object Oriented Design," by Erich Gamma, Richard
Helm, Ralph Johnson, and John Vlissides (Proceedings of ECOOP '93), "Design
and Reuse in Object-Oriented Frameworks," by Richard Lajoie and Rudolph Keller
(ACFAS, 1994), "Progress on Patterns: Highlights of PLoP '94," by Jim Coplien
(Object Expo Europe, 1994), and "Patterns Generate Architectures," by Kent
Beck and Ralph Johnson (Proceedings of ECOOP '94).
Figure 1 The relationship of toolkits, framework objects, and user-supplied
objects.
Figure 2 The Strategy pattern.
Figure 3 Design patterns.
Figure 4 Increasing design reuse.








































SOFTWARE AND THE LAW


Copying Software Concepts Can Be Legal




Marc E. Brown


Marc is a patent attorney and shareholder of the intellectual-property law
firm of Poms, Smith, Lande, & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at meb@delphi.com.


The best things in life are supposed to be free. But what about software?
Surprisingly, the concepts implemented by software can often be copied without
violating the law, as can many of the algorithms which implement them. What
must be respected are intellectual-property rights in copyrights, patents,
trade secrets, and trade dress. The trick is to identify when these
intellectual-property rights are present--then to steer clear of them.


Copyrights are Usually Present but Weak


By far, copyrights are the most prevalent form of protection for computer
software. But no copyright can stop you from lawfully using the ideas upon
which the software is based. 
Every piece of software will usually be protected by a copyright. Protection
will usually be provided, even if the software has not yet been registered
with the Copyright Office and even if it is not marked anywhere with a
copyright notice. The duration of copyright protection is also very long--a
minimum of 50 years for software written after 1976. Although the validity of
a copyright can be challenged, such challenges are usually not successful. 
On the other hand, the scope of copyright protection is not particularly
broad. 
First, copyrights only protect against copying. Even identical software will
not be an infringement of a copyright if it was created independently, that
is, not copied from the copyrighted software. Be warned, however, that a court
may not believe that identical software was independently created,
particularly when the software is unique and substantial in length. Wise
software developers also embed dummy, nonfunctional code that sticks out like
a sore thumb if the software is duplicated.
Second, copyrights only protect the "form of expression," not the "underlying
ideas." The real difficulty in copying software lawfully lies in
distinguishing between the protectable form of expression and the
unprotectable underlying ideas. The courts have not yet clearly defined this
demarcation line. There are nevertheless several factors which are helpful in
predicting its location.
One important question is whether the copied information represents the only
way of performing a particular operation, or at least one of only a very few
ways. If it does, a court is likely to conclude that only the underlying idea
has been taken.
Another important factor is the degree of abstraction between the copied
information and the underlying code. Slight variations in code are not likely
to avoid a copyright. The use of high-level flow charts or algorithms, on the
other hand, are likely to be viewed as the underlying idea, rather than the
form of expression. Copyright law does not prohibit the extraction of source
code from object code--that is, reverse engineering--to uncover the higher
levels of abstraction.
A further consideration is the quantity of the information which is copied.
Extraction of only a few high-level flowchart segments or algorithms would
probably be permissible. Slavish use of every one of them, particularly if
organized in the same way as the copyrighted software, may well be deemed a
violation. The problem here is that more than ideas are being taken. The
selection, organization, and presentation of those ideas are also being taken,
an appropriation which a court may regard as within the protectable
form-of-expression category.
When dealing with programming languages, therefore, it probably would be
unwise to appropriate the entire set of commands, particularly the syntax. On
the other hand, it is unlikely that a successful copyright-infringement claim
could be maintained if the functionality of only a few commands was used,
particularly if changes are made in the names of the commands and their
syntax.
Of course, some assemblers, interpreters, compilers, and database engines are
specifically designed to run applications written for a competitor's product.
In this case, the exact command names and syntax must be used or the
application will not run. In this instance, the court may find the "copying"
to be legal. Although a complete collection of ideas is being used, complete
use is the only way to implement the desired function, namely to run the
application program written for the competitor's product.
Rarely, on the other hand, must a user interface be fully duplicated to
achieve the needed functionality. Making the interface look and operate
differently is the safest course. On the other hand, copying a few functional
or design elements of the interface would probably be acceptable. Copying all
of it is risky.
In summary, all software should be viewed as being protected by a copyright,
unless there is certain evidence to the contrary. Large segments of code
should never be copied. High-level flow diagrams or algorithms can probably be
copied, but it is risky to copy them all. Whenever possible, have the person
who writes the code be someone different from the person who studied the
original software and extracted the ideas from it. When it is believed that
the underlying ideas are simply duplicative of ideas already in the public
domain, copy from the material in the public domain instead. When in doubt,
consider obtaining a license from the original copyright owner.


Patents are Uncommon but Powerful


You hear a lot of talk these days about whether software is patentable and
whether it should be patentable. While some people are talking, however,
others are applying for and receiving software patents! Quarterdeck, for
example, recently obtained a patent covering a memory-management technique
used in its well-known QEMM memory manager. There also appears to be a growing
acceptance of these patents by the courts and the Patent Office. Although
still not common, software patents therefore do exist. 
U.S. patents are currently enforceable for 17 years, except when the patent
was issued after December 12, 1980, and maintenance fees have not been paid.
Any U.S. patent bearing a number less than 4,000,000 is older than 17 years
and therefore can no longer be enforced. In June of this year, the enforcement
period will increase to 20 years, but will run from the date the application
for patent is filed, not from the date it is granted, as it does now.
It is a lot easier to determine the scope of protection afforded by a patent,
than a copyright. Unlike a copyright, each patent specifically delineates
exactly what it covers. This is accomplished at the end of each patent in one
or more separately numbered "claims." Each "claim" contains a description in
words of exactly what it embraces.
The rule is that a patent is infringed upon when each and every element
recited in a single patent claim is found in the accused product or process.
Sometimes, the absence of a recited element will not avoid infringement if the
accused product or process has an equivalent element which performs
substantially the same function, in substantially the same way, to achieve
substantially the same result. The presence of additional elements or even
improvements is irrelevant if all of the elements recited in a single patent
claim are found in the accused product or process. Unlike copyrights,
moreover, independent creation is no defense. Patents can cover an idea, as
well as the form of expression. They can also cover a process, as well as a
product.
Typically, however, the accused software will not itself contain each of the
elements recited in any single patent claim. Each patent claim will usually
recite additional elements, such as a ROM, microprocessor, or storage device.
Nevertheless, the software will still constitute an infringement, if and only
if the software has no substantial use other than in the product or process
recited in the claim, or if the software is promoted as being useful in such a
product or process. 
Usually, software covered by a patent will be prominently marked "Patented" or
"Pat.," along with the patent number. "Patent Pending" or "Pat. Pend." means
that an application for patent has been filed, but that a patent has not yet
been granted. Until the patent is granted, there is no enforceable patent
right. On the other hand, enforcement steps can sometimes be taken after a
patent has issued against software which was written and sold before the
patent issued. The reason is because use of a patented invention is an
infringement, as well as the manufacture and sale of it. Thus, not even
"Patent Pending" markings should be ignored.
The absence of patent markings does not always ensure against a patent
infringement claim. Like copyrights, patent markings are not an absolute
requirement for enforcement, although their absence will reduce the scope of
remedies which are available. The only sure way of verifying the absence of a
patent is to have a patent search performed. When the name of a suspected
patent owner or developer is known, that name can be looked up in an
"assignee" or "assignor" index. Otherwise, a search of patents by subject
matter can be performed. Subject-matter searches are commonly known as
"infringement searches." Infringement searches must usually be performed at
the Patent Office by a competent searcher under the direction of an attorney. 
There are many sources for obtaining a copy of a U.S. patent. Most large
libraries have them on microfilm. The text of a patent, as well the numbers of
patents owned by a particular company, can also now be obtained from a few
large databases, such as CompuServe. This information can also be obtained
from the Patent Office. 
Once the possibility of an infringement is perceived, the advice of a
knowledgeable patent attorney should be sought. Although the scope of a
patent's protection is far more clearly delineated than with a copyright,
there are numerous factors which bear on that scope. In addition, there are a
broad variety of legal defenses which can be asserted against a patent, even
when it is infringed. 


A Trade Secret is Sometimes Involved 


Some software concepts can be protected as a trade secret. The concept must
not be generally known. It must also be the subject of reasonable efforts by
the owner of the alleged trade secret to maintain its secrecy. Ideas, as well
as expression can be protected.
Identifying the possibility of trade-secret protection requires an analysis of
the circumstances under which the software was obtained. No trade-secret
rights are likely to be afforded if the software was lawfully obtained and if
restrictions on its use and disclosure were not promised. 
Mass-produced software is typically marketed using a "shrink-wrap" license.
The software is packaged in a box which states on the cover that use of the
software is restricted by terms of an enclosed license. After the software is
purchased, the purchaser opens the box and is usually confronted with a
lengthy "License Agreement" in tiny print on the face of an envelope
containing the floppy disks or CD-ROM. Most users never bother to even read
this material.
These shrink-wrap licenses are not likely to be enforced. Moreover, the
wide-spread distribution of this type of software probably negates one of the
other fundamental requirements for trade-secret protection--the requirement
that the information not be generally known. 
Once the possibility of trade-secret protection is eliminated, lawful efforts
can usually be made to extract the underlying source code by
reverse-engineering the software. Unless prohibited by an enforceable contract
(that is, something other than a shrink-wrap license), there is probably no
legal impediment to extracting source code from its object code.
Software developers sometimes claim that the portions which they want to copy
are not a trade secret because they are generally known. In this case, the
best course is to copy from the material in the public domain, not from the
software to which protection is claimed. For evidentiary purposes, the person
copying from the publicly available material should not be the person who had
access to the protected software.



Trade Dress is not About Clothing


As noted, it is not clear today how fully user interfaces are protected by
copyright law. As a consequence, software owners sometimes claim that user
interfaces are protected by the law of "trade dress."
Trade-dress and trademark law are very similar. Trademark law protects a name
or insignia. Trade-dress law protects the way a product is packaged or
presented. Neither are intended to protect the creator's concept. Instead,
they are designed to protect the consumer from being confused into erroneously
believing that one product originates from the same source as another.
Imagine seeing a set of golden arches on a fast-food restaurant along with the
sign "The Best Hamburgers." McDonald's would probably be successful in
claiming that this constitutes an infringement of its trade dress. Although
McDonald's name has not been used, the "dress" of the restaurant is
sufficiently similar to create a likelihood of confusion in the mind of the
consumer. 
Duplication of a user interface is sometimes alleged to create the same
problem. But this infringement theory is not an easy one on which to succeed.
There are also steps which can be taken to minimize exposure to it. The entire
user interface should not be duplicated. A distinguishing trademark for the
new software owner should also be prominently displayed, both on the packaging
and at least in the opening screen which is displayed. If any reasonable
possibility of confusion remains, explicit statements should be added advising
the user that the new software was not created or sponsored by, or associated
with, the owner of the other software.


Conclusion


The basic ideas underlying software can usually be copied with legal impunity.
Only a patent or trade secret could present a bar. Although almost all
software is protected by copyright law, a copyright only bars duplication of
the form of expression, not the underlying idea. Trade-dress problems can also
usually be avoided.
When in doubt, seek the advice of legal counsel. Such advice can help avoid a
potentially expensive and disruptive legal dispute. A favorable legal opinion
can also reduce the chances of a punitive-type remedy being assessed in the
event that an infringement is nevertheless found. 
Insurance against infringement claims is also now being offered by companies
such as Intellectual Property Insurance Corporation. 


For More Information


Copyright Office
Library of Congress
Washington, DC 20559 
202-479-0700
Intellectual Property Insurance Corp.
10503 Timberwood Circle
Louisville, KY 40223
800-537-7863
Software Patent Institute
2901 Hubbard Street
Ann Arbor, MI 48106-1485
313-769-4083
U.S. Patent and Trademark Office
Department of Commerce
Washington, DC 20230
703-557-4636





































































































































































































































































































































































































































































































































































































































EDITORIAL


What Next, the Kitchen Sink(TM)?


When it comes to trademarks and patents, you have to admit that Microsoft has
taken it on the corporate chin. First it was the Stac Electronics settlement,
which resulted in Microsoft coughing up $120 million for infringing on Stac's
data-compression patents. Then Microsoft ponied up another $90 million to
quell Wang Labs' claims of patent infringement over object linking and
embedding (OLE) technology. 
Still, the news hasn't been all bad for the world's biggest software producer.
Microsoft more than made up for any loss of face by throwing a full nelson on
Scanrom Publishing, a one-person Long Island software publishing house. In
naming a CD-ROM that included the Jewish Book of Knowledge, excerpts from the
Torah, music, cookbooks, and folklore "The First Electronic Jewish Bookshelf,"
Scanrom stubbed its toe on a Microsoft trademark claiming ownership of the
word "bookshelf." Microsoft's Bookshelf, you recall, is a collection of
standard reference materials, including a dictionary, thesaurus, world
almanac, encyclopedia, ZIP-code directory, and the like. 
Amazingly, Scanrom founder Irving Green wasn't aware of Microsoft's trademark,
which is emblazoned all over its packaging. The first Green heard about it was
in February, when he received a cease-and-desist letter from Microsoft
lawyers--after he'd shipped several thousand CD-ROMs. Green estimates it will
cost at least $100,000 to change the CD-ROM and its packaging. While that's
pocket change to Microsoft executives, to Green it's real money that he simply
doesn't have. 
You'd think that one of Green's options might be to change the name of his
CD-ROM to something less competitive--"bookcase," for instance. Sorry. Been
there, done that, says Allegro New Media's Barry Cinnamon, publisher of
CD-ROMs such as the Allegro Reference Series Business Library. At one point,
Allegro's CD-ROM sported the tagline "The ultimate business bookshelf," but
Microsoft said no can do. How about changing the tag from "bookshelf" to
"bookcase," countered Cinnamon. No dice, said Microsoft. In other words, say
Microsoft lawyers, if you are publishing a CD-ROM or any other interactive
electronic reference material and you suggest that there's a book-anything in
your digital closet, expect to receive a registered letter postmarked Redmond.
Interestingly, the term "bookshelf" was originally registered in 1987 by Ampro
Computers for its "computers, computer programs and manuals sold...for use in
data base management, word processing and data processing." In 1988, Microsoft
challenged Ampro's trademark, claiming the term "is the common descriptive
name for a library, portfolio or collection of books...and therefore in the
public domain and available for [Microsoft] and other commercial users to use
fairly to describe their goods." After some haggling, Microsoft ended up
buying the rights to the "bookshelf" moniker from Ampro. I leave it to you to
decide if Microsoft has changed its mind about whether or not the term is in
the public domain and available for others to freely use.
Green hasn't thrown in the towel yet. He's currently winding his way through
Patent and Trademark Office (PTO) procedures, filing both a "petition to
cancel" Microsoft's trademark claim as well as a "notice of opposition" to
recent changes Microsoft has proposed to the terms of the claim. Green points
out that the original Ampro trademark, which Microsoft has relied upon for the
past few years, applied to computer hardware. Only recently has Microsoft
moved to change the definition of the trademark to cover computer software
that contains a collection of interactive works.
It's little wonder that Microsoft is roughing up Scanrom. It's called
"precedence," particularly when there are bigger fish to fry. This includes
IBM, which produces (and sells through Counterpoint Publishing) a $99.00
disk-based product called "The Health Care Reform Bookshelf" that provides the
text of Clinton's Health Security Act of 1993, annotations, budget
predictions, and competing pieces of legislation. Likewise, BusinessWeek
magazine presents an interactive electronic version of its "Business
Bookshelf" on America Online. Unless I've missed something, Microsoft has yet
to go after either IBM or McGraw-Hill (publisher of BusinessWeek). By
establishing legal precedence, Microsoft will have a better chance of taking
on those outfits when the time is right.
When patents are registered, concerned third parties can object.
Unfortunately, that's not the case with trademarks. To oppose a trademark, a
"damaged" party must file suit against a trademark holder or applicant with
the PTO. That action is then heard by a PTO review board, which bases its
decision on evidence and depositions provided by the involved parties--no
juries, no witnesses, no Judge Itos. In short, there's not much you and I can
do to protest trademark injustice.
The lesson to be learned in all of this is to name products carefully. Never
take anything for granted--even seemingly innocuous terms can be trademarked.
(A recent search on the name "Bob," for instance, turned up 26 pages of
Bob-related trademarks--including one for "Bob Dylan" and, naturally, another
for Microsoft's new interface software. As with "bookshelf," Microsoft
obtained the rights to "Bob" from another company.) If nothing else, as Irving
Green would certainly agree, a few hundred dollars for a trademark search is
money well spent. 
Jonathan Ericksoneditor-in-chief













































LETTERS


Ada 95


Dear DDJ,
In October 1993, DDJ featured comparisons of various OOP languages. You did a
great service by leveling the playing field and demonstrating how a specific
problem could be coded and solved in each of the selected languages.
I realize that at the time of your article, the updated Ada standard was not
yet available. However, as of February 15, 1995, the Ada 95 language has
become an official ISO, ANSI, and FIPS standard. Given that Ada 95 may in fact
be the first internationally standardized OOPL, I thought I would contribute
an Ada 95 solution to the original DDJ challenge.
Example 1 uses a heterogeneous, linked list of objects that can be dynamically
dispatched (through the polymorphic Print routine) depending on the derivation
of the abstract type called Item.
The freely available GNU GNAT Ada 95 compiler (available from ftp:cs.nyu.edu
in the /pub/gnat directory) was used to compile this code. Besides the OO
features, Ada 95 also introduces improvements in exception handling, generic
programming, memory management, tasking, and programming-in-the-large. 
Paul Pukite
Minneapolis, Minnesota


Stepanov and STL


Dear DDJ,
What I would have given to have had Al Stevens' interview with Alexander
Stepanov (DDJ, March 1995) before I attended PLoP '94 last August. Object
versus generic. What a great pattern. But will the name "generic" stand? I
don't believe so. The concept is too powerful. It also has too much history to
be called "generic" for long. Frankly, I'm amazed it has lasted this long. 
As a nonprogrammer visiting PLoP '94, I went seeking patterns as a function of
Natural Structure--as Natural Phenomenon--the way patterns occur in reasoning
styles, which I have studied for 20 years. Programmers at PLoP, it seemed to
me, saw patterns as being invented, not discovered. It has yet to occur to the
community that they are to be studied as phenomena, the way gravity and
electro-dynamics were studied.
Objectivity, as a style of reasoning, has been studied for millennia. And so
has its complement: subjectivity. The excitement and recognition being
showered on Stepanov and Lee is well deserved. For within the realm of
language as object they have clearly differentiated the realm of language as
subject. STL-style programming, I predict, if it hasn't already happened by
the time you receive this letter, will be called "Subject-Oriented
Programming." Stepanov is right. His generic programming is going to launch a
virtual tidal wave of scientific study and understanding of language and its
structures.
What will happen? If my observations at PLoP are right, I predict the
programming community will rediscover both science and history. Pattern
programming is what the authors of the five books of Moses were doing in
Egypt, while the Chinese authors were writing The Book of Changes and The Book
of the Way.
Stepanov and Lee have opened a new struggle: Within the science of language,
there is a complementary relation between object and subject. As that mystery
unfolds, programmers will surely turn to Oriental history, where complementary
styles in structure were studied in great detail. We will all be amazed to
learn just how much our ancestors knew about such subtle phenomena as they
occur in nature.
Dan Palanza
palanza@delphi.com


Image Authentication


Steve Walton's article "Image Authentication for a Slippery New Age" (DDJ,
April 1995) poses an interesting question: how to prove that a digitized image
has not been tampered with. Unfortunately, his proposed answer is not secure.
A quick review of his method: Walton proposes combining secret, user-generated
keys with checksums of an image. He suggests using the keys as seeds to a
pseudo-random number generator, spreading checksum bits around the image in
the low-order bits of selected pixels.
What would an attack on this system look like? To know that an image has
arrived from Walton without alteration, we must have communicated keys in
advance using some trusted channel. In practice, unless Walton and his
partners want to spend all their time generating, transmitting, and
safeguarding new keys for every image exchanged, they will end up reusing
keys. When they do, the system becomes vulnerable to a chosen-plaintext
attack.
Assume that an enemy has become familiar with the general method (by reading
the source in DDJ). The problem then becomes recovering the actual keys. In a
chosen-plaintext attack, we assume the enemy can actually slip images into the
communication stream, have Walton seal them, and compare the before and after
data. The enemy should choose any two images with the property that the
low-order bits of one image are the ones complement of the low-order bits of
the other; for example, where image A has a 1 in a low-order bit, image B has
a 0 low-order bit in the corresponding pixel.
When the enemy compares images A and B before and after, all pixels where
checksum bits are stored can be identified. (You need two bit-complementary
images to recover all bits, since comparison of one original image with a
sealed version will show only the pixels where the checksum bit is not equal
to the original data bit.)
Let's say that checksums are 32 bits, as in the original article. Remember
that pixel locations are chosen by a pseudo-random number generator. To
recover the original seed for the generator, and thus the key, we need only
determine which of the 32 pixel locations represents the initial state of the
generator. Pick a pixel position, use it as a seed, and if you can run the
generator ahead 31 times without generating a number not in the set of
checksum locations--congratulations, you have broken the code. The whole
procedure is less than a millisecond of computer time.
Using multiple keys does not materially affect solution time, since each key
can be solved for independently. (Walton has designed the system so that bit
locations do not overlap between multiple keys; as he must, since if a key
overlaps in one location, it will overlap in all subsequent locations.)
Okay, we can break the system with two well-chosen Trojan images. How about
the easier and more-practical known-plaintext attack, where an enemy can
compare images before and after sealing but cannot necessarily choose the
images in advance? 
Sure. With only one image before-and-after to work with, the enemy will not
recover all the bits in the checksum--probably only half of them. However, he
could get lucky and recover the first location in the sequence (after all,
there is a 50/50 chance); in which case, he is done. If he isn't lucky, he
needs to determine which is the first location he does have, and then run the
pseudo-random number generator backward from there to recover the initial
state. Knuth shows how to run a linear-congruential generator in reverse.
This, too, is just a few milliseconds of compute time. So, some advice: Don't
use the checksum-and-key method. A better answer to the problem would be to
use the Secure Hash Algorithm to compute a long (160 bits) hash code for an
image, and the Digital Signature Algorithm to apply it as a signature. This
has two advantages: SHA and DSA are believed to be hard to break, and a
public-key scheme such as DSA can provide authentication without having to
exchange and safeguard secret keys.
Andrew T. Wilson
andyw@ibeam.intel.com. 
Steve responds: Thanks for your letter, Andrew. As near as I can tell, you are
absolutely correct except that you assume that the "enemy" has access to
before-and-after images. Without these, my statements as to security stand.
I'll look up your references on reversing random-number generators (if you can
do that, it solves another problem I'm working on) and think a little bit more
about what you have said. As far as incorporating DSA, SHA, RSA, and the like,
this was the answer I had in mind to one of the "Exercises for the Student"
portion, which was left off the published version. It would be a good
improvement. However, my purpose in illustrating the techniques of hiding
things in noise ("Security by Obscurity") was served. 


Network Options


Dear DDJ,
William Stallings' article "Congestion Control in Frame-Relay Networks" (DDJ,
March 1995) prompted me to write about our experience in Texas.
There's a saying that if you build a highway, people will drive on it. Texas
has adapted this idea to information technology and found that if you build an
information infrastructure, people will use it.
According to the Texas Department of Commerce, TEXAS-ONE, a State of Texas-
lead proposal, has been awarded $25 million in the Federal Technology
Reinvestment Project competition. More than 2800 proposals requesting a total
of $8.5 billion were submitted from companies, universities, state and local
governments nationwide. The Texas Open Network Enterprise, TEXAS-ONE, will
serve small- and medium-sized manufacturers in Texas by providing an
electronic-information network like those previously accessed only by large
corporations able to afford infrastructure investment. TEXAS-ONE is a
partnership led by the Texas Department of Commerce, the Microelectronics and
Computer Technology System, the Texas Department of Information Resources, the
Texas Innovation Network System, NASA's Mid-Continent Technology Transfer
Center, and the University of Texas at El Paso.
TEXAS-ONE is a model of what can and needs to be accomplished as we grapple
with the design and implementation of a national information superhighway.
Jimmy A. Castro
Austin, Texas



Setting the Revolution Record Straight


Dear DDJ,
Regarding Jonathan Erickson's "Editorial" in the January 1995 issue of DDJ:
Actually, it was the Mark-8 on the cover of the July 1974 issue of
Radio-Electronics that ushered in the personal-computer revolution. The Altair
on the cover of Popular Electronics didn't arrive for another six months. By
then, the revolution was already underway.
Jon Titus
Milford, Massachusetts


Flash File Systems


Dear DDJ,
Peter Torelli's article, "The Microsoft Flash File System" (DDJ, February
1995) concluded: "If standards begin to solidify and enough resources are
devoted to the development of other operating-system FFS drivers, flash cards
could become a dominant form of data exchange for computer users."
The fact is that flash cards already are becoming a popular form of data
exchange for computer users because there already is a dominant cross-platform
standard for flash cards called the "PC Card ATA Standard." It was developed
by PCMCIA. Many large vendors, including Hewlett-Packard, IBM, Casio, Fujitsu,
Motorola, 3M, and Verbatim, market ATA cards because they are "plug and play"
in thousands of computers, PDAs, handheld data-collection terminals, and
cellular phones.
Incompatibilities between various FFS and FTL products have been resolved by
companies producing flash cards that meet the PC Card ATA Standard.
Nelson Chan Santa Clara, California


Running Light


Dear DDJ,
I do not always have time to read the whole DDJ, but I always take time to
read Michael Swaine's "Programming Paradigms," as well as his (e)musings on
the last page--"Swaine's Flames." Mostly I find his reflections well founded,
although I may not always agree. However, when we disagree on such a
fundamental issue as the meaning of "running light," I have to express my
concern. "Light" in this respect should not primarily be associated with
efficient programming related to bytes or machine cycles as Michael states in
"Programming Paradigms" (DDJ, March 1995), but rather to a minimalistic
approach to functionality. Maybe the idea behind Occam's razor best expresses
my thoughts, with featuritis as its antithesis. A small, fast-running program
is very often--but not always--the result. 
Torbjorn Sund
torbjorn.sund@tf.telenor.no
Example 1: Ada 95.
package PList is -- A printable list of heterogeneous items
 type Item is abstract tagged null record;
 procedure Print (This : Item) is abstract;
 type Element is access Item'Class;
 type List is private;
 function New_List return List;
 procedure Append (To_List : in out List;
 This_Item : in Element);
 procedure Print (This : in List);
private
 type Node;
 type Pointer is access Node;
 type Node is record
 Contents : Element;
 Next : Pointer;
 end record;
 type List is record
 First, Last : Pointer;
 end record;
end PList;
package body PList is -- The implementation of printable list
 function New_List return List is
 begin
 return (List'(First => null, Last => null));
 end New_List;
 procedure Append (To_List : in out List;
 This_Item : in Element) is
 New_Node : Pointer := new Node'(Contents => This_Item, Next => null);
 begin
 if To_List.First = null then
 To_List.First := New_Node;
 To_List.Last := New_Node;

 else
 To_List.Last.Next := New_Node;
 To_List.Last := New_Node;
 end if;
 end Append;
 procedure Print (This : in List) is
 Current : Pointer := This.First;
 begin
 while Current /= null loop
 Print (Current.Contents.all);
 Current := Current.Next;
 end loop;
 end Print;
end PList;
with PList, Text_IO;
procedure Test is -- The main program
 Object_List : PList.List := PList.New_List;
 type Number is new PList.Item with record
 Value : Integer := 0;
 end record;
 procedure Print (This : Number);
 type Point is new PList.Item with record
 X, Y : Integer := 0;
 end record;
 procedure Print (This : Point);
 N1 : aliased Number := (Value => 10);
 N2 : aliased Number := (Value => 20);
 P1 : aliased Point := (X => 2, Y => 3);
 P2 : aliased Point := (X => 4, Y => 5);
 procedure Print (This : Number) is
 begin
 Text_IO.Put_Line ("Num:" & Integer'Image(This.Value));
 end Print;
 procedure Print (This : Point) is
 begin
 Text_IO.Put_Line ("Pt:" & Integer'Image(This.X) & Integer'Image(This.Y));
 end Print;
begin -- Add Number and Point items to list and then print
 PList.Append (To_List => Object_List, This_Item => N1'Access); -- by name
 PList.Append (Object_List, N2'Access); -- or by position
 PList.Append (Object_List, P1'Access);
 PList.Append (Object_List, P2'Access);
 PList.Print (Object_List);
end Test;















































































Single-Image Stereograms


Seeing (double) is believing




Dennis Cronin


Dennis writes drivers for Central Data's scsiTerminal Servers. He can be
contacted at denny@cd.com.


Three-dimensional illusions are showing up in everything from magazine
advertisements to the comic pages of daily newspapers. Although appearing to
be nothing more than a random field of dots or wavy patterns, striking 3-D
images emerge when you "correctly" view the designs. Once you learn to get a
fix on an image (and almost everyone can), you can look around the virtual 3-D
image just like looking out a window. Figure 1 is a typical stereogram in
which the word "SONY" appears. If you haven't been able to pick the images
out, understanding the concept behind them may help you experience the
illusion.
In this article, I'll discuss how the illusion works and the origins of the
technique. I'll also examine the basic algorithm for generating the images and
present a sample program (available electronically; see "Availability," page
3) that lets you display 3-D images on your PC screen. You'll then be able to
quickly design and generate your own custom 3-D illusions using a standard PC
paint program.


A 3-D Backgrounder


The terms "single-image stereogram" and "autostereogram" refer to a 3-D
illusion composed of only one image and requiring no special viewing
apparatus. Other types of stereograms use two small, side-by-side images or
require special glasses or other optics for viewing.
The most basic single-image stereogram is the single-image, random-dot
stereogram (SIRDS), which looks like a field of random dots with no apparent
texture or pattern. In its simplest form, the image is composed only of black
and white dots, yet a vivid 3-D image is clearly visible when viewed
correctly. Commercial 3-D illusion posters take SIRDS a step further,
replacing the TV-not-tuned-in dot field with a more visually appealing texture
or repeated pattern. Nevertheless, the principle behind the illusion is the
same.
The current crop of 3-D illusions has its roots in basic vision research. Bela
Julesz is generally credited with being the first to use computer-generated,
random-dot images to create a sense of depth. In his early-1960s
depth-perception studies, Julesz used pairs of random-dot images to
demonstrate that a sense of depth could be achieved with no other visual cues.
Christopher Tyler and Maureen Clark, in turn, are generally credited for
combining two images into a single, random-dot image circa 1990, creating the
forerunner of today's gift-shop rage.
Since then, numerous companies and individuals have advanced the art with
clever posters, books, and online images of autostereograms. The newsgroup
alt.3d, for instance, carries a steady discussion of SIRDS-related issues, and
the FTP site katz.anu.edu.au is probably the most active central clearing
house of autostereograms, information, and programs (see the directory
/pub/stereograms).


Making a Point


To understand how single-image stereograms work, I'll first examine the most
fundamental case of how you make a single dot appear at some point out in
virtual 3-D space. 
Assume that you want to make point A appear somewhere off in the distance
beyond the plane of the paper (or screen). Imagine for a minute that the image
is transparent; your eyes will have to converge (or triangulate) to view point
A off beyond the plane of the image. Note the points where the rays from each
eye intersect the plane of the image; see Figure 2. By placing a pair of dots
at exactly those locations, you can imply the first point in the 3-D
landscape.
It's not easy to get much feeling of depth from setting up just one virtual
point with two dots. The brain has no additional cues to help it interpret
those two lonely dots as a magic point in deep space. The effect doesn't kick
in until you start to build a larger set of information for your brain to cue
from.
Now let's see what happens if you want to make a dot appear somewhere further
away than point A. Referring again to Figure 2, notice which rays converge at
point B and where they cross the image plane. They are slightly farther apart
than the two points identified for point A.
With some basic geometry, you can formulate the distance of that virtual point
as a function of dot separation; see Example 1. Figure 3 shows the basic
convergence diagram again, but with the parameters in Example 1 indicated. To
give a convincing illusion of depth, you have to build a complete system of
dots that map out a 3-D scene. In doing so, you will be able to use the
formula in Example 1 to map out a system of dot pairs such that a complete 3-D
scene will be visible to the unaided eye. 


Getting the Effect


As you stare at a stereogram, you shift the convergence of your eyes and let
your focus wander. When you triangulate on a normal object, your brain tends
to select a focal length that will closely match the distance implied by your
eyes' convergence.
To see a stereogram, you need to break focal length away from triangulation.
When you find the exact point of triangulation that makes the dot pairs
overlap, a portion of the 3-D image fuses. When your brain stumbles on the
right triangulation and starts to note this image appearing, it will attempt
to adjust focus to cause the image to solidify. For some people, this
separation of triangulation and focus comes quite easily; others have trouble
with it.
When the image finally locks in for you, it's astounding how naturally the
brain adjusts its vision machinery to this new set of rules. With a little
practice, you can effortlessly maintain a good lock on the image as you look
around in it. The effect for most people is exhilarating--some of the fun
comes from just seeing the image appear, and some of it comes from the
strange, unnatural feeling of having your eyes operating in a way they're not
used to.


Making the Scene


Rendering a scene is a three-step process:
1. Develop a 2-D depth map of the scene to be displayed.
2. Process the depth map and build a map of dot-pair constraints.
3. Assign colors such that the constraints are met.
In Step #1, you scan the scene you want to render, developing a depth map of Z
values for every point in the scene at the desired resolution. There are
various ways to accomplish this, but this simple approach suffices: Imagine a
line perpendicular to the scene scanning the scene side-to-side and
top-to-bottom. You take a perpendicular-depth reading for every point and
record that in your depth map. While this technique ignores some basic tenets
of real 3-D geometry, it is more than adequate for generating these illusions.
Depth maps can also be generated which are not based on any real 3-D scene. I
use a paint program to generate a depth map using different colors to
represent different depths. The test program in Listing One (page 92)
generates this type of color depth map using a simple mathematical formula.
In Step #2 of the rendering process, you develop a constraint map that
describes all the dot pairings necessary to create the final image. You don't
describe what color the dot pairs have to be, just which ones have to match
which other ones.
A major simplifying assumption is that humans keep their heads oriented
vertically with respect to the image. Thus, for every point in virtual 3-D
space, you need define only two dots along the horizontal axis to imply that
virtual point. In fact, if you tilt your head slightly when viewing a
commercial stereogram, you will quickly lose the image.
This assumption allows you to break the problem into a single case of
rendering one horizontal line of the scene at a time. When you have devised an
algorithm for rendering a single horizontal line, you are ready to generate
the constraint map for the entire scene.

The traditional algorithm for constraint mapping is quite simple. An
adaptation of it (included in the sample program provided electronically)
looks like Example 2, where base_period is the number of pixels between the
dot pairs with the greatest separation. Picking this value correctly is
critical to the success of the illusion. Usually, base_period pixels should
map to 1.5-2 inches on the output device. cmap is the array in which you build
the constraint map. The get_depth function returns a small integer value
representing the z-coordinate (or depth), with larger values for closer
virtual points. An index value is just an ID used to identify points which
must be the same color.
This algorithm simply scans a horizontal line, "looking back" and copying a
previous index value. For closer virtual points, it looks back a smaller
number of points; for maximum depth (get_depth returns 0), it looks back the
full base_period number of points.
While this algorithm does a nice job, several limitations make it unusable for
more-serious stereogram work. A right-shift distortion becomes apparent as the
depth variations increase and, more fatally, it can develop rather distracting
artifacts or echoes around sudden jumps in depth.
In Step #3, you pick colors for all the points in the scene adhering to the
constraints developed in Step #2. For black-and-white stereograms, picking
colors can be as simple as looking at each possible index value in the
constraint map and randomly assigning it to be either black or white. For
color stereograms, you randomly assign a color value from an available
palette. Note that constraint-index values carry no meaning outside an
individual horizontal line. A given value can be assigned one color in one
line and another in the next, as long as consistency is maintained within each
individual line.
As you assign a color to each constraint index in a line, you scan the line
and replace all occurrences of the index with the selected final-output color
value. When you have finished processing all the lines this way, you've
created the classic SIRDS.
While a SIRDS certainly provides the 3-D experience, the random nature of the
color-assignment process can lend a drab sameness to the final product. You
can improve the aesthetics of the image by embellishing Step #3. 


Just Stare, It's There


There are techniques to help spot the image in these illusions. The most
well-known is to put the image behind glass and look for your reflection in
the glass. Since the sample program displays directly to your computer screen,
all you have to do is orient your screen and/or adjust the ambient lighting
until you can see your reflection in the screen. This technique helps you
achieve a fixed triangulation beyond the plane of the image. With a little
practice, your brain will spot the emerging lines and acquire a focal lock on
this image.
Note that this technique will not work with all stereograms--it is suited for
stereograms where the virtual image is rendered to appear at approximately
twice the viewing distance. These stereograms use pattern separations mostly
in the 1.25- to 1.5-inch range. It is also possible to make stereograms that
use wider separations, requiring you to stare farther off into the distance to
get the proper triangulation. Although these can be more difficult for novice
viewers, I find them somewhat more breathtaking, possibly because the vision
machinery is operating even farther out of its normal parameters using close
focal lengths and very distant triangulations.
Some stereograms provide a pair of registration dots immediately above or
below the main image. By staring at these dots and adjusting your
triangulation until you see exactly three dots, you can set your eyes to the
depth for which the image was designed.
Another technique for viewing images on paper is to pick an object in the
distance and focus on it, then slowly interpose the stereogram, trying not to
move your eyes. This is helpful in acquiring the image in stereograms designed
with more-distant triangulations.
If you don't find the image right away, keep trying. Almost anyone with normal
or reasonably corrected binocular vision can eventually perceive the image.
I've seen people spot the image within 30 seconds, and I've seen them struggle
for hours. It generally gets easier with practice.


Going into Depth


In the SIRDS algorithm, the right-hand dot of the dot pair is always in fixed
relation to the point in the depth map. This means that at closer virtual
distances, the center of the dot pair is skewed to the right of the actual
point in the depth map. You can easily correct this by calculating the
necessary separation as before, then splitting it equally and symmetrically
about the point in the depth map.
It is important to clean up artifacts (or echoes), false images that result
from unplanned repetitions in the pattern, or sudden depth changes. The paper
"Displaying 3D Images: Algorithms for Single Image Random Dot Stereograms," by
Thimbleby, Inglis, and Witten (available via FTP from katz.anu.edu.au in
/pub/stereograms/papers) covers this artifacting phenomenon in some detail.
Thimbleby et al. propose a method for removing artifacts based on hidden
surface removal (HSR); however, this method removes portions of the image
necessary for a solid illusion. This results in blurry areas to the sides of
sudden depth transitions. HSR assumes that if a point in the scene is hidden
from one eye by a foreground object, then constraints need not be developed
for a dot pair describing that partially hidden point. While there is some
geometric basis for this approach, it doesn't benefit the overall effect of
the illusion. In real life, when a point is hidden from one eye, the other eye
can still focus on it solo, providing some information about it. Illusions
benefit from providing information about this partially hidden point to both
eyes, even though this defies real-life physics.
Despite its faults, HSR is on the right track: The solution to artifacts is to
deconstrain points immediately to either side of the edges of foreground
objects. You deconstrain only the number of points necessary to control
unwanted repetitions of the pattern. This lets you reduce the number of points
on which you sacrifice constraints, yielding a more-solid image overall while
still removing artifacts.
The program 3D.C (available electronically in both source and executable form)
allows the artifact-removal parameter to be varied. At its maximum value of
10, it removes more constraints than it really needs, which leads to blurring.
At its nominal value of 7, the program removes artifacts from all images I've
tested, causing only slight, mostly unnoticeable blurring around foreground
objects. (I haven't tried to prove that this technique can fully remove
artifacts--I'll leave that to commercial stereogram developers.) The routine
symmetrical_constraint_map in 3D.C gives details about the artifact-removal
process.
Many commercial stereograms have scenes that appear to have smooth, contoured,
3-D surfaces. In other stereograms, including those generated to the screen by
3D.C, the levels are discrete and distinguishable. This is not a fault of the
rendering algorithm, but a product of the relatively low resolution of the
output device. A 53-dpi EGA/VGA screen can't provide the smooth depth changes
because the pixel width is too coarse to present very fine changes in spacing.


Colors, Textures, and Sprites


Coloring the final image is an art. At this stage, you have a constraint map,
and as long as you assign colors based on the constraints, you can do just
about anything. A simple method for improving the look of the stereogram is to
map a texture onto the constraint map. An interesting texture is more
appealing than a spray of random dots, and it may make it easier to lock onto
the image from a distance, since the elements of the texture are coarser than
little random dots. See the color_texture routine in 3D.C for my approach to
developing textures on top of the constraints.
Possibly the most ingenious method of coloring a stereogram is to map a
repeated smaller image onto the constraint maps. This yields an attractive
wallpaper effect when viewed normally; upon achieving proper focus, the 3-D
image jumps out just as crisply as from any random dot or textured image. The
scene itself then looks as if its been wallpapered.
I've borrowed the term "sprite" from computer-game animation to describe the
little image to be repeated. The sprite-mapping process should start near the
center of the image and proceed outward. Some distortion of the sprite is
inherent to the process, and starting at the center makes its mangling more
symmetrical. Certain subjects may warrant starting the sprite mapping at a
particular part of the screen so that the main subject matter gets mapped with
relatively intact sprites. 
Clever choice of material for both the sprite and the 3-D scene can conceal
the sprite distortions. Remember that this sprite-mapping technique is as much
art as science and may require extensive fiddling to generate a pleasing final
product.
One additional trick 3D.C provides to conceal some of the side effects of the
sprite distortion is the ability to sweep the sprite image from side to side
in a gentle, wavy motion. This helps disguise fragmentation of the sprite
image and can add some interesting effects to the sprite itself. 


About the Example Program


The 3D.C example stereogram program showcases some of the popular stereogram
techniques, displaying the images directly to the screen of your computer
monitor. It was designed for use by the largest number of PC users: It runs in
DOS mode, requires no extended memory, needs only 640x480 EGA/VGA graphics
mode, and is built with a standard, 16-bit C compiler. To easily and quickly
generate a custom 3-D scene, you use a standard, PCX-format file as a 3-D
depth map. You can use any PC paint program that can generate a 16-color PCX
image to edit a scene, with colors 0-15 being mapped to 16 depth levels.
While it would have been nice to support more levels of depth, higher
resolutions, and larger image sizes, PC memory limitations and compiler issues
constrained the program design. However, you should have no trouble expanding
upon the basic functionality.
The program was built and tested with Borland Turbo C 2.0 and Borland C++ 2.0.
Borland's graphics driver module EGAVGA.BGI must be accessible to the program
at run time. If you use another compiler, you will need the graphics
primitives or equivalents in Table 1.


Painting in Space


Before editing a scene, you will need to work out what colors map to what
color numbers so you can control depth. Using the Edit Colors option, you can
specify the color components. Set up a palette as in Table 2 and save it to a
file using the Save Colors option. With this palette, color 0 (black) will be
the farthest away, while color 15 (white) will yield the closest foreground
image. Once you have established a known palette, you are ready to begin
editing depth maps.
Next, build a simple, depth-map image with the paint program. Start with
large, uncomplicated shapes until you get a feel for how things work in 3-D.
Your image will be more believable if you maintain depth order and don't let
background-depth colors obscure foreground images. Once you have an image you
want to look at in 3-D, save it as a PCX file and exit the paint program.
Run 3D.EXE in DOS mode to render the scene; it runs extremely slow under
Windows. Invoking 3D with no options other than the filename of the depth map
(3D [filename.pcx]) causes it to default to a textured-coloring mode, picking
random image colors and control parameters. First it loads and displays the
depth-map image, then renders the image to 3D, line by line. An average image
takes about 30 seconds on a 486/33. 
When you are done viewing the image, hit any key to exit. As the program
exits, it prints out the random parameters it selected. You can use the
parameters in command-line options to (more or less) re-create an image. 
Two other coloring modes are possible besides the default. To generate a
traditional SIRDS using random dots, specify the -r option, or use the -s
option to render the image using a sprite as coloring material. To view an
image using the traditional constraint-mapping algorithm, use the -f option.
The artifact-removal option (-a) has no effect in this mode.
Images can be dramatically altered by using the control parameters horizontal,
vertical, and wiggles. These have different meaning in texture and sprite
modes; in random-dot mode, they are meaningless.
In texture mode, the "horizontal" parameter determines the length of
horizontal runs of like-colored pixels. The vertical parameter is a True/False
value (1/0) that enables vertical runs. The wiggles parameter can be anything
in the range 0-500 and only takes effect when the horizontal parameter is
greater than 1. If wiggles is 0, the horizontal runs all drift right; if
wiggles is greater than 480, the horizontal runs all drift left; in all other
cases, the horizontal drifts reverse direction every line.
In sprite mode, the parameter's functions are somewhat different. The
horizontal parameter determines the maximum amount of horizontal waviness that
can be injected into the sprite mapping. The vertical parameter can be: -1,
where wiggles start at top of screen and decrease toward bottom; 0, where
wiggle depth is constant top to bottom; and 1, where wiggle depth increases
toward the bottom of the screen. The wiggles parameter controls the wiggle
frequency, as in the texturing mode.
While the limited depth resolution resulting from the use of a 16-color PCX
image as a depth map is restrictive, clever scene design can still result in
striking stereograms.



The Egg-Carton Test


Listing One is EGGCARTN.C, a test program. Compiling this program lets you see
stereograms without editing a depth-map yourself. The program generates an
example PCX depth-map file called "EGGCARTN.PCX." Invoking the program with an
optional command-line parameter 0-15 causes it to generate different surfaces
of varying degrees of interest.
In DOS mode, you should first run the test program to generate the PCX depth
map (type EGGCARTN and press Enter). Next run the stereogram program to view
the results by entering 3D EGGCARTN.PCX. You should be able to see a repeating
contour not unlike that of an egg carton.


Conclusion


Many areas of single-image-stereogram generation are still being explored. An
example is "shimmering," which involves rendering the image several different
ways, then rapidly flipping the graphics page between the different images.
The image appears very solidly, but has a shimmering quality. Some people find
the images easier to view this way. Shimmering paves the way for possible
stereogram animation.
So keep your eyes peeled. That apparently random texture on the stone front of
a building, the slight shift in the wallpaper of your company's bathroom, the
advertisement with the undulating, repeated logo in the background (that one's
real already)--they all might be carrying secret messages in 3-D.
Figure 1: Typical stereogram (courtesy of NVision Grafix and Sony Corp.).
This figure is unavailable in electronic format.
Figure 2: Making point A appear in the distance beyond the plane of the
screen.
Figure 3: Basic convergence in Figure 1, with the parameters in Example 1
indicated. 
Example 1: Equation for determining the distance of a virtual point as a
function of dot separation. virt=virtual distance of point; sep=spacing
between the dot pair; eyes=spacing between the eyes.
 view*eyes
virt=-----------
 eyes-sep
Example 2: Basic constraint-mapping algorithm. 
simple_constraint_map(int y)
{
 int lx, /* left hand X coor */
 mx, /* middle X coor */
 rx, /* right hand X coor */
 dx; /* left hand coor w/ delta */
 for(rx = 0,
 lx = -base_period,
 mx = lx / 2,
 index = CMAP_INDEX;
 rx < MAX_X;
 rx++, mx++, lx++)
 {
 /* if dx on screen, copy to right */
 if((dx = lx + get_depth(mx,y)) >= 0)
 cmap[rx] = cmap[dx];
 /* otherwise, just pick new index */
 else
 cmap[rx] = index++;
 }
}
Table 1: Graphic primitives required by the 3D program.
Function Description
initgraph Sets up 640x480 EGA/VGA graphics mode.
closegraph Restores original screen mode.
setpixel Plots a point on screen with a specified color.
getpixel Returns the color at a point on screen.
setcolor Sets the draw color for line drawing.
line Draws a line from point x1,y1 to point x2,y2.
Table 2: Setting up the palette.
 Color # Red Green Blue
 0 0 0 0
 1 128 0 0
 2 0 128 0
 3 128 128 0
 4 0 0 128
 5 128 0 128

 6 0 128 128
 7 128 128 128
 8 192 192 192
 9 255 0 0
 10 0 255 0
 11 255 255 0
 12 0 0 255
 13 255 0 255
 14 0 255 255
 15 255 255 255

Listing One
/* EGGCARTN.C - generates a sample PCX depth map file called "eggcartn.pcx" 
for testing 3D.C stereogram program. Command-line param 0-15 generates 
different surfaces */
#include <stdio.h>
#include <graphics.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <conio.h>
#define MAX_X 640
#define MAX_Y 480
#define BPP 80 /* bytes per plane */
#define NPLANES 4 /* number of color planes */
void main(int argc,char **argv);
void doodle(void);
void dump_screen(void);
void put_line(unsigned char *data,int cnt);
void out_run(int byte,int rep_cnt);
void open_pcx(char *name);
void init_graphics(void);
double func(int val);
int mode;
double eggsz = 200.0;
/* main */
void main(int argc,char **argv)
{
 if(argc == 2)
 mode = atoi(argv[1]);
 open_pcx("eggcartn.pcx");
 init_graphics();
 doodle();
 dump_screen();
 closegraph();
 printf("Output in 'eggcartn.pcx'.\n");
 exit(0);
} 
/* doodle */
void doodle(void)
{
 int x,y,color;
 double xf,yf;
 if(mode & 4) eggsz *= 2.0;
 for(y = 0; y < MAX_Y; y++) {
 if(kbhit()) {
 getch();
 closegraph();
 exit(0);

 }
 yf = func(y);
 for(x = 0; x < MAX_X; x++) {
 xf = func(x);
 color = (int)(7.99 * (1 + xf * yf));
 putpixel(x,y,color);
 }
 }
}
/* func */
double func(int val)
{
 double res;
 
 res = mode & 1 ? val % (int) eggsz : val;
 if(mode & 2) {
 res /= eggsz / 2.0;
 res -= 1.0;
 }
 else {
 res *= M_PI / eggsz;
 res = cos(res);
 }
 return(mode & 8 ? res * 1.666 : res);
}
/* stuff for PCX dump routines */
struct pcx_header {
 char manufacturer, version, encoding, bits_per_pixel;
 int xmin, ymin, xmax, ymax, hres, vres;
 char colormap[16 * 3];
 char reserved, num_planes;
 int bytes_per_line, palette_code;
 char filler[58];
} header = {
 10, /* manu */ 5, /* version */
 1, /* encoding */ 1, /* bits per pixel */
 0, /* xmin */ 0, /* ymin */
 639, /* xmax */ 479, /* ymax */
 800, /* src hres */ 600, /* src vres */
 0,0,0, /* color 0 */ 0x80,0,0, /* color 1 */
 0,0x80,0, /* color 2 */ 0x80,0x80,0, /* color 3 */
 0,0,0x80, /* color 4 */ 0x80,0,0x80, /* color 5 */
 0,0x80,0x80, /* color 6 */ 0x80,0x80,0x80, /* color 7 */
 0xc0,0xc0,0xc0, /* color 8 */ 0xff,0,0, /* color 9 */
 0,0xff,0, /* color 10 */ 0xff,0xff,0, /* color 11 */
 0,0,0xff, /* color 12 */ 0xff,0,0xff, /* color 13 */
 0,0xff,0xff, /* color 14 */ 0xff,0xff,0xff, /* color 15 */
 0, /* reserved */ 4, /* # planes */
 80, /* bytes per line */
};
unsigned char planes[BPP * NPLANES];
FILE *pcx_fp;
#define pcx_putc(c) fputc(c,pcx_fp);
/* dump_screen */
void dump_screen(void)
{
 int x,y,color,mask;
 unsigned char *p;
 static masktab[] = {0x80,0x40,0x20,0x10,0x8,0x4,0x2,0x1};

 /* write PCX header */
 for(x = 0, p = (unsigned char *)&header; x < sizeof(header); x++)
 pcx_putc(*p++);
 /* write out the screen */ 
 for(y = 0; y < MAX_Y; y++) {
 memset(planes,0,sizeof(planes)); /* clear planes */
 for(x = 0; x < MAX_X; x++) { /* break color into sep planes */
 color = getpixel(x,y);
 mask = masktab[x & 7];
 p = &planes[x >> 3];
 if(color & 1) *p = mask;
 p += BPP;
 if(color & 2) *p = mask;
 p += BPP;
 if(color & 4) *p = mask;
 p += BPP;
 if(color & 8) *p = mask;
 }
 put_line(planes,BPP * NPLANES);
 } 
}
/* put_line */
void put_line(unsigned char *data,int cnt)
{
 int i,byte,rep_cnt;
 for(i = rep_cnt = 0; i < cnt; i++) {
 if(rep_cnt == 0) { /* no "current byte" */
 byte = data[i];
 rep_cnt = 1;
 continue;
 }
 if(data[i] == byte) { /* same as previous, inc run length */
 rep_cnt++;
 /* full run then output */
 if(rep_cnt == 0x3f) {
 out_run(byte,rep_cnt);
 rep_cnt = 0;
 }
 continue;
 }
 out_run(byte,rep_cnt); /* not equal to previous */
 byte = data[i];
 rep_cnt = 1;
 }
 if(rep_cnt) /* shove out any stragglers */
 out_run(byte,rep_cnt);
}
/* out_run */
void out_run(int byte,int rep_cnt)
{
 if((byte & 0xc0) == 0xc0 rep_cnt > 1)
 pcx_putc(0xc0 rep_cnt);
 pcx_putc(byte);
}
/* open_pcx */
void open_pcx(char *name)
{
 pcx_fp = fopen(name,"wb");
 if(pcx_fp == NULL) {

 printf("Can't open output PCX file '%s'.\n",name);
 exit(1);
 }
}
/* init_graphics */
void init_graphics(void)
{
 int graphdriver,graphmode,grapherr;
 /* set VGA 640x480 graphics mode */
 graphdriver = VGA;
 graphmode = VGAHI;
 initgraph(&graphdriver,&graphmode,"" );
 if((grapherr = graphresult()) != grOk) {
 printf("Graphics init error: %s\n",grapherrormsg(grapherr));
 exit(1);
 }
 /* make double sure we hit 640x480 mode */
 if(getmaxx() != MAX_X - 1 getmaxy() != MAX_Y - 1) {
 closegraph();
 printf("Wrong screen size, x=%u y=%u.\n",getmaxx(),getmaxy());
 exit(1);
 }









































A Ray-Casting Engine in C++


Trading realism for speed




Mark Seminatore


Mark is a mechanical engineer with General Electric. He has been programming
for over 15 years and specializes in numerical analysis and optimization. Mark
can be reached on CompuServe at 72040,145.


Ray casting is a real-time, three-dimensional rendering technique that's at
the heart of many computer-graphics applications, particularly games. The
allure of ray casting is its capacity to produce very realistic 3-D images at
extremely high frame rates.
Ray casting is similar enough to ray tracing that some books treat the two
synonymously. Both techniques trace imaginary rays of light from the display
plane through a data structure representing a 3-D world. Both look for
intersections between these rays and the 3-D world to determine what is
visible. Ray casting, however, sacrifices realism for speed by placing
constraints on the 3-D world. In this article, I'll discuss the theory behind
ray casting, present a ray-casting engine called "Raycastr," and address
issues such as performance and optimization. 


Ray-Casting Basics


In ray casting, the 3-D world is represented by a 2-D map of wall
cubes--fixed-size cubes with a texture-mapped image on each face. The 2-D map
is just an overhead view of the 3-D world. Typically, you would like to have
several types of wall cubes, such as a brick wall and a wooden wall. So, a
style code is assigned to the wall cubes and open spaces are given a style
code of 0. A simple 2-D map might then look something like Figure 1. Note that
the room is totally enclosed by wall cubes of style 1 (brick walls), and there
is a style-2 wall cube (perhaps a wooden wall) on the right-most wall. The
point of view (POV) location and orientation is shown with a ">". (POV
identifies the current x,y location and view angle within the 3-D world. For
realism, the POV encompasses a 60-degree field of vision. Only wall slices 30
degrees to either side of the POV are visible.) An imaginary ray of light is
then "cast" from the POV through the 2-D map. The ray continues until it finds
a wall cube or the map boundary. When a wall cube is hit, you use the ray
endpoints as an index into the 2-D map and look up the style code. What the
ray actually sees is a wall slice--a vertical piece of a wall cube, the
smallest visual element in our 3-D world. (Only one wall slice can be drawn
for each column of the video display. The image generated by ray casting is
made up of a number of wall slices; everything visible is either a floor, a
ceiling, or a wall slice.) If the ray reaches the map boundaries, it's assumed
that the ray saw nothing.
Once the ray endpoints (x2,y2) are known, consider the ray to be the
hypotenuse of a right triangle. Then, use the Pythagorean theorem to calculate
the distance from the POV (x1,y1) to the wall slice. In Figure 2, the distance
calculated is relative to a single point, the POV. If you use this distance,
however, your view of the 3-D world will be distorted into a curved image.
This is sometimes referred to as the "fish-bowl" effect. To solve this
problem, you need the distance relative to the flat plane of the video
display, which is found by multiplying the distance by the cosine of the view
angle. This cosine-corrected distance can then be used to determine the height
of the wall slice. I make the assumption that the perceived wall height is
proportional to 1/Distance'. The ratio of wall height to viewport height is
the scale factor used for drawing the wall slice on the screen. (The viewport
height is the part of the video screen used to display the ray-casting image;
viewport width determines the number of ray casts required, while height
scales the texture-mapped wall slices.) The ray endpoints also allow you to
calculate what column of the wall cube was struck. The column determines what
part of the bitmap tile to draw on the wall slice. For example, if a ray
strikes column 6 of a wall cube, draw column 6 of the bitmap tile on the wall
slice.
For a single ray, you know the style code, the distance (and therefore the
height), and which column of the wall cube is visible. You continue to cast
rays for each column of the viewport. Once this information is known, a single
screen of graphics can be generated. The entire process is repeated
continuously for the duration of the program.


Design Decisions


In building the Raycastr ray-casting engine, I had to make several design
decisions, many driven by performance. The code uses the standard VGA mode 13h
(320x200x256). You could use a different video mode, such as the tweaked VGA
mode X (see Michael Abrash's "Graphics Programming" column, DDJ, July 1991),
or one of the Super VGA modes. Mode 13h is the easiest to understand and
implement. The bitmap tiles are 256-color PCX files with a resolution of
64x64, sharing a single color palette.
For keyboard control, I avoided DOS and BIOS interrupts and installed a custom
keyboard-interrupt handler. This improves response and eliminates the keyboard
buffer, which causes an annoying beep when it overflows. The viewport size is
240 columns by 120 rows. A smaller viewport would improve performance but the
goal is to provide a reasonably large view space. 
The bitmap tiles are stored in an array of pointers; see Listing One. The
first array element is a NULL pointer since style code 0 is reserved for open
spaces. The array index corresponds to the wall-cube style code. Next, I
define a data structure for the POV. The structure holds the current (x,y)
location and view angle within the 3-D world; see Listing Two. A data
structure is also needed to hold the results of the ray casts for each video
column. An array of this structure will hold the wall-cube style, wall-slice
column, and the distance for each viewport column.
Several tables are required for efficient texture mapping. The first, in
Listing One, is a table of offsets into video memory; this avoids calculating
offsets inside loops. The second table relates the wall-slice height to
distance. The last is the ratio of bitmap height to wall-slice heights.


The Map Representation


The previous explanation of ray casting made use of the Pythagorean theorem to
calculate the distance from the POV to a wall cube. In PC graphics, however,
you cannot afford square roots or floating-point math of any kind. Ideally,
the inner loops should involve only integer addition or subtraction.
Therefore, you must find another method of calculating the distance from the
POV to a wall cube. The solution is to cast two rays instead of one: One ray
looks only for walls in the XZ plane, the other for walls in the YZ plane.
Also, since parallel walls are a multiple of a fixed distance (64 units) from
each other, you only need to look for walls at these locations. This optimizes
ray casting considerably.
Consider an XZ ray with the POV at coordinates 96,96 and a view angle of 0
degrees (facing right). The ray is cast at an angle of 30 degrees relative to
the POV. You find the distance dx from the POV to the first x-wall the ray
strikes. The distance dy from the POV to the x-wall is dx multiplied by the
tangent of the ray angle. Now check the 2-D map for a wall cube at location
x+dx and y+dy. If there is no wall cube at this location, you add 64 to x and
add dy to y, then check again. This continues until a wall cube is found or
the end of the map is reached. When a wall cube is found, the ray endpoints
are returned.
A similar process is followed for the YZ ray: Find the distance dy to the
first Y-wall struck by the ray. The distance dx from the POV to the y-wall,
however, is now dy multiplied by 1/tangent of the view angle. Check the
yz-wall map for a wall cube. If no wall cube is found, add 64 to y and dx to x
and continue. Compare the distance for both rays and use only the shorter one.
The path of the rays and the trigonometry involved is easier to visualize if
they are drawn as right triangles. Consider the two right triangles in Figure
3. The 2-D world map must now be reformatted so that it represents wall
boundaries instead of wall cubes. Since each wall cube has two xz walls, there
is an additional x-wall boundary, likewise for the y and z walls. The
wall-boundary maps are stored in arrays of type char, which are allocated at
run time; see Listing One.
All distance calculations use fixed-point arithmetic for performance reasons.
In addition, the standard C floating-point trigonometry functions cannot be
used. Instead, all the trigonometry values are precalculated and stored as
fixed-point numbers in look-up tables. The data structures for these tables
are arrays of long integers allocated at run time.


The Raycastr Code


I compiled Raycastr using Borland C++ 3.1 and Turbo Assembler, but it should
be reasonably compatible with other compilers. The complete source code, which
consists of several modules, is available electronically; see "Availability,"
page 3. The code can logically be grouped into three phases: initialization,
animation, and clean-up and exit. The main() function (see Listing Three)
calls Initialize(), which in turn calls several other initialization routines.
InitKeyboard() installs the new keyboard-interrupt handler, InitVideo() sets
the video mode to 13h, and InitHeightTable() and InitTrigTables() build the
various lookup tables. Finally, InitBitmap() loads the palette file and the
PCX bitmaps, and InitMap() loads the 2-D map file and builds the wall arrays.
The animation phase starts with EventLoop(), which checks the keyboard status.
If the keyboard status indicates motion forward or backward, EventLoop() calls
HitTest() to validate the movement and updates the POV state. The HitTest()
function is really a simplified version of GenView(): It calls the ray
functions to find the distance from the POV to wall cubes in the direction of
movement. If the distance is too short, the function returns a nonzero value.
The return value determines whether movement in a particular direction is
valid.
EventLoop() then calls GenView(), which calls RayXZ() and RayYZ() for each
viewport column. RayXZ() and RayYZ() cast rays to find visible wall cubes.
Based on the results of the ray functions, GenView() fills in the View[]
structure, stores the distance to the farthest wall cube in FarthestWall, and
then calls DrawFrame(); see Listing Four. DrawFrame() uses the information in
the View[] array to perform the texture mapping required to generate one frame
of graphics. The function finds the height (h) of the shortest wall slice
using FarthestWall to index HeightTable[]. If h is greater than VIEW_HEIGHT,
no ceiling or floor is drawn. Otherwise, the ceiling color is drawn from the
top row of the viewport to the row (VIEW_HEIGHT-h)/2. The floor is drawn
similarly, starting from the bottom viewport row.
Next, wall slices are drawn for each viewport column. The Display pointer is
initialized to ScreenBuffer, plus the viewport column, and Bmp points to the
wall-slice style bitmap. The height (h) of each wall slice is found using
View[]. The distance and height are used to look up the scaling factor,
ScaleFactor, from ScaleTable[]. Next, calculate the starting row relative to
the center of the viewport. DrawFrame() then loops over the height, drawing a
bitmap pixel in ScreenBuffer, updating the Display pointer, and then adding
ScaleFactor to BmpOffset. This is the texture-mapping loop that draws the
bitmaps on each wall slice. After the wall slices are drawn in system memory,
the entire frame is copied to video memory.
When the animation is finished, main() calls CleanUp(), which calls
RestoreVideo(), FreeMem(), and RestoreKeybd(). These routines restore the
initial video mode, free all allocated memory, and reenable the DOS/BIOS
keyboard-interrupt handler, respectively. Finally, main() displays performance
statistics, including total frames, clock ticks, and frames per second.


Ray-Casting Optimizations



There are several ways to improve performance. First, the program was written
and tested in C. This served to validate the algorithms used by the code. On a
20-MHz 386SX, the code produces nearly six frames per second, not too bad for
a C program. Next, Turbo Profiler shows that more than 60 percent of the
execution time is spent in the DrawFrame() function. This is not surprising,
since DrawFrame() draws 28,800 pixels per frame. At 24 frames per second, it
must draw and copy roughly 700,000 pixels per second. For this reason, our
optimization efforts will concentrate mainly on DrawFrame().
The first optimization, loop unrolling, improves the ratio of value-added code
to loop overhead code. In DrawFrame(), a for loop accumulates the sum of a
scaling factor for texture mapping a wall slice. Since the amount of code
inside the loop is small, more time is spent processing the loop code
(increments, compares, and jumps) than the code inside the loop. In order to
unroll the loop, I take advantage of the fact that the row variable is always
a multiple of two in order to perform two additions instead of one for the
same amount of loop code. The loop is now faster because it spends more time
doing useful work; see Figure 4. Optimizing compilers will attempt to unroll
such loops, but it's often simpler to do it by hand.
When performance counts, there's no substitute for assembly language. The
easiest way to convert a function from C to assembler is to use the assembly
output of the compiler as a starting point. Conversion to assembly provides
three advantages over compiler-generated code. The first is avoiding redundant
segment-register loads. Segment values rarely change, yet most compilers
insist on reloading segment registers inside loops. Another advantage is the
ability to keep values in registers longer and avoid memory references. Most
compilers use register variables when possible, but a good assembly programmer
is hard to beat. Assembly language also allows us to use REP STOSW and REP
MOVSW as often as possible. These fast, compact instructions are an excellent
way to draw and move graphics in memory.
The real-mode 386 assembly-language instructions REP STOSD and REP MOVSD allow
data to be processed 32 bits at a time; the effect of this is most significant
when the source and destination are in system memory. The BlitTest utility
(available electronically) demonstrates this by testing the performance of 8-,
16- and 32-bit block transfers to both system and video memory. An additional
benefit of 386 code is the ability to perform integer arithmetic using the
32-bit general-purpose registers. This speeds up fixed-point math
considerably. For example, using 16-bit code, the addition of two long-integer
values typically requires six instructions and six memory operands. The same
addition in 32-bit code requires just three instructions and three memory
operands. Figure 5 illustrates this with assembly-language versions of RayXZ()
and RayYZ() that use 16- and 32-bit fixed-point math.
Finally, you can write tighter loops by avoiding memory operands and keeping
everything in registers. Many times, however, there are simply not enough
registers available for this approach. The solution is to replace the memory
operands with immediate values and change the values at run time. All that is
needed is a MOV instruction with a code-segment override. The hardest part is
knowing where in the code segment to write the new values. The answer lies in
the .LST generated by TASM. Write the self-modifying code, assemble it, then
check the .LST file to verify that the code is overwriting your dummy value.
Note that if the modified code follows too closely (without an intervening
branch) you may have to clear the prefetch queue using a JMP $+2. This breaks
the optimization, so be careful. One more caveat: By definition,
self-modifying code is difficult to debug. Use this optimization only when the
benefits outweigh those of all other options.


Conclusion


The assembly-language version of DrawFrame() uses all these optimizations
heavily, and the results are impressive--nearly an 80 percent increase in
performance. And since ray casting only requires addition in the inner loops,
it is very fast. However, this also means that drawing the graphics will
always be the limiting factor affecting performance. For slower 286 and 386SX
machines, it may be faster to draw the graphics in off-screen video memory and
then flip the video pages. This requires a planar graphics mode such as mode
X. The low cost of the OUT instruction on the 286 (3 cycles versus 16 on the
486) and slower system memory make this option more attractive.
Figure 1: Map of a simple, enclosed room.
Figure 2: Cosine-corrected distance; Distance'=Ray Length*cos(q).
Figure 3: Ray trigonometry.
Figure 4: (a) Original loop; (b) unrolled loop.
(a) for(j=row;j<0;j++)
 BmpOffset+=ScaleFactor;
(b) for(j=row;j<0;j+=2)
 BmpOffset+=ScaleFactor+ScaleFactor;
Figure 5: Assembly-language versions of RayXZ() and RayYZ(). (a) Using 16-bit
fixed-point math; (b) using 32-bit fixed-point math.
(a)
mov ax,[fp1] ; get lower word of fp1
mov dx,[fp1+2] ; get upper word of fp1
add ax,[fp2] ; add in lower word of fp2
adc dx,[fp2+2] ; add in upper word of fp2
mov [fp1],ax ; store lower word result
mov [fp1+2],ax ; store upper word result
(b)
mov eax,[fp1] ; get a fixed point value
add eax,[fp2] ; add two fixed point numbers
mov [fp1],eax ; store result

Listing One
// Data.c: This module instantiates all the global data.
// Copyright (c) 1994 by Mark Seminatore, all rights reserved.
 #include "pcx.h"
 #include "ray3d.h"
 char *face[NUM_FACES];
 int MapWidth, // run-time map width
 MapHeight; // run-time map height
 char *map, // raw map data
 *xzWalls, // xz wall boundary map
 *yzWalls; // yz wall boundary map
 unsigned VideoRow[200]; // Video row memory offsets
 unsigned char volatile ScanCode; // modified by int 9 handler
 unsigned int volatile FaceIndex; // current face bitmap to display
 unsigned int Frames=0; // number of frames generated
 Pov_t Pov; // current Point-of-View
 long CosAngle, // movement trig values
 SinAngle;
 int HeightScale=10000; // The wall-height aspect ratio
 PcxImage pcx; // structure for loading bitmaps
 int HeightTable[MAX_DISTANCE+1]; // table of height vs. distance
 long ScaleTable[MAX_DISTANCE+1]; // table of bitmap scaling factors
 char *bitmap[NUM_BITMAPS+1]; // array of bitmap tiles
 char *ScreenBuffer, // pointer to frame memory buffer
 *screen; // pointer to video memory
 long RayX,RayY; // last coordinates of x and y
 int iRayX,iRayY; // ...raycasts

 int FarthestWall; // distance to farthest wall in frame
 View_t View[VIEW_WIDTH]; // raycasting data
 long *Sine; // pre-calc'd trig tables
 long *Tangent;
 long *InvTangent;
 long *InvCosine;
 long *InvSine;
 long *Cosine;
 long *dxTable; // pre-calc'd table of 64*tan(angle)
 long *dyTable; // pre-calc'd table of 64/tan(angle)

Listing Two
// Ray3d.h: Declares all global data structures and constants
// Copyright (c) 1994 by Mark Seminatore, all rights reserved.
#ifndef __RAY3D_H
#define __RAY3D_H
 #define FP16 16
 #define FP20 20
 #define ROUND20 ((1L<<20)/2)
 #define CHECK_DIST 48L // minimum distance from wall
 enum {NO_WALL,X_WALL,Y_WALL}; // wall struck codes
 #define MAX_DISTANCE 2048
 #define MIN_DISTANCE 10
 #define BITMAP_WIDTH 64
 #define BITMAP_HEIGHT 64
 #define MAP_SIZE 64
 #define MAP_WIDTH 64
 #define MAP_HEIGHT 64
 #define MAP_MAX MAP_WIDTH*MAP_HEIGHT
 #define MAP_XMAX MAP_WIDTH*MAP_SIZE
 #define MAP_YMAX MAP_HEIGHT*MAP_SIZE
 #define MAP_XMAXLONG (MAP_XMAX*(1L<<FP16))
 #define MAP_YMAXLONG (MAP_YMAX*(1L<<FP16))
 enum {FALSE,TRUE};
 #define CEILING_COLOR 0
 #define FLOOR_COLOR 8
 #define SCREEN_WIDTH 320
 #define VIEW_WIDTH 240
 #define VIEW_HEIGHT 120
 #define VIEW_ANGLE 60
 #define NUM_BITMAPS 16
 #define NUM_FACES 5
 #define MAX_HEIGHT 1024
 #define MIN_HEIGHT 16
 #define BITMAP_BYTES (BITMAP_WIDTH*BITMAP_HEIGHT)
 #define Degree_6 (int)(6L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_30 (int)(30L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_45 (int)(45L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_90 (int)(90L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_135 (int)(135L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_180 (int)(180L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_225 (int)(225L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_270 (int)(270L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_315 (int)(315L*VIEW_WIDTH/VIEW_ANGLE)
 #define Degree_360 (int)(360L*VIEW_WIDTH/VIEW_ANGLE)
 typedef struct
 {
 int Style,
 Column,

 Distance;
 } View_t;
 typedef struct
 {
 int x,
 y,
 angle;
 } Pov_t;
#ifdef __cplusplus
extern "C" {
#endif
 void Blit(char*,int,int,int,int);
 void DrawFrame(void);
 void GetSineAndCosine(int);
 char RayXZ(int,int,int);
 char RayYZ(int,int,int);
 int InitTables(void);
 void FreeMem(void);
 void InitTimer(void);
 void RestoreTimer(void);
 void CleanUp(void);
 void InitMap(char*);
 void InitBitmaps(char*);
 void InitHeightTable(void);
 void Initialize(void);
 void GenView(void);
 int HitTest(int,int,int);
 void EventLoop(void);
 void InstallKeybd(void);
 void RestoreKeybd(void);
#ifdef __cplusplus
 }
#endif
#endif

Listing Three
// Main.c: This module calls the initialzation routines, then jumps into
// the animation loop, and lastly shows some statistics.
// Copyright (c) 1994 by Mark Seminatore, all rights reserved.
 #include <stdlib.h>
 #include <time.h>
 #include <alloc.h>
 #include <stdio.h>
 #include <limits.h>
 #include "ray3d.h"
 #include "globals.h"
 #include "keys.h"
 void GenView(void)
 {
 int i,distance,RayAngle;
 unsigned char yStyleCode,StyleCode;
 unsigned long RayLength,yRayLength;
 unsigned int WallColumn,yWallColumn;
 long dx,dy;
 RayAngle=Pov.angle-Degree_30; // Start by looking 30 to our left
 if(RayAngle < 0) // Keep angle within table bounds
 RayAngle+=Degree_360;
 FarthestWall=0; // No farthest wall yet
 for(i=0;i<VIEW_WIDTH;i++) // Cast two rays for each video column

 {
 RayLength=LONG_MAX; // Start with an impossible distance
// Don't cast an XZ ray if it is not possible to hit an XZ wall
 if(RayAngle!=Degree_90 && RayAngle!=Degree_270)
 {
 StyleCode=RayXZ(Pov.x,Pov.y,RayAngle); // cast XZ ray
 if(StyleCode) // Did we hit a wall?
 {
// Use y-intercept to find the wall-slice column
 WallColumn=(unsigned)(RayY>>FP16)&0x3F;
 if(iRayX < Pov.x) // if looking west
 WallColumn=63-WallColumn; // reverse column
 dx=iRayX-Pov.x; // get ray x-component
 RayLength=(dx*InvCosine[RayAngle]) >> 14; // calc distance
 }
 }
// Don't cast a YZ ray if its not possible to hit any YZ walls
 if(RayAngle!=0 && RayAngle!=Degree_180)
 {
 yStyleCode=RayYZ(Pov.x,Pov.y,RayAngle); // cast YZ ray
 if(yStyleCode) // was a wall found?
 {
// Use the x-intercept to find the wall-slice column
 yWallColumn=(unsigned)(RayX >> FP16)&0x3F;
 if(iRayY > Pov.y) // if looking south
 yWallColumn=63-yWallColumn; // reverse column
 dy=iRayY-Pov.y;
 yRayLength=(dy*InvSine[RayAngle]) >> 14;
 if(yRayLength < RayLength) // use the shorter ray
 {
 RayLength=yRayLength;
 StyleCode=yStyleCode;
 WallColumn=yWallColumn;
 }
 }
 }
 if(WallColumn < 64) // A wall was found (either X or Y)
 {
 RayLength*=Cosine[i]; // cosine correct distance
 RayLength+=ROUND20; // round up fixed point
 distance=(int)(RayLength>>FP20); // convert to an int
 if(distance < MIN_DISTANCE) // check for min distance
 distance=MIN_DISTANCE;
 if(distance >= MAX_DISTANCE) // check for max distance
 distance=MAX_DISTANCE-1;
// Save the wall-slice data
 View[i].Distance=distance;
 View[i].Style=(StyleCode&0x3F);
 View[i].Column=WallColumn;
 if(distance > FarthestWall) // is new distance the farthest?
 FarthestWall=distance; // update farthest distance
 }
 RayAngle++; // get next ray angle
 if(RayAngle >= Degree_360) // keep angle within tables
 RayAngle-=Degree_360;
 }
 DrawFrame(); // Use View[] to draw the entire screen
 Blit(face[FaceIndex],266,133,50,45); // draw the new bitmap
 } // end GenView()

// This routine validates player moves
 int HitTest(int x,int y,int angle)
 {
 int wall;
 unsigned long distance,ydistance;
 int dx,dy;
 distance=LONG_MAX; // Set to a very large distance
// Correct for angles where sin=cos, push the angle to one side otherwise
// we can get false readings.
 if(angle==Degree_45 angle==Degree_135 
 angle==Degree_225 angle==Degree_315)
 angle++;
// Don't cast a XZ ray if it can't possibly hit an XZ wall!
 if(angle!=Degree_90 && angle!=Degree_270)
 {
 if(RayXZ(x,y,angle)) // cast an xz-ray
 { // we hit something
 dx=iRayX-x; // get ray x-component
 distance=(dx*InvCosine[angle])>>14; // calc distance
 wall=X_WALL; // set wall struck code
 }
 }
// Don't cast a YZ ray if it can't possibly hit a YZ wall!
 if(angle!=0 && angle!=Degree_180)
 {
 if(RayYZ(x,y,angle)) // can a yz-ray
 { // we hit something
 dy=iRayY-y; // get ray y-component
 ydistance=(dy*InvSine[angle])>>14; // calc distance
 if(ydistance < distance) // use the shorter ray
 {
 distance=ydistance; // use y ray distance
 wall=Y_WALL; // set wall struck code
 }
 }
 }
 if(wall) // a wall was hit
 {
 distance*=Cosine[VIEW_WIDTH/2]; // cosine correct distance
 distance+=ROUND20; // round-up fixed point
 distance>>=FP20; // convert back to integer
 if(distance > CHECK_DIST) // check distance
 wall=NO_WALL; // its OK to move
 }
 return wall; // return wall code
 }
// This is the main program loop.
 void EventLoop(void)
 {
 int dx,dy,RevAngle,result;
 while(ScanCode!=ESC_PRESSED) // do until ESC is hit
 {
 switch(ScanCode)
 {
 case 0: // no keys pressed
 break;
 case PLUS_PRESSED:
 HeightScale+=1000; // increase aspect ratio
 if(HeightScale > 20000) HeightScale=20000;

 InitHeightTable(); // recalc height/distance table
 break;
 case MINUS_PRESSED:
 HeightScale-=1000; // decrease aspect ratio
 if(HeightScale < 1000) HeightScale=1000;
 InitHeightTable(); // recalc height/distance table
 break;
 case LEFT_ARROW_PRESSED:
 Pov.angle-=Degree_6; // turn left 6 degrees
 if(Pov.angle < 0) // keep angle within tables
 Pov.angle+=Degree_360;
 GetSineAndCosine(Pov.angle); // get trig values
 break;
 case RIGHT_ARROW_PRESSED:
 Pov.angle+=Degree_6; // turn right 6 degrees
 if(Pov.angle >= Degree_360) // keep angle within tables
 Pov.angle-=Degree_360;
 GetSineAndCosine(Pov.angle); // get trig values
 break;
 case UP_ARROW_PRESSED: // move forward
 dx=(int)(CosAngle >> 12); // get x-component
 dy=(int)(SinAngle >> 12); // get y-component
 result=HitTest(Pov.x,Pov.y,Pov.angle); // check for all walls
 if(result == X_WALL)
 { // we hit an x-wall
 dx=0; // can't move in x-dir
 if(Pov.angle < Degree_180) // so check for y-walls
 RevAngle=Degree_90; // check north
 else
 RevAngle=Degree_270; // check south
 result=HitTest(Pov.x,Pov.y,RevAngle); // check for y-walls
 }
 if(result == Y_WALL)
 { // we hit a y-wall
 dy=0; // can't move in y-dir
 if(Pov.angle>Degree_270 Pov.angle<Degree_90)
 RevAngle=0; // look east
 else
 RevAngle=Degree_180; // look west
 result=HitTest(Pov.x,Pov.y,RevAngle); // check for x-walls
 }
 if(result==NO_WALL)
 {
 Pov.x+=dx; // no walls were hit so we can move
 Pov.y+=dy; // ...update POV
 }
 break;
 case DOWN_ARROW_PRESSED: // move backward
 dx=(int)(CosAngle >> 12); // get x-component
 dy=(int)(SinAngle >> 12); // get y-component
 RevAngle=Pov.angle+Degree_180;
 if(RevAngle >= Degree_360)
 RevAngle-=Degree_360;
 result=HitTest(Pov.x,Pov.y,RevAngle);
 if(result == X_WALL)
 {
 dx=0; // can't move in x-dir
 if(RevAngle < Degree_180)
 RevAngle=Degree_90;

 else
 RevAngle=Degree_270;
 result=HitTest(Pov.x,Pov.y,RevAngle);
 }
 if(result == Y_WALL)
 {
 dy=0;
 if(RevAngle > Degree_270 RevAngle < Degree_90)
 RevAngle=0;
 else
 RevAngle=Degree_180;
 result=HitTest(Pov.x,Pov.y,RevAngle);
 }
 if(result == NO_WALL)
 {
 Pov.x-=dx;
 Pov.y-=dy;
 }
 break;
 default: // handle unrecognized keys
 break;
 }
 GenView(); // cast some rays and draw screen
 Frames++; // update frame counter
 }
 }
// This is the main program start
 void main(void)
 {
 clock_t begin,fini;
 unsigned memleft;
 Initialize(); // setup video, read bitmaps, etc.
 Pov.x=Pov.y=96; // set starting player location
 Pov.angle=0;
 GetSineAndCosine(Pov.angle); // get sin/cos values
 GenView(); // draw initial view
 memleft=(unsigned)(coreleft()>>10); // get free memory kbytes
 begin=clock(); // start timing the program
 EventLoop(); // the animation loop
 fini=clock(); // finish timing the program
 CleanUp(); // free up all memory
 printf("Raycasting Performance\n----------------------\n");
 printf("Memory: %uk\n",memleft); // show free memory
 printf("Frames: %u\n",Frames); // show frame count
 printf(" Ticks: %u\n",(fini-begin)); // show clock count
 printf("Frame rate: %6.2f f/s\n",Frames*18.2/(fini-begin));
 printf("\n\nRayCastr: A raycasting demo for Dr. Dobb's Journal.\n\n");
 printf("Written by: Mark Seminatore\n");
 printf(" CIS: [72040,145]\n");
 }

Listing Four
// Draw.c: This module contains the DrawFrame() routine which texture
// maps the wall-slices in View[].
// Copyright (c) 1994 by Mark Seminatore, all rights reserved.
 #include <stdio.h>
 #include <mem.h>
 #include "ray3d.h"
 #include "globals.h"

 void DrawFrame(void)
 {
 unsigned hceil;
 long ScaleFactor,BmpOffset;
 char *Display,*p,*Bmp;
 int i,j,h,row;
 h=HeightTable[FarthestWall]; // lookup smallest height on screen
 if(h < VIEW_HEIGHT) // is it still > viewport height?
 { // if not then draw floors/ceilings
 row=(VIEW_HEIGHT-h)>>1;
 hceil=VideoRow[row+h];
 Display=ScreenBuffer;
 for(i=0;i<row;i++)
 {
 memset(Display,CEILING_COLOR,VIEW_WIDTH); // draw ceiling row
 memset(Display+hceil,FLOOR_COLOR,VIEW_WIDTH); // draw floor row
 Display+=VIEW_WIDTH;
 }
 }
 for(i=0;i<VIEW_WIDTH;i++)
 {
 Display=ScreenBuffer+i; // init display pointer
 Bmp=bitmap[View[i].Style]; // get ptr to bitmap
 Bmp+=(View[i].Column<<6); // add in column offset
 h=HeightTable[View[i].Distance]; // find wall-slice height
 row=(VIEW_HEIGHT-h)>>1; // calc starting row
 BmpOffset=0; // start with first pixel
 ScaleFactor=ScaleTable[View[i].Distance]; // get scaling factor
 if(row < 0)
 {
 h+=row<<1; // adjust wall-slice height
 for(j=row;j<0;j++) // loop until on screen
 BmpOffset+=ScaleFactor; // update position in bitmap
 }
 else
 Display+=VideoRow[row]; // point to starting row
 for(j=0;j<h;j++) // texture mapping loop
 {
// Copy bitmap pixel to ScreenBuffer
 *Display=*(Bmp+(unsigned)(((0xffff0000L&BmpOffset)>>(16))));
 Display+=VIEW_WIDTH; // get next ScreenBuffer row
 BmpOffset+=ScaleFactor; // update position in bitmap
 }
 }
 Display=ScreenBuffer; // get source pointer
 p=screen+20+SCREEN_WIDTH*40; // get destination pointer
 for(i=0;i<VIEW_HEIGHT;i++) // copy each row
 {
 memcpy(p,Display,VIEW_WIDTH); // copy row to screen
 p+=SCREEN_WIDTH;
 Display+=VIEW_WIDTH;
 }
 }

Listing Five
; Xray.asm: This module implements the optimized assembly version of the
; RayXZ() function.
; Copyright (c) 1994 by Mark Seminatore, all rights reserved.
 ideal

 smart
 p386
 model compact,c
 dataseg
 include "rayasm.inc"
 codeseg
 proc RayXZ
 ARG x:word,y:word,angle:word
 push si ; save Turbo C register vars
 push di
 mov ax,[x] ; get x
 and ax,0ffc0h ; calc (int)(x/64) * 64
 les di,[dyTable] ; get full ptr to table
 mov bx,[angle] ; get ray angle
 shl bx,2 ; multiply by the size of a long
 mov edx,[es:di+bx] ; get value from table
 cmp bx,4*Degree_270 ; compare angle with 270 degrees
 jg short LookingRight ; jump if greater than
 cmp bx,4*Degree_90 ; compare angle with 90 degrees
 jl short LookingRight ; jump if less than
 mov [word cs:dxPos-2],-MAP_SIZE ; dx=-MAP_SIZE
 neg edx ; dy=-dy
 jmp short Continue1
LookingRight:
 add ax,MAP_SIZE ; xLocation=xBeg+MAP_SIZE
 mov [word cs:dxPos-2],MAP_SIZE ; dx=MAP_SIZE
Continue1:
 mov [dword cs:dyPos-4],edx ; store dy inside loop
 mov cx,ax ; cx contains xLocation
 sub ax,[word x] ; xLocation - x
 cwde ; xLocation - x
 lfs di,[dword Tangent] ; get ptr to table
 imul [dword fs:di+bx] ; (xLocation-x)*Tangent[angle]
 movsx edx,[word y]
 shl edx,16
 add edx,eax
 lgs di,[xzWalls] ; get ptr to xzWalls
GiantLoop:
 cmp cx,0 ; is xLocation < 0?
 jl short OuttaHere
 cmp cx,MAP_MAX ; is xLocation > MAP_MAX?
 jg short OuttaHere
 cmp edx,large 0 ; is yLocation < 0?
 jl short OuttaHere
 cmp edx,large MAP_YMAXLONG ; is yLocation > MAP_YMAXLONG?
 jg short OuttaHere
 mov eax,edx
 shr eax,16
 and eax,0ffc0h
 movsx ebx,cx
 shr ebx,6
 add ebx,eax
 mov al,[byte gs:di+bx] ; get xzWalls[bx]
 cmp al,0 ; was a wall found?
 jne short Continue2 ; then quit
 add cx,100h ; add imm16 value
dxPos:
 add edx,1000000h ; add imm32 value
dyPos:

 jmp GiantLoop
Continue2:
; and cx,0ffffh
 mov [iRayX],cx ; iRayX=xLocation
 mov [RayY],edx ; RayY=yLocation
 pop di ; clean up and return
 pop si
 ret
OuttaHere:
 xor al,al ; no wall was found
 pop di ; clean up and return
 pop si
 ret
 endp RayXZ
 end
















































PNG: The Portable Network Graphic Format


Hey, didn't you used to be GIF?




Lee Daniel Crocker


Lee was involved with defining the GIF89a and JFIF image file formats. He
works for Avantos Performance Systems and can be reached at lee@piclab.com.


On December 29, 1994, CompuServe announced new licensing terms for use of the
Graphics Interchange File (GIF) format, a previously freely available de facto
standard. Unisys Corp. instigated this with its decision to enforce its U.S.
Patent 4,558,302 on the LZW (Lempel-Ziv-Welch) compression algorithm.
Consequently, developers of GIF-based software have to begin paying royalties
to Unisys or sublicensing fees to CompuServe. Even producers of CD-ROMs with
public-domain software must pay if that software uses GIF. This upset many
developers who have been using GIF free-of-charge since its creation in 1987.
The change affected software developers, graphic artists, Internet and
online-service users, and others. The Usenet newsgroup comp.graphics quickly
filled with discussion, leading to a mailing list looking into ways of
replacing GIF. Many of us joined the effort, which has since received
CompuServe's official support.
We first considered basing our work on an existing file format, but quickly
discovered that existing formats failed to fully address the issues that made
GIF so popular. Portability was our first concern. Formats such as BMP, ILBM,
PCX, PICT, and others have explicit/implicit platform dependencies. The
Microsoft/Aldus Tagged Image File Format (TIFF) can hold any kind of graphic
information in a platform-independent way. However, almost no application
supports every TIFF option, so many low-end programs can't read files written
by high-end programs or programs on different platforms. TIFF also requires
file seeks in order to read, making it difficult to use in streaming
communications applications such as World-Wide Web browsers. GIF, having been
designed by CompuServe, had communications in mind from the outset. In fact,
some images transferred from online services or Web sites to user displays
never exist as a file at all.
We then considered JPEG File Interchange Format (JFIF). It is designed for
streaming applications and offers outstanding quality and compression. Many
applications that currently use GIF--such as distributing photographs from
online services and BBSs--would be far better served by JFIF. Unfortunately,
JFIF is optimal only for photographic images (grayscale or full color); it
performs badly with line art, text, icons, and similar images.
JFIF is also less useful than GIF for medium-quality photographs that need to
be edited extensively. While converting a full-color photo to GIF incurs
considerable loss of data (a process called "color quantization"), any further
editing of the GIF is lossless. JFIF, while starting at a much-higher quality
than GIF, loses more data every time an image is edited, much like a photocopy
of a photocopy. Even simple cropping is not immune. JFIF's rarely used
lossless mode suffers from the TIFF problem: Very few programs read them.
With all this in mind, we decided to create a new format, one that would be:
Simple, clean, and easy to implement.
Completely portable.
Available free-of-charge in source-code form for reading and writing.
Furthermore, we wanted the format to: 
Support 100 percent lossless conversion of existing GIFs.
Support streaming communications.
Compress better than GIF, while still being lossless.
Support progressive display and transparency as well as or better than GIF.
Most importantly, the new format needed to be patent free, public, and
otherwise legally unencumbered.
We achieved these goals and more with the Portable Network Graphic (PNG)
format. In addition, we provided support for lossless compression of
full-color and medical images. This addition slightly increased the format's
complexity, but it greatly widened the potential market and addressed serious
omissions in GIF. On the other hand, we did not include features better served
by other means. JFIF, for example, handles lossy compression well. Because we
didn't want to complicate the format with support for multiple images (which
are better handled by other tools), PNG is strictly a single-image format.
(For specific details, the PNG specification is available on CompuServe at GO
GRAPHICS and on the Internet at http://sunsite.unc.edu/boutell/.) 


Testing


To determine which features to include in PNG and how to implement them, I ran
more than 6000 tests on different kinds of images, producing files in various
formats, so that our ultimate decisions would be based on solid evidence.
Other team members ran tests to confirm these results and to test issues such
as compression optimization and progressive display.
Many of the test results surprised us. For example, separating scan lines into
red, green, and blue sections didn't prove efficient. This surprised me
because, during the development of GIF89a spec, I had proposed a similar
technique to store full-color data. At that time, scan-line separation was an
improvement to compression, but that was with LZW and apparently didn't apply
to the deflation algorithm. 
Another surprising result was that predictor-function filtering not only
failed to help compression of limited-color images, but it made images bigger.
This technique works wonders on gray-scale and full-color images, but was
disastrous once an image went through quantization. This affected some of the
recommendations in the specification and in some current implementations.
I ran another suite of tests to decide whether drawings with few (4 or 16)
colors should be stored with their values packed two or four to a byte, or
expanded to a byte each before compression. Some team members suspected that
the latter might lead to more identical bytes that the Huffman-coding portion
of the deflate algorithm could compress well. As it turns out, packing them
into bytes worked better, so we allowed it even though we wanted to minimize
the number of options.


Compression


Compression is one of GIF's most attractive features. Users could always count
on files not taking up huge amounts of disk space or download time. While not
the best, GIF compression is reliable and good enough for most uses.
We decided to use the deflate-compression scheme Phil Katz designed for PKZIP
because it is well documented, freely usable, powerful, and has an existing
base of code supporting it on many platforms. The algorithm, called a
"sliding-window" method, has a buffer (32K in our case) that is logically
passed over the input stream. Output is based on patterns found in that
window, which are then Huffman-coded for further compression. This method
(without the Huffman pass) was first described by Lempel and Ziv in 1977, but
not patented. In 1978, they described a different method of collecting
patterns in text that was simpler and almost as powerful. Terry Welch (now
with Unisys) described a particular straightforward way to implement their
LZ78 idea--the dreaded LZW patent.
To further enhance compression, we added prefiltering of image data through
predictor functions. This takes advantage of the slowly varying nature of
true-color and gray-scale photographs. In the simplest case (as with TIFF),
each pixel is stored as the difference between it and the previous pixel. In
images that contain a wide range of values with smooth transitions between
them, predictor functions create many more small values for the compressor.
Programs reading the file perform the same function in reverse--adding the
input value to the previously decoded pixel. As a result, the approach is
completely lossless. 
Predictor functions differ in their methods. So that PNG display programs
would not have to know a dozen filter functions or keep more than one line of
data around, we limited the number of functions. We considered only those that
outperformed all other predictors by at least 5 percent on a number of images,
and from those, we chose the four simple functions now in the specification.
This technique is powerful (sometimes saving 30 percent on file sizes) but
dangerous. Certain filters work best on certain images, and it is hard to
predict which. When these functions don't work, they make things worse. A
PNG-writing program must carefully choose the functions to use for a
particular image. One reasonable method is to use none. This will usually
generate a file smaller than the equivalent GIF, but perhaps not as small as
it could be. Another method is to try each of the functions on the first few
input lines, then measure which one works best.
For streaming applications, it's useful not to have to decide in advance which
predictor function to use for the entire file. The function ID number for each
scan-line is stored at the start of the line, so you can adapt to the image as
it comes. My own code does this and picks predictors with a third method as
well: a heuristic that looks at the bytes in each line and tries to guess
which predictor will work best. It often outperforms even the best of the
single predictors because images often have regions which are different from
each other. The compression levels we achieved are shown in Table 1. 


Progressive Display


Interlacing is another popular GIF feature. If you view an interlaced GIF
while it's being downloaded (a feature available in numerous communications
packages and Web browsers), you see a 1/8-quality version of the image in 1/8
of the time, then 1/4, 1/2, and finally the whole thing. This helps you decide
whether or not to cancel the download. 
Through testing, we discovered that we could extend this concept to a
seven-pass method that gave quicker response and better-looking
intermediates--without adding too much complexity, requiring more memory, or
compromising compression. Storing an image this way expands it by about 8
percent, but lets users get a 1/64-quality image almost immediately, followed
by 1/32, 1/16, and so on. PNG stores each pass as a rectangular image, so the
predictor functions work well, and often overcome the interlace-induced size
expansion. This makes implementation simpler, too. As with GIF, it is possible
to interlace/de-interlace images by storing the passes in temporary files.



Transparency


Sometimes you need to overlay an image onto a background, allowing the
background to peek through parts of the top image. High-end applications
extend this by storing a numerical transparency value for each pixel so that
the top image smoothly transitions into the background.
In this instance, we were torn between keeping the specification simple and
clean and accommodating an obvious need for these features. We knew that we
could have used gradual transparency (also called "alpha channel") for simple
on/off transparency, so we considered making it the only option. However,
testing revealed that using the GIF-like technique of "keying" (picking a
color in the image and treating it as "clear") allowed better compression and
was already supported by existing code. Reluctantly, we allowed both.
In any event, simple image-display programs can ignore transparency
information, so it isn't that much of a burden.


Gamma Correction


The lack of a standard definition of RGB values has always limited GIF's
portability. The numbers you send to your display will look different when
sent to your color printer. Likewise, a scanned photograph that looks great on
a Macintosh will look dark and murky on a PC. With PNG, we wanted the image to
look right on any device.
While there's general agreement that 0 means black and 255 means white, the
human eye, TV phosphors, color printers, and film emulsions all disagree about
what 128 means. With a linear-response medium (many color printers and
Macintosh displays, for instance), the response curve might look like the line
in Figure 1, where 128 roughly equates to 50 percent gray, 64 to 25 percent
gray, and so on. Linear values are simple, easy to work with, and easy to
match. It is not uncommon for a Macintosh to have more than one monitor, and
standardizing on linear RGB values means that colors will closely match on all
of them.
What's wrong with this method? Well, neither your eyes nor television
phosphors work this way. You can get better-looking pictures by using a
response curve like the dashed line in Figure 1, where 128 means 73 percent
gray and 64 means 54 percent gray. This is how TVs treat input voltages, and
the way color values respond on typical PC displays. Unfortunately, there's no
single standard for that curve either. With NTSC television cameras, for
example, Voltage=(Luminance(1/Gamma)), where Gamma=0.45, and the Luminance and
Voltage values are scaled into the 0.0 to 1.0 range. Note that the linear case
is the same equation with Gamma=1.0.
We could have picked one of these methods and required developers to convert
from one to the other (as JFIF does). However, this conversion process causes
some loss of original data. Instead we decided to let programmers put their
native values into the file, then provide a clue--the Gamma value--in the file
as to which curve was used. If you are writing a PNG file from data designed
specifically for a PC or digitized from a television camera, 0.45 is a good
choice to put into the file. If you're capturing a Macintosh screen, use 1.0.
Artificial images (such as those created by ray-tracing programs) usually use
1.0, as well.
Correcting for gamma when you display an image is important. It isn't
necessary to perform complex calculations for every pixel. For palette-based
images (all GIFs are palette based, so most PNGs probably will be), you
correct the palette and the pixels take care of themselves. Gray-scale and
full-color images can be corrected by making a 256-byte lookup table and
mapping the pixel values through the table as they come in. 
Color experts will tell you that even gamma correction isn't enough to get
accurate color matching. Strictly speaking, they're right, but gamma
correction will get you 90 percent of the way there, showing colors similar to
the original. High-end applications can add high-definition information.


File Layout


A PNG file (or data stream) consists of a signature identifying the file,
followed by a series of chunks, each of which contains a specific piece of
information about the image. The 8-byte identifying signature (0x89 0x50 0x4E
0x47 0x0D 0x0A 0x1A 0x0A) also detects common transmission problems. For
example, if gateways chop 8-bit data to 7 bits or modify end-of-line
characters (common problems on the Internet), the receiver will detect this
immediately. The long sequence also makes it fairly reliable to find the start
of a PNG stream when it is embedded inside another byte stream. Macintosh
binary headers--128 bytes of Mac operating-system information added to the
start of a binary file--are often attached to files uploaded by Macintosh
users to BBSs and Internet sites. A program that reads PNG files (and any
other format, for that matter) should accept a file as valid if the
identifying number starts 128 bytes into the file instead of at the beginning.
There is no PNG version number. Strict compatibility and version numbers don't
mix. However, the existing PNG specification allows for private extension.
The general layout of a chunk is a 4-byte length, followed by a 4-byte
chunk-type ID, data bytes (which vary with chunk type), and finally a 4-byte
CRC. All 2-byte and 4-byte values are stored high-to-low (Big-endian), so they
must be byte-swapped when read into an integer on machines (such as Intel
processors) that store the low byte first. The data-length field counts only
the number of bytes in the data field, so the total size of the chunk is Data
Length+12.
The chunk type consists of four ASCII characters. The case of the characters
indicates certain information about the chunk type. Chunks that contain
critical information about the image begin with an uppercase letter. Those
that contain ancillary information begin with lowercase letters. When a
PNG-reader program encounters a chunk it doesn't recognize, it may safely
ignore ancillary chunks, although this might not produce as good an image. If
the unknown chunk is marked critical, the program should warn the user that
vital information is missing, although it may attempt to render the image
anyway.
Chunks defined publicly in the PNG specification have an uppercase letter in
the second position. Those with a lowercase letter can be used by developers
for application-specific needs. The popular fractal-exploration program
Fractint, for example, uses a chunk of this type to store the math formula
that generates the image. Someone who wants to look at the picture will ignore
this chunk. However, if you load the image back into Fractint, the data in the
files lets you continue exploring fractal space from where you left off.
The third letter, which is always uppercase, is currently unused. The final
letter indicates the copy-safe property of a chunk. Programs that make changes
to any critical chunk in a PNG file should look at the fourth letter of any
chunk they don't recognize. If that letter is lowercase, it is safe to copy
that chunk into the new file even if you don't recognize it. If the chunk ID
ends with an uppercase letter, you should not copy that chunk into the new
file. This guarantees data integrity of a file that contains chunks that
depend on other chunks in the image. 
The CRC-32 at the end of each chunk verifies the integrity of the data. This
is important for communication channels. Another use for CRCs is detecting
changes in the chunks. For instance, say you wanted to store a chunk of
application-specific data inside a palette-based image used for palette
animation. Because this chunk depends on the palette chunk, you could store
the CRC of the palette chunk in your data area and test it to make sure that
the palette was not modified. This technique allows you to use copy-safe
chunks for things that depend on other data in the file. The ptot program (see
ptot.h, Listing One, and ptot.c, Listing Two shows how to do this. 
This program, which implements one approach to PNG-to-TIFF conversion,
attempts to preserve as much data as possible. No attempt is made to compress
the output. I have also borrowed Mark Adler's public-domain inflate.c code
from the Info-Zip package. This makes the decompression routines a little less
lucid than they might be if they had been an integral part of the PNG-to-TIFF
package itself. All of the necessary files to compile the PNG-to-TIFF
converter on MS-DOS, Windows NT/POSIX, and Sun OS w/gcc are available
electronically; see "Availability," page 3. Be sure to use the appropriate
makefile.


Chunk Types


The minimum set of chunks that every PNG file must have is IHDR, IDAT, and
IEND, in that order. Palette-based images must have PLTE. All images should
have gAMA if it's known, but none if it is not (hence the lowercase "g").
Other chunks may appear anywhere between IHDR and IEND, but some have specific
rules about ordering (gAMA must appear before PLTE, and PLTE before IDAT, for
example). Because a streaming PNG writer may not know the compressed size of
the entire image when it starts writing, it is allowed to split IDAT into
multiple chunks. It is not legal to put other chunks in between these IDATs,
so a PNG reader can always depend on all the image data being in sequential
IDATs.
The pHYs ancillary chunk is used to describe the physical size of the image. A
scanner may put in this chunk, for example, that the photo was 5x5 cm. While
most simple display programs will ignore this and show the picture
pixel-for-pixel, desktop-publishing programs will find this invaluable. Note
that a unit of none is allowed in this chunk. If set, the measurements in the
pHYs chunk can be used to indicate how far from square the pixels are, even
though you don't know the actual size. The oFFs chunk measures the offset of
the upper-left corner of the image from the upper-left corner of whatever it
is embedded in (a page, a background image, or whatever). This can be handy,
for example, when sending an image to a service bureau to be printed or for
converting old multiple-image GIFs (which are rare, and not supported by many
GIF-reading programs).
Ancillary tEXt chunks store short comments or data such as author's name,
software used, and the like. The PNG spec specifies that text should be ISO
8859-1, a common, 8-bit extension of ASCII. Windows ANSI is already a superset
of ISO, so it requires no translation. Macintosh and other machines have to
use a translation table to ensure that the comment "Woodcut by Albrecht
Durer," for example, looks the same on all machines. Care should also be taken
with extra characters in the Windows ANSI set that aren't part of ISO; for
example, "" and -. Although not portable, they are likely to be found in PNG
files, so a Macintosh app should assume they are Windows ANSI. We considered
Unicode (especially the ASCII-compatible UTF-8 encoding), but chose not to
include it.
If you use more than one line of text in such a block (this isn't
recommended), use ASCII newline (LF) characters only. Lines ending with CR (as
in Macintosh) or CR/LF (as in MS-DOS) should be converted.
These chunks are intended for short pieces of human-readable information about
the image, not presentation-quality text. 


Conclusion


Although PNG was primarily designed to replace GIF, it could become a commonly
used format, even in applications that do not use GIF. By the time you read
this, it will be supported by shareware and commercial authors, and free code
will also be available.


Bibliography


Graphics Interchange Format. CompuServe Inc. Columbus, OH, 1990. 
Hunt, R.W.G. The Reproduction of Colour in Photography, Printing, and
Television. Tolworth, U.K.: Fountain Press, 1987.
JFIF, C-Cube Microsystems, Milpitas, CA, 1987.
Liaw, Wilson. "Reading GIF Files." Dr. Dobb's Journal (February 1995).
Murray, James D. and William vanRyper. Encyclopedia of Graphics File Formats.
Sebastopol, CA: O'Reilly & Associates, 1994.
TIFF Revision 6.0, Aldus Corp., Seattle, WA, 1992.
Table 1: Comparative file sizes (in bytes).

Name Subject Type PNG GIF JPEG TIFF
 (Lossless) (LZW/Pred)
dpcreek landscape grayscale 190,178 283,794 202,263 263,175
einstein face grayscale 208,139 261,922 213,372 275,970
winona face grayscale 249,602 400,653 242,896 338,598
bayros drawing grayscale 161,652 239,704 161,659 293,552
laura figure 256 colors 532,043 613,388 N/A 672,010
olivia drawing 256 colors 149,480 170,134 N/A 224,740
spiral fractal 256 colors 665,004 948,736 N/A 842,200
wall landscape 256 colors 169,513 187,588 N/A 242,740
catalina figure full color 1,285,476 N/A 1,448,055 1,512,778
goldhill landscape full color 703,787 N/A 796,283 892,922
neptune astronomy full color 295,197 N/A 390,420 413,376
cwheel raytrace full color 229,506 N/A 366,144 383,188
Figure 1: Gamma response.

Listing One
/* ptot.h -- Header file for PNG to TIFF converter.
 * HISTORY: 95-03-10 Created by Lee Daniel Crocker <lee@piclab.com>
 * <URL:http://www.piclab.com/piclab/index.html>
 */
#ifdef _X86_ /* Intel i86 family */
# define LITTLE_ENDIAN
#endif
#ifdef _SPARC_____LINEEND____
# define BIG_ENDIAN
# ifndef FILENAME_MAX /* Work around stupid header file bugs */
# define FILENAME_MAX 1024
# endif
# ifndef SEEK_SET
# define SEEK_SET 0
# endif
# ifndef min
# define min(x,y) (((x)<(y))?(x):(y))
# endif
# ifndef max
# define max(x,y) (((y)<(x))?(x):(y))
# endif
#endif
/* Some types and macros for easier porting. Byte swapping is the major issue
 * because we have to convert big-endiang PNG to native-endian TIFF on 
 * whatever architecture we're compiled on. Code depends heavily on endianness
 * definition above. Functions would be a lot simpler than macros here, but
are
 * less likely to be optimized down to simple inline byte swaps. Some of these
 * macros evaluate the address twice, so don't pass "*p++" to them! */
typedef signed char S8;
typedef unsigned char U8;
typedef signed short S16;
typedef unsigned short U16;
typedef signed long S32;
typedef unsigned long U32;
#ifndef TRUE
# define TRUE 1
# define FALSE 0
#endif
#define LOBYTE(w) ((U8)((w)&0xFF))
#define HIBYTE(w) ((U8)(((w)>>8)&0xFF))
#define LOWORD(d) ((U16)((d)&0xFFFF))
#define HIWORD(d) ((U16)(((d)>>16)&0xFFFF))

#define PUT16(p,w) (*(U16*)(p)=(w)) /* Native byte order */
#define GET16(p) (*(U16*)(p))
#define PUT32(p,d) (*(U32*)(p)=(d))
#define GET32(p) (*(U32*)(p))
#if !defined(BIG_ENDIAN) && !defined(LITTLE_ENDIAN)
# error "No byte order defined"
#endif
#ifdef BIG_ENDIAN
# define BE_GET16(p) GET16(p)
# define BE_PUT16(p,w) PUT16((p),(w))
# define BE_GET32(p) GET32(p)
# define BE_PUT32(p,d) PUT32((p),(d))
# define LE_GET16(p) ((U16)(*(U8*)(p)&0xFF)\
 (*((U8*)(p)+1)<<8))
# define LE_PUT16(p,w) (((*(U8*)(p))=LOBYTE(w)),\
 ((*((U8*)(p)+1))=HIBYTE(w)))
# define LE_GET32(p) (((U32)LE_GET16(p))\
 LE_GET16((U8*)(p)+2)<<16)
# define LE_PUT32(p,d) (LE_PUT16((p),LOWORD(d)),\
 LE_PUT16((U8*)(p)+2,HIWORD(d)))
#else
# define BE_GET16(p) ((U16)(*(U8*)(p)<<8)\
 (*((U8*)(p)+1)&0xFF))
# define BE_PUT16(p,w) (((*(U8*)(p))=HIBYTE(w)),\
 ((*((U8*)(p)+1))=LOBYTE(w)))
# define BE_GET32(p) (((U32)BE_GET16(p)<<16)\
 BE_GET16((U8*)(p)+2))
# define BE_PUT32(p,d) (BE_PUT16((p),HIWORD(d)),\
 BE_PUT16((U8*)(p)+2,LOWORD(d)))
# define LE_GET16(p) GET16(p)
# define LE_PUT16(p,w) PUT16((p),(w))
# define LE_GET32(p) GET32(p)
# define LE_PUT32(p,d) PUT32((p),(d))
#endif
/* Miscellaneous PNG definitions. */
#define PNG_Signature "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A"
#define PNG_MaxChunkLength 0x7FFFFFFFL
#define PNG_CN_IHDR 0x49484452L /* Chunk names */
#define PNG_CN_PLTE 0x504C5445L
#define PNG_CN_IDAT 0x49444154L
#define PNG_CN_IEND 0x49454E44L
#define PNG_CN_gAMA 0x67414D41L
#define PNG_CN_sBIT 0x73424954L
#define PNG_CN_cHRM 0x6348524DL
#define PNG_CN_tRNS 0x74524E53L
#define PNG_CN_bKGD 0x624B4744L
#define PNG_CN_hIST 0x68495354L
#define PNG_CN_tEXt 0x74455874L
#define PNG_CN_zTXt 0x7A545874L
#define PNG_CN_pHYs 0x70485973L
#define PNG_CN_oFFs 0x6F464673L
#define PNG_CN_tIME 0x74494D45L
#define PNG_CN_sCAL 0x7343414CL
#define PNG_CF_Ancillary 0x20000000L /* Chunk flags */
#define PNG_CF_Private 0x00200000L
#define PNG_CF_CopySafe 0x00000020L
#define PNG_FT_Adaptive 0 /* Filtering type */
#define PNG_CT_Deflate 0 /* Compression type */
#define PNG_IT_None 0 /* Interlace types */

#define PNG_IT_Costello 1
#define PNG_CB_Palette 0x01 /* Colortype bits */
#define PNG_CB_Color 0x02
#define PNG_CB_Alpha 0x04
#define PNG_MU_None 0 /* Measurement units */
#define PNG_MU_Pixel 0
#define PNG_MU_Meter 1
#define PNG_MU_Micrometer 1
#define PNG_MU_Radian 2
#define PNG_PF_None 0 /* Prediction filters */
#define PNG_PF_Sub 1
#define PNG_PF_Up 2
#define PNG_PF_Average 3
#define PNG_PF_Paeth 4
/* Miscellaneous TIFF definitions. This is a small subset of the tags, data 
 * types, and values available in TIFF--only those needed for PNG conversion.
 * For example, we are not doing any TIFF compression, so the only TIFF 
 * compression type listed is "None". */
#define TIFF_BO_Intel 0x4949 /* Byte order identifiers */
#define TIFF_BO_Motorola 0x4D4D
#define TIFF_MagicNumber 42
#define TIFF_DT_BYTE 1 /* Data types */
#define TIFF_DT_ASCII 2
#define TIFF_DT_SHORT 3
#define TIFF_DT_LONG 4
#define TIFF_DT_RATIONAL 5
#define TIFF_DT_UNDEFINED 7
#define TIFF_TAG_ImageWidth 256 /* Tag values */
#define TIFF_TAG_ImageLength 257
#define TIFF_TAG_BitsPerSample 258
#define TIFF_TAG_Compression 259
#define TIFF_TAG_PhotometricInterpretation 262
#define TIFF_TAG_ImageDescription 270
#define TIFF_TAG_Make 271
#define TIFF_TAG_Model 272
#define TIFF_TAG_StripOffsets 273
#define TIFF_TAG_SamplesPerPixel 277
#define TIFF_TAG_RowsPerStrip 278
#define TIFF_TAG_StripByteCounts 279
#define TIFF_TAG_XResolution 282
#define TIFF_TAG_YResolution 283
#define TIFF_TAG_PlanarConfiguration 284
#define TIFF_TAG_XPosition 286
#define TIFF_TAG_YPosition 287
#define TIFF_TAG_ResolutionUnit 296
#define TIFF_TAG_TransferFunction 301
#define TIFF_TAG_Software 305
#define TIFF_TAG_DateTime 306
#define TIFF_TAG_Artist 315
#define TIFF_TAG_HostComputer 316
#define TIFF_TAG_WhitePoint 318
#define TIFF_TAG_PrimaryChromaticities 319
#define TIFF_TAG_ColorMap 320
#define TIFF_TAG_ExtraSamples 338
#define TIFF_TAG_Copyright 33432
/* This last tag is registered to me specifically for this program and its 
 * companion TIFF-to-PNG (not included in the DDJ code), so that they can be 
 * invertable. I encourage you to use it for the same purpose--just make sure 
 * you stay compatible with this program. There is no equivalent PNG chunk for

 * TIFF data; the known TIFF tags can be either translated to functionally
 * equivalent PNG chunks or encoded in tEXt chunks. Unknown
 * ones are not copy-safe (according to the TIFF spec). */
#define TIFF_TAG_PNGChunks 34865
#define TIFF_CT_NONE 1 /* Compression type */
#define TIFF_PI_GRAY 1 /* Photometric interpretations */
#define TIFF_PI_RGB 2
#define TIFF_PI_PLTE 3
#define TIFF_PC_CONTIG 1 /* Planar configurations */
#define TIFF_RU_NONE 1 /* Resolution units */
#define TIFF_RU_CM 3
#define TIFF_ES_UNASSOC 2 /* Extra sample type */
/* Structure for holding miscellaneous image information. Conversion program
 * will read an image into this structure, then pass it to the output
function.
 * In this implementation, image data bytes are stored in a file, which is 
 * pointed to by this structure. This is so the code will work on small-memory
 * architectures like MS-DOS. On Unix, Win32 (NT/Chicago), and other systems,
 * it might make more sense to allocate one big chunk of memory for the image
 * and replace image_data_file string with an image_data_buffer pointer. */
#define N_KEYWORDS 5
typedef struct _image_info {
 U32 width, height;
 U32 xoffset, yoffset;
 U32 xres, yres;
 double xscale, yscale;
 double source_gamma;
 U32 chromaticities[8]; /* Fixed point x 100000 */
 int resolution_unit; /* Units as in PNG */
 int offset_unit, scale_unit;
 int samples_per_pixel;
 int bits_per_sample;
 int significant_bits[4];
 int background_color[4];
 int is_color, has_alpha, has_trns;
 int is_interlaced, is_palette;
 int palette_size;
 U8 palette[3 * 256];
 U16 trans_values[3];
 U8 palette_trans_bytes[256];
 char *keywords[N_KEYWORDS];
 char *pixel_data_file; /* Where to find the pixels */
 U32 png_data_size;
 char *png_data_file; /* Untranslatable PNG chunks */
} IMG_INFO;
#define IMG_SIZE (sizeof (struct _image_info))
extern char *keyword_table[N_KEYWORDS];
extern U16 ASCII_tags[N_KEYWORDS];
/* Local ASSERT macro. Assumes the function Assert() is defined somewhere in
 * the calling program (in this case, it's in ptot.c). */
#ifndef NDEBUG
# define ASSERT(x) ((x)?(void)0:Assert(__FILE__,__LINE__))
# define TRACE_STR(x) (fprintf(stderr,"TR: %s\n",(x)),\
 fflush(stderr))
# define TRACE_INT(x) (fprintf(stderr,"TR: %ld\n",(long)(x)),\
 fflush(stderr))
#else
# define ASSERT(x)
# define TRACE_STR(x)
# define TRACE_INT(x)

#endif
/* Prototypes */
U32 update_crc(U32, U8 *, U32);
int main(int argc, char *argv[]);
void print_warning(int);
void error_exit(int);
void Assert(char *, int);
int read_PNG(FILE *, IMG_INFO *);
int get_chunk_header(void);
U32 get_chunk_data(U32);
int verify_chunk_crc(void);
int decode_IDAT(void);
U8 fill_buf(void);
void flush_window(U32);
int decode_text(void);
int copy_unknown_chunk_data(void);
size_t new_line_size(IMG_INFO *, int, int);
int get_local_byte_order(void);
int write_TIFF(FILE *, IMG_INFO *);
int create_tempfile(int); 
int open_tempfile(int);
void close_all_tempfiles(void);
void remove_all_tempfiles(void);
/* Interface to Mark Adler's inflate.c */
int inflate(void);
typedef unsigned char uch;
typedef unsigned short ush;
typedef unsigned long ulg;
typedef void *voidp;
#define slide (ps.inflate_window)
#define WSIZE ((size_t)(ps.inflate_window_size))
#define NEXTBYTE ((--ps.bytes_in_buf>=0)?(*ps.bufp++):fill_buf())
#define FLUSH(n) flush_window(n)
#define memzero(a,s) memset((a),0,(s))
#define qflag 1
/* A state structure is used to store all needed info about the reading
process
 * so that we don't have to pass 4 or 5 arguments to every function in ptot.c.

 * This is also used to share data with inflate.c. */
#define IOBUF_SIZE 8192 /* Must be at least 768 for PLTE */
typedef struct _png_state {
 FILE *inf, *tf[7];
 char *tfnames[7];
 IMG_INFO *image;
 U8 *buf, *bufp;
 U32 crc, bytes_remaining;
 U32 inflated_chunk_size;
 U32 current_chunk_name;
 S32 bytes_in_buf; /* Must be signed! */
 U32 inflate_window_size;
 U8 *inflate_window;
 U16 inflate_flags;
 U16 sum1, sum2;
 U8 *last_line, *this_line;
 size_t byte_offset;
 size_t line_size, line_x;
 int interlace_pass;
 U32 current_row, current_col;
 int cur_filter;
 int got_first_chunk;

 int got_first_idat;
} PNG_STATE;

Listing Two
/* ptot.c -- Convert PNG (Portable Network Graphic) file to TIFF). Takes a
 * filename argument on the command line.
 * HISTORY: 95-03-10 Created by Lee Daniel Crocker <lee@piclab.com>
 * http://www.piclab.com/piclab/index.html
 */
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
#include "ptot.h"
#define DEFINE_ENUMS
#include "errors.h"
#define DEFINE_STRINGS
#include "errors.h"
PNG_STATE ps = {0}; /* Referenced by tempfile.c, etc. */
char *keyword_table[N_KEYWORDS] = {
 "Author", "Copyright", "Software", "Source", "Title"
};
/* Local definitions and statics */
static int decode_chunk(void);
static int decode_IHDR(void);
static int decode_PLTE(void);
static int decode_gAMA(void);
static int decode_tRNS(void);
static int decode_cHRM(void);
static int decode_pHYs(void);
static int decode_oFFs(void);
static int decode_sCAL(void);
static int skip_chunk_data(void);
static int validate_image(IMG_INFO *);
/* Main for PTOT. Get filename from command line, massage the
 * extensions as necessary, and call the read/write routines. */
int
main(
 int argc,
 char *argv[])
{
 int err;
 FILE *fp;
 char *cp, infname[FILENAME_MAX], outfname[FILENAME_MAX];
 IMG_INFO *image;
 image = (IMG_INFO *)malloc((size_t)IMG_SIZE);
 if (NULL == image) error_exit(ERR_MEMORY);
 if (argc < 2) error_exit(ERR_USAGE);
 strcpy(infname, argv[1]);
 strcpy(outfname, argv[1]);
 if (NULL == (cp = strrchr(outfname, '.'))) {
 strcat(infname, ".png");
 } else (*cp = '\0');
 strcat(outfname, ".tif");
 if (NULL == (fp = fopen(infname, "rb")))
 error_exit(ERR_READ);
 err = read_PNG(fp, image);

 fclose(fp);
 if (0 != err) error_exit(err);
 if (NULL == (fp = fopen(outfname, "wb")))
 error_exit(ERR_WRITE);
 err = write_TIFF(fp, image);
 fclose(fp);
 if (0 != err) error_exit(err);
 return 0;
}
/* Print warning, but continue. A bad code should never be
 * passed here, so that causes an assertion failure and exit. */
void
print_warning(
 int code)
{
 ASSERT(PTOT_NMESSAGES > 0);
 ASSERT(code >= 0 && code < PTOT_NMESSAGES);
 fprintf(stderr, "WARNING: %s.\n", ptot_error_messages[code]);
 fflush(stderr);
}
/* Print fatal error and exit. */
void
error_exit(
 int code)
{
 int msgindex;
 ASSERT(PTOT_NMESSAGES > 0);
 if (code < 0 code >= PTOT_NMESSAGES) msgindex = 0;
 else msgindex = code;
 fprintf(stderr, "ERROR: %s.\n",
 ptot_error_messages[msgindex]);
 fflush(stderr);
 if (0 == code) exit(1);
 else exit(code);
}
void
Assert(
 char *filename,
 int lineno)
{
 fprintf(stderr, "ASSERTION FAILURE: "
 "Line %d of file \"%s\".\n", lineno, filename);
 fflush(stderr);
 exit(2);
}
/* PNG-specific code begins here. read_PNG() reads the PNG file into the 
 * passed IMG_INFO struct. Returns 0 on success. */
int
read_PNG(
 FILE *inf,
 IMG_INFO *image)
{
 int err;
 ASSERT(NULL != inf);
 ASSERT(NULL != image);
 memset(image, 0, IMG_SIZE);
 memset(&ps, 0, sizeof ps);
 ps.inf = inf;
 ps.image = image;

 if (NULL == (ps.buf = (U8 *)malloc(IOBUF_SIZE)))
 return ERR_MEMORY;
 /* Skip signature and possible MacBinary header; verify signature.
 * A more robust implementation might search for file signature 
 * anywhere in first 1k bytes or so, but in practice, the method 
 * shown is adequate or file I/O applications. */
 fread(ps.buf, 1, 8, inf);
 ps.buf[8] = '\0';
 if (0 != memcmp(ps.buf, PNG_Signature, 8)) {
 fread(ps.buf, 1, 128, inf);
 ps.buf[128] = '\0';
 if (0 != memcmp(ps.buf+120, PNG_Signature, 8)) {
 err = ERR_BAD_PNG;
 goto err_out;
 }
}
 ps.got_first_chunk = ps.got_first_idat = FALSE;
 do {
 if (0 != (err = get_chunk_header())) goto err_out;
 if (0 != (err = decode_chunk())) goto err_out;
 /* IHDR must be the first chunk. */
 if (!ps.got_first_chunk &&
 (PNG_CN_IHDR != ps.current_chunk_name))
 print_warning(WARN_BAD_PNG);
 ps.got_first_chunk = TRUE;
 /* Extra unused bytes in chunk? */
 if (0 != ps.bytes_remaining) {
 print_warning(WARN_EXTRA_BYTES);
 if (0 != (err = skip_chunk_data())) goto err_out;
 }
 if (0 != (err = verify_chunk_crc())) goto err_out;
 } while (PNG_CN_IEND != ps.current_chunk_name);
 if (!ps.got_first_idat) {
 err = ERR_NO_IDAT;
 goto err_out;
 }
 if (0 != (err = validate_image(image))) goto err_out;
 ASSERT(0 == ps.bytes_remaining);
 if (EOF != getc(inf)) print_warning(WARN_EXTRA_BYTES);
 err = 0;
err_out:
 ASSERT(NULL != ps.buf);
 free(ps.buf);
 return err;
}
/* decode_chunk() is just a dispatcher, shunting the work of decoding incoming
 * chunk (whose header we have just read) to the appropriate handler. */
static int
decode_chunk(
 void)
{
 /* Every case in the switch below should set err. We set it 
 * here to gurantee that we hear about it if we don't. */
 int err = ERR_ASSERT;
 switch (ps.current_chunk_name) {
 case PNG_CN_IHDR: err = decode_IHDR(); break;
 case PNG_CN_gAMA: err = decode_gAMA(); break;
 case PNG_CN_IDAT: err = decode_IDAT(); break;
 /* PNG allows a suggested colormap for 24-bit images. TIFF 

 * does not, and PLTE is not copy-safe, so we discard it. */
 case PNG_CN_PLTE:
 if (ps.image->is_palette) err = decode_PLTE();
 else err = skip_chunk_data();
 break;
 case PNG_CN_tRNS: err = decode_tRNS(); break;
 case PNG_CN_cHRM: err = decode_cHRM(); break;
 case PNG_CN_pHYs: err = decode_pHYs(); break;
 case PNG_CN_oFFs: err = decode_oFFs(); break;
 case PNG_CN_sCAL: err = decode_sCAL(); break;
 case PNG_CN_tEXt: err = decode_text(); break;
 case PNG_CN_zTXt: err = decode_text(); break;
 case PNG_CN_tIME: /* Will be recreated */
 case PNG_CN_hIST: /* Not safe to copy */
 case PNG_CN_bKGD:
 err = skip_chunk_data();
 break;
 case PNG_CN_IEND: /* We're done */
 err = 0;
 break;
 /* Note: sBIT does not have the "copy-safe" bit set, but that really only
 * applies to unknown chunks. We know what it is just like PLTE and that it
 * is probably safe to put in the output file. hIST and bKGD aren't 
 * (modifications to the output file might invalidate them), so we leave 
 * them out. */
 case PNG_CN_sBIT:
 err = copy_unknown_chunk_data();
 break;
 default:
 if (0 == (ps.current_chunk_name & PNG_CF_CopySafe))
 err = skip_chunk_data();
 else err = copy_unknown_chunk_data();
 break;
 }
 return err;
}
/* get_chunk_header() reads the first 8 bytes of each chunk, which include the
 * length and ID fields. Returns 0 on success. The crc argument is 
 * preconditioned and then updated with the chunk name read. */
int
get_chunk_header(
 void)
{
 int byte;
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 if (8 != fread(ps.buf, 1, 8, ps.inf)) return ERR_READ;
 ps.bytes_remaining = BE_GET32(ps.buf);
 ps.current_chunk_name= BE_GET32(ps.buf+4);
 ps.bytes_in_buf = 0;
 if (ps.bytes_remaining > PNG_MaxChunkLength)
 print_warning(WARN_BAD_PNG);
 for (byte = 4; byte < 8; ++byte)
 if (!isalpha(ps.buf[byte])) return ERR_BAD_PNG;
 ps.crc = update_crc(0xFFFFFFFFL, ps.buf+4, 4);
 return 0;
}
/* get_chunk_data() reads chunk data into the buffer, returning number of
bytes
 * actually read. Do not use this for IDAT chunks; they are dealt with 

 * specially by the fill_buf() function. */
U32
get_chunk_data(
 U32 bytes_requested)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ps.bytes_in_buf = (U32)fread(ps.buf, 1,
 (size_t)min(IOBUF_SIZE, bytes_requested), ps.inf);
 ASSERT((S32)(ps.bytes_remaining) >= ps.bytes_in_buf);
 ps.bytes_remaining -= ps.bytes_in_buf;
 ps.crc = update_crc(ps.crc, ps.buf, ps.bytes_in_buf);
 return ps.bytes_in_buf;
}
/* Assuming we have read a chunk header and all the chunk data, now check to 
 * see that CRC stored at end of the chunk matches the one we've calculated.
*/
int
verify_chunk_crc(
 void)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 if (4 != fread(ps.buf, 1, 4, ps.inf)) return ERR_READ;
 if ((ps.crc ^ 0xFFFFFFFFL) != BE_GET32(ps.buf)) {
 print_warning(WARN_BAD_CRC);
 }
 return 0;
}
/* Read and decode IHDR. Errors that would probably cause the IDAT reader to 
 * fail are returned as errors; less serious errors generate a warning but 
 * continue anyway. */
static int
decode_IHDR(
 void)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (ps.bytes_remaining < 13) return ERR_BAD_PNG;
 if (13 != get_chunk_data(13)) return ERR_READ;
 ps.image->width = BE_GET32(ps.buf);
 ps.image->height = BE_GET32(ps.buf+4);
 if (0 != ps.buf[10] 0 != ps.buf[11])
 return ERR_BAD_PNG; /* Compression & filter type */
 ps.image->is_interlaced = ps.buf[12];
 if (!(0 == ps.image->is_interlaced 
 1 == ps.image->is_interlaced)) return ERR_BAD_PNG;
 ps.image->is_color = (0 != (ps.buf[9] & PNG_CB_Color));
 ps.image->is_palette = (0 != (ps.buf[9] & PNG_CB_Palette));
 ps.image->has_alpha = (0 != (ps.buf[9] & PNG_CB_Alpha));
 ps.image->samples_per_pixel = 1;
 if (ps.image->is_color && !ps.image->is_palette)
 ps.image->samples_per_pixel = 3;
 if (ps.image->has_alpha) ++ps.image->samples_per_pixel;
 if (ps.image->is_palette && ps.image->has_alpha)
 print_warning(WARN_BAD_PNG);
 /* Check for invalid bit depths. If a bitdepth is not one we can read,
 * abort processing. If we can read it, but it is illegal, issue a 
 * warning and continue anyway. */

 ps.image->bits_per_sample = ps.buf[8];
 if (!(1 == ps.buf[8] 2 == ps.buf[8] 4 == ps.buf[8] 
 8 == ps.buf[8] 16 == ps.buf[8])) return ERR_BAD_PNG;
 if ((ps.buf[8] > 8) && ps.image->is_palette)
 print_warning(WARN_BAD_PNG);
 if ((ps.buf[8] < 8) && (2 == ps.buf[9] 4 == ps.buf[9] 
 6 == ps.buf[9])) return ERR_BAD_PNG;
 return 0;
}
/* Decode gAMA chunk. */
static int
decode_gAMA(
 void)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (0 != ps.image->palette_size)
 print_warning(WARN_LATE_GAMA);
 if (ps.bytes_remaining < 4) return ERR_BAD_PNG;
 if (4 != get_chunk_data(4)) return ERR_READ;
 ps.image->source_gamma = (double)BE_GET32(ps.buf) / 100000.0;
 return 0;
}
/* Decode PLTE chunk. Number of entries is determined by chunk length. A 
 * non-multiple of 3 is technically an error; issue a warning in that case.
 * IOBUF_SIZE must be 768 or greater, so we check that at compile time here.
*/
#if (IOBUF_SIZE < 768)
# error "IOBUF_SIZE must be >= 768"
#endif
static int
decode_PLTE(
 void)
{
 U32 bytes_read;
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (!ps.image->is_color) print_warning(WARN_PLTE_GRAY);
 if (0 != ps.image->palette_size) {
 print_warning(WARN_MULTI_PLTE);
 return skip_chunk_data();
 }
 ps.image->palette_size =
 min(256, (int)(ps.bytes_remaining / 3));
 if (0 == ps.image->palette_size) return ERR_BAD_PNG;
 bytes_read = get_chunk_data(3 * ps.image->palette_size);
 if (bytes_read < (U32)(3 * ps.image->palette_size))
 return ERR_READ;
 memcpy(ps.image->palette, ps.buf, 3 * ps.image->palette_size);
 ASSERT(0 != ps.image->palette_size);
 return 0;
}
/* Copy transparency data into structure. We will later expand the 
 * TIFF data into full alpha to account for its lack of this data. */
static int
decode_tRNS(
 void)
{

 int i;
 U32 bytes_read;
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (ps.image->has_trns) print_warning(WARN_MULTI_TRNS);
 ps.image->has_trns = TRUE;
 if (ps.image->is_palette) {
 if (0 == ps.image->palette_size) {
 print_warning(WARN_LATE_TRNS);
 }
 bytes_read = get_chunk_data(ps.bytes_remaining);
 memcpy(ps.image->palette_trans_bytes,
 ps.buf, (size_t)bytes_read);
 for (i = bytes_read; i < ps.image->palette_size; ++i)
 ps.image->palette_trans_bytes[i] = 255;
 } else if (ps.image->is_color) {
 if (ps.bytes_remaining < 6) return ERR_BAD_PNG;
 bytes_read = get_chunk_data(6);
 for (i = 0; i < 3; ++i)
 ps.image->trans_values[i] = BE_GET16(ps.buf + 2 * i);
 } else {
 if (ps.bytes_remaining < 2) return ERR_BAD_PNG;
 ps.image->trans_values[0] = BE_GET16(ps.buf);
 }
 return 0;
}
static int
decode_cHRM(
 void)
{
 int i;
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (ps.bytes_remaining < 32) return ERR_BAD_PNG;
 if (32 != get_chunk_data(32)) return ERR_READ;
 for (i = 0; i < 8; ++i)
 ps.image->chromaticities[i] = BE_GET32(ps.buf + 4 * i);
 return 0;
}
static int
decode_pHYs(
 void)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (ps.bytes_remaining < 9) return ERR_BAD_PNG;
 if (9 != get_chunk_data(9)) return ERR_READ;
 ps.image->resolution_unit = ps.buf[8];
 if (ps.buf[8] > PNG_MU_Meter) print_warning(WARN_BAD_VAL);
 ps.image->xres = BE_GET32(ps.buf);
 ps.image->yres = BE_GET32(ps.buf + 4);
 return 0;
}
static int
decode_oFFs(
 void)

{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 if (ps.bytes_remaining < 9) return ERR_BAD_PNG;
 if (9 != get_chunk_data(9)) return ERR_READ;
 ps.image->offset_unit = ps.buf[8];
 if (ps.buf[8] > PNG_MU_Micrometer) print_warning(WARN_BAD_VAL);
 ps.image->xoffset = BE_GET32(ps.buf);
 ps.image->yoffset = BE_GET32(ps.buf + 4);
 return 0;
}
/* Decode sCAL chunk. Note: as of this writing, this is not an official PNG 
 * chunk. It probably will be by the time you read this, but it might possibly
 * change in some way. You have been warned. It also has no TIFF equivalent,
so
 * this only gets read into the structure. */
static int
decode_sCAL(
 void)
{
 ASSERT(NULL != ps.inf);
 ASSERT(NULL != ps.buf);
 ASSERT(NULL != ps.image);
 get_chunk_data(ps.bytes_remaining);
 if (ps.bytes_in_buf == IOBUF_SIZE) {
 --ps.bytes_in_buf;
 print_warning(WARN_BAD_PNG);
 }
 ps.buf[ps.bytes_in_buf] = '\0';
 ps.image->scale_unit = ps.buf[0];
 if (ps.buf[0] < PNG_MU_Meter ps.buf[0] > PNG_MU_Radian)
 print_warning(WARN_BAD_VAL);
 ps.image->xscale = atof(ps.buf+1);
 ps.image->yscale = atof(ps.buf + (strlen(ps.buf+1)) + 2);
 return 0;
}
/* Skip all remaining data in current chunk. */
static int
skip_chunk_data(
 void)
{
 U32 bytes_read;
 do {
 bytes_read = get_chunk_data(ps.bytes_remaining);
 } while (0 != bytes_read);
 return 0;
}
/* Ensure that the image structure we have created by reading the input PNG is
 * compatible with whatever we intend to do with it. In this case, TIFF can 
 * handle anything, so we use this as a sanity check on basic assumptions. */
static int
validate_image(
 IMG_INFO *image)
{
 if (0 == image->width 0 == image->height)
 return ERR_BAD_IMAGE;
 if (image->samples_per_pixel < 1 
 image->samples_per_pixel > 4) return ERR_BAD_IMAGE;
 if (image->is_palette && (image->palette_size < 1 

 image->palette_size > 256)) return ERR_BAD_IMAGE;
 if (NULL == image->pixel_data_file) return ERR_BAD_IMAGE;
 return 0;
}
/* End of ptot.c. */


























































Implementing and Using BSP Trees


Fast 3-D sorting




Nathan Dwyer


Nathan is a programmer at Starware in Bellevue, WA and can be reached at
nate@netcom.com.


With the arrival of graphic-intensive applications like DOOM, high-speed 3-D
display engines are becoming a necessity for PC systems. These 3-D engines owe
their speed, in part, to binary space partitioning (BSP) trees, data
structures that provide an extremely fast method for one component of the
rendering process. 
Rendering 3-D shapes usually consists of several steps. First, the shape is
oriented by rotation about its center. Next, its location is determined
relative to a viewpoint and view direction. Finally, the shape's component
polygons are sorted, and each is projected and rendered. Writing a fast 3-D
engine consists of speeding up each step as much as possible. BSP trees
address the issue of sorting. 
Sorting the polygons may seem trivial, but is usually fairly complicated. The
difficulty is in determining which polygons lie behind which. If all the
polygons are small and far apart, the task is simple. In that case, it's
usually sufficient to compute the distance from the viewpoint to the center of
each polygon, rendering the farther polygons first. This technique is called
the "painter's algorithm" because polygons are rendered in the same order a
painter would paint them on a flat wall. Consider Figure 1, however. In the
first case, the center of the large polygon is closest to the viewpoint, yet
the large polygon should clearly be drawn before the small one. In the second
case, where the three polygons overlap in a circular manner, the painter's
algorithm will be incorrect, no matter what order the polygons are drawn in. 
A number of algorithms have been developed to attack this problem, but the
most popular rely on two techniques: z-buffering and subdivision. 
Z-buffering algorithms represent each display pixel by one element in an
array. The array is used to keep track of the distance from the viewpoint to
the polygon to be drawn at that point. As a polygon is drawn to the screen,
the distance at each pixel is computed and compared with its corresponding
array element. A new polygon pixel is rendered only if it is closer to the
viewpoint than the last polygon drawn at that pixel. While this is an
extremely elegant solution, it requires a lot of memory. Each pixel may
require 16 or 32 bits to accurately represent the distance, and the extra
distance computation can be time intensive. 
Subdivision algorithms analyze a group of polygons, breaking them into smaller
pieces that don't overlap. This may entail drawing the polygons concurrently,
splitting each scanline into nonoverlapping, horizontal sections.
Alternatively, each polygon may be checked against every other polygon for
overlap. Both approaches require significant extra processing. 
BSP trees were developed to determine the visibility of surfaces. They were
later adapted to represent arbitrary 3-D shapes. BSP trees trade some
(possibly expensive) preprocessing for very fast polygon sorting, making them
best suited to static shapes: floor plans, walls, and other objects that don't
change. 
Generating a BSP tree is straightforward. From a list of polygons, choose one
to be the root. Then separate the remaining polygons into two groups: those in
front and those behind the plane of the root polygon, with respect to the root
polygon's normal vector. Next, recursively build BSP trees out of each group.
The root of each subgroup becomes a child of the root containing it. If any
polygon isn't completely in front of or behind the plane of the root, split it
into two polygons that are. This defines the criteria for choosing a root
polygon: The best choice causes the fewest number of splits in the remaining
polygons. 
Almost all the difficulty in creating the BSP tree lies in splitting the
polygons intersected by a root plane. Three pieces of code bear examination:
the plane-plane intersection code, the code that tests for inaccurate
intersections, and the code that actually splits a polygon. 
Plane-plane intersection can be a bear. If brute-force linear algebra is
applied, much complexity results. Luckily, plane-plane intersection can be
simplified to line-plane intersection, which is much easier. The function
BSPNode::_SplitPoly() in node.cpp (Listing Two) tests each line segment in the
boundary of a polygon to see if it intersects the plane of a second (Listing
One is the bsp.h include file.) The math involved is described in the
accompanying text box entitled, "Intersecting a Line and a Plane." The
complete source code, including sample data files, that implements this
approach to BSP is provided electronically; see "Availability," page 3.
If a single vertex or edge from the polygon is contained in the plane, the
polygon need not be split. If the variable t is very close to 1.0 or 0.0, a
vertex is contained in the plane. While it would be nice to simply skip these
occurrences, a plane may cut across the polygon and just happen to pass
through a vertex. To guard against this possibility, the Boolean variable
LastSideParallel indicates whether the last polygon edge was parallel to the
plane. If so (and t equals 0.0), the intersection can be safely ignored. 
The function BSPNode::Split() in node.cpp contains the code that creates two
polygons from one. It calls BSPNode::_SplitPoly() to fill in a sparse array of
CPoint objects with intersection points. BSPNode::_SplitPoly() returns a set
of flags that indicates which boundary segments were split. BSPNode::Split()
builds two polygons from this information by following the original list of
points around the polygon and adding them to the first new polygon. When an
intersection point is encountered, it's added to both new polygons. Subsequent
points are added to the second new polygon until another intersection point is
encountered. Once again, the intersection point is added to both new polygons,
and the first polygon becomes the destination for subsequent points. Once the
point list is exhausted, the new polygons have complete point lists of their
own. The two new polygons have the same original normal, but their centers
must be recalculated. 
Sorting the polygons contained in a BSP tree is simply an in-order,
depth-first traversal of the tree. At each node, a dot product is computed to
determine if the viewpoint is in front of or behind the node's polygon. If in
front, the polygons behind the node are drawn first, then the node's polygon,
then the polygons in front of the node. If behind, the node's children are
visited in reverse order. 
To see how this works, look at Figure 2. If the viewpoint is at position A,
it's behind polygon 1. Therefore, polygon 2 is drawn first, then polygon 1,
and finally polygon 3, which is correct. If the viewpoint is at position B,
the polygon order becomes 3, 1, 2, which is also correct. 
All of the code to traverse the tree is contained in the function
BSPNode::Traverse() in node.cpp. The vector from each polygon's center to the
viewpoint is computed first. The dot product between this vector and the
polygon's normal vector indicates the side on which the viewpoint is located.
Here, the center point is used for clarity, but any vertex of the polygon may
be used with the same result.
Let's take a moment to discuss file formats. The code reads and writes text
files so the input and output may be understood easily. An input file of
unprocessed polygons has the format shown in Example 1(a). Once the BSP tree
has been created, it's saved in the format shown in Example 1(b), which can
also be read back in. 
There are many extensions to this basic data structure that can further
increase rendering speed. Polygons can be back-face culled very easily. If the
viewpoint is behind a node, that node's polygon should not be drawn. Bounding
boxes can be generated for each level of the tree, allowing branches which
don't intersect the viewcone to be skipped entirely. It should also be noted
that limited polygon movement is possible, as long as the polygon never leaves
its original plane and doesn't cross the plane of a root polygon. 
While BSP trees are primarily limited to static shapes and require extra
preprocessing, they provide a substantial speed increase without a lot of
memory overhead. Preprocessing is fine if the goal is speed in the final
presentation. Finally, since the implementation of BSP-tree algorithms is not
overly difficult, the result is easy maintenance, as well as a large speed
increase for a small amount of work. 
Intersecting a Line and a Plane
A plane is defined by a point A and a normal N. Given a point X, X is in the
plane if it satisfies the requirement in Example 2(a). A line is defined as
Example 2(b), which can be rewritten as the set of equations in Example 2(c).
Substituting these equations into the plane equation and solving for t results
in Example 2(d). If t is in the interval [0, 1], then the line intersects the
plane. If the denominator is 0, it is because the dot product of the plane's
normal and the direction of the line is 0. This means the line and the plane
are parallel. Either the line is contained in the plane or it doesn't
intersect it at all. 
--N.D.
Figure 1: Sorting polygons.
Figure 2: In-order, depth-first traversal of the tree.
Example 1: (a) Format of input file of unprocessed polygons; (b) format after
BSP tree has been created.
(a) ushort NumPolygons
 for each polygon:
 ushort NumPoints
 for each point:
 long X Y Z

(b) for each node:
 ushort NodeIndex
 ushort FrontIndex, BackIndex
 ushort NumPoints
 for each point:
 long X Y Z
Example 2: Math required to intersect a line and a plane.
(a)
N(X-A)=0NX-NA=0n1*x+n2*y+n3*z-(NA)=0

(b)
X=X1+t(X2-X1)


(c)
x=x1+t*iy=y1+t*j z=z1+t*k

(d)
t=-(n1*x1+n2*y1+n2*z1+(-(NA)))____________________________ (n1*i+n2*j+n3*k )

Listing One
/* BSP.H -- Type definitions for BSP code */
#include <fstream.h>
#define TRUE 1
#define FALSE 0
typedef unsigned char bool;
typedef unsigned short ushort;
typedef unsigned long ulong;
class CPoint
{
 public:
 double x, y, z;
 public:
 CPoint( void );
 void Read( ifstream& Input );
 void Write( ofstream& Output ) const;
 double Magnitude( void ) const;
 void Normalize( void );
 double DotProduct( const CPoint& Pnt ) const;
 CPoint CrossProduct( const CPoint& Pnt ) const;
 short operator==( const CPoint& Pnt ) const;
 CPoint operator-( const CPoint& Pnt ) const;
};
class BSPNodeList;
class BSPNode
{
 private:
 ushort Index;
 BSPNode *FrontNode, *BackNode;
 short PntCnt;
 CPoint *PntList;
 CPoint Center;
 CPoint Normal;
 double D;
 
 ulong _SplitPoly( BSPNode *Plane, CPoint *SplitPnts );
 void _ComputeCenter( void );
 void _ComputeNormal( void );
 void _ComputeD( void );
 public:
 BSPNode( void );
 ~BSPNode( void );
 void ReadPoly( ifstream& Input );
 void ReadNode( ifstream& Input, const BSPNodeList& NodePool );
 void WriteNode( ofstream& Output, short& CurIndex );
 CPoint GetCenter( void ) { return Center; }
 CPoint GetNormal( void ) { return Normal; }
 bool Intersects( BSPNode *Plane );
 BSPNode *Split( BSPNode *Plane );
 BSPNode *GetFront( void ) { return FrontNode; }
 BSPNode *GetBack( void ) { return BackNode; }
 void SetFront( BSPNode *Node ) { FrontNode = Node; }
 void SetBack( BSPNode *Node) { BackNode = Node; }

 void Traverse( const CPoint& CameraLoc );
};
class BSPNodeHeader
{
 friend class BSPListIterator;
 friend class BSPNodeList;
 private:
 BSPNodeHeader *Next;
 BSPNode *Data;
 BSPNodeHeader( BSPNode *DataNode ) { Data=DataNode; Next=NULL;}
 void SetNext( BSPNodeHeader *NextNode ) { Next = NextNode; }
 BSPNodeHeader *GetNext( void ) { return Next; }
 BSPNode *GetData( void ) { return Data; }
};
class BSPNodeList
{
 friend class BSPListIterator;
 private:
 BSPNodeHeader *FirstNode, *LastNode;
 public:
 BSPNodeList( void );
 ~BSPNodeList( void );
 void ReadPolys( ifstream& Input );
 bool Empty( void ) { return FirstNode == NULL; }
 void Insert( BSPNode *Node );
 void Remove( BSPNode *Node );
};
class BSPListIterator
{
 private:
 BSPNodeHeader *Current;
 public:
 BSPListIterator( void );
 BSPNode *First( const BSPNodeList *List );
 BSPNode *Next( void );
};
class BSPTree
{
 private:
 BSPNodeList Nodes;
 BSPNode *Root;
 BSPNode *_FindRoot( BSPNodeList *List );
 BSPNode *_BuildBSPTree( BSPNodeList *List );
 public:
 void ReadPolyList( ifstream& Input );
 void ReadTree( ifstream& Input );
 void WriteTree( ofstream& Output );
 void BuildTree( void );
 void Traverse( const CPoint& CameraLoc );
};

Listing Two
/* BSPNode.cpp -- Implementation of BSPNode class */
#include <assert.h>
#include "bsp.h"
//-------------- Private Functions
ulong BSPNode::_SplitPoly( BSPNode *Plane, CPoint *SplitPnts )
// This is limited to convex polygons with no more than 32 sides
{

 ulong Sides = 0;
 bool LastSideParallel = FALSE;
 if( !( Normal == Plane->Normal ) )
 {
 CPoint EdgeDelta;
 double numer, denom, t;
 for( ushort vertex=0; vertex<PntCnt; vertex++ )
 {
 ushort prevVertex = vertex ? vertex - 1 : PntCnt - 1;
 EdgeDelta = PntList[ vertex ] - PntList[ prevVertex ];
 denom = EdgeDelta.DotProduct( Plane->Normal );
 if( denom )
 {
 numer = PntList[ prevVertex ].DotProduct
 ( Plane->Normal ) + Plane->D;
 t = - numer / denom;
 if( !( LastSideParallel && t == 0.0 ) )
 {
 if( t >= 0.0 && t < 0.999999 )
 {
 Sides = 1 << vertex;
 if( SplitPnts )
 {
 SplitPnts[ vertex ].x = PntList[ 
 prevVertex ].x + SplitPnts[ 
 vertex ].y = PntList[ 
 prevVertex ].y + t * EdgeDelta.y;
 SplitPnts[ vertex ].z = PntList[ 
 prevVertex ].z + t * EdgeDelta.z;
 }
 }
 }
 }
 LastSideParallel = ( denom == 0 );
 }
 }
 return Sides;
}
void BSPNode::_ComputeCenter( void )
{
 Center.x = Center.y = Center.z = 0.0;
 for( ushort i=0; i<PntCnt; i++ )
 {
 Center.x += PntList[ i ].x;
 Center.y += PntList[ i ].y;
 Center.z += PntList[ i ].z;
 }
 Center.x /= PntCnt;
 Center.y /= PntCnt;
 Center.z /= PntCnt;
}
void BSPNode::_ComputeNormal( void )
{
 CPoint a, b;
 assert( PntCnt >= 3 );
 a = PntList[ 0 ] - PntList[ 1 ];
 b = PntList[ 2 ] - PntList[ 1 ];
 Normal = a.CrossProduct( b );
 Normal.Normalize();

}
void BSPNode::_ComputeD( void )
{
 D = -Normal.DotProduct( Center );
}
//-------------- Public Functions
BSPNode::BSPNode( void ) :
 FrontNode( NULL ),
 BackNode( NULL ),
 Index( 0 ),
 PntCnt( 0 ),
 PntList( NULL )
{
}
BSPNode::~BSPNode( void )
{
 if( PntList )
 delete[] PntList;
};
void BSPNode::ReadPoly( ifstream& Input )
{
 Input >> PntCnt;
 assert( PntCnt >= 3 );
 PntList = new CPoint[ PntCnt ];
 for( ushort i=0; i<PntCnt; i++ )
 PntList[ i ].Read( Input );
 _ComputeCenter();
 _ComputeNormal();
 _ComputeD();
}
void BSPNode::ReadNode( ifstream& Input, const BSPNodeList& NodePool )
{
 short FrontIndex, BackIndex;
 Input >> Index >> FrontIndex >> BackIndex;
 if( !Input )
 return;
 Input >> PntCnt;
 PntList = new CPoint[ PntCnt ];
 for( short i=0; i<PntCnt; i++ )
 PntList[ i ].Read( Input );
 _ComputeCenter();
 _ComputeNormal();
 _ComputeD();
 // Find Children
 BSPListIterator Iter;
 BSPNode *TestNode = Iter.First( &NodePool );
 FrontNode = BackNode = NULL;
 while( TestNode )
 {
 if( TestNode->Index == FrontIndex )
 FrontNode = TestNode;
 else if( TestNode->Index == BackIndex )
 BackNode = TestNode;
 if( FrontNode && BackNode )
 return;
 TestNode = Iter.Next();
 }
}
void BSPNode::WriteNode( ofstream& Output, short& CurIndex )

{
 if( FrontNode )
 FrontNode->WriteNode( Output, CurIndex );
 if( BackNode )
 BackNode->WriteNode( Output, CurIndex );
 // write index and child indices
 Index = CurIndex++;
 Output << Index << '\n';
 Output << ( FrontNode ? FrontNode->Index : -1 ) << ' ';
 Output << ( BackNode ? BackNode->Index : -1 ) << '\n';
 // write point list
 Output << PntCnt << '\n';
 for( short i=0; i<PntCnt; i++ )
 {
 Output << '\t';
 PntList[ i ].Write( Output );
 }
 Output << '\n';
}
bool BSPNode::Intersects( BSPNode *Plane )
{
 return ( _SplitPoly( Plane, NULL ) != 0 );
}
BSPNode *BSPNode::Split( BSPNode *Plane )
{
 BSPNode *NewNode;
 CPoint *SplitPnts;
 ulong Splits;
 SplitPnts = new CPoint[ PntCnt ];
 Splits = _SplitPoly( Plane, SplitPnts );
 if( Splits )
 {
 CPoint *NewPoly1, *NewPoly2;
 ushort Poly1Index = 0, Poly2Index = 0;
 ushort Destination = 0;
 NewPoly1 = new CPoint[ PntCnt ];
 NewPoly2 = new CPoint[ PntCnt ];
 for( ushort i=0; i<PntCnt; i++ )
 {
 // Handle split points
 if( Splits & ( 1 << i ) )
 {
 NewPoly1[ Poly1Index++ ] = SplitPnts[ i ];
 NewPoly2[ Poly2Index++ ] = SplitPnts[ i ];
 Destination ^= 1;
 }
 if( Destination )
 NewPoly1[ Poly1Index++ ] = PntList[ i ];
 else
 NewPoly2[ Poly2Index++ ] = PntList[ i ];
 }
 // Make New node
 NewNode = new BSPNode;
 NewNode->PntCnt = Poly1Index;
 NewNode->PntList = NewPoly1;
 NewNode->Normal = Normal;
 NewNode->_ComputeCenter();
 delete[] PntList;
 PntCnt = Poly2Index;

 PntList = NewPoly2;
 _ComputeCenter();
 }
 delete SplitPnts;
 return NewNode;
}
void BSPNode::Traverse( const CPoint& CameraLoc )
{
 CPoint VecToCam = CameraLoc - Center;
 if( VecToCam.DotProduct( Normal ) < 0 )
 {
 if( FrontNode )
 FrontNode->Traverse( CameraLoc );
 // Process 'this' i.e. render it to screen
 if( BackNode )
 BackNode->Traverse( CameraLoc );
 }
 else
 {
 if( BackNode )
 BackNode->Traverse( CameraLoc );
 // Process 'this' i.e. render it to screen
 if( FrontNode )
 FrontNode->Traverse( CameraLoc );
 }
}





































JPEG-Like Image Compression, Part 1


Here's a C++ class library for JPEG-like compression




Craig A. Lindley


Craig is a founder of Enhanced Data Technology and author of Practical Image
Processing in C, Practical Ray Tracing in C, and Photographic Imaging
Techniques for Windows (all published by John Wiley & Sons). Craig can be
contacted at edt@rmii.com. EDT also maintains a home page on the WWW at
www.mirical.com.


JPEG, the Joint Photographers Expert Group, is an ISO/CCITT-backed
international standards committee that has defined an image-compression
specification (also referred to as "JPEG") for still images. The standard
stipulates the following:
Compression algorithms must allow for software-only implementations on a wide
variety of computer systems.
Algorithms must operate at or near the state of the art in image-compression
rates.
Compressed images must be good to excellent in quality.
Compression ratios need to be user variable.
Compression must be generally applicable to all sorts of continuous-tone
(photographic type) images.
Interoperability must exist between implementations from different vendors.
The JPEG committee realized that different applications required different
implementation approaches, so few absolute requirements are written into the
specification. In terms of interoperability, the JPEG specification was so
vague that a group of vendors established the JPEG File Interchange Format
(JFIF), a de facto file format for encapsulating JPEG compressed images such
that files can be shared between applications from different manufacturers.
Since then, TIFF (Tagged Image File Format) Version 6.0 has been extended to
encompass JPEG compressed images. 
It is important to distinguish between JPEG's image-processing algorithms,
which comprise JPEG image compression, and the file format in which JPEG
images are typically stored. In this two-part article, I'll present an
image-compression technique called "CAL" (my initials) that uses the same
algorithms used for JPEG images, but encapsulates the images in a simple,
proprietary file format. In this installment, I'll focus on JPEG and its
constituent algorithms, including discrete cosine transforms, quantization,
and entropy encoding. I'll also discuss how and why gray-scale and color
images must be treated differently and why compression of color images is much
more difficult. Next month, I'll discuss how CAL differs from JPEG, present
the C++ classes on which CAL is built, and suggest possible uses for CAL. 


JPEG Backgrounder


The JPEG specification defines four different modes of operation for
JPEG-aware software:
A sequential encoding mode, in which each image component is encoded in a
left-to-right, top-to-bottom fashion.
A progressive encoding method, in which images are compressed such that on
decoding they paint an image that is successively refined for each decoded
scan. This is similar to interleaved GIF images.
A lossless encoding method, where a decoded image is guaranteed to be a
bit-for-bit copy of the original image. Lossless encoding cannot achieve the
compression ratios of normal, lossy JPEG compression, so it is rarely
implemented.
A hierarchical encoding method, where multiple copies of an image are encoded
together with each copy being a different resolution. This allows an
application to decode an image of the specific resolution it needs without
having to handle an image with too-high or too-low resolution.
Many JPEG implementations (including CAL) provide only the baseline,
sequential mode of operation because implementing this mode is difficult
enough and it is sufficient for many imaging applications. Consequently, this
is the only mode I'll discuss.
JPEG achieves image compression by methodically throwing away visually
insignificant image information. This information includes the high-frequency
components of an image, which are less important to image content than the
low-frequency components. When an image is compressed using JPEG, the
discarded high-frequency components cannot be retrieved, so baseline,
sequential JPEG is considered lossy. When a JPEG compressed image is
displayed, much of the high-frequency information is missing: The image is not
a bit-for-bit copy of the original. If lossless image compression is
necessary, either the lossless JPEG mode of operation or a different
compression technique must be used. 
Allowing the user and/or application program to make the image quality versus
compression trade-off on an image-by-image basis is an important aspect of
JPEG operation. 


The JPEG Pipeline


Figure 1 illustrates the steps necessary for JPEG encoding/decoding of a
gray-scale image. 
During encoding, an image is broken up into 8x8-pixel blocks that are
processed individually, from left-to-right and top-to-bottom. Each block of
image data is subjected to a forward discrete cosine transform (FDCT), which
converts pixel values into their corresponding frequency components. A block
of pixels is submitted to the FDCT, which returns a block of frequency
components. The frequency-component values, or coefficients, are resequenced
("zigzagged") in order of increasing frequency.
The resultant frequency coefficients are then quantized, which causes
components with near-zero amplitudes to become zero. The level of
frequency-component quantization depends upon a user-specified image-quality
factor. The more an image is compressed, the more frequency components in a
block become zero.
The entropy encoding process performs two types of image compression on the
block of frequency coefficients. First, it run-length encodes the number of
zero-coefficient values preceding a nonzero coefficient in a block. Secondly,
it bit encodes the coefficient values using statistically generated tables.
The encoded bit stream is written to the output stream or file.
During decoding, the encoded bit stream is read from the input stream or file,
and the quantized frequency coefficients for a block of pixel data are
decoded. These frequency components are then dequantized by multiplying them
by the quantization table with which they were produced.
The frequency coefficients are resequenced from frequency order back into
pixel order. The block of frequency components is then subjected to an inverse
discrete cosine transform (IDCT), which converts the frequency component
values back into pixel values. The reconstructed pixel values are stored in
the decoded image. 


The Discrete Cosine Transform 


Discrete cosine transforms (DCTs) convert 2-D pixel-amplitude and spatial
information into frequency content for subsequent manipulation. In other
words, a DCT is used to calculate the frequency components of a given signal
(in this case, an 8x8-block of pixels) sampled at a given sampling rate. The
mathematics of DCTs are well beyond the scope of this article--simply note
that DCTs are calculated by applying a series of weighting coefficients to the
pixel data. 
The equations in Figure 2 describe the 2-D DCT process for 8x8-pixel blocks.
Note, however, that the transcendental functions cannot be computed with
perfect accuracy. In theory, if the FDCT and IDCT could be computed with
perfect accuracy, the exact pixel data fed into the FDCT could be recovered
when the transformed data was returned from the IDCT. In real life, this
doesn't happen. What you get out is not exactly what you put in. Although the
JPEG specification does not specify the exact algorithms to use for the DCTs,
it does mandate an accuracy figure for the results. This gives implementors a
lot of latitude but still allows for compliance testing. Numerical errors, as
you shall see, are not the source of the loss in JPEG compressed images.
The FDCT is applied to the 8x8 block of pixel data, resulting in a series of
64 frequency coefficients. The first, the "DC coefficient," is the average of
all of the pixel values within the block. Coefficients 1-63, the "AC
coefficients," contain the spectrum of frequency components that make up the
block of image data. As a result of the slowly changing nature of
continuous-tone images, many high-frequency coefficients produced by the FDCT
are zero or close to zero in value. This is exploited during the quantization
process. Note that there is usually a strong correlation between the DC
coefficients of adjacent blocks of image data. The entropy-encoding process
takes advantage of this as we shall discuss later.
In developing the CAL code, I made multiple attempts at coding the DCTs. The
first attempt was a literal interpretation of the equations in Figure 2,
utilizing floating-point numbers for the calculations. Although this method
worked, life was too short to compress images that way. Subsequent attempts
utilized fixed-point 32-bit integers instead of floating-point numbers, and I
precomputed and scaled the cosine weighting terms before their use. This
approach improved performance by a factor of 12; see dct.hpp (Listing One) and
dct.cpp (Listing Two) for details. Further experimentation pointed to the
speed of the DCTs as the determining factor in overall image-compression
performance. To improve CAL's performance, I replaced my DCT code with highly
optimized DCT code extracted from the Independent JPEG Group's (IJG) JPEG
software. The new DCT code is contained in the files dct1 .hpp and dct1.cpp,
which are available electronically; see "Availability," page 3. The optimized
DCT code was a full three times faster than the fastest code I had written. 



The Zigzag Sequence


Once the block of image pixels is processed with the FDCT, the result is a
frequency spectrum for the pixels. The coefficients that make up the frequency
spectrum are not conveniently arranged in ascending order at the conclusion of
FDCT processing. Instead, they are clustered with the DC coefficient in the
upper left, surrounded by the lower-frequency components. The higher-frequency
components are grouped towards the lower right. This is shown in Figure 3. The
application of the zigzag sequence after the FDCT converts the 64 frequency
coefficients into ascending order, as required for further processing. With
the frequency coefficients in ascending order, the DC and low-frequency
coefficients (which are less likely to be zero) are grouped together, followed
by the high-frequency coefficients. 


Quantization


The quantization step is where most image compression takes place and is
therefore the principal source of loss in JPEG image compression. Quantization
is performed by dividing each frequency coefficient by a corresponding
quantization coefficient mandated by the JPEG specification. One quantization
table is specified for the luminance (brightness) portion of the image data,
and another, for the chrominance (color). Quantization coefficients with
values approaching one allow the corresponding frequency coefficient to pass
through the quantization process unmodified. Large quantization coefficients
force the corresponding frequency coefficients to approach zero in value.
Thus, visually insignificant, high-frequency information is discarded. 
In the CAL code presented here, you can specify a quality factor value in the
range 10-100, where a value of 10 results in severe image compression with
noticeable image degradation, and a value of 100 results in much lower
compression but with generally unnoticeable image distortion. The quality
factor that you specify is used to manipulate the quantization tables in the
JPEG specification. A quality factor of 100 causes all of the quantization
coefficients to become 1, resulting in no quantization (any number divided by
1 is still that number). A quality factor of 50 causes the quantization tables
in the specification to be used unaltered. Quality factors approaching 10
result in large quantization coefficients, which causes many of the frequency
coefficients to be quantized to 0. As more and more of the frequency
coefficients become 0, the more an image can be compressed. 
During quantization, the image data is divided by the values in the
quantization table, but during JPEG decoding, image data must be dequantized.
Dequantization is performed by multiplying the decoded image data by the value
in the quantization table, thereby restoring it to a value close to the
prequantization value.


Entropy Encoding


The final step in the JPEG encoding process is entropy encoding. The JPEG
specification allows for either arithmetic or Huffman encoding. It is
generally acknowledged that arithmetic encoding performs marginally better for
some images, but is much more complex to implement. Also, there are currently
patent problems with using arithmetic encoding, so most implementors steer
clear of it. Huffman encoding is in the public domain and can be used without
worry of patent infringement. Consequently, Huffman encoding is utilized in
most JPEG implementations and in the CAL code.
Using Huffman encoding as the entropy-coding mechanism provides additional
lossless compression for the already highly processed image data. Huffman
compression is based upon the statistical characteristics of the data to be
compressed: Symbols that occur frequently in the data are assigned shorter
Huffman codes; those that occur infrequently are assigned longer codes.
Compression will occur as long as there is a large difference between the
occurrence counts of the most common and the least common symbols. Note also
that Huffman coding is bit oriented, not byte oriented. The Huffman codes
assigned to the various symbols are bit packed together into the tightest
possible configuration of bytes. This makes the code for Huffman
encoding/decoding difficult to write and debug because the data stream has to
be examined at the bit level. Convenient byte boundaries do not exist at the
lowest level. 
During the encoding process, the JPEG frequency coefficients for a block of
image data are bit encoded as a variable-length Huffman code, followed by a
variable-length integer. During decoding, the variable-length codes and
accompanying variable-length integers are converted back into integer values
for subsequent processing.
Two additional forms of data compression occur in the entropy-coding step:
delta coding of the DC coefficients of adjacent blocks of image data; and
run-length encoding of zero-valued frequency coefficients. These ancillary
compression mechanisms contribute to the overall compression achieved for
JPEG-compressed images.
As previously mentioned, there is usually a high level of correlation between
the DC coefficients of adjacent blocks of image data, and the values of the DC
coefficients are generally large (requiring many bits to Huffman encode).
Therefore, significant compression can be achieved by encoding the differences
in the DC coefficient values between adjacent blocks instead of their actual
values. In most cases, the difference in DC coefficient values can be encoded
in fewer bits than can the actual value. Of course, during decoding of the
Huffman bit stream, the encoded difference values must be converted back to
actual values for the DC coefficients.
Run-length encoding of the zero coefficient values also provides significant
compression. As mentioned, quantization causes many high-frequency
coefficients in a block of image data to become 0. Picture the frequency
coefficients as an array of 64 integers with nonzero values for the first five
or ten entries, followed by mostly zero values. Significant compression can be
achieved by counting up the number of zero coefficients in the block preceding
a nonzero coefficient and encoding that number along with the value of the
nonzero coefficient. This is so important that two special Huffman codes have
been assigned to assist this process: code 0xF0 (referred to as ZRL), used to
signal the special case where 16 zero coefficients in a row were detected; and
code 0x00 (called EOB or end of block), which signifies that all of the
remaining coefficients in a block are 0, so no further processing of the block
is necessary. 
For maximum compression, the Huffman codes used to compress an image should be
derived from the image itself. The JPEG specification, however, provides a
general set of Huffman code tables that can be used for encoding but that are
not optimal for any one image. The tables provided were generated by
statistical sampling of a large number of continuous-tone images considered
suitable for JPEG compression. With CAL, I use the provided Huffman tables
instead of generating unique tables from the image data on the fly. This
approach was chosen for three reasons: first, performance. It takes time to
analyze and build a Huffman table for an image. Building the tables on the fly
would slow things down quite a bit. Second, for decoding reasons, the unique
Huffman codes would have to be stored along with the encoded image data in the
CAL file, making the compressed image files larger. Finally, this approach
would increase the complexity of the already complex Huffman code.
The file tables.cpp (available electronically) shows JPEG-specified Huffman
code tables. The Huffman tables have been organized in many different ways to
optimize the encoding/decoding processes. The code that performs Huffman
encoding/decoding is contained in huffman.cpp.


Multiple-Component Images


To this point, I have discussed the compression of single-component images
only; that is, gray-scale images in which each pixel represents the luminance
of the image sample at a specified point. The 8x8 blocks into which a
gray-scale image is chopped for processing are filled with 8-bit samples taken
from the image. As expected, the image would be processed from left-to-right
and top-to-bottom, although the application must define the image top. CAL
works with Windows DIB images, which are stored upside down in memory. CAL
therefore defines the top of the image to be the image data at the lowest
memory address, opposite to how the data is actually stored.
Compressing color images is more complex. Images with multiple color
components must be handled in addition to single-component images. The JPEG
specification allows processing of images with up to four color components.
Much of the complexity involved in decoding standard JPEG images is a result
of the extreme flexibility allowed for processing of multiple-component
images. To understand the inherent complexities, the following concepts must
be addressed: 
Color-space conversions. Color images come in many different formats depending
upon how they were acquired and how they are meant to be used. The most common
color-space coding for color images is probably RGB, but many others exist,
including CMYK and YCbCr. 
RGB-format image data is not optimum for lossy image compression because an
RGB pixel's brightness and hue information is distributed among all three
color components--red, green, and blue--forcing all three components to be
treated equally. Any unequal treatment results in distortion in image color
and/or brightness. The human visual system is less sensitive to changes in
color than to changes in brightness, so it makes sense to convert RGB image
data into a color space that treats luminance and chrominance separately. This
conversion takes advantage of the sensitivities of the human visual system. In
fact, all major video-broadcast standards (including NTSC, PAL, and SECAM)
have exploited this fact for years. Each standard transmits the luminance
information of an image at full bandwidth and the chrominance information at a
reduced bandwidth. 
For these reasons, RGB images are generally converted into the YCbCr color
space as a prerequisite to compression. Thus, the luminance component, Y, can
be treated differently than the two chrominance components, Cb and Cr, without
undo image degradation. Figure 4 provides formulas for color-space
conversions. I'll discuss how these formulas are applied in Part 2 of this
article.
Component subsampling. Once the RGB image data is converted into YCbCr format,
it could be compressed directly. In that case, one 8x8 block each of Y, Cb,
and Cr data extracted from the image would be processed sequentially through
the JPEG pipeline. While easy, this technique misses an opportunity for
further image compression through subsampling the chrominance portion of the
image data. Various chroma subsampling techniques could be applied to the
YCbCr image data, including 4:2:2 and 4:1:1. 4:2:2 specifies that for every
four samples of Y information, there are two samples each of Cb and Cr. 4:1:1
implies that for every four samples of Y, there is one sample each of Cb and
Cr. To give you an idea of how much compression can be achieved by
subsampling, consider that each pixel of RGB data (or converted YCbCr data)
would require 24 bits for storage. The same pixel value using 4:2:2
subsampling would require 16 bits. 4:1:1 subsampling of the same pixel would
require 12 bits for storage.
To further complicate matters, chrominance subsampling can be either 1- or
2-D; that is, it can be applied across an image or both across and down an
image. Subsampling in 2-D reduces the amount of image data even further. The
choice of subsampling technique depends upon the intended uses of the
compressed images. CAL uses 2-D 4:2:2 chroma subsampling, which seems to be an
acceptable trade-off between image quality and compressed image size for the
photographic-quality images I tend to deal with.


Minimum Coded Units 


Each 8x8 block of color-component information is referred to as a "data block"
or "data unit." To define a region of an image, 2-D 4:2:2 subsampling requires
four blocks of Y image data for each block of Cb and Cr data. These six blocks
are referred to as a minimum coded unit (MCU). Without subsampling, an MCU
would consist of one block of Y, one block of Cb, and one block of Cr data.
For gray-scale images, each block of Y data would be considered an MCU.


Next Month


Up to this point, I've discussed the various concepts and algorithms utilized
in JPEG compression. Next month, I'll focus on the CAL implementation of JPEG
technology. In doing so, I'll describe the design and operation of the C++
classes and discuss the practical considerations in implementing DCTs and
color-space conversions using only integer arithmetic. Additionally, I will
provide a series of images that show the effects of various levels of CAL
compression and give some figures on CAL performance. 


References 


ISO JPEG Standards (DIS 10918-1 and draft DIS 10918-2), ANSI Sales
(212-642-4900). 
JFIF File Format Specification, Literature Department, C-Cube Microsystems
Inc., 399A West Trimble Road, San Jose, CA 95131 (408-944-6300).
Loeffler, C., A. Ligtenberg, and G. Moschytz. "Practical Fast 1-D DCT
Algorithms with 11 Multiplications." Proceeding of the International
Conference on Acoustics, Speech, and Signal Processing 1989 (ICASSP '89).
Mattison, Phillip E. Practical Digital Video with Programming Examples in C.
New York, NY: John Wiley & Sons, 1994.

Nelson, Mark. The Data Compression Book. New York, NY: M&T Books, 1991.
Pennebaker, William B. and Joan L. Mitchell. JPEG Still Image Data Compression
Standard. New York, NY: Van Nostrand Reinhold, 1993.
TIFF 6.0 File Format Specification, Aldus Corp. (206-628-6593) and via ftp at
sgi.com (192.48.153.1). See the file graphics/tiff/TIFF6.ps.Z. 
Wallace, Gregory. "The JPEG Still Picture Compression Standard."
Communications of the ACM (April 1991). 
Figure 1: The JPEG encoding/decoding pipelines.
Figure 2: (a) Forward discrete cosine transform (FDCT); (b) inverse discrete
cosine transform (IDCT).
Figure 3: The zigzag sequence.
Figure 4: (a) RGB-to-YCbCr conversion; (b) YCbCr-to-RGB conversion.
(a) Y= 0.29900*R+0.58700*G+0.11400*B Cb=-0.16874*R-0.33126*G+0.50000*B Cr=
0.50000*R-0.41869*G-0.08131*B

(b) R=Y+1.40200*Cr G=Y-0.34414*Cb-0.71414*Cr B=Y+1.77200*Cb

Listing One
// Discrete Cosine Transform Class Interface File
#ifndef DCT_HPP
#define DCT_HPP
#include "misc.hpp"
#define FORWARDREORDER 1 // Direction of zigzag
#define INVERSEREORDER 0 // 1 is pixel order to freq order
 // 0 is freq order to pixel order
// The DCT Class Definition
class DCT {
 private:
 long Coefficients[MAXRESULTS][MAXSAMPLES];
 public:
 DCT(void);
 void FDCT(BYTEBLOCKPTR InBlock, INTBLOCKPTR OutBlock);
 void IDCT(INTBLOCKPTR InBlock, BYTEBLOCKPTR OutBlock);
 void ZigZagReorder(int *InBlock, int *OutBlock, BOOL ForwardReorder);
};
#endif

Listing Two 
// Discrete Cosine Class Member Functions

#include <math.h>
#include <mem.h>
#include "dct.hpp"
#include "tables.hpp"

#define SCALE4BITS 4
#define SCALE6BITS 6
#define DIVIDEBYFOUR 2

#define SCALEBYBITS 10 // Scale by this number of bits
#define SCALEFACTOR (1 << SCALEBYBITS)
#define SCALEBITSANDDIVIDE (SCALEBYBITS + DIVIDEBYFOUR + SCALE6BITS)

// Class Constructor for DCT class. Creates an array of weighting
// coefficients used for the discrete cosine transforms. Entries in
// Coefficients array are scaled up by SCALEFACTOR to remove the need
// for floating point arithematic. Coefficients array has dimensions of
// [MAXRESULTS][MAXSAMPLES].

DCT::DCT(void) {
 // Create the coefficient array
 double const PI = 4 * atan(1.0);
 double Constant = PI/16.0;
 for (register int k = 0; k < MAXRESULTS; k++) {

 for (register int m = 0; m < MAXSAMPLES; m++) {
 if (k == 0)
 Coefficients[k][m]=((1.0 / sqrt(2.0))*(double) SCALEFACTOR)+0.5;
 else
 Coefficients[k][m] = (cos(Constant * k * (2.0 * m + 1)) *
 (double) SCALEFACTOR) + 0.5;
 }
 }
}
// Forward 2D discrete cosine transform. Transforms block of pixels in
// InBlock into a block of frequency components in OutBlock.
void DCT::FDCT(BYTEBLOCKPTR InBlock, INTBLOCKPTR OutBlock) {
 long TempBlock[8][8];

 memset(TempBlock, 0, sizeof(TempBlock));
 long *plDest;
 // Do a one dimensional row FDCT
 for (register int Row = 0; Row < 8; Row++) {
 for (register int K = 0; K < 8; K++) {
 plDest = &TempBlock[Row][K];
 for (register int Col = 0; Col < 8; Col++) 
 *plDest += (*InBlock)[Row][Col] * Coefficients[K][Col];
 *plDest >>= SCALE4BITS; // Limited scaling to keep as much
 } // precision as possible
 }
 long TempLong;
 // Do a one dimensional column FDCT
 for (register int Col = 0; Col < 8; Col++) {
 for (register int K = 0; K < 8; K++) {
 TempLong = 0;
 for (register int Row = 0; Row < 8; Row++) {
 TempLong += TempBlock[Row][Col] * Coefficients[K][Row];
 }
 (*OutBlock)[K][Col] = (int)(TempLong >> SCALEBITSANDDIVIDE);
 }
 }
}
// Inverse 2D discrete cosine transform
void DCT::IDCT(INTBLOCKPTR InBlock, BYTEBLOCKPTR OutBlock) {
 long TempBlock[8][8];
 memset(TempBlock, 0, sizeof(TempBlock));
 long *plDest;
 // Do a one dimensional column FDCT
 for (register int Col = 0; Col < 8; Col++) {
 for (register int K = 0; K < 8; K++) {
 plDest = (long *) &TempBlock[K][Col];
 for (register int Row = 0; Row < 8; Row++)
 *plDest += (*InBlock)[Row][Col] * Coefficients[Row][K];
 *plDest >>= SCALE4BITS; // Limited scaling to keep as
 } // much precision as possible
 }
 long TempLong;
 // Do a one dimensional row FDCT
 for (register int Row = 0; Row < 8; Row++) {
 for (register int K = 0; K < 8; K++) {
 TempLong = 0;
 for (register int Col = 0; Col < 8; Col++)
 TempLong += TempBlock[Row][Col] * Coefficients[Col][K];
 TempLong >>= SCALEBITSANDDIVIDE;

 // Clamp pixel value into valid range 0..255
 TempLong = (TempLong > 255) ? 255:TempLong;
 TempLong = (TempLong < 0) ? 0:TempLong;
 // Store pixel value into output block
 (*OutBlock)[Row][K] = (BYTE) TempLong;
 }
 }
}
// Reorder the coefficients between pixel format and frequency content
// format. ZigZag table in file tables.cpp.
void DCT::ZigZagReorder(int *InBlock, int *OutBlock, BOOL ForwardReorder) {

 if (ForwardReorder) // Convert from pixels to frequencies
 for (register int Index = 0; Index < 64; Index++)
 OutBlock[Index] = InBlock[ZigZagTable[Index]];
 else // Convert from frequencies to pixels
 for (register int Index = 0; Index < 64; Index++)
 OutBlock[ZigZagTable[Index]] = InBlock[Index];
}












































The Future for Programmable Logic


Will tomorrow's embedded-control industry be PLD based?




Nick Tredennick


Nick is chief scientist at Altera and can be contacted at nickt@altera.com.


With the invention of the integrated circuit, TTL (transistor-transistor
logic) displaced the transistor in embedded-system designs because TTL
components increased the designer's efficiency. Similarly, in the 1980s,
microprocessors began displacing TTL. In the coming years, however,
microprocessors themselves will be displaced by programmable-logic devices
(PLDs) in many embedded-system designs. From both a hardware and software
perspective, this transition will have significant impact on the design and
implementation of embedded systems, and designers need to prepare for this
inevitable change. To determine when this transition will occur, it's useful
to examine the development of the microprocessor itself. 
Figure 1 illustrates the conceptual design difference between TTL and a
microprocessor. TTL designs use a catalog of TTL macro functions. The state
sequencer and data unit are wired directly into the implementation. 
A microprocessor system, on the other hand, implements a standard design, such
as that illustrated on the right side of Figure 1, and implements the
algorithm in a program in memory. Figure 2 illustrates the conceptual mapping
of the application into a microprocessor design. The algorithm must take into
account the instruction set of the microprocessor (it wouldn't do to pick an
algorithm which depended heavily on floating-point instructions if the
microprocessor didn't implement them). The algorithm is then mapped into a
high-level language (HLL). The HLL description of the algorithm is translated
to object code by a compiler. The compiler must have access to the computer
architecture to know what binary codes to generate for the high-level language
instructions. The object code implementing the algorithm runs on the computer
to manipulate the data and produce a result.
This introduces considerable complexity in the conceptual model of the
implementation. In TTL implementations, the state sequencer and the data unit
are fixed in the hardware. This gives low cost and excellent performance for a
single application. In microprocessor implementations, the state sequencer is
a program driving a general-purpose data unit. This gives low cost and
adequate performance for a broad range of applications. Improvements in
process technology have led to a proliferation of TTL components. A TTL
catalog contains hundreds of part types (corresponding to hard macro functions
available to the designer). Individual TTL-style designs are customized for
individual applications. As the TTL parts catalog grew, system manufacturers
were forced to stock an increasing variety of TTL components. Those same
improvements in process technology led to the development of the
microprocessor. Instead of a custom design of selected TTL parts, the
microprocessor design consisted of a smaller variety of standard components:
microprocessor, memory, and I/O components. A single, basic design could be
used for a large variety of applications by changing the program in memory.
Since the microprocessor and its associated components could be used in a
range of designs, the microprocessor attained high-production volume, leading
to low cost--a basic requirement for embedded-control applications.


Embedded Control


Simple designs, such as those in consumer appliances, drive the vast majority
of component sales. For these designs, problem size is small and performance
isn't an issue. Cost is the driving issue. Figure 3 illustrates my guess for
the position of the bubble representing the majority of component dollars in
the performance and problem-size domain. Most of the volume in the
four-billion-unit microprocessor market lies in the overlap between the
"Zillions of Component Dollars" and the "Embedded Microprocessor" bubbles.
In the zillions-of-component-dollars bubble in Figure 3, PLDs compete with TTL
and microprocessor designs. At the very low end of the problem-size and
performance scale, process improvements benefit PLDs, but TTL devices are
stalled for cost and performance improvement as they become pad limited.
Programmable logic offers the same performance and component cost, but there
are fewer component types to stock and fewer components in the final design.
As Figure 4 shows, PLDs offer a more-direct solution than microprocessors for
some applications. Fewer components means a cheaper, more-reliable design. The
same process-technology improvements driving the expansion of the
embedded-microprocessor bubble drive improvements in programmable-logic
devices.


What's Needed


Figure 5 is the conceptual view of PLD implementation. The PLD implementation
is less direct than the TTL and transistor implementation of Figure 6 and more
direct than the microprocessor implementation in Figure 2. The application is
mapped to an algorithm which is compiled directly into a processor design.
Because there is a compiler in the design process, components fit a large
range of applications, which drives down component cost. The PLD design has
the same advantages over the microprocessor design that the
microprocessor-based design had over the direct implementation--it is cheaper,
uses fewer components, and is more reliable. As Figure 7 illustrates, PLDs
occupy the lower-left corner of the performance and problem-size domain,
somewhere directly underneath the zillions-of-component-dollars bubble,
guaranteeing them gigantic volumes necessary to drive process improvements in
the technology. As process technology improves, the PLD bubble expands away
from the origin, encroaching on the TTL domain and the embedded-microprocessor
bubble.
What's needed for rapid expansion of the PLD application bubble in Figure 7 is
a compelling development environment that includes sophisticated compilers
(see the compiler bubble in Figure 5). Will that happen?


The History of Microprocessor Development 


As much as we would like to believe otherwise, our industry isn't driven
solely by logic. Like the fashion, toy, and other pop industries, high-tech is
swept by fads, too. Fads determine how press attention, R&D dollars, and
individual enthusiasm are channeled. Dollars and enthusiasm drive the industry
forward. Press attention complicates the process by creating a feedback loop.
Predicting the future for programmable logic in the presence of this feedback
loop is dangerous. But we have been in the midst of a similar development
cycle for the microprocessor since 1971. An analogy with the history of
microprocessor development should help us predict what will happen in the
programmable-logic industry.
The history of the microprocessor hasn't been completely rewritten yet, so
I'll give a rendition that suits my purposes. The microprocessor was invented
for embedded control. The first commercial microprocessor, introduced by Intel
in 1971, was derived from a calculator design. The original design proposal
called for seven custom chips to implement a calculator. Ted Hoff countered
with a four-chip proposal, the central component of which was the precursor to
today's microprocessor.
Even though computers used a microprocessor as the CPU much earlier, the
introduction of the IBM PC in 1981 clearly split microprocessor applications
into two categories: embedded control and CPU. In about 1982, the RISC fad
began, and further split the CPU branch of the taxonomy into CPUs for PCs and
CPUs for workstations and other computers. 
Today, the total microprocessor market should be about four billion units.
Almost four billion of those microprocessors will end up in embedded-control
applications. About 50 million microprocessors will become the CPU in a
computer system. Almost all of those 50 million microprocessors will become
the CPU in a PC. Fewer than one million microprocessors will become the CPU in
a workstation or other computer. Microprocessors used as the CPU in a computer
system, therefore, represent less than 2 percent of the unit volume.
Microprocessors used as the CPU in a workstation or other computer, excluding
the lowly PC, represent less than 2 percent of the CPU unit volume, making
them an almost invisible percentage of the total microprocessor unit volume. 
Computers solve very large problems for which a direct TTL solution is not
practical. The computer has two cost-reducing advantages: Its hardware can be
multiplexed (that is, it can iterate to solve a problem), and the same
hardware can be shared across many applications. As performance requirements
increase, however, computers lose ground. In these cases, a TTL solution may
be possible, but prohibitively expensive.
Zillions of research dollars will be spent on process development, which
benefits PLDs, computers, and microprocessors, but not TTL. Many dollars will
also be spent on computer development, which includes the development of
high-end microprocessors. A lot of the money will be spent on microprocessor
development, so performance will improve, but microprocessors are facing a
barrier because the new superscalar designs are able to exploit the available
parallelism in the instruction stream. Future microprocessors may be able to
issue six or eight instructions per clock tick, but existing code contains a
branch about every five instructions. Address and data dependencies (an
operand or address for the current instruction is calculated in a preceding
instruction) further reduce available parallelism. Additional improvements in
the microprocessor will not return much additional performance for the
research investment. Instruction execution is the bottleneck. Researchers are
studying exotic alternatives to superscalar microprocessors, such as very long
instruction word (VLIW) processors and parallel processors.
Meanwhile, in the PC marketplace, operating systems and applications grow more
sophisticated. The operating system increasingly isolates the applications
from the underlying hardware. Applications no longer directly manipulate
system resources and instead, simply request services from the operating
system. This leads to the system model in Figure 8, with applications talking
to the operating system and the operating system talking to the hardware.
Since instruction execution on the microprocessor is a major bottleneck in the
system, it has led to the development of application accelerators, as in
Figure 8. These application accelerators intercept service calls from the
applications and execute the function directly, avoiding low-level instruction
execution on the processor. There are, for example, Windows accelerators for
the PC from about 100 manufacturers. Likewise, there are accelerators for
Adobe Photoshop and other applications. The effort to improve performance by
avoiding instruction execution could lead to a proliferation of custom
accelerator cards. A PLD-based accelerator card, however, could be configured
to accelerate Windows and then reprogrammed to accelerate Photoshop or
AutoCAD. Eventually, this reconfigurable hardware could migrate onto the
microprocessor itself, intercepting and directly executing high-level function
calls between applications and the operating system.


Conclusion


If their development follows the same path as microprocessors, PLDs have great
potential for long-term growth. The microprocessor has had outstanding
long-term growth because gigantic volumes at the low end continuously reduce
cost, while at the high end, process and component R&D ensure the future. 
PLDs can expect to achieve the gigantic volumes at the low end necessary to
continuously reduce cost because they are a superior solution to TTL and
embedded microprocessors in regions where they compete. Many R&D dollars will
be spent on reconfigurable hardware, which is based on PLDs. I predict PLDs
will grow at the expense of both TTL and the microprocessor and will also
absorb the growth in system complexity.
Figure 1: For some designs, microprocessors offer a more efficient solution
than TTL.
Figure 2: Mapping an application into a microprocessor.
Figure 3: The low end of problem size and performance is the largest component
market.

Figure 4: The PLD is a more-direct solution to some design problems than a
microprocessor. 
Figure 5: Mapping an application into a PLD.
Figure 6: Mapping an application into hardware using TTL or transistors.
Figure 7: Low-end PLD applications expand as the underlying technology
improves.
Figure 8: Application accelerators intercept service requests to avoid
instruction execution.


























































An Architecture for Network Simulation


A flexible system based on a blocks language




Peter D. Varhol


Peter is chair of the graduate computer science and mathematics department at
Rivier College in New Hampshire. He can be contacted at
pvarhol@mighty.riv.edu.


My need to simulate computer networks arose out of research on intelligent
architectures for evaluating network traffic and routing packets along the
most-efficient path. Without a large-scale computer network with which to
experiment, I could not prototype and test different types of architectures,
nor show with any certainty that they would actually work as planned. Before
proceeding with this research, I needed to find a way to demonstrate its
effectiveness; a simulation seemed in order. Of course, commercial
network-simulation software is available--NetMaker (Make Systems), Bones
PlanNet (Comdisco Systems), L-Net (CACI), and SES (Scientific and Engineering
Software) spring to mind--but none of these completely satisfied my needs.
Consequently, I developed my own simulation package as a visual blocks
language.
A blocks language is composed of graphical blocks representing, in this case,
simulation or networking concepts. With blocks languages, you write a program
by positioning blocks on the screen, connecting them with wires, and adding
any necessary parameters. In the case of my blocks language, a simulation
might look like the diagram in Figure 1.
Writing a block is a multiple-step process. For the block itself, you can
assume an array of inputs and outputs corresponding to the number of inputs
into the block and outputs from the block. Parameters for individual blocks
are entered in a dialog box and are available to the block as an array of
variables.
The block itself is a procedure or function that takes the data, manipulates
it according to the specification for that block, and outputs the result. For
example, the create-packet block might look like Listing One.
In addition, several optional functions are associated with each block. These
functions perform operations such as creating the dialog box, labeling
dialog-box inputs, allocating memory for parameters, and initializing
variables at the start of a simulation run. Some associated functions are
shown in Listing Two.
As I began developing a language for simulating computer networks, I
discovered that many people I talked to were interested in network simulation
as an end unto itself. Many researchers, network managers, and
information-system planners are looking for ways to experiment with different
network configurations for both abstract and practical purposes. This
description of a network-simulation system represents one possible approach to
doing just that.


Queuing and Computer Networks


A packet network is, in effect, a queuing system. Packets are generated by a
source system and passed to a server. The server evaluates the destination
address of the packet, wraps it in a different protocol if necessary, and
sends it on its way. Most networks are simply a sequence of these structures
operating in a serial fashion.
The server, hub, or router (or internetworking host) is a finite resource that
all packets must use. The packets may come into that servicing agent from
multiple sources, and may have to be routed to multiple destinations,
according to their destination address. If multiple packets arrive too quickly
for the server to process them all at once, they have to queue up and wait for
service. These intermediate servers may also wrap the packet in another
protocol, requiring more processing time.
Granted, more than this basic activity goes on in computer networks; in fact,
many types of computer networks do not even use this transmission paradigm.
The Token Ring, for example, is literally a ring that continuously runs a
token, much as a race car runs on a racetrack. When some data has to be
transmitted from a system on the network, the token picks it up and deposits
it at the destination system. I wanted to be able to model these different
types of networks also.
Therefore, my simulation-blocks language became a general queuing language
geared toward simulating bus-oriented packet networks, since these networks
were easiest to visualize as a queuing system. In fact, the queue/server
concept can serve as the basis for simulating other types of networks (such as
Token Ring) and even nonnetworking applications. (I've been asked about the
possibility of using it to simulate computer I/O subsystems and even to
simulate riverboat traffic on the Mississippi River.)
Written in Borland Pascal and sitting on top of the VisSim simulation engine
from Visual Solutions (Westford, MA), my blocks language includes blocks for
creating, queuing, servicing, and passing a packet on to its destination. The
packet can be destroyed, or it can be passed on to as many servers as
necessary to reach its destination. (For more information on VisSim, see my
article, "Extending a Visual Language for Simulation," DDJ, June 1993, which
describes how you can enable discrete-event simulation via DLLs.)


The Language Internals


The base unit of management is the packet--a Pascal record that carries all of
the information that you may need to know about individual packets. It has
fields for a packet ID number, priority, destination, waiting time, service
time, and number of "servers," or routing gateways. User-defined fields let
the user specify things like packet size or protocol type. While it doesn't
contain nearly all of the information contained in a TCP/IP packet, for
example, it can be configured with most of the information that could possibly
influence a simulation. 
Example 1 is the packet's structure. Queues are arrays of linked lists, with
the linked list representing the actual queuing of packets. This lets me
consecutively number packets and reference them by array element. The linked
list can be ordered in a variety of different scheduling disciplines,
including FIFO, LIFO, and priority orders, with FIFO as the default.
Implementing the alternative scheduling disciplines is a simple matter of
ordering the linked list every time a new packet enters the queue.
The server is also an array, but one whose elements can contain only a single
packet structure. The server element simply holds onto a packet until the
service time expires.
In between the queue and the server is a block called a "transaction manager,"
a structure that accepts packets from any number of different queues and
parcels them out among any number of different servers. It is a traffic
manager, so to speak. While there is no analogous construct in a computer
network, the transaction manager gives a queue/server combination the
flexibility to simulate many different combinations of networks. The
transaction managers are also implemented in arrays, so that it is possible to
develop a simulation with multiple queue-server combinations.
The transaction manager also provides for the implementation of several
different service disciplines across multiple queues and servers. It can
select packets for service according to their arrival time in the queue,
irrespective of which queue they join. For multiple-queue systems, it offers
several different service protocols, including packet or queue priority and
priority preemption with or without saving the processing state.
In addition to these basic simulation constructs, I collect the data from
packets as they are created and use it to produce statistics on
packet-creation rate, average service time, average waiting time, number of
packets created and processed, and server-utilization rate. As a result, you
can monitor the progress of individual packets and also locate bottlenecks in
individual servers, routers, or hubs.


How It Works


The entire simulation process is event driven. Each event represents either
one or more packets being created, or one or more packets leaving one or more
of the service elements. (If the queue has more packets, this also represents
a dequeue.) All packets are created with a random number that designates the
time required to service that packet (this number can also be constant and
deterministic). At the same time, another random number is generated to
designate the time until the next transaction is created.
At an event, if a packet is created, it is assigned a service time, an ID
number, a destination address, a priority (if desired), and any user-defined
data you want to include. It is queued, and the queue is ordered if necessary.
If both the queue and an available server are idle, the packet goes directly
to the server.
At the next event, if the service time completes before the next packet is
created, the server releases the packet. If that server is its destination, it
is destroyed; if it was an intermediate processing location, it is passed on.
If the next packet is created first, it waits in the queue until the next
event occurs. If service is completed at that time, that packet is released
and the next waiting packet enters the queue. If, instead, it's time to create
yet another packet, the new packet will also be queued up.
For multiple-queue/multiple-server configurations, the transaction manager
comes into play. It effectively manages the flow of packets from multiple
queues and parcels them out to one or more servers. It can service multiple
queues in round-robin ordering, or it can service queues either by queue
priority or by individual-packet priority, regardless of queue. If you service
by packet priority, you can choose to preempt service for packets with a
higher priority. The service protocol can be specified from a pop-up menu.
Random numbers play an important part in this architecture. They represent the
time between packets and the time required to service a packet. I developed
random-number generators that follow the uniform, normal, poisson, and
exponential distributions. The user has full control over the means of the
random number created by the random-number generator and can even insert the
mean values on a slider control and change them during a simulation run.


Building a Network Simulation


The VisSim blocks editor lets you select blocks from the menu and place them
on the workspace, then connect them with wire lines to signify the flow of
control. It also lets you define your own menu selections through a C-language
interface. These menus are mounted in the editor's blocks menu during load
time. For flexibility (and to let me work in Pascal), my menu selections use
the C-language interface (see Listing Three), which subsequently calls a DLL
with my Pascal functions.

Creating a network simulation is a simple matter of loading the correct menus
during application launch, then placing the blocks on the display and
connecting them. The VisSim blocks editor also lets you collect individual
blocks together and create compound blocks so that you can abstract details
and create large network simulations. The level of abstraction is effectively
unlimited, so you can create a hierarchy of components and collect statistics
that examine the behavior at each level.
It would be relatively easy to add other constructs to this simulation
language. The block-programming interfaces are limited, and there is little
dependency between blocks that would limit the ways in which the constructs
can be assembled. Each block places a packet in a memory location identified
by a pointer. The next block in the sequence checks to see if there is a
packet in the previous block; if there is, it simply assigns a pointer
variable to that packet and continues processing from that point.
As a result, knowing the data interface and the conventions for developing
user-defined components for the VisSim simulation engine, it is possible to
develop additional blocks to interact with the existing ones. They can even be
added to the VisSim menu and completely integrated with existing blocks.


An Environment for Network Planning


There are many directions to take from here. I intend to build neural-network
and fuzzy-logic controls for an intelligent feedback architecture that
monitors network performance and changes the routing scheme in response to
faults and bottlenecks. Imagine an unsupervised neural net that categorizes
network traffic by traffic intensity or by projected time to destination.
These categories are, in effect, fuzzy and can be used to construct likelihood
distributions of the relevant variable. The likelihood distributions can then
be defined as inputs to a supervised neural net, which will develop the
routing algorithm. In Figure 2, the unsupervised neural net establishes fuzzy
categories of network traffic, the backpropagation neural net develops the
algorithm and feeds data back to refine fuzzy categories, and the network
traffic is routed across multiple pathways.
Eventually, this type of simulation approach can also become a more complete
environment for network planning and implementation. For many network
managers, the implementation and maintenance of a large network can be a
matter of hit or miss. Imagine the possibility of amassing and codifying
expertise on network types, configurations, implementations, and management;
in other words, an expert system (or combination of cooperating expert
systems) for networking. Such an expert system could evaluate the data created
by a network simulation and recommend improvements to the configuration. This
is, of course, much more difficult to do than to describe, but it is a worthy
effort and certainly a needed tool to deal with the growing size and
complexity of today's internetworks.
Figure 1: A multiple-queue, single-server configuration that simulates the
behavior of a networking bridge.
Figure 2: Routing algorithm.
Example 1: Packet structure.
packetptr = ^packet;
packet = record
 packet_number : double;
 priority : double;
 wait_time : double;
 serve_time : double;
 preempt_time : double;
 destination : double;
 t_mgr : double;
 servers : double;
 use : user;
 link : packetptr
end;

Listing One
procedure create(var param, inV,outV:VisSimArg);
export;
var i: integer;
begin
if param[0] <> 0 then
 q_number := trunc(param[0])
 else
 begin
 q_number := 1;
 param[0] := q_number
 end;
c[q_number] := false;

if {(inV[0] > 0) and} q_remaining[q_number] <= 0 then
begin
 new(next[q_number]);
 created[q_number] := true;
 q_remaining[q_number] := InV[0];
 packet_created[q_number] := cust_created[q_number] + 1;
 packet_id[q_number] := customer_id[q_number] + 1;
 next[q_number]^.customer_number := customer_id[q_number];
 next[q_number]^.destination := q_number;
 next[q_number]^.priority := inV[1];
 next[q_number]^.wait_time := 0;
 next[q_number]^.serve_time := 0;
 next[q_number]^.preempt_time := 0;
 next[q_number]^.link := nil;
 next[q_number]^.use[1] := InV[2];
 if inV[3] <> 0 then 
 if first[q_number,1] = 0 then
 next[q_number]^.use[1] := InV[2]

 else
 for i := 1 to max_data do
 next[q_number]^.use[i] := first[q_number,i];
end;
if created[q_number] = false then
 outV[0] := 0
 else
 outV[0] := q_number;
end;

Listing Two
procedure createSS(var param : VisSimArg; var runCount : shortint);
export;
var i : integer;
begin
for q_number := 1 to maxqueues do
 begin
 created[q_number] := false;
 packet_id[q_number] := 0;
 q_remaining[q_number] := 0;
 arrive_sum[q_number] := 0;
 packet_created[q_number] := 0;
 q_time[q_number] := 0;
 for i := 1 to max_data do
 first[q_number,i] := 0;
 end
end;
function createPA(var ppCount : shortint) : longint;
export;
var paramalloc : byteptr;
begin
 ppCount := 1;
 createPA := GetMem(paramalloc,16);
end;
function createPC(var param : VisSimArg) : PChar;
export;
begin
 createPC := 'Queue #'
end;

Listing Three
#include "windows.h"
#include "vsuser.h"

int DLLInst;

 USER_MENU_ITEM trans[] = {
 {"Discrete Event", "tran2", -1,-1,0, "Discrete event simulation building
blocks"},
 {"createTransaction", "create", 3,1,48, "Creates new transactions"},
 {"queue", "queue_block", 1,1,184, "First-in, first-out queue"},
 {"transactionManager", "t_manager", 1,1,32, "Manages transaction flow between
queues and servers"},
 {"server", "server", 2,1,32, "Services a transaction"},
 {"departSystem", "depart", 1,0,16, "Removes a transaction at the end of a
system"},
 {0}
 };

 USER_MENU_ITEM stats[] = {
 {"Discrete Statistics", "tran2", -1,-1,0, "Collects statistical data on
simulation"},
 {"utilizationRate", "utilization", 1,1,16, "Cumulatively computes utilization
rate"},

 {"queueLength", "queue_length", 1,1,16, "Computes the length of the queue"},
 {"waitingTime", "waiting_time", 1,1,16, "Cumulatively computes time waiting
in queue"},
 {"timeInSystem", "system_time", 1,1,16, "Cumulatively computes waiting and
service times combined"},
 {"transactionsGenerated", "transactions_generated", 1,1,16, "Computes number
of transactions produced in a simulation run"},
 {"transactionsServed", "transactions_served", 1,1,16, "Computes number of
transactions served in a simulation run"},
 {"transactionsLost", "transactions_lost", 1,1,16, "Cumulatively computes
proportion of transactions lost in a finite queue"},
 {"transactionTime", "trans_time", 1,1,16, "Foobar"},
 {0}
 };

 USER_MENU_ITEM utils[] = {
 {"Discrete Utilities", "tran2", -1,-1,0, "Utilities from manipulating a
simulation"},
 {"poissonRandomNumbers", "poisson", 1,1,16, "Produces a stream of
Poisson-distributed random numbers"},
 {"exponentialRandomNumbers", "exponential", 1,1,16, "Produces a stream of
exponentially-distributed random numbers"}, {"generatePriorities",
"generate_priority", 1,1,16, "Randomly generates priority levels for the
priority queue"},
 {"readTransactionID", "read_transaction", 1,1,16, "Reads the unique ID number
of a transaction"},
 {"insertData", "insert_data", 1,1,160, "Writes user-defined data into a
transaction"},
 {"retrieveData", "retrieve_data", 1,1,160, "Reads user-defined data from a
transaction"},
 {"fork", "fork", 1,2,8, "Causes a branching of a random number stream"},
 {"delayTransaction", "delay", 2,1,16, "Delays a transaction passing from a
server to a serial queue"},
 {"simulationTime", "sim_time", 1,1,16, "Computes the internal simulation
running time"},
 {0}
 };


void EXPORT PASCAL vsmInit()
{
 setUserBlockMenu(trans);
 setUserBlockMenu(stats);
 setUserBlockMenu(utils);
} 

































Examining the VESA VBE 2.0 Specification


Extending the VESA standard




Brad Haakenson 


Brad is the manager of advanced software research at Cirrus Logic and a member
of the VESA Software Standards Committee. He can be contacted at
brad@corp.cirrus.com.


In November 1994, the Video Electronic Standards Association (VESA) released
Version 2.0 of the VESA BIOS Extension (VBE). Based upon the
earlier-generation VBE (originally the "Video BIOS Extension"), the VESA VBE
specification is a widely accepted, device-independent interface that allows
programmers to access high-resolution/color-depth video modes on SVGA graphics
controllers.
Our goals for VBE 2.0 were to provide an extensible framework for future
growth, a high-performance interface for application programs, and support for
advanced features present in current-generation graphics controllers--all
while maintaining compatibility with the VBE specification. Consequently, VBE
2.0 is a true superset of VBE 1.2. All of the features and behaviors from
previous versions are present. New VBE 2.0 features include support for
non-VGA compatible controllers, a protected-mode interface for
higher-performance graphics, support for linear frame buffers,
digital-to-analog converter (DAC) services for palette operations, more OEM
data, BIOS certification, and support for supplemental specifications. All VBE
supplemental specifications are named "VBE/xxx," where xxx identifies the
supplemental specification. For this reason, the name of the basic
specification has been changed to "VBE Core Functions."


VBE 2.0


The hard copy of the VBE 2.0 specification is about three times the size of
Version 1.2. While new features contribute about half of the bulk, we also
added material to clarify existing features. (One of the biggest problems with
previous versions was the ambiguity that led programmers to interpret it
differently.)
VBE 2.0 also provides many more implementation notes within the VBE functions,
along with new sections on BIOS implementation and application programming.
These sections provide rules and suggestions on writing code that will work
cleanly with all versions of VBE. We tried to make VBE 2.0 compatible with all
properly written VBE applications so that, with a little work, any VBE 2.0
application will work properly with all previous versions of the
specification. 
To this end, VBE 2.0 includes an example program that demonstrates a "correct"
VBE 2.0 application. This code is the basis for the program in Listings One
and Two. While it can be compiled into a working program, the example code's
main purpose is to demonstrate as many of the programming elements of a
correctly written VBE 2.0 program as possible. If your compiler supports
in-line assembly, you can combine the two listings into a single module. You
are welcome to lift the code from the specification (or this article) and use
it as-is. If you want to write your own code or adapt existing code to work
with VBE 2.0, that's okay too. However, if it performs differently than the
source code in the specification, your program needs to change.


Supplemental Specifications


Even though VESA was founded to define the interface that evolved into VBE, it
has also endorsed a variety of other software specifications. Since most
supplemental software specifications use the same extensions to the INT 10h
interface as VBE, Version 2.0 provides a mechanism for them to extend VBE
without requiring the core functions to be rewritten. 
The first 16 VBE functions are reserved for the VBE Core Functions. Currently,
only 11 have been assigned. The other five functions allow for future growth.
Since the core functions were designed as an unaccelerated application-program
interface to the graphics controller, any new requirements in this area will
be added to future versions of the core functions. 
Each supplemental specification is assigned one of the remaining function
calls. This allows for a total of 240 specifications. The calls for each
supplemental specification are subfunctions of the basic function call.
Currently, VBE 2.0 defines the supplemental specifications; see Table 1. Some
of these supplemental specifications, such as DPMS power management services,
were released prior to the adoption of VBE 2.0 and will be rewritten to comply
with all of the requirements for VBE Supplemental Specifications. Most of the
other interfaces are currently being worked on by the VESA Software Standards
Committee (SSC), which is responsible for VBE 2.0


VBE Video Modes 


In the past, the SSC has defined double-byte-mode numbers to be used by VBE
applications. For VBE 2.0, the Software Standards Committee decided that,
rather than define new double-byte-mode numbers, we would encourage vendors to
use their own single-byte-mode numbers for VBE-mode numbers. The
double-byte-mode numbers have always caused problems: They can't be used with
a standard VGA set-mode function call; they require video-card manufacturers
to support two different numbers for every extended video mode; and the BIOS
data area only reserves a single byte for the mode number at 0040:0049h.
The VBE specification recommends looking in the VbeInfoBlock returned by
function 0h (Return VBE Controller Information) to determine the available
video modes. Function 1h (Return VBE Mode Information) needs to be called for
each reported video mode to find out if it is the correct color depth and
resolution. If so, use it; if not, check the next mode until the end of the
list.
VBE 2.0 requires that any mode reported in function 0h be fully supported by
function 1h. Standard VGA modes may or may not be supported, depending on the
OEM. Applications which assume that a specific mode number represents a given
resolution and color depth may have serious problems in VBE 2.0. For
compatibility, we recommend that VBE BIOSs support the existing
double-byte-mode numbers wherever possible (but don't assume that they will
always be there).


Function by Function


The best way to see how VBE 2.0 works is to examine the 11 core functions and
how they differ from VBE 1.2. The VBE 2.0 Specification is over 80 pages long.
Table 1 provides an overview of the function calls and how they relate to each
other.
Function 0h (Return VBE Controller Information) requires a larger buffer than
previous versions. The VbeInfoBlock size has been increased from 256 to 512
bytes. This allows for a longer video-mode list as well as increased vendor
information. The vendor strings in VBE 2.0 include the VBE BIOS release
number, the vendor and product names, and the revision of the hardware that
the VBE BIOS is meant to work with. In ROM implementations, the mode list is
also returned in the VbeInfoBlock. The calling program tells the VBE BIOS that
there is a 512-byte block available by placing the string VBE2 in the first
four bytes of the VbeInfoBlock when function 0h is called. Because of the
different types of data that can be returned by function 0h, application
programs cannot use any areas of the VbeInfoBlock for their own data storage. 
Function 1h (Return VBE Mode Information) returns a ModeInfoBlock that has
minor extensions from the previous versions of VBE. The biggest differences
are the new mode-attribute flags, an address for a linear frame buffer, and
off-screen memory information. The mode-attribute flags tell you whether or
not VGA-compatible hardware is available in this mode and whether the mode can
use linear addressing, segmented addressing, or both. If the mode isn't VGA
compatible, don't directly program any registers or make assumptions about the
location of the display memory windows. The off-screen memory size is
represented as an offset from the start of display memory and a size. This is
important, since some graphics controllers need to use part of their
off-screen memory for internal features like hardware cursors and scratch
data. If you destroy this data, bad things can happen, ranging from corrupted
pop-up icons or hardware cursors to lost screen-centering data. In extreme
cases, you would need to reboot the computer.
Function 2h (Set VBE Mode) reserves bit 14 of the mode number as a flag for a
linear frame buffer. If the bit is set, the mode will be set with the linear
frame buffer enabled. If bit 14 is not set, the mode will use standard
segmented addressing. If bit 14 is set to a state not supported by the mode,
it will be ignored. Bit 15 of the mode is the only bit that can be used as a
flag to not clear memory. Previous versions of VBE were ambiguous about this,
so some programs were written using bit 7 when setting a single-byte mode.
Function 3h (Return Current VBE Mode) is the same as in VBE 1.2 except that
the linear-frame-buffer flag and the clear-memory flag are returned in the
mode number. These flags use the same bits as in function 2h (Set VBE Mode).
Functions 4h and 5h (Save/Restore State and Display Window Control) are the
same as in VBE 1.2, but function 5h has no effect while using the
linear-frame-buffer memory model.
Function 6h (Set/Get Logical Scan Line Length) has had a safety net added.
When the line length was longer than could be supported by the hardware, VBE
1.2 returned a failure code and went on its way. A VBE 2.0 BIOS will set the
maximum line length that the hardware will support and return that value in
CX. In VBE 2.0, you can select the scan-line length in bytes or pixels. (VBE
1.2 only supported line lengths in pixel units, making it impossible to set
common scan-line lengths, such as 1024 bytes, in 24 bit-per-pixel modes.) VBE
2.0 also provides a parameter to request the maximum scan-line length in
either bytes or pixels without setting it.
Function 07h (Set/Get Display Start) can be set to wait for the next vertical
retrace to take effect. This will prevent the screen from flickering as two
sets of data are displayed during the same frame. Function 8h (Set/Get DAC
Palette Format) is the same as in VBE 1.2.
Function 9h (Set/Get DAC Palette data) is new in VBE 2.0. It is equivalent to
the standard VGA BIOS function 0Bh (Set Color Palette), except that it adds
support for a second palette (if the DAC supports the feature). VBE 2.0 also
adds a flag to delay palette writes until the next vertical retrace to prevent
a snow-like effect from showing up on the screen during RAMDAC programming. We
added this function because VBE 2.0 doesn't require VGA-compatible hardware or
a VGA-compatible BIOS.
Function Ah (Return VBE Protected Mode Interface) provides pointers and
offsets to the protected-mode versions of the three time-critical functions.
This was added to VBE 2.0 to support high-performance, protected-mode access
to the VBE 2.0 interface. For most of the VBE 2.0 function calls, your program
can switch to real mode, but functions 05h (Display Window Control), 07h
(Set/Get Display Start), and 9h (Set/Get DAC Palette data) are likely to be
used while your program is in the middle of drawing the screen. For a
real-time graphics program, the state transitions will seriously degrade your
performance. The protected-mode versions of these functions are completely
relocatable, and are intended to be copied directly into your code segment and
executed locally for the best performance. You can also execute them directly
without relocating them. Function Ah returns all of the information necessary
to locate, copy, and execute the three functions.


Conclusion



VBE 2.0 is the result of two years of hard work. When we started on the spec,
we thought it would be simple to clean up a few issues, toss in some new
functions, and get it out the door. Upon closely examining the comments and
requests programmers had sent VESA, however, we realized we had more work
ahead. VBE 2.0, the result of our efforts, has been formally approved by the
VESA membership. For copies of the VBE 2.0 Programmer's Toolkit (which
includes the specification), contact VESA. 


For More Information


Video Electronic Standards Association
VBE 2.0 Programmer's Toolkit
2150 N. First Street, Suite 440
San Jose, CA 95131
408-435-0333
$100.00
Table 1: Supplemental specifications in VBE 2.0.
Function Description
Function 10h Power Management Extensions (PM), formerly known
 as "VESA DPMS."
Function 11h Flat Panel Interface Extensions (FP) provide a
 software interface to control the flat panel in
 portable computers.
Function 12h Cursor Interface Extensions (CI), formerly known
 as "VESA VCI," are an interface to hardware
 cursors.
Function 13h Audio Interface Extensions (AI) are a
 specification that provides a hardware-
 independent audio interface.
Function 14h OEM Extensions provide an address space for
 graphics-card manufacturers to define their own
 function calls for the specific configuration
 of their graphics controllers. Normally used
 only by the individual controller manufacturers.
Function 15h Display Data Channel (DDC) provides plug-and-play
 for monitors.
Function 16h Graphics System Configuration (GC) is an interface
 that provides a low-level configuration for
 graphics controllers.
Function 17h Accelerator Functions (AF) give application
 programs access to accelerator functions such as
 blitters and video playback.
Table 2: VBE 2.0 function summary; changes from VBE 1.2 are in italics. 
Input: AX=4F00h Return VBE controller information
 ES:DI= Pointer to buffer in which to place
 VbeInfoBlock structure.
 (VbeSignature should be set to 'VBE2' when
 function is called to indicate VBE 2.0
 information is desired and the information
 block is 512 bytes in size.)
Output*: AX= VBE return status

 
Input: AX=4F01h Return VBE mode information
 CX= Mode number
 ES:DI= Pointer to ModeInfoBlock structure 
Output*: AX= VBE return status 
Input: AX=4F02h Set VBE mode
 BX= Desired mode to set
 D0-D8= Mode number
 D9-D13= Reserved (must be 0)

 D14=0 Use windowed frame buffer model
 =1 Use linear/flat frame buffer model 
 D15=0 Clear display memory 
 =1 Don't clear display memory
Output*: AX= VBE return status 
 
Input: AX=4F03h Return current VBE mode 
Output*: AX= VBE Return Status 
 BX= Current VBE mode
 D0-D13= Mode number
 D14=0 Windowed frame buffer model
 =1 Linear/flat frame buffer model
 D15=0 Memory cleared at last mode set
 =1 Memory not cleared at last mode set
 
Input: AX=4F04h Save/restore state
 DL=00h Return save/restore state buffer size 
 =01h Save state
 =02h Restore state
 CX= Requested states 
 D0= Save/Restore controller hardware state 
 D1= Save/Restore BIOS data state 
 D2= Save/Restore DAC state 
 D3= Save/Restore Register state 
 ES:BX= Pointer to buffer (if DL <> 00h) 
Output*: AX= VBE return status 
 BX= Number of 64-byte blocks to hold the state
 buffer (if DL= 00h)
 
Input: AX=4F05h VBE display-window control 
 BH=00h Set memory window 
 =01h Get memory window 
 BL= Window number 
 =00h Window A 
 =01h Window B 
 DX= Window number in video memory in
 window-granularity units (set memory
 window only)
Output: AX= VBE return status 
 DX= Window number in window-granularity units
 
Input: AX=4F06h VBE set/get logical scan-line length
 BL=00h Set scan-line length in pixels
 =01h Get scan-line length
 =02h Set scan-line length in bytes
 =03h Get maximum scan-line length
 CX= If BL=00h desired width in pixels
 If BL=02h desired width in bytes (ignored
 for get functions)
Output: AX= VBE return status 
 BX= Bytes per scan line 
 CX= Actual pixels/scan line (truncated to
 nearest complete pixel)
 DX= Maximum number of scan lines 
 
Input: AX=4F07h VBE set/get display start control
 BH=00h Reserved and must be 00h 
 BL=00h Set display start 
 =01h Get display start 

 =80h Set display start during vertical retrace
 CX= First displayed pixel in scan line (set
 display start only)
 DX= First displayed scan line (set display
 start only) 
Output: AX= VBE return status 
 BH= 00h reserved and will be 0 (get display
 start only) 
 CX= First displayed pixel in scan line (get
 display start only) 
 DX= First displayed scan line (get display
 start only) 
 
Input: AX=4F08h VBE set/get palette format
 BL=00h Set DAC palette format 
 =01h Get DAC palette format 
 BH= Desired bits of color per primary (set
 DAC palette format only) 
Output: AX= VBE return status 
 BH= Current number of bits of color per primary
 
Input: AX=4F09h VBE load/unload palette data
 BL=00h Set palette data
 =01h Get palette data
 =02h Set secondary palette data
 =03h Get secondary palette data
 =80h Set palette data during vertical retrace
 with blank bit on
 CX= Number of palette registers to update
 DX= First palette register to update
 ES:DI= Table of palette values (see below for format)
Output: AX= VBE return status
Format of Alignment byte, red byte, green byte,
palette values: blue byte
 
Input: AX=4F0Ah VBE 2.0 protected-mode interface
 BL=00h Return protected-mode table
Output: AX= Status
 ES= Real-mode segment of table
 DI= Offset of table
 CX= Length of table, including protected-mode
 code in bytes (for copying purposes)

*All other registers are preserved.

Listing One
/****************************************************************************
* Hello VBE!
* Language: C (Keyword far is by definition not ANSI, therefore to make it
true
* ANSI remove all far references and compile under MEDIUM model.)
* Environment: IBM PC (MSDOS) 16 bit Real Mode
* Original code contributed by: - Kendall Bennett, SciTech Software
* Conversion to Microsoft C by: - Rex Wolfe, Western Digital Imaging
* - George Bystricky, S-MOS Systems
* Description: Simple 'Hello World' program to initialize a user-specified 
* 256-color graphics mode, and display a simple moire pattern. Tested with
* VBE 1.2 and above. This code does not have any hard-coded VBE mode 
* numbers, but will use the VBE 2.0 aware method of searching for available
* video modes, so will work with any new extended video modes defined by a

* particular OEM VBE 2.0 version. For brevity, we don't check for failure 
* conditions returned by the VBE (but we shouldn't get any).
****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#include <conio.h>
/* Comment out the following #define to disable direct bank switching. 
 * The code will then use Int 10h software interrupt method for banking. */
#define DIRECT_BANKING
#ifdef DIRECT_BANKING
/* only needed to setup registers BX,DX prior to the direct call.. */
extern far setbxdx(int, int);
#endif
/*---------------------- Macro and type definitions -----------------------*/
/* SuperVGA information block */
struct
{
 char VESASignature[4]; /* 'VESA' 4 byte signature */
 short VESAVersion; /* VBE version number */
 char far *OEMStringPtr; /* Pointer to OEM string */
 long Capabilities; /* Capabilities of video card */
 unsigned far *VideoModePtr; /* Pointer to supported modes */
 short TotalMemory; /* Number of 64kb memory blocks */
 char reserved[236]; /* Pad to 256 byte block size */
} VbeInfoBlock;
/* SuperVGA mode information block */
struct
{
 unsigned short ModeAttributes; /* Mode attributes */
 unsigned char WinAAttributes; /* Window A attributes */
 unsigned char WinBAttributes; /* Window B attributes */
 unsigned short WinGranularity; /* Window granularity in k */
 unsigned short WinSize; /* Window size in k */
 unsigned short WinASegment; /* Window A segment */
 unsigned short WinBSegment; /* Window B segment */
 void (far *WinFuncPtr)(void); /* Pointer to window function */
 unsigned short BytesPerScanLine; /* Bytes per scanline */
 unsigned short XResolution; /* Horizontal resolution */
 unsigned short YResolution; /* Vertical resolution */
 unsigned char XCharSize; /* Character cell width */
 unsigned char YCharSize; /* Character cell height */
 unsigned char NumberOfPlanes; /* Number of memory planes */
 unsigned char BitsPerPixel; /* Bits per pixel */
 unsigned char NumberOfBanks; /* Number of CGA style banks */
 unsigned char MemoryModel; /* Memory model type */
 unsigned char BankSize; /* Size of CGA style banks */
 unsigned char NumberOfImagePages; /* Number of images pages */
 unsigned char res1; /* Reserved */
 unsigned char RedMaskSize; /* Size of direct color red mask */
 unsigned char RedFieldPosition; /* Bit posn of lsb of red mask */
 unsigned char GreenMaskSize; /* Size of direct color green mask */
 unsigned char GreenFieldPosition; /* Bit posn of lsb of green mask */
 unsigned char BlueMaskSize; /* Size of direct color blue mask */
 unsigned char BlueFieldPosition; /* Bit posn of lsb of blue mask */
 unsigned char RsvdMaskSize; /* Size of direct color res mask */
 unsigned char RsvdFieldPosition; /* Bit posn of lsb of res mask */
 unsigned char DirectColorModeInfo; /* Direct color mode attributes */
 unsigned char res2[216]; /* Pad to 256 byte block size */

} ModeInfoBlock;
typedef enum
{
 memPL = 3, /* Planar memory model */
 memPK = 4, /* Packed pixel memory model */
 memRGB = 6, /* Direct color RGB memory model */
 memYUV = 7, /* Direct color YUV memory model */
} memModels;
/*--------------------------- Global Variables ----------------------------*/
char mystr[256];
char *get_str();
int xres,yres; /* Resolution of video mode used */
int bytesperline; /* Logical CRT scanline length */
int curBank; /* Current read/write bank */
unsigned int bankShift; /* Bank granularity adjust factor */
int oldMode; /* Old video mode number */
char far *screenPtr; /* Pointer to start of video memory */
void (far *bankSwitch)(void); /* Direct bank switching function */
/*------------------------ VBE Interface Functions ------------------------*/
/* Get SuperVGA information, returning true if VBE found */
int getVbeInfo()
{
 union REGS in,out;
 struct SREGS segs;
 char far *VbeInfo = (char far *)&VbeInfoBlock;
 in.x.ax = 0x4F00;
 in.x.di = FP_OFF(VbeInfo);
 segs.es = FP_SEG(VbeInfo);
 int86x(0x10, &in, &out, &segs);
 return (out.x.ax == 0x4F);
}
/* Get video mode information given a VBE mode number. We return 0 if the mode
 * is not available, or if it is not a 256 color packed pixel mode. */
int getModeInfo(int mode)
{
 union REGS in,out;
 struct SREGS segs;
 char far *modeInfo = (char far *)&ModeInfoBlock;
 if (mode < 0x100) return 0; /* Ignore non-VBE modes */
 in.x.ax = 0x4F01;
 in.x.cx = mode;
 in.x.di = FP_OFF(modeInfo);
 segs.es = FP_SEG(modeInfo);
 int86x(0x10, &in, &out, &segs);
 if (out.x.ax != 0x4F) return 0;
 if ((ModeInfoBlock.ModeAttributes & 0x1)
 && ModeInfoBlock.MemoryModel == memPK
 && ModeInfoBlock.BitsPerPixel == 8
 && ModeInfoBlock.NumberOfPlanes == 1)
 return 1;
 return 0;
}
/* Set a VBE video mode */
void setVBEMode(int mode)
{
 union REGS in,out;
 in.x.ax = 0x4F02; in.x.bx = mode;
 int86(0x10,&in,&out);
}

/* Return the current VBE video mode */
int getVBEMode(void)
{
 union REGS in,out;
 in.x.ax = 0x4F03;
 int86(0x10,&in,&out);
 return out.x.bx;
}
/* Set new read/write bank. Set both Window A and Window B, as many VBE's have
 * these set as separately available read and write windows. We also use a 
 * simple (but very effective) optimization of checking if the requested bank 
 * is currently active. */
void setBank(int bank)
{
 union REGS in,out;
 if (bank == curBank) return; /* Bank is already active */
 curBank = bank; /* Save current bank number */
 bank <<= bankShift; /* Adjust to window granularity */
#ifdef DIRECT_BANKING
 setbxdx(0,bank);
 bankSwitch();
 setbxdx(1,bank);
 bankSwitch();
#else
 in.x.ax = 0x4F05; in.x.bx = 0; in.x.dx = bank;
 int86(0x10, &in, &out);
 in.x.ax = 0x4F05; in.x.bx = 1; in.x.dx = bank;
 int86(0x10, &in, &out);
#endif
}
/*-------------------------- Application Functions ------------------------*/
/* Plot a pixel at location (x,y) in specified color (8 bit modes only) */
void putPixel(int x,int y,int color)
{
 long addr = (long)y * bytesperline + x;
 setBank((int)(addr >> 16));
 *(screenPtr + (addr & 0xFFFF)) = (char)color;
}
/* Draw a line from (x1,y1) to (x2,y2) in specified color */
void line(int x1,int y1,int x2,int y2,int color)
{
 int d; /* Decision variable */
 int dx,dy; /* Dx and Dy values for the line */
 int Eincr,NEincr; /* Decision variable increments */
 int yincr; /* Increment for y values */
 int t; /* Counters etc. */
#define ABS(a) ((a) >= 0 ? (a) : -(a))
 dx = ABS(x2 - x1);
 dy = ABS(y2 - y1);
 if (dy <= dx)
 {
 /* We have a line with a slope between -1 and 1. Ensure that we are 
 * always scan converting the line from left to right to ensure that 
 * we produce the same line from P1 to P0 as the line from P0 to P1. */
 if (x2 < x1)
 {
 t = x2; x2 = x1; x1 = t; /* Swap X coordinates */
 t = y2; y2 = y1; y1 = t; /* Swap Y coordinates */
 }

 if (y2 > y1)
 yincr = 1;
 else
 yincr = -1;
 d = 2*dy - dx; /* Initial decision variable value */
 Eincr = 2*dy; /* Increment to move to E pixel */
 NEincr = 2*(dy - dx); /* Increment to move to NE pixel */
 putPixel(x1,y1,color); /* Draw the first point at (x1,y1) */
 /* Incrementally determine the positions of the remaining pixels */
 for (x1++; x1 <= x2; x1++)
 {
 if (d < 0)
 d += Eincr; /* Choose the Eastern Pixel */
 else
 {
 d += NEincr; /* Choose the North Eastern Pixel */
 y1 += yincr; /* (or SE pixel for dx/dy < 0!) */
 }
 putPixel(x1,y1,color); /* Draw the point */
 }
 }
 else
 {
 /* We have a line with a slope between -1 and 1 (ie: includes vertical
 * lines). We must swap x and y coordinates for this. Ensure that we
 * are always scan converting the line from left to right to ensure 
 * that we produce the same line from P1 to P0 as line from P0 to P1.*/
 if (y2 < y1)
 {
 t = x2; x2 = x1; x1 = t; /* Swap X coordinates */
 t = y2; y2 = y1; y1 = t; /* Swap Y coordinates */
 }
 if (x2 > x1)
 yincr = 1;
 else
 yincr = -1;
 d = 2*dx - dy; /* Initial decision variable value */
 Eincr = 2*dx; /* Increment to move to E pixel */
 NEincr = 2*(dx - dy); /* Increment to move to NE pixel */
 putPixel(x1,y1,color); /* Draw the first point at (x1,y1) */
 /* Incrementally determine the positions of the remaining pixels */
 for (y1++; y1 <= y2; y1++)
 {
 if (d < 0)
 d += Eincr; /* Choose the Eastern Pixel */
 else
 {
 d += NEincr; /* Choose the North Eastern Pixel */
 x1 += yincr; /* (or SE pixel for dx/dy < 0!) */
 }
 putPixel(x1,y1,color); /* Draw the point */
 }
 }
}
/* Draw a simple moire pattern of lines on the display */
void drawMoire(void)
{
 int i;
 for (i = 0; i < xres; i += 5)

 {
 line(xres/2,yres/2,i,0,i % 0xFF);
 line(xres/2,yres/2,i,yres,(i+1) % 0xFF);
 }
 for (i = 0; i < yres; i += 5)
 {
 line(xres/2,yres/2,0,i,(i+2) % 0xFF);
 line(xres/2,yres/2,xres,i,(i+3) % 0xFF);
 }
 line(0,0,xres-1,0,15);
 line(0,0,0,yres-1,15);
 line(xres-1,0,xres-1,yres-1,15);
 line(0,yres-1,xres-1,yres-1,15);
}
/* Return NEAR pointer to FAR string pointer*/
char *get_str(char far *p)
{
 int i;
 char *q=mystr;
 for(i=0;i<255;i++)
 {
 if(*p) *q++ = *p++;
 else break;
 }
 *q = '\0';
 return(mystr);
}
/* Display a list of available resolutions. Be careful with calls to function
 * 00h to get SuperVGA mode information. Many VBE's build the list of video
 * modes directly in this information block, so if you are using a common 
 * buffer (which we aren't here, but in protected mode you will), then you
will
 * need to make a local copy of this list of available modes. */
void availableModes(void)
{
 unsigned far *p;
 if (!getVbeInfo())
 {
 printf("No VESA VBE detected\n");
 exit(1);
 }
 printf("VESA VBE Version %d.%d detected (%s)\n\n",
 VbeInfoBlock.VESAVersion >> 8, VbeInfoBlock.VESAVersion & 0xF,
 get_str(VbeInfoBlock.OEMStringPtr));
 printf("Available 256 color video modes:\n");
 for (p = VbeInfoBlock.VideoModePtr; *p !=(unsigned)-1; p++)
 {
 if (getModeInfo(*p))
 {
 printf(" %4d x %4d %d bits per pixel\n",
 ModeInfoBlock.XResolution, ModeInfoBlock.YResolution,
 ModeInfoBlock.BitsPerPixel);
 }
 }
 printf("\nUsage: hellovbe <xres> <yres>\n");
 exit(1);
}
/* Initialize the specified video mode. Notice how we determine a shift factor
 * for adjusting the Window granularity for bank switching. This is much
faster
 * than doing it with a multiply (especially with direct banking enabled). */

void initGraphics(unsigned int x, unsigned int y)
{
 unsigned far *p;
 if (!getVbeInfo())
 {
 printf("No VESA VBE detected\n");
 exit(1);
 }
 for (p = VbeInfoBlock.VideoModePtr; *p != (unsigned)-1; p++)
 {
 if (getModeInfo(*p) && ModeInfoBlock.XResolution == x
 && ModeInfoBlock.YResolution == y)
 {
 xres = x; yres = y;
 bytesperline = ModeInfoBlock.BytesPerScanLine;
 bankShift = 0;
 while ((unsigned)(64 >> bankShift) != ModeInfoBlock.WinGranularity)
 bankShift++;
 bankSwitch = ModeInfoBlock.WinFuncPtr;
 curBank = -1;
 screenPtr = (char far *)( ((long)0xA000)<<16 0);
 oldMode = getVBEMode();
 setVBEMode(*p);
 return;
 }
 }
 printf("Valid video mode not found\n");
 exit(1);
}
/* Main routine. Expects the x & y resolution of the desired video mode to be
 * passed on command line. Will print out a list of available video modes if
 * no command line is present. */
void main(int argc,char *argv[])
{
 int x,y;
 if (argc != 3)
 availableModes(); /* Display list of available modes */
 x = atoi(argv[1]); /* Get requested resolution */
 y = atoi(argv[2]);
 initGraphics(x,y); /* Start requested video mode */
 drawMoire(); /* Draw a moire pattern */
 getch(); /* Wait for keypress */
 setVBEMode(oldMode); /* Restore previous mode */
}
/*----------------------------------------------------------------------*/
/* The following commented-out routines are for Planar modes */
/* outpw() is for word output, outp() is for byte output */
/*----------------------------------------------------------------------*/
/* Initialize Planar (Write mode 2)
 * Should be Called from initGraphics
void initPlanar()
{
 outpw(0x3C4,0x0F02);
 outpw(0x3CE,0x0003);
 outpw(0x3CE,0x0205);
}
*/
/* Reset to Write Mode 0
 * for BIOS default draw text

void setWriteMode0()
{
 outpw(0x3CE,0xFF08);
 outpw(0x3CE,0x0005);
}
*/
/* Plot a pixel in Planar mode
void putPixelP(int x, int y, int color)
{
 char dummy_read;
 long addr = (long)y * bytesperline + (x/8);
 setBank((int)(addr >> 16));
 outp(0x3CE,8);
 outp(0x3CF,0x80 >> (x & 7));
 dummy_read = *(screenPtr + (addr & 0xFFFF));
 *(screenPtr + (addr & 0xFFFF)) = color;
}
*/

Listing Two
; This module performs the bank switching for HELLOVBE.EXE. If your compiler
; supports in-line assembly, this code can be incorporated into HELLOVBE.C
public _setbxdx
.MODEL SMALL ;whatever
.CODE
set_struc struc
 dw ? ;old bp
 dd ? ;return addr (always far call)
p_bx dw ? ;reg bx value
p_dx dw ? ;reg dx value
set_struc ends
_setbxdx proc far ; must be FAR
 push bp
 mov bp,sp
 mov bx,[bp]+p_bx
 mov dx,[bp]+p_dx
 pop bp
 ret 
_setbxdx endp
END 























Programming with OpenGL


3-D graphics for Windows NT




Ron Fosner


Ron is a principal software developer at Lotus Development, where he
researches and develops graphical and interactive techniques for data analysis
and exploration. Ron can be contacted at ron@lotus.com.


If you've worked with the Windows GDI, you're painfully aware of its
limitations, particularly when trying to create anything other than a flat,
2-D, static scene. And whether you program games or business graphics, you
know that in Windows, attempting to create any effects beyond a simple
gradient fill usually means some complicated programming. Recognizing these
shortcomings, Microsoft has added to Windows NT 3.5 (and promised for all
Microsoft 32-bit operating systems) a graphics library called "OpenGL," which
provides the advanced 3-D rendering and animation that is difficult to do with
GDI. 
OpenGL is a computer-industry standard based upon Silicon Graphics' internal
graphics library. OpenGL was designed and is maintained by an industry-wide
review board composed of SGI, Microsoft, IBM, Intel, and DEC. Until recently,
OpenGL was usually found only on UNIX workstations. However, with the
availability of a standardized (and well-known) interface for 3-D graphics,
along with advances in dedicated 3-D rendering hardware, it's possible to
create some amazingly complicated and realistic scenes in Windows and render
them quickly. In this article, I'll provide an overview of OpenGL and
illustrate how you can start writing your own OpenGL programs.


OpenGL Primitives 


OpenGL provides primitives for points, lines, and polygons. Everything you
create is based on these three primitives. The library also provides support
routines that draw curves, surfaces, or text; you can also create filled
polygons (thereby creating a surface). Once you've created a scene out of
primitives, you can specify lighting effects, specialized effects (fog or
transparency), and viewing angle. OpenGL takes care of the rest: shading,
hidden-surface removal, and perspective rendering. If you don't like the
viewpoint, simply change it and OpenGL will recalculate the scene for you. In
fact, once the objects are created, you can dynamically alter their location
and rotation, your viewpoint, the lighting effects, shading, and so on; these,
too, will be recalculated for you. The hard part is locating and describing
the objects themselves.
OpenGL is designed to run efficiently as a state machine in a client/server
model. For instance, in a typical environment, you might have one powerful
computer generating the drawing commands (the server), while a networked
client workstation receives these commands and does the actual rendering on
its screen. While there's nothing in NT 3.5 preventing you from creating such
a program using remote-procedure calls (RPCs), OpenGL works just as well if
the same computer is both client and server. Again, the tricky part is
learning how to create an OpenGL scene and trying to interface between OpenGL
and Windows, since OpenGL (as a hardware-independent library) knows nothing
about Windows, device contexts, pens, or brushes.


OpenGL Libraries 


Three libraries are provided with the NT version of OpenGL, the main one being
opengl32.lib. By convention, functions in this library (such as
glDrawPixels()) use the prefix "gl". Next is the OpenGL utility library,
glu32.lib, containing functions such as gluBeginPolygon(), which use the
prefix "glu". These are helper routines for OpenGL that provide services such
as creating a sphere, performing matrix manipulations, and tessellation. If
you think of opengl32.lib as the workhorse of OpenGL, then the utility library
provides higher-level functionality. The final library is the auxiliary
library written for the OpenGL Programming Guide. Routines in this library
contain an "aux" prefix, as in auxInitWindow(). These functions are not
strictly part of OpenGL, and needn't be included for most OpenGL programs.
However, you will likely find them in most OpenGL implementations, including
NT's. I'll use the auxiliary library, since it allows you to ignore the
Windows-specific portions of a program and concentrate on the OpenGL parts.
Finally, six new, implementation-specific interface routines allow OpenGL to
work on a Windows platform. Interface routines like wglGetCurrentContext() use
a "wgl" prefix and are referred to as "wiggle" routines. These routines
provide the interface between straight OpenGL and Windows and are analogous to
the "glx" interface functions (X Windows' interface for OpenGL) in an X Window
System implementation. 
In addition, four Win32 functions allow access to the pixel formats. These are
important since you have to try to match your program's needs with your
system's hardware. Finally, there's one Win32 function that deals with
swapping the buffers in a double-buffered window.


Watch the Bouncing Ball


Listing One (page 106) is an OpenGL program that displays a bouncing ball on a
checkerboard surface; see Figure 1. I'm using the auxiliary library, which
lets me ignore Windows and concentrate on OpenGL. It also lets me write
more-traditional C code rather than Windows-style code. The first three aux
functions in the localInitialize() procedure of Listing One initialize the
display mode, set the window's size and position, and open the window.
Clearly, using the aux library hides a lot of the Windows code. 
After initialization, the next few routines in main() take function names as
arguments and show how the auxiliary library operates. auxReshapeFunc is
called when the window needs to be reshaped; auxKeyFunc, when a specified key
is pressed; auxIdleFunc, when you have idle time; and auxMainLoop is the main
loop of the program. In the Windows implementation, these functions all hide
the Windows messaging system, making OpenGL programming straightforward. Of
course, when you write an OpenGL program for Windows, you have to worry about
all the other nastiness that accompanies writing for the Windows API, plus the
additional worries of an OpenGL Windows app. 


Creating and Viewing a 3-D Object


The biggest change that comes from rendering a 3-D scene is learning how to
specify both object and a viewing volume. In the 2-D world, you could just
specify a line to be drawn from, say, 100,100 to 200,300, and there it would
be, on your screen. Things aren't that simple in 3-D, because 3-D objects are
described by their vertices using x-, y-, and z-coordinates. The difficulty is
compounded by the fact that you must specify the coordinates of both an object
and the viewpoint. 
When a vertex is rendered to the screen, it goes through a couple of
transformation matrices. Figure 2 shows the steps that a single point goes
through. An object is initially specified in what's usually called "object"
coordinates, which are considered local for each object. The object is then
usually translated, rotated, and scaled into "world" coordinates. Objects in
world coordinates are positioned with respect to all other objects in the
world. When everything is set, all of the viewing and projection calculations
are performed to render your object to a collection of pixels on the screen.
When an object is created, its default origin is 0,0,0. Any initial
transformations to an object are called "modeling transformations." For
example, if you create a rendering of a car, you'd probably have one routine
to draw a wheel in object coordinates, and then just call the routine four
times with four different transformations that would place the wheels in the
correct location and orientation about the car body in world coordinates. In
this way, you can create complex objects out of simply rendered objects. When
all of the objects are correctly positioned in world coordinates, we can
specify the viewing transformation. This will determine the viewpoint from
which we "see" the objects. 
As a performance improvement, OpenGL combines the modeling transformations
with the viewing transformation since the viewing and modeling matrices can be
combined at this point. What this means for the programmer is that the viewing
transformation is applied first and the modeling transformations follow. This
is one of the trickier issues about 3-D graphics, particularly OpenGL's
implementation. 
Next, the projection matrix is applied to take the specified viewing volume
and clip out everything outside it, along with parts of any object obscured by
another object. The perspective division adjusts the results from the
projection matrix (3-D coordinates) and gives you 2-D device coordinates.
Finally, these 2-D coordinates are mapped to the physical screen by the
viewport transformation. Fortunately, the only complex part of this whole
procedure is the specification of the modelview matrix and the projection
matrix. For now, I'll just use a simple set for both. The localReshape
function in Listing One selects, initializes, then sets up the projection so
that the result is a simple perspective projection. This is done each time the
window is resized to maintain the correct aspect ratio. The localIdle function
controls the modelview matrix, which is selected and initialized, and then
translates our viewpoint along the z-axis. Next, rotations are applied along
all three axes. In Listing One, all these values are controlled by the user,
so that you can manipulate the view.


Bouncing Ball Revisited


The real substance of the Listing One program is contained in two areas. The
first is the visible part--the program functions that render the ball and the
surface. The functions localDrawSurface and localDrawSphere are
straightforward. The localDrawSphere function simply draws a white (glColor3f)
solid sphere (auxSolidSphere) along the y-axis (glTranslatef). Since OpenGL is
a state machine, you must first modify the state (in this case, the color and
position). Hence, you set the color and position and then draw a sphere. Note
that I've taken advantage of the aux library function to draw a sphere, rather
than the more-complicated gluSphere.
Drawing the surface is similar, except that you have to explicitly create the
surface out of polygons, and the polygons out of vertices. Inside the two
nested for loops that divide up the surface into squares, OpenGL primitives
are created between calls to glBegin and glEnd (auxSolidSphere handled this in
localDrawSphere). This is similar to a WM_PAINT message, where you call
BeginPaint, do some painting, then call EndPaint. In the case of OpenGL
primitives, you signal OpenGL that you are going to create an object out of
some vertices, construct the object, then signal you're done. 
For the localDrawSurface function, the glBegin(GL_QUADS) call tells OpenGL
that we are going to construct a four-sided polygon (quad). You set the color
of a vertex with glColor3fv, then the position of the vertex with glVertex3fv.
After the fourth vertex, glEnd signals that you are done. You do this for each
square in the checkerboard surface. 
Note that a more powerful form of glBegin could have been used for a simple
checkerboard, making up the top side of a surface. However, I decided to add
an interesting feature to the program: For the underside of the surface, I
only draw alternating squares. If you rotate the view around the x- or z-axis,
you can flip the scene over so that you are looking at it from underneath.
From this viewpoint you can see the bouncing ball through the checkerboard!
Try to imagine how many calculations are required to perform this feat and
you'll quickly appreciate what OpenGL can do. 



Animating the Scene


In my original plan, I was just going to render a scene of a ball bouncing on
a red and blue checkered surface. With just a few additional lines of code, I
added the capability to spin the scene around the y-axis, then the x- and
z-axes. At this point, the bottom of the surface was visible, so I added code
to change its color and place holes in it. The ease with which I added the
code created a classic example of feature creep; I had to stop or I'd never
finish. Here's how the scene is animated. 
The main() procedure of Listing One makes a call to auxMainLoop(localDisplay).
localDisplay is normally your display routine for nonanimated scenes. However,
if you're creating an animated scene, the localIdle function is where the
rendering is done. In the localIdle function the call to glClear clears out
the display buffer, then adjusts the animation parameters in the call to
localAdjustParameters. This adds spin to each axis and moves the ball along
its trajectory. Next, set up the modelview matrix by first initializing it,
then accounting for the rotations and translations. Then call the routines to
draw the surface and sphere, and finally, since this is a double buffered
window (as specified in the call to auxInitDisplayMode), swap the front and
rear buffers, which sends the rendering to the screen.
With the program running in its startup state, you'll just see the ball
hovering above the checkered surface. If you press the "a" key, you'll start
the ball bouncing. The "a" key toggles the animation on and off. Eventually
the bounces will diminish in magnitude, reaching a cutoff point at which the
ball is reset to above the surface. The "x," "y," and "z" keys increase the
rotation of their respective axes for each time through the animation loop.
The more times you press a key, the larger each change will be through the
loop. Press the uppercase letter to subtract from the rotation. The up and
down arrow keys will move you closer and further away (add to or subtract from
the initial z-axis translation). Actually, the center of the animation moves
relative to the specified viewpoint. You'll see this when moving the surface
away from you till it's beyond the far clipping plane of the viewing volume.
If you just start the y-axis spinning from the initial position, you can see
the far corner disappear when it rotates into the clipping plane. The "f" key
will freeze the display, and the "r" key will reset it to the initial viewing
parameters.
One thing you'll immediately notice about the animation is its constant speed.
Since all of the matrix calculations are done each time through the loop, the
values of the matrices are not important. In other words, the scene renders at
the same rate even when the animation is running and spinning about the axes.
The only thing that will affect the scene-rendering rate (aside from different
hardware or a different scene) is the size of the window. If you make the
scene full screen, the render rate drops off. If you make the window smaller,
the speed picks up.


Conclusion


OpenGL is not a high-level toolkit, but it provides excellent capabilities
that are pretty hard to program yourself. The low level of functionality
provides opportunities for both video-hardware manufacturers, to provide
dedicated OpenGL hardware, and third-party developers to provide high-level
wrappers for 3-D graphics applications. We may, in fact, see an explosion of
OpenGL applications including games, virtual reality, analytical graphics, and
architectural applications. 


References


Crain, Dennis. Windows NT OpenGL: Getting Started. Microsoft Developer Network
CD-ROM, Disk #8, July 1994. 
OpenGL Architecture Review Board. OpenGL Reference Manual. Reading, MA:
Addison-Wesley, 1992.
OpenGL Architecture Review Board. OpenGL Programming Guide. Reading, MA:
Addison-Wesley, 1992.
Prosie, Jeff. "Advanced 3-D Graphics for Windows NT 3.5: Introducing the
OpenGL Interface, Part 1." Microsoft Systems Journal (October 1994). 
Figure 1: An OpenGL program displaying the underside of a checkerboard
surface.
Figure 2: The path from 3-D coordinate space to screen pixel.

Listing One
// MS supplied file to turn off compiler warnings
#include "glos.h"

// OpenGL, utility, and aux header files
#include <windows.h>
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/glaux.h>

// local functions
static void localInitialize(int argc, char** argv);
static void localDrawSurface( void );
static void localDrawSphere( void );
static void localAdjustParameters( void );
static void localAdjustRotationalParameters( float * , float * );

// These functions are called by the AUX library
// These are CALLBACK, which just means treat them as cdecl functions
static void CALLBACK localDisplay(void);
static void CALLBACK localReshape(GLsizei w, GLsizei h);
static void CALLBACK localIdle(void);

static void CALLBACK Key_a(void);
static void CALLBACK Key_Z(void);
static void CALLBACK Key_z(void);
static void CALLBACK Key_X(void);
static void CALLBACK Key_x(void);
static void CALLBACK Key_Y(void);
static void CALLBACK Key_y(void);
static void CALLBACK Key_r(void);
static void CALLBACK Key_f(void);

static void CALLBACK Key_up(void);
static void CALLBACK Key_down(void);

#define INITIAL_HEIGHT (7.0)
#define INITIAL_ACCEL (0.1)
#define INITIAL_ANGLE (15.0)
#define SPHERE_RADIUS (0.5)
#define ANGULAR_CHANGE (1.0)
#define INITIAL_TRANSLATION (-15.0)

#define MAX_DIMENSION (5.0)
#define SUBDIVISION (10.0)

// State variables
int animate = 0;
int freeze = 0;
float sphere_height = INITIAL_HEIGHT;
float sphere_drop_speed = 0;
float sphere_drop_accel = INITIAL_ACCEL;
float z_axis_rotation = 0;
float z_axis_rotational_speed = 0;
float y_axis_rotation = 0;
float y_axis_rotational_speed = 0;
float x_axis_rotation = INITIAL_ANGLE;
float x_axis_rotational_speed = 0;
float z_axis_translation = INITIAL_TRANSLATION;

// main -- just like any other main you've seen before
void main(int argc, char** argv)
{
 // Initialize our program and OpenGL
 localInitialize(argc, argv);
 // if the window is resized, call this function
 auxReshapeFunc(localReshape);
 // Assign some keys to some functions
 auxKeyFunc(AUX_a, Key_a);
 auxKeyFunc(AUX_z, Key_z);
 auxKeyFunc(AUX_Z, Key_Z);
 auxKeyFunc(AUX_x, Key_x);
 auxKeyFunc(AUX_X, Key_X);
 auxKeyFunc(AUX_y, Key_y);
 auxKeyFunc(AUX_Y, Key_Y);
 auxKeyFunc(AUX_r, Key_r);
 auxKeyFunc(AUX_f, Key_f);
 auxKeyFunc(AUX_UP, Key_up);
 auxKeyFunc(AUX_DOWN, Key_down);
 // what to do with our idle time
 auxIdleFunc(localIdle);
 // which function to call when the window needs to be repainted
 auxMainLoop(localDisplay);
}
// localReshape -- called whenever the window is resized, moved, or
// uncovered. The two arguments are the new windows width & height.
static void CALLBACK localReshape(GLsizei w, GLsizei h)
{
 GLfloat adjust_height, adjust_width;
 // Resize the viewport to the new window's size
 glViewport(0, 0, w, h);
 // scale the width/height by the size of the window so that

 // aspect ratio is retained, i.e., a sphere remains a sphere
 if ( w <= h )
 {
 adjust_height = 1.0;
 adjust_width = (GLfloat)h/(GLfloat)w;
 }
 else
 {
 adjust_height = (GLfloat)w/(GLfloat)h;
 adjust_width = 1.0;
 }
 // Set up a projection matrix
 glMatrixMode(GL_PROJECTION);
 glLoadIdentity();
 gluPerspective( 60.0, // field of view in degrees
 (GLfloat) w/(GLfloat) h, // aspect ratio
 1.0,
 20.0);
}
// Adjust a rotational value, keeping it between 0 and 360 degrees
static void localAdjustRotationalParameters(float*rot_value,float*rot_rate )
{
 *rot_value += *rot_rate;
 *rot_value = *rot_value < 0 ?
 (*rot_value + 360) :
 (*rot_value > 360 ? (*rot_value - 360) : *rot_value );
}
static void localAdjustParameters( void )
{
 if ( freeze )
 return;
 localAdjustRotationalParameters(&x_axis_rotation,&x_axis_rotational_speed);
 localAdjustRotationalParameters(&y_axis_rotation,&y_axis_rotational_speed);
 localAdjustRotationalParameters(&z_axis_rotation,&z_axis_rotational_speed);
 if ( animate )
 {
 sphere_drop_speed += sphere_drop_accel;// effect of gravity
 sphere_drop_speed *= 0.95; // effect of resistance
 sphere_height -= sphere_drop_speed;
 // Detect when we've hit the floor
 if ( sphere_height < 0 )
 {
 sphere_height = -sphere_height*.95;
 sphere_drop_speed *= -0.95;
 sphere_drop_accel *= 0.95; // battle roundoff errors
 }
 // Detect when we've stopped bouncing
 if ( sphere_height <= 0.001 && abs(sphere_drop_speed) <= 0.001 )
 {
 sphere_height = INITIAL_HEIGHT; // Start it over again
 sphere_drop_speed = 0.0;
 sphere_drop_accel = INITIAL_ACCEL;
 }
 }
}
// localDrawSphere -- Draw a sphere above (+Y) the surface
static void localDrawSphere( void )
{
 if ( sphere_height <= 0 )

 return;
 // Now create a sphere along the +Y axis
 glColor3f (.9, .9, 0.9);
 glTranslatef (0.0, SPHERE_RADIUS+sphere_height, 0.0);
 auxSolidSphere(SPHERE_RADIUS); 
}
// localDrawSurface -- Draw a checkerboard surface, explicitly creating each 
// square make one side (the "front") two color, and the other side (the
// "back") alternating squares and blanks. The surface is centered about 
// 0,0,0 and is perpendiculr to the Y axis
static void localDrawSurface( void )
{
 int x,y;
 GLfloat vertices[4][3];
 GLfloat red_color[3] = {0.8, 0.0, 0.0};
 GLfloat blue_color[3] = {0.0, 0.0, 0.8};
 GLfloat *color1, *color2, *color3, *color4;

 GLfloat mesh_delta = 2.0*MAX_DIMENSION/SUBDIVISION;

 for ( x=1 ; x <= SUBDIVISION ; x++ )
 {
 for ( y=1 ; y <= SUBDIVISION ; y++ )
 {
 // Orient them counter clockwise
 // vertice 1
 vertices[0][0] = -MAX_DIMENSION+mesh_delta*(x-1); // x
 vertices[0][1] = 0.0; // y
 vertices[0][2] = -MAX_DIMENSION+mesh_delta*(y-1); // z
 // vertice 2
 vertices[3][0] = -MAX_DIMENSION+mesh_delta*(x-0); // x
 vertices[3][1] = 0.0; // y
 vertices[3][2] = -MAX_DIMENSION+mesh_delta*(y-1); // z
 // vertice 3
 vertices[2][0] = -MAX_DIMENSION+mesh_delta*(x-0); // x
 vertices[2][1] = 0.0; // y
 vertices[2][2] = -MAX_DIMENSION+mesh_delta*(y-0); // z
 // vertice 4
 vertices[1][0] = -MAX_DIMENSION+mesh_delta*(x-1); // x
 vertices[1][1] = 0.0; // y
 vertices[1][2] = -MAX_DIMENSION+mesh_delta*(y-0); // z

 // Color squares such that four squares form a pattern
 if ( x%2 == 1 && y%2 == 1 ) // quadrant ul
 {
 color1 = blue_color; // ul
 color2 = blue_color; // ll
 color3 = red_color; // lr
 color4 = blue_color; // ur
 }
 else if ( x%2 == 1 && y%2 == 0 ) // quadrant ll
 {
 color1 = blue_color;
 color2 = blue_color;
 color3 = blue_color;
 color4 = red_color; 
 }
 else if ( x%2 == 0 && y%2 == 1 ) // quadrant ur
 {

 color1 = blue_color;
 color2 = red_color;
 color3 = blue_color;
 color4 = blue_color;
 }
 else // quadrant lr
 {
 color1 = red_color;
 color2 = blue_color;
 color3 = blue_color;
 color4 = blue_color;
 }
 glBegin(GL_QUADS);
 glColor3fv ( color1 );
 glVertex3fv(vertices[0]);
 glColor3fv ( color2 );
 glVertex3fv(vertices[1]);
 glColor3fv ( color3 );
 glVertex3fv(vertices[2]);
 glColor3fv ( color4 );
 glVertex3fv(vertices[3]);
 glEnd();
 // now, draw alternating back faces in different colors
 if ( (x+y)%2 )
 {
 glBegin(GL_QUADS);
 // note that these "face" a different direction
 glColor3f (0.4, 0.4, 0.6); // blue-grey
 glVertex3fv(vertices[3]);
 glColor3f (0.0, 1.0, 0.0); // green
 glVertex3fv(vertices[2]);
 glColor3f (0.4, 0.4, 0.6); // blue-grey
 glVertex3fv(vertices[1]);
 glColor3f (1.0, 1.0, 0.); // yellow
 glVertex3fv(vertices[0]);
 glEnd();
 }

 }
 }
}
// localIdle -- Called whenever there is idle time. Use it
// for rendering frames when using double buffering
static void CALLBACK localIdle(void)
{
 // clear the viewport buffers, in this case the color & depth buffers
 // (there are other buffers we could include)
 glClear(GL_COLOR_BUFFER_BIT GL_DEPTH_BUFFER_BIT);
 // Select the modelview matrix
 glMatrixMode(GL_MODELVIEW);
 // Initialize it
 glLoadIdentity();

 // adjuset all of the animation and rotation parameters
 localAdjustParameters();

 // apply Z axis translation. i.e. move viewpoint along the Z axis
 // (by default we're facing -Z with Y+ as the up vector)
 glTranslatef (0.0, 0.0, z_axis_translation);

 // apply the rotations
 glRotatef(x_axis_rotation, 1.0, 0.0, 0.0);
 glRotatef(y_axis_rotation, 0.0, 1.0, 0.0);
 glRotatef(z_axis_rotation, 0.0, 0.0, 1.0);

 // draw the objects
 localDrawSurface();
 localDrawSphere();

 // New Win32 function to swap buffers in a double-buffered window
 auxSwapBuffers();
}
// localDisplay -- Called whenever we need to redisplay the scene
static void CALLBACK localDisplay(void)
{
 ; // do nothing for now, the Idle function takes care of it
}
// localInitialize -- Initializes program and sets up the initial OpenGL state
static void localInitialize(int argc, char** argv)
{
 // double buffering, RGBA mode, 16 bit depth buffer
 auxInitDisplayMode (AUX_DOUBLE AUX_RGBA AUX_DEPTH16 );

 // create a default window centered at 0,0, that's 400 pixels wide
 // (note that this is shifted to Windows coordinate system)
 auxInitPosition (0, 0, 400, 400);

 // Open the window, specify the title
 auxInitWindow ("OPENGL DEMO");

 // turn on depth testing
 glEnable( GL_DEPTH_TEST );
 // turn on back face removal
 glEnable( GL_CULL_FACE );
}
// These are the misc key functions
static void CALLBACK Key_z(void)
{
 z_axis_rotational_speed += ANGULAR_CHANGE;
}
static void CALLBACK Key_Z(void)
{
 z_axis_rotational_speed -= ANGULAR_CHANGE;
}
static void CALLBACK Key_y(void)
{
 y_axis_rotational_speed += ANGULAR_CHANGE;
}
static void CALLBACK Key_Y(void)
{
 y_axis_rotational_speed -= ANGULAR_CHANGE;
}
static void CALLBACK Key_x(void)
{
 x_axis_rotational_speed += ANGULAR_CHANGE;
}
static void CALLBACK Key_X(void)
{
 x_axis_rotational_speed -= ANGULAR_CHANGE;

}
static void CALLBACK Key_up(void)
{
 z_axis_translation += 1;
}
static void CALLBACK Key_down(void)
{
 z_axis_translation -= 1;
}
static void CALLBACK Key_f(void)
{
 // toggle all movement
 freeze = !freeze;
}
static void CALLBACK Key_a(void)
{
 // toggle animation
 animate = !animate;
}
static void CALLBACK Key_r(void)
{
 x_axis_rotation = INITIAL_ANGLE;
 z_axis_translation = INITIAL_TRANSLATION;
 z_axis_rotation = y_axis_rotation = 0;
 z_axis_rotational_speed = y_axis_rotational_speed = 
 x_axis_rotational_speed = 0;
}
DDJ



































PROGRAMMING PARADIGMS


The Mac, the Web, and Errant Pedantry




Michael Swaine


Reader Neil MacDonald sounds like a kindred spirit, I thought, as I read his
e-mail message. There are those years he spent in book and magazine
publishing, and his strong views on the use and misuse of text. These days, he
develops text apps, using HyperCard for prototyping: in short, a word kinda
guy.
Inspired by our December 1994 piece on HTML and by an old (June 1990) column
of mine, Neil e-mailed me as follows on the issue of style in the age of
electronic communication: 
I object to the growing use of underlines, especially as seen in [World Wide
Web pages]. My roots in ink-based orthography influence me away from ever
using underscore. [One] should never actually underline text passages!
Underscore [must be] carried over from TELETYPES. Are these still in use?
(Perhaps in Basecamp Antarctica.) As electronic documents will someday
displace imagesetter documents, so will many amateurs be hijacking bold and
italics for unseemly uses. 
Is what I submit here really just errant pedantry? Is the Web already out of
control, right from the beginning? Does a letter like this lead you to
reconsider appending your e-mail address in DDJ?
You see why I identify with this guy. I'm just a sucker for errant pedantry.
Coincidentally, I was working on this column about the World Wide Web just as
this querulous missive dropped into my inbox. That was Coincidence One.
Coincidence Two was another item that fell into my inbox that day, from Chuq
Von Rospach, List Mom of apple-internet-providers@abs.apple.com. This was
Apple's announcement about its Internet offerings.
I had planned to say some things this month about Web software, especially Mac
software, since that's the flavor I've been sampling lately. I probably would
have touched on Neil's concerns anyway, but Chuq's message shifted my focus
abruptly.


Apple, the Low-Price Leader? 


That's the claim in the press release. I'll say this: Somebody knows how to
get this skeptic's attention. 
The announcement was really two announcements and a preannouncement.
Announcement One: Apple is offering three Internet server bundles, PowerMacs
with a CD full of the software necessary to set up a World Wide Web server.
Announcement Two: Apple is also offering Internet-access packages for
educators, including all necessary software, a modem, and a trial Internet
account. The preannouncement was the less-than-earthshaking news that Apple's
eWorld service will provide World Wide Web access in the next release,
projected for this summer.
The educator package could be the most significant of the announcements, given
Apple's popularity in that market and the widespread perception that every
schoolchild ought to be on the Internet. Most significant for Apple and for
the world at large, that is.
But the server bundle is of most interest to us.


Plug-and-Play Web Serving


Apple describes this package as a "WWW server in a box"--all the hardware and
software required to create Web pages and run a Web server. Everything but the
phone line. Apple claims you can be up and running in minutes after opening
the box. The appealing phrase "single-click installation" is heard.
There are also some strong-sounding claims regarding security. "One of the
most secure right out of the box," Apple says, which of course isn't
necessarily the same as "one of the most secure." Of course, when the
alternative is UNIX, it's easy to score points re: security. One of Apple's
points is that the Mac OS doesn't allow remote administration as UNIX does,
and these servers are pure Mac-OS machines. No UNIX a-tall.
There are three machines. The 6150/66 has a 66-MHz PowerPC 601 processor, 16
Mbytes of RAM, a 700-Mbyte hard drive, and a quad-speed CD-ROM drive; it sells
for $2909. The 8150/110 has a 110-MHz PPC processor, 16 Mbytes of RAM, a
1-gigabyte hard disk, and the same CD-ROM drive; its price is $5319. The top
of the line is the 9150/120, with a 120-MHz 601, 16 Mbytes of RAM, two
1-gigabyte hard drives, the CD-ROM drive, and a DAT drive with backup
software. That one is $8209.
Whether or not these prices justify Apple's claim to the lowest-cost route to
establishing a Web presence is an empirical question and subject to market
correction. Sometimes the claim of lowest price lasts only as long as it takes
your competitors to read your claim and adjust their pricing; then, too, the
real lowest-price solution rarely involves buying new hardware. That said, I
have to admit that I was impressed.
So did I run out and buy one?
No. I had just spent a big chunk of change on a PowerMac about six months ago,
and all I would be interested in is the CD-ROM. And Apple isn't selling that
separately. It's the bundle or nothing. Which is not to say that the software,
or equivalent software, isn't available elsewhere. In fact, you could probably
find most of what you need as shareware, and often very good shareware. (I'd
already looked at most of this software before I knew it would be part of this
bundle.)
Nice to have it all on one CD, though, and the one-click installer sounds so
good, doesn't it?


The Bundle Deal


About that software: As I mentioned, it's all Mac OS-based; there's no UNIX
supplied or required. The bundle includes System 7.5, which in turn includes
MacTCP, the without-which-nada of Mac Internet connectivity.
The central component of the server software, though, is MacHTTP, which IS the
server; that is, it allows you to serve text documents, like the HTML
documents that Web pages are, as well as binary files, like GIF and JPEG
graphics. MacHTTP supports AppleScript, so you can integrate FileMaker,
HyperCard, and SQL applications with your Web pages. Convenient, then, that
versions of FileMaker, HyperCard, and an SQL application are included in the
bundle, along with some sample databases for each and the hooks to make them
work with MacHTTP.
For creating and editing HTML documents, Apple is providing BBedit, the most
widely praised code editor for the Mac, now featuring a set of HTML-editing
extensions.
To see what you've created, you need a Web browser. Apple is bundling NetScape
from NetScape Communications, the company formed by key Mosaic developer Mark
Andreessen and former SGI president Jim Clark. About NetScape: Both the
company and the product were initially overpraised in the press and are now
suffering from some backlash. Wired magazine (or rather HotWired, its online
persona) raised an eyebrow when NetScape didn't have a booth at a recent trade
show, and the errant pedants on the Web object strenuously to certain Web
makers' use of NetScape-only extensions to HTML. There's even a Web page
devoted to ridiculing those who seem to regard NetScape's blinking text as a
big improvement over underlining.


An Aside Regarding Style


A personal aside that will annoy my pal Neil: I recently finished writing a
HyperCard-based HTML editor, and found myself using the underline style to
flag the hypertext links (URLs) in the documents that the editor produces.
No, that's not quite right. I used underlining early on in the development of
the product, but I wanted to change it right from the start. HyperCard has its
own mechanisms for supporting hypertext links, and I thought that the URLs
that my editor placed in documents ought to use some of HyperCard's linking
techniques. I implemented that, so the editor now assigns the HyperCard Group
Text style to URLs, rather than the Underline style.
Group Text is crucial for hypertext linking: It allows the user to click on a
word or arbitrary string of characters and incur some action. I use this
capability to make the links hot. That is, while the editor is not a Web
browser itself, it will let you click on a link as though it were, and it will
fire off the URL to a Web browser if you happen to have one running.
Maybe that's not clear. The editor produces documents that a Web browser can
read. The Web browser shows links by its preferred style, such as color, and
jumps to the associated address. I wanted my editor to show the links, too,
and to invoke the Web browser to jump to the address.

Talking to the Web browser was easy, so the only trick was how to make these
links visible so the user would know where to click. HyperCard has a solution
for that: By turning on the Show Groups property at launch, my editor can use
HyperCard's built-in style convention for grouped text.
That convention is (sorry, Neil) underlining.


Back to the Bundle


Of course, HTML could turn out to be a flash in the pan. Some time back,
pundit Tony Bove expressed the opinion that the efforts of Adobe to produce a
portable document format in Acrobat have been in vain, and that a
noncommercial product, viz Mosaic, had won that battle. Not everyone agrees
that Mosaic and its kin are the last word in portable-document formats; many
would say that the Acrobat alternative is particularly interesting when you're
publishing over a heterogenous LAN. Well, Apple supplies Acrobat in the
bundle, too.
AppleSearch is also bundled, including AppleSearch for Windows so Windows
clients can access files on your AppleTalk network.
And there is some software support for clickable maps and for e-mail.


The Missing Pieces


It's interesting that Apple positions these as Web servers. That's consistent
with Apple's deep need to be trendy, of course, but it also finesses the
detail that the software bundle isn't all you might want to be an Internet
provider.
One missing component is MacDNS, the domain-name server software. Although
it's promised to ship with the servers by summer, it's not in the first
release.
Apple had its own name-resolution problem with this product. Another company,
having had the genius to come up with it first, objected to Apple using the
name MacDNS. The matter has since been resolved, and aren't we all relieved.
Apparently the delay in the release of MacDNS was not caused by this nonsense.
Another missing component: a news server.
If Apple delivers MacDNS this summer and does something quickly for NNTP
support, the missing-pieces issue could be moot. It just may be that Mac-based
service providers, coming late to the net, will choose to ignore a lot of
baggage of chiefly historical interest. How many services that service
providers now provide will be of little to no interest in six months?
NNTP? Definitely gotta have a news server, though maybe you can get it
elsewhere for now. Internet Relay Chat? Well, okay, but bandwidth allocation
is bound to be a problem. Same for CU-SeeMe. But do you really need to put up
an Archie server? A Veronica server?
Here's one view I saw expressed in the chat flurry following the announcement
of the new servers: The cream will float to the top and only the tried, the
true, and the downright wickedly cool will survive. A quintessentially
Mac-headed attitude.


Apple Guide Complete


Meanwhile, off the net and over the transom here at Stately Swaine Manor drop
not one but two fat copies of the aptly titled Apple Guide Complete
(Addison-Wesley, 1995).
Between the covers of this tome are 500-plus pages and a CD-ROM containing
everything you ever wanted to know about designing, building, and scripting
Apple Guide files, as well as integrating them with your applications.
Always assuming, of course, you are interested in knowing anything about Apple
Guide. If you do any Mac programming, you should be, because Guide will be a
key tool in the next big Mac OS release (code named "Copeland"), which will be
hardware independent and is due out sometime next year. With the arrival of
Copeland, Guide will become a tool for developing agents that perform
repetitive tasks.
Right now, though, Apple Guide is a help system that makes it relatively easy
to produce and deliver on-screen interactive help for users of your
applications. Unlike Apple's Balloon Help, which generally answers the
question "What does this gizmo do?," Guide specializes in instruction of the
"How do I accomplish this task?" variety. When users invoke Guide help for
your application, they see a small, floating help panel on top of the
application window. You can give them a sequence of panels to walk through,
and you can put in branches that depend on user feedback or external
conditions, like the presence of a particular piece of software. The help
panel typically says "How do I <something>?" at the top and then answers the
question with specific "Do this" instructions below. A nice, irreverent touch
is the "Huh?" button, which you generally link to definitions of terms in
more-detailed explanations of a process's substeps. 
So far, I might almost be talking about Windows Help. In fact, Windows Help
files can be imported to Guide and converted into Guide files.


The Red Circle


Guide, though, is considerably more interactive. Rather than just saying "Do
the following" and supplying a text description of the steps in a process,
Guide documents can say "Here, let me show you." They walk the user through
the task step by step, pointing to the correct button to press or menu to
select and waiting until the user presses that button or selects that menu.
The coolest aspect of this is how Guide points to the object in question. As
though an unseen hand wielding an invisible magic marker were writing directly
on the user's screen, a big red circle will be drawn around the object,
slashing heedlessly across window boundaries with an attitude that can't help
but warm the heart of anyone who has ever rankled under the strictures of the
Apple-interface police.
The red circles are rad.
There are basically three phases to implementing Guide help for your
application: designing the Guide files, scripting them, and integrating them
into your application. Apple realizes that the three phases might be the
responsibility of three different individuals or teams. The book covers the
whole process, but broken down into those three phases.
Guide scripting is within the scope of anyone experienced with HyperTalk or
AppleScript or any scripting system. Actually, Guide scripts look more like
HTML than anything else; Guide Script commands are named keywords enclosed in
angle brackets, like this:
<Define Text Block>
To start, click Topics, Index, or Look For.
<End Text Block>.
Naturally, integrating Guide files with your app requires more programming
experience, although the Guide API is pretty straightforward. From within your
app, you can start up Guide and check its status; get the number and types of
Guide files available; open, close, manipulate, and get information about
Guide files; and more. Integrating Guide into your app isn't obligatory, but
without integration you don't get the level of interaction that makes Guide
special.
Designing Guide files is really an instructional-design task. Apple has broken
out the task and, in this book, the documentation for that task, so that a
nonprogrammer with instructional-design knowledge can do the job.
The book offers style advice for designing Guide files, but some style
decisions have already been made by Apple. You can safely expect any two Guide
documents to look more similar than any two Web pages; Mother Apple is more
controlling than those Internet anarchists. Display text defaults to a proper
10-point Espy Serif Plain Black, title and prompts default to Espy Sans 10
Plain, and so forth. You rebels out there can, however, override the defaults,
and the book gives some style tips to follow if you do so. The first one:
Use an underscore to indicate hot text.
Sorry, Neil.










































































C PROGRAMMING


CD-ROMs, Classes, and GTIs (Graphically Tattooed Individuals)




Al Stevens


Once upon a time, magazine columnists never had to buy blank diskettes.
Compiler and tool vendors regularly send us betas and review copies of their
products. A typical Windows compiler takes more than a dozen diskettes, and
there are always new versions and betas coming out, so my scratch diskette
drawer was always full. I've sent many book chapters, articles, and other
stuff to publishers, readers, and associates on diskettes that were paid for
by Microsoft, Borland, Symantec, and other generous suppliers.
Microsoft must have gotten wise to what I was doing. They started popping out
the little sliders that write-protect the diskette to thwart my reuse of their
media, but that was only a minor inconvenience. The drawer still holds a lot
of those adhesive tabs from the days of the 5.25-inch diskettes, and they plug
up that write-protect hole just fine. I figured when I ran out of the tabs,
I'd use chewing gum.
Then came the CD-ROM. A wonderful medium to be sure, but the free diskette
supply quickly dried up when vendors started shipping their goods on the
shiny, read-only discs. I can see the bottom of that diskette drawer now.
Guess I'll have to break down and buy some diskettes. Shows how things have
gone to hell.
Now I have a different problem. A big stack of outdated CD-ROMs languishes in
another drawer. Beta releases. Superseded versions. Junk free software that
came with this or that piece of hardware or shrink- wrapped with some
overpriced magazine. One thousand clip-art samples. Five hundred crummy DOS
games. NT 3.5 in every language on the face of the earth. Version 1 of the Dr.
Dobb's JournalCD.
What to do with all those useless silver discs? They last forever, are not
reusable, and can't be recycled. The environmentalist in me won't let me
contribute them to the nonbiodegradable rubbish at the county landfill. Judy
says it's because I'm a pack rat and can't throw away anything that might
still work.
If I had a dog, the discs would make good Frisbees for playing fetch, but I
have no dog, and the cat won't play Frisbee with me.
A mobile hangs in my office made of Borland C++ CD-ROMs. There weren't enough
of them to balance the mobile, and I didn't want to spoil the set with
compilers from those other guys, so I filled the mobile out with Philippe's
saxophone CDs.
I don't need any more mobiles, though, and there are still plenty more
obsolete CD-ROMs to go.
Chicago Beta 1 CD-ROM is a pretty disc, with gold lettering on a navy blue
background and a pattern of black steel girders to suggest something under
construction. It's a copy of the wallpaper that beta 1 starts up with. Quality
artwork, so Chicago Beta 1 CD-ROM is a nice clock on my office wall now. For
six bucks the local craft shop offers a matching quartz clockwork with gold
hands (battery not included). Later Windows 95 (n Chicago) betas are not as
pretty as that first one. I wouldn't hang one of them on my wall. Too cheap
looking. Besides, they don't have the heirloom appeal of old number one.
Judy wants wind chimes on the patio. I tried a dangling farrago of CD-ROMs,
but the dumb plastic discs have no musical timbre. They just rattle drearily
in the wind. No tinkle to speak of. 
Someone else suggested skeet shooting. Pull! Kaboom! Another TrueType font
collection blown to smithereens.


Teaching C and C++


The Contract On America promises to downsize government. There are
repercussions in my home town. The Kennedy Space Center (KSC) employs a lot of
people, both contractors and government employees. The contractors do the
technical work, and the feds manage the projects.
Contractors are accustomed to job changes and relocations when the government
alters how work is done and who does it. But the federal workers' jobs have
always been relatively secure. There were always projects of one kind or
another, which meant that there was always a need for project managers.
That security is evaporating. Some government jobs will be eliminated, which
means that the government employees who are doing those jobs will have to find
something else to do. The lucky ones will find reassignment within the
government. Many others will be cut from the payroll, either through early
retirement or, in the case of younger staff, that government euphemism for
getting canned, the Reduction In Force (RIF).
Downsizing also demands more efficient spending, which, on a bottom line
somewhere, means fewer contractors. The feds who survive the RIF have to start
doing more of the technical stuff. Many of them have technical aptitude, but
their skills have been dulled by a career spent counting beans and
administering contracts. To sharpen their skills, NASA has begun to offer
classes to government employees who want to improve their chances to avoid the
RIF and, if that doesn't work out, to enhance their employability elsewhere.
As part of this retraining program, I was recently hired to teach two classes,
an introduction to C and an introduction to C++. The classes were contracted
to a local training company. They found me by asking around, I guess.
The teaching environment was unique in my experience. Each class was for five
days, six hours each day, with 15 students per class. Each student sat at a
PC, and I used a laptop with a projection system. The training company
provided the PCs and whatever software was to be used in the class, and the
government provided the classroom at KSC.
I was permitted to choose the teaching materials I wanted, so I used the two
books that I wrote as tutorials on those subjects. PowerPoint was invaluable
for making teaching slides. Importing the Word outlines for the book chapters
created instant slides, which needed only some formatting and the addition of
some figures.
The first day of class had a bummer of a downside. Actually, we did not learn
about the problem until the second morning. NASA had told us the first day
that we could not lock the classroom, because they did not have the
combination to the cipher lock. They assured us that the building itself would
be locked, however, and our stuff would be safe. We dutifully left the door
unlocked, and that night someone stole two brand new Gateway 200 4DX2-66V
computers from our classroom. Guess what? NASA told us that the loss was our
responsibility because we failed to lock the door.
When they get around to choosing RIF candidates, I hope they call me for an
opinion.
The C-class outline assumed that the students understood programming, which
they did, although none of them were actively working as programmers at the
time. The C++-class outline assumed that the students understood C well enough
to keep up, which most of them did. There were no dropouts, and all the C
students finished satisfactorily. One of the C++ students had no C experience
at all, and he was soon lost, but I gave him a copy of the C book and let him
teach himself at his own pace. It wasn't his fault. The organizers did not
properly advertise the prerequisites.
During each class, I ran examples from the book and had the students follow
along on their PCs. Whenever we reached a critical point, I had the students
write programs that used what they had learned so far. They were allowed to
work together, keep the book open, and ask questions, a learning environment
that I believe in. A classroom should provide students with the same
information available to them in the office.
While they worked on their programs, I went from student to student and made
sure that everyone had a running program before we went to the next lesson.
This, I learned, is an important technique. If students are forced, due to
schedule demands, to go to the next plateau without having successfully
completed the current one, they lose confidence and, consequently, interest.
Fortunately, the exercises were easy enough that no one's individual pace
slowed the class down.
Given the level of experience of the students, I couldn't cover either
language completely in one week by using the methods I chose. If the C
students were active programmers in other languages, and if the C++ students
were active C programmers, we could have covered more ground, but I doubt that
we would have exhausted either subject.
It is possible, I am sure, to teach all of C or C++ in a single week, but only
if you restrict the lessons to lectures and examples demonstrated by the
instructor from the podium. There would be no time for lab work or much class
participation. When asked, the students enthusiastically agreed that by having
the PCs and writing their own programs in class they learned much more than
they would have from a nonstop lecture, even if more material could have been
covered. For one thing, the class was more interesting because they had
something to do besides sit and take notes for hours on end. For another, at
each step their learning was reinforced by immediate experience.


Serial Port and Modem Classes


Last month I described IMail, an application that calls an Internet host and
collects and sends e-mail. That project is now complete and available for
download. Recently I've been working on a project with PC games in C++, and
one of its features supports multiuser games across a serial port and
optionally through a modem. I adapted the two classes from IMail that
encapsulate the serial port and the modem for the game library, and they
ported with only a few changes. After congratulating myself on having built
such reusable software components, I decided to use the two classes as the
code-related part of this month's column.
There are many applications for connecting computers with the serial port both
directly and through a modem. These two classes saw first light several years
ago as the serial port and modem function libraries for a C project called
SMALLCOM. Now they are C++ classes.


The CommPort Class


Listings One and Two, are serial.h and serial.cpp, the source files that
implement the CommPort class. Serial.h begins by defining a simple Timer class
to track timeouts during port input/output. There are a number of constant
variables (I hate that--how can a variable be constant, or vice versa?) that
specify some port addresses and control values, a buffer size for serial
input, buffer fill thresholds for sending XOF during reception, and a
bit-field structure that defines the serial port's initialization byte.
A structure contains the usual initialization parameters to be used when
constructing an object of the CommPort class: port number, parity, number of
stop and data bits, and baud rate.
The CommPort class includes data members for the initialization byte, pointers
to the input buffer, a switch to indicate whether Xon/Xoff protocols are
enabled, indicators to control the protocol, an instance of the parameter
structure, a timer object, and a set of inline functions that compute port,
IRQ, and interrupt-vector addresses based on the port chosen in the
parameters. The public interface for the CommPort class provides for object
construction from a copy of the parameter structure, a function to initialize
the port, and functions to read and write characters from the port, clear the
input queue, test for carrier detect, set the timeout value, and enable and
disable Xon/Xoff protocols.
When a program constructs a CommPort object, the data members are initialized
based on the parameter structure passed. An input buffer is allocated from the
heap. Then the port is initialized, which includes conditioning the port to
send and receive characters in the specified format at the specified baud rate
and intercepting the serial-input and system-timer interrupts.

When the CommPort object goes out of scope, the destructor deletes the
serial-input buffer and restores the timer and serial input-interrupt vectors.
The timer's interrupt-service routine counts down the timer if it is running.
The serial-input interrupt service routine (ISR) maintains a circular buffer
of input characters into which it reads serial characters--one for each
interrupt. This ISR also manages the Xon/Xoff protocols. If the input-buffer
count has reached the fill threshold, the ISR transmits the XOF character to
the other computer and sets a flag that says that it is waiting to send the
XON character. The protocol assumes that the other computer will receive the
XOF character and will not transmit any further data characters until it
receives the XON character. If the serial-input ISR receives the XOF
character, it sets a flag telling the local CommPort object to wait for an XON
before sending any further data characters. If the serial input ISR receives
the XON character, it clears that flag.
The input_char_ready function returns a true value if there are data bytes in
the input buffer. The readcomm function waits for that condition to be true
and fetches the next character from the input buffer to be returned to the
caller. The function also tests to see if the object is waiting to send the
XON protocol byte and if the fetch operation reduced the number of bytes in
the input buffer to below the fill threshold. If so, the function sends the
XON byte and turns off the flag that says that it is waiting to do so.
The writecomm function is called to send a byte to the other computer. First
it goes into what looks like a dead loop while the flag is set that says the
object is waiting for an XON character. Since that flag is cleared by the ISR,
the loop is not really dead. Then the function sets a timer and waits for the
port to be available to transmit a byte or for a timeout to occur. If the port
becomes available, the function transmits the character.


The Modem Class


Listings Three and Four are modem.h and modem.cpp, the source code that
implements the Modem class. A program that makes a direct connection
instantiates an object of the CommPort class. If you are going to use a modem,
you can instantiate an object of only the Modem class, since it instantiates
its own CommPort object. The Modem class has a pointer to the CommPort class
with which to instantiate the CommPort object during construction. It also has
a flag to tell if the connection has been made and references to serial port
and modem initialization parameters.
Modem parameters consist of the strings that reset and initialize the modem,
dial a number, put the modem in answer mode, and go on hook (hang up the
phone).
You instantiate a Modem object by providing communications parameters and
modem parameters. A program usually gets these values from a configuration
file that the user can modify. When the program is ready to make a call, it
calls the Dial function, waits for the connection to be made or to fail and
returns true or false accordingly. The HangUp function breaks the connection.
The program can call the InputCharReady function to see if there is a
character to be read from the modem. The ReadChar and CharOut functions read
and write characters between the local computer and the computer at the other
end of the line. TestCarrier returns True as long as the connection is made.
This allows a program to test to see if either end broke the connection.
The Answer function puts the modem in answer mode. The program can call this
function and then wait until the TestCarrier function returns a true value to
process a call that has been received.
The IMail version of modem.cpp changes the baud rate and resets the serial
port if the connection indicates that the other end answered with a lower baud
rate than the one that was used to place the call. This version does not do
that because it is meant to be used when the two users agree in advance about
how they are placing the call. That version also supports the XModem file
transfer. You can get the IMail source code if you want to add that feature to
this version of the serial and modem classes.
Listing Five is comm.cpp, a program that uses the Modem and Serial classes to
connect two computers for a chat. The caller specifies the port number (1 or
2) and a phone number on the command line. The answerer specifies only the
port number.


Source Code


The source code files for IMail and the D-Flat libraries are free. You can
download them from the DDJ Forum on CompuServe, the Internet by anonymous ftp,
as well as other sources; see "Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Make sure that you
include a note that says which project you want. The code is free, but if you
care to support my Careware charity, include a dollar for the Brevard County
Food Bank. 


Petzold's Permanent Pixelderm Patch


At Software Development 95 in San Francisco, I was sitting in the corner at a
vendor's party and feeling morose. Christi Westphal of Sax Software had just
broken the heart of every red-blooded man in the room by announcing her
impending nuptials. I needed something to cheer me up. Then I saw a peculiar
thing. Across the room Windows programming guru Charles Petzold was facing
down a colleague and taking off his shirt. "A fight!" I thought. Great. Just
the thing to brighten up a dull evening. Maybe Charles was talking to a
Macintosh programmer and, emotions running high as they will, words were
exchanged, and things were about to get out of hand. But, alas, no fight
ensued. She must not have been a Mac programmer. As it turned out, Charles was
showing off the Windows logo prominently tattooed on his bicep. I read about
that tattoo in Dvorak's column. For once Dvorak was right about something.
By now we all know what you must do to be permitted to display that logo on
your packaging. We may, therefore, conclude that Charles has been carefully
examined by Microsoft and that their impartial panel of judges certifies that
Charles is made of 32-bits, drags and drops properly, uses long names, runs
under NT, is mail-enabled, and, when pressed on his right mouse button, blurts
out a context menu.

Listing One
//  serial.h
#ifndef SERIAL_H
#define SERIAL_H
#include <dos.h>
#undef disable
typedef int bool;
const int true = 1;
const int false = 0;
const int systimer = 8;
class Timer {
 int timer;
public:
 Timer()
 { timer = -1; }
 bool timed_out()
 { return timer == 0; }
 void set(int secs)
 { timer=secs*182/10+1; }
 void disable()
 { timer = -1; }
 bool running()
 { return timer > 0; }
 void countdown()
 { timer; }
 bool disabled()
 { return timer == -1; }
};

const int xon = 17;
const int xoff = 19;
const int PIC01 = 0x21; // 8259 Programmable Interrupt Controller
const int PIC00 = 0x20; //     
const int EOI = 0x20; // End of Interrupt command
// - line status register values
const int XmitDataReady = 0x20;
//  modem control register values
const int DTR = 1;
const int RTS = 2;
const int OUT2 = 8;
//  modem status register values
const int RLSD = 0x80;
const int DSR = 0x20;
const int CTS = 0x10;
// - interrupt enable register signals
const int DataReady = 1;
// - serial input interrupt buffer
const int BufSize = 1024;
const int SafetyLevel = (BufSize/4);
const int Threshold = (SafetyLevel*3);
// - com port initialization parameter byte
union portinit {
 struct {
 unsigned wordlen : 2;
 unsigned stopbits : 1;
 unsigned parity : 3;
 unsigned brk : 1;
 unsigned divlatch : 1;
 } initbits;
 char initchar;
};
//  parameters to initialize the com port
struct CommParameters {
 int port;
 int parity;
 int stopbits;
 int databits;
 int baud;
};
//  CommPort class
class CommPort {
 friend class Modem;
 portinit initcom;
 char *mp_recvbuff;
 bool xonxoff_enabled;
 char *mp_nextin, *mp_nextout;
 int buffer_count;
 CommParameters commparms;
 bool waiting_for_xon;
 bool waiting_to_send_xon;
 static CommPort *mp_CommPort;
 int timeout;
 static Timer serialtimer;
 int BasePort()
 { return (0x3f8-((commparms.port-1)<<8)); }
 int TxData()
 { return BasePort(); }
 int RxData()

 { return BasePort(); }
 int DivLSB()
 { return BasePort(); }
 int DivMSB()
 { return BasePort()+1; }
 int IntEnable()
 { return BasePort()+1; }
 int IntIdent()
 { return BasePort()+2; }
 int LineCtl()
 { return BasePort()+3; }
 int ModemCtl()
 { return BasePort()+4; }
 int LineStatus()
 { return BasePort()+5; }
 int ModemStatus()
 { return BasePort()+6; }
 int irq()
 { return 4-(commparms.port-1); }
 int vector()
 { return 12-(commparms.port-1); }
 int ComIRQ()
 { return ~(1 << irq()); }
 void CommInterrupt();
 friend void interrupt newcomint(...);
 friend void interrupt newtimer(...);
public:
 CommPort(const CommParameters& cp);
 ~CommPort();
 void Initialize();
 int readcomm();
 bool writecomm(int ch);
 void clear_serial_queue();
 bool carrier()
 { return (inp(ModemStatus()) & RLSD) != 0; }
 bool input_char_ready()
 { return mp_nextin != mp_nextout; }
 void SetTimeout(int to)
 { timeout = to; }
 const CommParameters& CommParms()
 { return commparms; }
 void EnableXonXoff()
 { xonxoff_enabled = true; }
 void DisableXonXoff()
 { xonxoff_enabled = false; }
};
#endif

Listing Two
// - serial.cpp
#include serial.h
static void interrupt (*oldtimer)(...);
static void interrupt (*oldcomint)(...);
Timer CommPort::serialtimer;
CommPort *CommPort::mp_CommPort;
//  ISRs to count timer ticks
void interrupt newtimer(...)
{
 (*oldtimer)();

 if (CommPort::serialtimer.running())
 CommPort::serialtimer.countdown();
}
//  serial input interrupt service routine
void interrupt newcomint(...)
{
 CommPort::mp_CommPort->CommInterrupt();
}
void CommPort::CommInterrupt()
{
 int c;
 outp(PIC00,EOI);
 if (mp_nextin == mp_recvbuff+BufSize)
 mp_nextin = mp_recvbuff; // circular buffer
 c = inp(RxData()); // read the input
 if (xonxoff_enabled)
 if (c == xoff) // test XON
 waiting_for_xon = true;
 else if (c == xon) // test XOFF
 waiting_for_xon = false;
 if (!xonxoff_enabled (c != xon && c != xoff)) {
 *mp_nextin++ = (char) c; // put char in buff
 buffer_count++;
 }
 if (xonxoff_enabled && !waiting_to_send_xon &&
 buffer_count > Threshold) {
 while ((inp(LineStatus()) & XmitDataReady) == 0)
 ;
 outp(TxData(), xoff); // send XOFF
 waiting_to_send_xon = true;
 }
}
CommPort::CommPort(const CommParameters& cp) : commparms(cp)
{
 mp_CommPort = this;
 mp_recvbuff = new char[BufSize];
 mp_nextin = mp_nextout = mp_recvbuff;
 xonxoff_enabled = true;
 buffer_count = 0;
 waiting_for_xon = false;
 waiting_to_send_xon = false;
 oldtimer = 0;
 oldcomint = 0;
 timeout = 10;
 Initialize();
}
CommPort::~CommPort()
{
 delete mp_recvbuff;
 if (oldcomint) {
 setvect(vector(), oldcomint);
 oldcomint = 0;
 }
 if (oldtimer) {
 setvect(systimer, oldtimer);
 oldtimer = 0;
 }
}
//  initialize the com port

void CommPort::Initialize()
{
 initcom.initbits.parity =
 commparms.parity == 2 ? 3 : commparms.parity;
 initcom.initbits.stopbits = commparms.stopbits-1;
 initcom.initbits.wordlen = commparms.databits-5;
 initcom.initbits.brk = 0;
 initcom.initbits.divlatch = 1;
 outp(LineCtl(), initcom.initchar);
 outp(DivLSB(), (char) ((115200L/commparms.baud) & 255));
 outp(DivMSB(), (char) ((115200L/commparms.baud) >> 8));
 initcom.initbits.divlatch = 0;
 outp(LineCtl(), initcom.initchar);
 //  intercept the timer interrupt vector
 if (oldtimer == 0) {
 oldtimer = getvect(systimer);
 setvect(systimer, &newtimer);
 }
 //  hook serial interrupt vector
 if (oldcomint == 0) {
 oldcomint = getvect(vector());
 setvect(vector(), &newcomint);
 }
 outp(ModemCtl(), (inp(ModemCtl()) DTR RTS OUT2));
 outp(PIC01, (inp(PIC01) & ComIRQ()));
 outp(IntEnable(), DataReady);
 outp(PIC00, EOI);
 // - flush any old interrupts
 inp(RxData());
 inp(IntIdent());
 inp(LineStatus());
 inp(ModemStatus());
}
int CommPort::readcomm()
{
 CommPort::serialtimer.set(timeout);
 while (!input_char_ready())
 if (CommPort::serialtimer.timed_out())
 return 0;
 if (mp_nextout == mp_recvbuff+BufSize)
 mp_nextout = mp_recvbuff;
 buffer_count;
 if (waiting_to_send_xon && buffer_count < SafetyLevel) {
 waiting_to_send_xon = false;
 writecomm(xon);
 }
 return *mp_nextout++;
}
bool CommPort::writecomm(int ch)
{
 while (waiting_for_xon)
 ;
 CommPort::serialtimer.set(timeout);
 while ((inp(LineStatus()) & XmitDataReady) == 0)
 if (CommPort::serialtimer.timed_out())
 return false;
 outp(TxData(), ch);
 return true;
}

void CommPort::clear_serial_queue()
{
 mp_nextin = mp_nextout = mp_recvbuff;
 buffer_count = 0;
}

Listing Three
//  modem.h
#ifndef MODEM_H
#define MODEM_H
#include <fstream.h>
#include serial.h
//  default modem control strings
#define RESETMODEM ~+++~ATZ
#define INITMODEM AT&C1E0M1S7=35V1X4S0=0
#define DIAL ATDT
#define ANSWER ATS0=1
#define HANGUP ~+++~ATH0S0=0
struct ModemParameters {
 char resetmodem[20]; // reset string
 char initmodem[50]; // initialize string
 char dial[10]; // dial command
 char answer[10]; // answer command
 char hangup[50]; // hangup string
};
const int MaxStrings = 15;
class Modem {
 CommPort *mp_comport;
 bool connected;
 const CommParameters& commparms;
 const ModemParameters& modemparms;
 bool WaitForConnect();
public:
 Modem(const CommParameters& cp, const ModemParameters& mp);
 ~Modem();
 bool Dial(const char *phoneno);
 void HangUp();
 void Answer()
 { CommandOut(ANSWER); }
 int WaitForStrings(char *tbl[]);
 void CommandOut(const char *s, int sendcr = true);
 void StringOut(const char *s);
 void FlushInput();
 bool CharOut(int c);
 bool InputCharReady()
 { return mp_comport->input_char_ready(); }
 int ReadChar()
 { return mp_comport->readcomm(); }
 bool TestCarrier()
 { return mp_comport->carrier(); }
 CommPort *Port() const
 { return mp_comport; }
};
#endif

Listing Four
//  modem.cpp
#include modem.h
Modem::Modem(const CommParameters& cp,

 const ModemParameters& mp) :
 commparms(cp), modemparms(mp)
{
 connected = false;
 mp_comport = new CommPort(commparms);
 CommandOut(modemparms.resetmodem);
 sleep(1);
 CommandOut(modemparms.initmodem);
 sleep(1);
 mp_comport->clear_serial_queue();
}
Modem::~Modem()
{
 HangUp();
 sleep(1);
 CommandOut(modemparms.resetmodem);
 sleep(1);
 delete mp_comport;
}
// - write a character to the modem
bool Modem::CharOut(int c)
{
 if (mp_comport->writecomm(c)) {
 if (mp_comport->input_char_ready())
 mp_comport->readcomm();
 return true;
 }
 return false;
}
//  flush the input buffer
void Modem::FlushInput()
{
 while (mp_comport->input_char_ready())
 mp_comport->readcomm();
}
// - write a string to the modem
void Modem::StringOut(const char *s)
{
 while(*s && CharOut(*s))
 s++;
 FlushInput();
}
// - write a command string to the modem
void Modem::CommandOut(const char *s, int sendcr)
{
 while(*s) {
 if (*s == ~)
 sleep(1);
 else if (!CharOut(*s))
 break;
 s++;
 }
 if (sendcr)
 CharOut(\r);
}
// - place a call
bool Modem::Dial(const char *phoneno)
{
 CommandOut(modemparms.dial, false);

 CommandOut(phoneno);
 connected = WaitForConnect();
 return connected;
}
//  hang up
void Modem::HangUp()
{
 CommandOut(modemparms.hangup);
 connected = false;
}
//  modem result codes
static char *results[] = {
 CONNECT\r\n,
 CONNECT 1200,
 CONNECT 2400,
 CONNECT 9600,
 CONNECT 14400,
 OK,
 RING,
 NO CARRIER,
 ERROR,
 BUSY,
 NO DIAL TONE,
 NO ANSWER,
};
// - wait for the modem to connect to the remote
bool Modem::WaitForConnect()
{
 int lbaud = 0;
 int rtn = false;
 while (lbaud == 0) {
 rtn = WaitForStrings(results);
 switch (rtn) {
 case 0: lbaud = 300; break; // CONNECT 
 case 1: lbaud = 1200; break; // CONNECT 1200 
 case 2: lbaud = 2400; break; // CONNECT 2400 
 case 3: lbaud = 9600; break; // CONNECT 9600 
 case 4: lbaud = 19200; break; // CONNECT 14400 
 case 5: // OK 
 case 6: break; // RING 
 case 7: // NO CARRIER 
 case 8: // ERROR 
 case 9: // BUSY 
 case 10: // NO ANSWER 
 case 11: // NO ANSWER 
 case -1: lbaud = -1; break; // time-out 
 default: break; // anything else 
 }
 }
 return (lbaud != -1);
}
// - wait for one of a table of strings as input
int Modem::WaitForStrings(char *tbl[])
{
 int c;
 int done = false;
 char *sr[MaxStrings];
 for (int i = 0; tbl[i] != NULL; i++)
 sr[i] = tbl[i];

 while (!done) {
 CommPort::serialtimer.set(60);
 while (!mp_comport->input_char_ready())
 if (CommPort::serialtimer.timed_out())
 return -1;
 c = mp_comport->readcomm();
 for (i = 0; tbl[i] != 0; i++) {
 if (c == *(sr[i])) {
 if (*(++(sr[i])) == \0) {
 done = true;
 break;
 }
 }
 else
 sr[i] = tbl[i];
 }
 }
 return i;
}

Listing Five
#include <iostream.h>
#include <conio.h>
#include <stdlib.h>
#include modem.h
CommParameters cp = { 1, 0, 1, 8, 2400 };
ModemParameters mp = {
 RESETMODEM,
 INITMODEM,
 DIAL,
 ANSWER,
 HANGUP
};
void dispchar(int ch)
{
 putch(ch);
 if (ch == \r)
 putch(\n);
}
int getkey()
{
 int ch = 0;
 if (kbhit())
 ch = getch();
 return ch;
}
int main(int argc, char *argv[])
{
 int ch = 0;
 cp.port = atoi(argv[1]);
 Modem *modem = new Modem(cp, mp);
 if (argc > 2) {
 cout << Dialing  << argv[2] << \n;
 if (modem->Dial(argv[2]))
 cout << Connected\n;
 else
 cout << Call aborted\n;
 }
 else {

 cout << Waiting for call\n;
 modem->Answer();
 while (!modem->TestCarrier())
 if ((ch = getkey()) == 27)
 break;
 if (ch != 27)
 cout << Call received\n;
 else
 cout << Aborted\n;
 }
 while (ch != 27 && modem->TestCarrier()) {
 for (;;) {
 if ((ch = getkey()) == 27)
 break;
 if (ch) {
 modem->CharOut(ch);
 dispchar(ch);
 }
 if (modem->InputCharReady()) {
 int ch = modem->ReadChar();
 dispchar(ch);
 }
 }
 }
 cout << \nHanging up\n;
 delete modem;
 return 0;
}
DDJ


































ALGORITHM ALLEY


The Popularity Algorithm




Dean Clark


Dean is a programmer/analyst developing graphics and imaging applications and
doing user-interface design. He can be reached on CompuServe at 71160,2426.


Compared to the once-standard 16-color VGA, 256-color graphics systems are a
significant improvement. They give us enough range to be subtle about the
colors in our designs, and we don't have to worry about falling off the edge
of the world at every turn. But are 256 colors really enough? What about
displaying realistic color imagery such as scanned photos and ray-traced
simulations? Most of these large images contain thousands of colors.
Does this mean that we need graphics hardware capable of displaying at least
several thousand colors before we can see pretty pictures on our PCs? Of
course not. Practically everyone with a 256-color VGA card has seen GIF, PCX,
or TIFF scanned images of incredible realism, even on 256-color Super-VGA
hardware. It's unlikely that the original pictures had only 256 colors in the
first place. At some point, someone picked the 256 best colors to use for
those images--a process called "color quantization."
The basic problem of color quantization is straightforward: Out of the set of
all possible system colors, select the N colors most representative of a
particular image and then map the colors in the image to those
representatives. There are several techniques commonly used to implement color
quantization, one of which is called the "popularity algorithm." (For a brief
discussion of other techniques, see the accompanying text box entitled, "Other
Color-Quantization Methods.")


Eight-Bit Color Basics


All 256-color workstations and PCs work pretty much the same way. The graphics
hardware provides a universe of colors composed of red, green, and blue
primaries (RGB). Varying the proportion of individual RGB primary values
results in distinct colors on the screen. If all the values are 100 percent,
you see white; if all of them are 0, you see black.
The range of individual values depends on the graphics device. UNIX
workstations typically support 256 shades of each primary color for a total of
16 million potential colors--each primary is 28 bits, so three of them
together are (28)3=224, or 16,777,216. That's how many distinct colors the
hardware is capable of, but only 256 of them can be displayed at a time.
A 256-color VGA card supports 64(26) shades of each primary color. Putting all
the primaries together gives (26)3=218, or 262,144 distinct colors. This is
fewer than what's available on workstations, but it's still a lot of colors.
The RGB colors available for display are stored in the graphics system's
palette table, essentially an array indexed from 0 to 255. The software sets a
color on the screen by specifying a palette-table index. Applications can set
any palette-table location to any RGB color the system can support.


The Popularity Algorithm


The maximum number of possible distinct colors in a computer image is its
width in pixels times its height in pixels; that is, there can be no more than
one distinct color per image pixel. A 640x480 image, for example, can have as
many as 307,200 different colors. A quantization function should map each
distinct color in the original image to one of the colors in your palette
table so that when the image is displayed, it looks like the real thing. 
The popularity algorithm uses the most-frequently occurring colors in the
image as the palette colors. Every other color is then mapped to the popular
color it's closest to. The steps are:
1. Scan the image, building a list of all colors found in the image. Keep a
count of the number of occurrences of each distinct color.
2. Sort the colors in descending order by count (ties can be broken
arbitrarily) and select the top 256 colors for the palette map.
3. Rescan the image to map image colors to palette colors. For image colors
that don't exactly match a palette color, use the closest color in the
palette.


Implementation Details


RGB colors are commonly thought of as a cube, where each axis of the cube
corresponds to one primary component of a color. For VGA, each axis of the
color cube is 64 units long, for a total of 262,144 individual color cells.
For Step #1 of the algorithm, you can allocate a 3-D array of integers and use
the RGB values as indexes into the array.
Unfortunately, this array would be over 1 Mbyte in size, and for most images
nearly all of it would be empty. Instead of statically allocating a cube, this
implementation builds one dynamically by using a sparse 3-D matrix. The matrix
is built as three orthogonal linked lists, where each list represents an axis
of the color cube. Each node of the sparse matrix contains two pointers--one
to its neighbor and another to the next color axis. That is, a red node
contains pointers to the next red node and to a list of green nodes. Each
green node contains pointers to the next green node and to a list of blue
nodes. The structure is something like Figure 1.
Every color is completely specified by the path to its blue component, so you
store the color counts there. To store a color, each axis is scanned in turn
until all three color components have been matched. If any component is not
found, a new node is created. To speed the search a little, each list is kept
sorted.
To find the most popular colors in the image, you need to access them in order
by count. The function AddColorToList() in Example 1 maintains an array of
pointers (popcolors) to the most popular nodes in the color matrix. When the
first image scan is finished, popcolors will point to the 256 colors for the
new color-palette table. (Example 1 is excerpted from the popular.c program,
which is available electronically; see "Availability, page 3.)
Finally, the new quantized color image is built. As the original image is
rescanned, the popcolors table is searched for a matching RGB value. If a
color in the table exactly matches the image color, then the color-table index
is displayed or saved. If you go past the 256th element, then you use the
index of the color in the table closest to the image color.


Vector Basics


What do I mean by "closest" color? In a qualitative sense, you want the color
in the table that is the least distinguishable from the color in the image. In
quantitative terms, you want the color in the table that is the shortest
straight-line distance from the actual color. Remember that all the colors are
inside a 3-D cube, so the distance between two colors is
d=c(overbar)2-c(overbar)1, where c1 and c2 are two points in the color cube
defined by their individual R, G, and B values. In algebraic terms,
d=sqrt((r2-r1)2+(g2-g1)2+(b2-b1)2), where d is always nonnegative because of
the squared terms, and implying that (c2-c1)=(c1-c2). The smaller d is, the
closer the two colors are; if d is 0, then the colors are the same. In
practice, you eliminate the square-root calculation, since the square-root
function doesn't change the ordering of the numbers (if a>b then
sqrt(a)>sqrt(b)).


The Code



The program, popular.c (available electronically), reads a 24-bit image file,
creates a 256-color table, then either displays the image or writes it to a
PCX file. The input image file is assumed to be an ASCII file, where the first
line is the image title; the second, a description; the third, the number of
columns (x) and rows (y); and the fourth, a floating-point number for the
maximum image-intensity value (this number is ignored in this application).
What follows are rowxcolumns of RGB "vectors" of floating-point values, where
1.0 is the maximum value for any color component. I use this simple file
format in my rendering applications because it's very easy to work with while
debugging the hard stuff.
As each image RGB value is read, it is converted to VGA (6 bits per primary)
and added to the color matrix, or its count is incremented if it's already in
the matrix. The popular-colors list is then updated. When the entire image has
been read, LinearizePopcolors() extracts the most popular RGB components from
the popular-colors list. This is simply a convenience to help make the next
scan a little easier and faster.
The program then asks whether it should display the image or write it to a
file. As written, the program uses the MetaWINDOW graphics-kernel system from
Metagraphics Software (Scotts Valley, CA) for displaying images. However, all
you really need are graphics initialization, VGA color palette, and
pixel-drawing functions, which are universal in any graphics library or even
through BIOS calls. The PCX code is also straightforward.


Results and Additional Heuristics


The proof of the pudding is in the eating, and the proof of any computer
graphics technique is the result on the screen. Figure 2 is a
computer-generated graphic at 24-bit color resolution. The image is 512x375
and contains about 4200 distinct colors. Figure 3 shows the same image
quantized to 256 colors using popular.c. By most standards, the quantized
image is an acceptable approximation.
The algorithm might miss small but visually significant areas of colors if
none of them are popular enough. This is much more likely to occur on
workstations, where each primary color is eight bits instead of VGA's six. The
result can be that no color in the quantized palette table is close enough to
these areas to produce good results.
One way of addressing this is to reduce the color resolution. Instead of eight
or six bits of color, use five or even four bits. The sample program does this
by prompting for a color "compression factor" for each primary color. These
factors are used to reduce the number of distinct primary colors on each axis.
Lowering the effective color resolution allows more image colors to become
clumped around a single point in the system color cube; that is, the points in
the color cube get "bigger." Fewer colors fall outside the clumps, so more
fringe colors are represented.
Figure 4 shows the sample image quantized with five bits of color; Figure 5
shows it with four. Note that the highlights in the spheres and light
reflections in the background mirrors are becoming more distinct, but the
smooth-colored red floor begins to show serious banding. This is a typical
result when color resolution is lowered. Which image--6-bit, 5-bit, or
4-bit--is "best" is subjective.
Most images have far fewer than their theoretical maximum number of colors,
and those colors tend to group into a relatively small number of regions of
similar colors. This is called "color coherence." Consider a photograph of a
child on a sunny day in the park. The main elements of the photo are the
child's face and clothing, the sky, grass and trees, and clouds. The sky is
likely subtle shades of blue, the clouds are mostly gray-white, the grass and
trees are expanses of green, and so on. A careful scan of the photo at VGA
resolution and 24 bits of color might result in 60,000 distinct colors.
Yet most of those colors are in one of a few regions; blues for the sky,
greens for the grass and trees, grays for clouds, and skin tones for the
child. The many distinct colors in the scan are largely the result of minute
subtleties of shading.
You might be able to improve some quantized images by making sure you get a
representative sample from different regions in the color cube. Let's
implement a heuristic that divides the output color palette into four regions;
red-dominant, green-dominant, blue-dominant, and gray-dominant. A color is
dominated by its brightest component. For the gray-dominant region, no primary
color is significantly brighter than the others. I'll allow 64 elements of the
256-color palette for each dominant region.
In the sample program, if the user chooses to apply the color-dominance
heuristic, the initial scan distributes input colors into four
popularity-ordered lists instead of just one. After scanning, the four lists
are combined into a single color palette by taking the most-popular 64 colors
from each region. If a region doesn't have 64 distinct colors, its leftover
palette space is distributed among the others.
Just as you have to define "closest color," you also have to define
"dominant." There's no hard rule. A fast test would be that a primary color is
at least N greater than either of the other colors, say 15. So the rgb value
(56,37,18) is red-dominant, but (12,1,2) isn't. A different rule would make
the dominant color N% greater, say 15 percent. By this rule, both of the
example colors are red-dominant; popular.c uses a percentage of 20 percent.
You could extend this approach by distributing the colors among the eight
octants of the color cube, or even use image characteristics to drive the
distribution, but the code quickly becomes complex and, it turns out, the
benefits tend to diminish. Figure 6 shows the sample image with the
color-dominance heuristic applied.


Conclusion 


For a long time I thought I had "invented" the popularity algorithm. It turns
out, however, that Paul Heckbert published a paper describing the algorithm in
1982: "Color Image Quantization for Frame Buffer Display," (ACM 1982 SIGGRAPH
Proceedings, Vol. 16, No. 3). The same paper describes another quantization
technique called "Median-cut" (see also "Median-Cut Color Quantization," by
Anton Kruger, DDJ, September 1994).
The algorithm's running time is bounded mainly by the number of pixels in the
original image and the size of the quantized color table. These values drive
the two image scans, the sparse matrix insertions, and the code that maintains
the popularity list. The memory costs are bounded by the number of distinct
colors in the original image. In practice, popular.c processes about 2800
image pixels per second on a 486/33 and spends most of its time accessing the
disk. The program could be sped up somewhat by postponing the popularity
ordering (AddColorToList()) until the color matrix has been built. There's a
little extra unused space in each matrix node. The program assumes there will
be at least 256 distinct colors in an image.
The popularity algorithm is easy to implement and tinker with, making it well
suited for experimentation. Best of all, the resulting images are generally
quite good.
Other Color-Quantization Methods
The popularity algorithm isn't the only technique for mapping 24-bit imagery
to 8-bit lookup-table systems. Two other common techniques are median-cut and
octree quantization. 
The median-cut algorithm was first published by Heckbert, in the same article
as the popularity algorithm. As in the popularity algorithm, you first create
a cube structure containing all the colors in the original image. The cube is
then recursively subdivided along its axes such that about the same number of
pixels are represented in each subdivision. When there are 256 subcubes (or
some other target number), the colors within them are averaged to find the
lookup-table colors. 
Michael Gervautz and Werner Purgathofer published their octree quantization
technique in New Trends in Computer Graphics (edited by Nadia Magnenat-Thalman
and David Thalman, Springer-Verlag, 1988). This technique was later summarized
in Graphics Gems (edited by Andrew S. Glassner, Academic Press, 1990). As the
name implies, an octree is a tree structure in which each node has eight
children. As the original image is scanned, unique colors are inserted into
the octree. Inserting the 257th color (or N+1) causes the tree to be reduced
by merging the two closest tree colors into a single color. This method is
unique in that there are never more than the final number of colors stored in
the tree.
--D.C.
Example 1: The AddColorToList function.
/****************************************************************************
** AddColorToList
** Adds a color to the list of most popular colors in the image, if necessary
*****************************************************************************/
void AddColorToList(COLOR_NODE **popcolors, int npal,
 int *currentcount, COLOR_NODE *color)
{
 COLOR_NODE *temp;
 int i;
 /* If table is empty then insert this color */
 if (*currentcount < 0) {
 *currentcount = 0;
 popcolors[*currentcount] = color;
 return;
 }
 /* Search the table for the same color */
 for (i = 0; i <= *currentcount; i++) {
 if (popcolors[i] == color) break;
 }
 if (popcolors[i] == color) {
 /* Found it. Since the color is already in the table, adjust it
 ** to its proper position */
 while (i && (popcolors[i-1]->count < popcolors[i]->count)) {
 temp = popcolors[i-1];
 popcolors[i-1] = popcolors[i];
 popcolors[i] = temp;
 i-;

 }
 return;
 }
 /* This color isn't in the list. See if it belongs there. First of all,
 ** if the list isn't full, this color must have a count of 1 and can be
 ** simply added to the end */
 if (*currentcount < npal-1) {
 (*currentcount)++;
 popcolors[*currentcount] = color;
 }
 else {
 /* Otherwise the list is full and this color may belong in the list.
 ** Start at the low end (if the color had a high count it would
 ** already be in the list) */
 if (color->count > popcolors[npal-1]->count) {
 i = npal - 1;
 popcolors[i] = color;
 i-;
 while ((i >=0) && (popcolors[i]->count < popcolors[i+1]->count)) {
 temp = popcolors[i+1];
 popcolors[i+1] = popcolors[i];
 popcolors[i] = temp;
 i-;
 }
 }
 }
}
Figure 1: Sparse 3-D matrix structure.
Figure 2: Original 24-bit image with a 512x375 resolution and about 4200
colors.
Figure 3: Same image as Figure 2, but quantized to 256 colors using the
popular.c program.
Figure 4: Same image as Figure 3, but quantized with five bits of color.
Figure 5: Same image as Figure 3, but quantized with four bits of color.
Figure 6: Sample image with color-dominance heuristic applied.






























PROGRAMMER'S BOOKSHELF


Books from Inside the Walls




Lou Grinzo and Steve Gallagher


Lou, a programmer who lives in Endwell, New York, can be reached on CompuServe
at 71055,1240. Steve is president of G&A Consultants, an OS/2 consulting firm
in Research Triangle Park, North Carolina. He can be reached at ganda@ibm.net.


Dave Edson is a Windows-programming support engineer at Microsoft, which means
he spends his days answering real-world questions from Windows developers who
use Visual C++. Dave's Book of Top Ten Lists for Great Windows Programming is
a compilation of tips and techniques that he invented or discovered in that
daunting job. 
As its title suggests, the book is organized into top-ten lists, after the
fashion of that other Dave. At a higher level, the book consists of four
sections: "General Windows Stuff," "Windows NT & Windows 95 Specialties,"
"MFC" (the book never mentions OWL or any other framework, which is not too
surprising, considering Edson's employer), and "Programming  la C." The last
section (Chapters 17 through 22) was written by Robert Schmidt, not Dave
Edson, and contains nothing that is Windows specific.
In the 16 chapters he writes, however, Edson presents some valuable material.
He covers numerous techniques that I don't remember seeing anywhere else, and
for that reason alone I'm glad I bought the book. (Even one such item used in
a real program makes this book worth several times its price.) He presents
chapters on Windows architecture, resource leaks, common Windows-programming
problems and mistakes, MFC tips and techniques, and Win32 and Windows 95
programming. 
For example, two of the more inventive techniques are in Chapter 7, "Tricks,
Hints, and Techniques for Programming in Windows." The first, "Create
Invisible Windows to Eat Input During Gigunda Tasks," addresses the problem of
impatient users. As Edson points out, even if you display the hourglass cursor
during a long-running task, a user will often go on a mouse-clicking binge,
queuing up events that are processed in rapid order once your program
completes its processing. Edson's first solution is to create an invisible
window that does a SetCapture and funnels all events to DefWindowProc.
Freezing the user out of all mouse input is "extremely non-multi-tasking
aware," as Edson says, so you and your users might prefer his second approach:
Create a transparent window that covers your application and swallows
messages, but doesn't use SetCapture. Your user can't create an event backlog,
but is still free to work with other programs while yours is busy.
The second technique is "Change a Window from a Popup to a Child Window and
Back--Without Hacking." While Edson doesn't actually show you how to do this,
he does describe a way to simulate it. I'm sure Edson is right, and there is
no way to toggle a window between being a popup and a child by fiddling its
style bits. But his solution is clever and nearly as good: Create two windows,
one a child and the other a popup, that both use the same window callback
function. Your program then hides one window and displays the other, and can
switch between them as circumstances require different window characteristics.
The centerpiece of the book is the 280-page Chapter 11, "Windows 95 User
Interface Goodies," a strong introduction to the Windows 95 common
controls--image lists, tree views, rich-edit controls, toolbars, status bars,
property sheets, header controls, list views, hotkeys, and spinners. Instead
of presenting short tips as he does elsewhere in the book, Edson provides a
primer on each control, complete with API details and sample programs. The
property-sheet control, which is likely of greatest interest to readers, has
two sample programs: one demonstrating the MFC 3.0 support for this control,
and another showing a bare API implementation via COMCTL32.DLL. Oddly, neither
program is included on the accompanying diskette, although both are listed in
the book.
Robert Schmidt's book-within-a-book is well done and presents considerable
forward-thinking advice for all C and C++ coders, not just Windows
programmers. Schmidt covers issues such as the difference between K&R C and
ANSI C, as well as surprises you might encounter when moving to C++. He
finishes with his most thoughtful chapter, "Reasons Not to Migrate from C to
C++." Based on much of the production C and C++ code I see, I hope Schmidt's
chapters are widely read and taken to heart.
Still, the book has its quirks, beginning with Edson's colorful writing style
(which he acknowledges by thanking his editors for "letting me get away with
so much goofiness in my articles"). Whether you find this "goofiness"
refreshing or annoying is a matter of personal taste. In almost all cases I
thought Edson's colorization was fun without being intrusive. However, I felt
he went overboard with Chapters 3, 12, and 15: "Party Phrases to Ensure that
You Maintain Your Computer Nerd Status," "Names of Pets Owned by Windows
Programmers," and "Pre-IBM PC Trivia." These self-indulgent chapters, while
short, don't add enough to the book to justify their presence.
I was bothered by the underlying assumption of the book--that the topics
covered can or should be addressed in lists of exactly ten items. This
assumption leads to oddities, such as item #1 in Chapter 16, "MFC Programming
Checklist: Use OLE," which is nothing more than a suggestion to use OLE in
your programs, and a pointer to Kraig Brockshmidt's book Inside OLE2
(Microsoft Press, 1994). I doubt that any intermediate/advanced Windows
programmer (the stated target audience for this book) still needs this advice
in 1995.
The biggest drawback of Edson's book is that it appears to have been rushed
through production. I was disappointed at the number of typos (for instance,
"wierd" and "anyting" on the same page). Similarly, the accompanying diskette,
while a useful resource, has its own problems. Only one of the over 20
programs has a .MAK file, a small but needless hassle. Most annoying of all,
three of the programs would not compile without errors until I made minor
changes that had nothing to do with the still-fluid Windows 95 API. In one
case, I spent over an hour trying to determine why one of the rich-edit
examples worked, while the other wouldn't allow me to change text attributes.
I finally discovered that the defective program wasn't setting the length
field in a structure that it passed to the RTF control.
These problems and quirks aren't reason enough to avoid this book, but they're
bothersome in a $40 package. Edson's book feels like a useful Windows program
that lacks online help and is saddled with a conspicuously ugly icon. Whether
that's enough to keep you from adding this title to your shelf is a personal
decision.
--L.G.
Every newbie needs a guru. You know who I mean: those wise and experienced
hackers from the dumb-terminal era who have forgotten more about bit-hammering
than you or I can ever hope to learn. When I began programming, my guru was
archetypal, prone to offering Yoda-like koans at the drop of a paradigm. (Come
to think of it, he looked a lot like Yoda.) For instance, one of his favorite
Delphic proclamations was: "As you grow as a programmer, you're always hitting
walls. And you'll hammer at that wall for the longest time, to no avail. Til
one day you will find a weapon that will smash the wall and allow you to break
through to a whole new level of expertise. That weapon will always turn out to
be a book."
Looking back, his words are still true--and never more so than in the last few
months, when my time has been spent pounding on the endless mysteries of the
OS/2 C++ Class Library, an exercise designed to make even the most hard-bitten
keyboard cowboy whimper with frustration. I finally broke through the wall
with the help of a book whose uninviting title is OS/2 C++ Class Library:
Power GUI Programming with CSet++. Weighing in at 879 pages, the book is
intimidating. Don't let this put you off, however. Deal with this intimidation
factor, and sink your teeth into the book. If you're programming in OS/2 C++
and hope to master the Class Library, you can't afford not to read it.
Besides, you will be surprised and delighted at how approachable and downright
sensible the authors, who are all IBM employees, make it all seem.
Make no mistake: The authors expect you to understand programming to the OS/2
Presentation Manager (PM) and C++ programming at the outset. This is not a
book for beginners. The authors make no bones about having a lot of ground to
cover and little time to dawdle, waiting for you to play catch-up.
If you are not quite sure if you are an "advanced beginner" or a "novice
seasoned professional," the first three chapters will help you get situated.
You will be led through the theory behind object-oriented user interfaces, the
basics of programming using the User Interface Class Library, and a number of
PM concepts that are important within the object-oriented framework. If you're
more advanced, you may not need this material, but give it a skim anyway--it's
good stuff.
The real guts of the book begin with Chapter 4, which gives an in-depth
analysis of the Class Library from an architectural perspective and maps this
to the message-based paradigm that drives PM. The authors also reveal the
behaviors and attributes that are common to all the window classes. The
chapters that follow assume you have digested and understood the concepts
advanced in Chapter 4, so tuck in and give it some serious attention. Read it
twice if need be, to avoid foundering later on.
If you are programming to PM, you are by definition living and breathing
nothing but windows and controls all day long. This is the subject matter of
Chapters 5 through 15. If you understand Chapter 4, you will be okay here,
although that is not to imply that you will find it easy going. If you are
like me, you feel that you've already paid your dues once by learning how to
program to the straight PM C API. Prepare to ante up a whole new set of dues
as you learn how the Class Library manages frames, menus, static and input
controls, buttons, and the relatively new CUA91 controls (sliders, containers,
and notebooks). You will even get to cut your teeth on the powerful,
underutilized Canvas control. It is to the authors' credit that they are able
to tie the concepts and APIs together so beautifully and keep you from being
overwhelmed.
Although the OS/2 sun rises and sets on windows, buttons, and frames, the
Class Library also delivers a lot of solid function in terms of encapsulating
things that are not graphical. Chapters 19 through 26 analyze this capability
in detail, easing you as gently as possible into a whole new way of dealing
with the back-end coding that actually gets the real work done. For a set of
classes that were, as the authors modestly claim, essentially cobbled together
on an ad hoc basis, you can't help but be impressed with the thorough job the
developers did with these areas. Applications and Threads both have a solid
set of classes, and the Class Library drag-and-drop classes are exponentially
less painful to deal with than enabling drag-and-drop with straight PM. The
chapter on the DDE classes is well written, a blessing given the surprising
(and occasionally mind-numbing) wealth of classes that exist to enable dynamic
data exchange within and between programs.
I found the chapter on the Help classes odd; I could not help wondering if it
were really necessary to create classes to manage Help interactions, which I
personally always found to be fairly painless under regular old PM. However,
give it a browse and see if you prefer the OO method. Maybe I am getting old,
but I had the same reaction when reading the chapters on the Resource classes
and the Profile classes. A little voice in my head kept chiming in, "yes,
that's an interesting approach, but is it really an improvement over the
old-fashioned way of getting things done?". I don't think so, but, again, you
may find the indefinable flavor of the OO approach to these aspects of OS/2
programming to be just the ticket.
The book finishes with two elegant chapters on problem determination,
packaging, and performance. Gone are the days when we debugged our apps using
cleverly placed printf() statements; in the object-oriented universe, the
inner workings of a given object are obscure by design, and you will want to
devote the time and energy to mastering the Class Library's ITrace class. The
authors also address the issue of performance and packaging and offer handy
hints to reduce the bloats (let's face it, C++ GUI programming is sexy, but
without some hand-crafted attention, the GUIs run hog-slow and the bloody
executables are huge).
This is yet another outstanding technical guide authored by a group of IBMers.
This trend gives me hope that IBM has finally abandoned the "If You Build It
They Will Come" mindset that has hobbled efforts to make OS/2 more
approachable to the programming community. Maybe, just maybe, IBM has realized
that it has to turn its people loose to do what so many of them seem quite
good at--namely, showing us how to get things done in the real world using
OS/2.
--S.G.
Dave's Book of Top Ten Lists for Great Windows Programming
Dave Edson
M&T Books, 1995, 751 pp., $39.95
ISBN 1-55851-388-4
C++ Class Library: Power GUI Programming with CSet++
Kevin Leong, William Law, Robert Lowe, Hiroshi Tsuji, and Bruce Olson
Van Nostrand Reinhold, 1994, 879 pp., $39.95
ISBN 0-442-01795-2












































































SWAINE'S FLAMES


Battering Bob


The third Tuesday of the month is Journalist's Roundtable at Foo Bar, the
hidden bistro where I occasionally moonlight as backup bartender. The best
brains in the computer press attend these cerebral soirees. Typically, you'll
find Cringe, Spence, and the Knife huddled in their respective corners; Rory
O'Connor of the Merc holds forth over a pint at the bar, while the Times' John
Markoff sips mineral water and tells jokes to Dilbert cartoonist Scott Adams.
The brain trust ebbs and flows as the evening unrolls, with Brock Meeks
usually logging in around midnight, by which time Dvorak and the hors
d'oeuvres are but a bittersweet memory.
This particular Tuesday was a slow night. Only witty Jennifer J. Sun-Lee,
earnest Jimmy Stalwart, and the crusty old editor from my LA daily days, Ed S.
Nurr, had responded to my call to discuss "The Concept of the Intensely
Personalized Endearing Cartoon Agent User Interface, as Exemplified in
Microsoft's Bob."
Jennifer bit the end off a cigar and scratched a match across the bar. "Thanks
for the tip, Jimmy. I now know to give your family reunions a pass. Fill it,
Mike, and don't scrimp. The question is, boys, when you're on deadline, do you
want your word processor engaging you in sappy chit-chat? Maybe you do. I
don't."
"What I'm getting at," Jimmy went on, "is this: Ya don't want flaws in a
computer interface. You're not gonna put up with a computer interface that
misunderstands everything you say and takes snuff. But, but without the flaws,
ya see, it's just not endearing, if ya see what I mean."
Ed was slumped forward, both forearms flat on the bar, his chin resting on his
glass. "I had this typewriter I called Daisy," he sniffed. "I loved that
typewriter. In spite of her flaws."
Jennifer tapped the ash off her stogie and faced me. "Here's the nub:
Microsoft wants you to think of Bob as a person. But there are too many things
you do or say to a computer that you would feel peculiar doing or saying to a
person. Or should. Even to one of Jimmy's relatives."
"For example?" I prompted. 
"For example: Bob just crashed. I think he's corrupted. I'm going to boot Aunt
May. Better shut down Flopsie." 
"I can imagine saying all those things, in the right mood," I said.
She sloshed her highball gloriously across the bar. "It really comes down to
this, boys: Do you want to become attached to something you know you're just
going to throw out?"
I manipulated the bar rag. "You mean, how attached can you get to Bob -"
"Bingo. When you know that Bob '96 will be out in a couple of years."
Well, she had me there.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com








































OF INTEREST
At this year's Computer Game Developers Conference, Microsoft announced the
Windows 95 Game SDK. Developers attending the announcement received a free
beta copy of the SDK along with a beta release of Windows 95. The company
hopes to gain the mindshare of game developers who have traditionally targeted
DOS for their apps. Microsoft concedes that over 90 percent of the games
currently on the market and in development run under DOS. Windows 3.1's
sluggish display performance has been a sore point with games programmers and
has been a primary reason for its slow acceptance in the game industry. The
company hopes that the new features of Windows 95 along with improvements
provided through the Game SDK will change that.
The SDK includes a Windows 95 game subsystem to enable high-performance game
play. The game subsystem, a new component of Windows 95, will eventually be
included in a future release of the operating system. In the interim,
developers can use the game subsystem on a royalty-free basis to create games
and will have permission to redistribute the run time with their software.
The SDK also features four C APIs: DirectDraw, DirectSound, DirectPlay, and
DirectInput. DirectDraw supports Blt, transparent Blt, and page flipping
through hardware acceleration. On a 486/66, Blts can be moved at rates up to
70 frames per second at 640x480 pixels using 256 colors. DirectSound allows
mixing and playback of up to eight audio streams with full volume, frequency,
and pan control over individual streams. Latency on playback has been cut from
the roughly 200 milliseconds it takes for current WAV drivers to approximately
50 milliseconds. The DirectPlay API allows multiplayer games to connect either
through Winsock or modem. DirectPlay provides the functionality to open a
connection, create a player, and send messages between players. This is
accomplished using a "send- receive-reply" model. DirectInput supports digital
joysticks using a mini-driver model; drivers will be supplied by individual
vendors.
In a related announcement, Microsoft detailed plans to include the Reality Lab
real-time 3-D engine in future versions of Windows. Microsoft obtained the
product through its acquisition of RenderMorphics in February of this year.
Reality Lab, an API for creating and rendering 3-D graphics in real time,
already takes advantage of hardware acceleration at any stage of the graphics
pipeline. According to a Microsoft spokesperson, "the 3-D engine will also
benefit from hardware acceleration features of DirectDraw." Microsoft plans to
release a beta version of Reality Lab in early 1996. Questions regarding
Reality Lab can be sent to reality3@microsoft.com. 
Microsoft
One Microsoft Way
Redmond, WA 98052-6399
206-882-8080
Metagraphics has released Media!Lab, a graphics-programming toolkit for
developing Windows-based, interactive graphic and multimedia programs.
Media!Lab supports imaging, special effects, animation, video, and sound in a
C++ programming environment. The library, which is in optimized assembly
language, integrates with Windows enhanced multimedia and WinG facilities and
can be used alone or in conjunction with class libraries such as MFC or OWL.
Media!Lab supports BMP, PCX, and DCX image files; AVI, FLC, and FLI video;
sprite animation; sound; timer controls; fade and wipe effects; color-palette
animation; object hit and collision detection; bitmap tiling; and the like.
The class library, which works with Visual C++ and Borland C++ compilers,
sells for $349.00.
Metagraphics
200 Clock Tower Place, Suite 201E
Carmel, CA 93923
408-622-8950
3Dlabs has announced its GLINT 3D graphics-accelerator chip, which supports
VRML (Virtual Reality Modeling Language) 3-D graphics and virtual-reality
applications available on the World Wide Web. Silicon Graphics (SGI), Template
Graphics Software (TGS), and Netscape Communications have endorsed the
accelerator for their VRML-based software. With GLINT-based graphics boards,
VRML browser software will be seamlessly accelerated for a real-time 3-D
experience on low-cost desktop PC systems.
As part of this announcement, SGI and TGS introduced WebSpace, a VRML-based
Internet browser that's an add-on module to existing Web browsers. TGS will be
providing its Open Inventor SDK with VRML Internet extensions compatible with
GLINT boards.
Both Open Inventor and WebSpace use OpenGL as their graphics-rendering engine,
and GLINT silicon was designed to enable the high-speed processing of all
OpenGL commands and operations.
The 3Dlabs GLINT processor incorporates the equivalent of a high-end
workstation graphics board set in a single chip. Target platforms include
desktop PCs, workstations, and embedded systems. GLINT is capable of
processing 300,000 shaded, depth-buffered, anti-aliased polygons/sec. The chip
provides 32-bit color, 2-D and 3-D acceleration, an on-chip, PCI-compliant
local bus interface, and integrated LUT-DAC control, making a complete
graphics subsystem possible with minimal chip count. GLINT implements
sophisticated rendering operations in silicon, including Gouraud shading,
depth buffering, anti-aliasing, and texture mapping.
3Dlabs Inc.
2010 N. 1st Street, Suite 403
San Jose, CA 95131
408-436-3456
Template Graphics 
9920 Pacific Heights, Suite 200
San Diego, CA 92121
619-457-5359
InContext has released its InContext Spider, an HTML Web processor that lets
you create dynamic Web pages while browsing on the World Wide Web. The company
claims that Spider, unlike other HTML editors, does not require specialized
knowledge of HTML to produce valid Web documents. You can run InContext Spider
on any Windows-based system, and do everything from simple text editing to
creating interactive pages linked to any number of Web sites. Spider speaks
directly to the browser, allowing you to create Web pages interactively by
inserting URLs to other topics, or external links which connect to other sites
on the Web. By using the InContext SDK with InContext Spider, you can
customize the interface, making it as simple or as advanced as the work
environment demands. InContext Spider retails for about $100.00.
InContext Corp.
2 St. Clair Ave. West, Suite 1600
Toronto, ON 
Canada M4V 1L5
416-922-0087
GX Sounds, from Genus Microprogramming, is a DOS sound-programming toolkit for
C programmers. GX Sounds lets you control and utilize digital and MIDI sound
on a variety of audio-output devices without knowing the details of the
hardware. The software also provides the ability to automatically detect
multiple sound devices. Detection drivers are loadable so that new detection
routines can be added without requiring program modification. For MIDI
playback, GX Sounds supports the OPL2 and OPL3 sound chips. The GX Sounds
library sells for $249.00.
In a related announcement, Genus has released GX Images, a DOS graphics
programming toolkit that lets you incorporate a variety of bitmapped
image-file formats into application programs. The supported formats include
Windows BMP, Dr. Halo CUT, GEM/Ventura IMG, JPEG (JFIF), GRASP/PICTOR PIC, HP
PCL, ColorRIX, TARGA, TIFF, WordPerfect WPG, PCX, DCX, and EPS. The GX Image
toolkit includes more than 50 library functions and several utility programs.
The library supports C, Pascal, Fortran, and Clipper compilers in both real-
and 16-bit protected mode. GX Images sells for $499.00 ($549.00 for 16-bit
protected mode). 
Genus Microprogramming
1155 Dairy Ashford, Suite 200
Houston, TX 77079
800-227-0918
Realtime Performance has announced ControlPro, an integrated development
environment for developers specializing in distributed control applications.
The toolkit includes a drawing tool that lets you create custom screen
objects, specify dynamic behaviors, and preview behaviors before writing
application code. The GUI also provides a set of input objects called
"graphical interactive screen management objects," which respond to input
events. 
At the heart of the C/C++ toolkit is an object-registry database for the
storage and retrieval of configuration data, object definitions, and
behavioral models. ControlPro is available for Windows NT, Windows 95, OS/2,
and SunOS. Non-GUI components of the environment are portable to OS/2,
iRMX-III, and VxWorks. 
Realtime Performance
349 Cobalt Way
Sunnyvale, CA 94086
408-245-6537
Virtual Media Technology has released its Virtual Media Hypertext Development
Kit (HDK), which lets you create intuitive, context-sensitive help systems for
Windows-based software. The Virtual Media HDK automatically converts
previously created documents into full-featured hypertext systems that are
displayed by the Windows Help engine.
The HDK includes graphics outlines; provides full-text indexing; and extends
standard WinHelp capabilities by giving you ways of delivering pop-up
glossaries, animation, 256-color and watermark bitmaps, and support for audio.
In addition to generating source code, the HDK generates RTF and HPJ files.
The HDK sells for $395.00.
Virtual Media Technology
1843 The Woods II
Cherry Hill, NJ 08003
609-424-6565
The Kalendar from Artic Software is a calendar custom-control VBX that lets
you develop calendars that have multiple views, international formats, data
awareness, 3-D features, graphic wallpaper, and more. In all, the VBX provides
20 events so that you can customize the day boxes, draw your own calendar day
boxes, and draw directly on the day boxes. The control, which works with
Visual Basic, Visual C++, Borland C++, and Delphi, sells for $69.00. 
Artic Software
P.O. Box 28
Waterford, WI 53185-0028
414-534-4309
XVT Software has announced its XVT Graphical Extensions, a library of graphics
and plotting programs. In addition to common 2-D and 3-D bar charts and pie
charts, Graphical Extensions includes support-line plots, area plots, impulse
plots, scatter plots with error bars, and carpet plots with hidden-line
removal. Linear, logarithmic, and semilog scales may be used on any axis.

For data-acquisition applications, real-time data updates are supported. Hot
spots can be defined for user interaction and automatic zoom capabilities are
included. Fonts, colors, and pen-brush styles are customizable. All 3-D images
can be rotated for viewing from any direction. In addition, an
intermediate-level Viewport API offers a broad set of geometry management
functions for all graphics drawing needs.
XVT Graphical Extensions is available now for both the C and C++ versions of
the XVT framework. It sells for $895.00 for Macintosh, Windows, and SCO UNIX
systems, and $1600.00 for workstations.
XVT Software 
4900 Pearl East Circle
Boulder, CO 80301
303-443-4223
Beame & Whiteside Software has announced its BW-Connect News Server for
Windows NT, which will let network administrators set up and maintain network
news groups on any Windows NT-based PC. BW-Connect News Server for Windows NT
supports multiple Network News Transfer Protocol (NNTP) clients and allows
administrators to synchronize messages on news servers scattered throughout a
network.
The software includes INN-like news extensions, such as NNTP commands
(article, body, group, head, help, ihave, last, list, newsgroups, newnews,
next, post, quit, slave, stat, listgroup, and xhdr). Since it is
multithreaded, BW-Connect News Server for Windows NT can support an unlimited
number of users; the only limitation is the capacity of the hardware platform
and the memory space available. BW-Connect News Server for Windows NT also
includes support for news synchronization so NNTP messages can be
automatically forwarded at preset intervals to update other NNTP news servers
located around the network. By using news forwarding, each news server can be
synchronized to other NNTP servers so all network servers can carry the same
newsgroup information, which should help localize network news traffic.
Administrators will be able to set the news feeds to run fully synchronized or
to only synchronize limited news groups. 
BW-Connect News Server for Windows NT retails for $599.00.
Beame & Whiteside Software 
706 Hillsborough Street
Raleigh, NC 27603-1655
919-831-8989
BlueWater Systems has announced a Windows 95 version of its WinRT Tool Kit for
developing Win32 hardware-control applications. The toolkit lets you write
programs that directly access port I/O, memory I/O, and interrupts without
dealing with the complexities of the Windows 95 Device Driver Kit. WinRT-based
code is portable between NT and Windows 95. The WinRT Tool Kit, which includes
the WinRT Device Driver (a set of dynamically loaded VxDs), the WinRT
preprocessor, DOS simulator, NT Registry editor, and sample programs, sells
for $395.00.
BlueWater Systems
144 Railroad Ave., Suite 217
Edmonds, WA 98020
206-771-3610
Phar Lap and Periscope have teamed up to release Embedded Periscope for Phar
Lap, a debugging tool for embedded-systems and real-time developers using Phar
Lap's TNT Embedded ToolSuite system. The debugging tool, which sells for
$3000.00, eliminates the need for in-circuit emulators to track down bugs in
real-time software. 
The Periscope Company
1475 Peachtree Street, Suite 100
Atlanta, GA 30309
404-888-5335








































EDITORIAL


Winners and Losers


Sometimes you just can't win for losing. Prodigy, for instance, recently lost
a $200 million lawsuit because it didn't practice enough censorship--this,
after its ears were soundly boxed in 1991 for too much censorship. 
Four years ago, Prodigy screened 100,000 public messages per week for
potentially offensive words or phrases. When word of the censorship leaked
out, civil libertarians and Prodigy customers didn't censor their comments
about infringements of First Amendment rights. Of course, Prodigy's checkered
history regarding censorship didn't help, especially considering the company's
1989 decision to edit and eventually pull the plug on a public forum where
religious fundamentalists and gay-rights activists were verbally battling. 
Still, neither of these controversies went to the bottom line like this recent
ruling. In this case, the investment bank Stratton Oakmont charged a Prodigy
subscriber with making libelous statements about the bank. The bank claimed
that Prodigy should have edited or deleted the remarks, since it was known to
edit or delete other comments. Prodigy countered that, like the phone company
or bookstore, it's a common carrier and not responsible for the contents of
what it carries.
New York State Court Judge Stuart Ain sided with Stratton Oakmont, describing
Prodigy as a publisher subject to the rules of libel because the online
provider took steps toward censorship in the first place. "Prodigy's conscious
choice, to gain the benefits of editorial control has opened it up to a
greater liability than CompuServe and other computer networks that make no
such choices," wrote Judge Ain. This was in line with a 1991 ruling in which a
federal judge threw out a libel suit against CompuServe because that provider
simply distributed--but did not edit--a newsletter that contained potentially
libelous material. 
And then there's Intel. Still smarting from a ton of bad publicity over
defective Pentiums, the semiconductor giant came up with a thinly veiled
philanthropy scheme to recoup some of its public-relations losses. Alas, The
Wall Street Journal shellacked Intel for the plot--and on the front page, no
less. At issue was Intel's "generous" offer to help finance and build a Rio
Rancho, New Mexico, high school by guaranteeing $28.5 million in construction
financing. 
But the WSJ reports that there's nothing generous about the Intel offer.
Through out-of-state land developers who were in cahoots with local
government, Intel finagled lower corporate-income taxes, exemptions from
property taxes and gross-receipts taxes on equipment purchases,
taxpayer-funded employee recruitment and training, guarantees of rapid grants
for permits, and deep discounts on everything from moving fees to employee
utility deposits. Furthermore, the Sandoval County Commission will issue up to
$8 billion in industrial-revenue bonds to finance Intel's plant expansion. 
In the long run, however, Rio Rancho taxpayers will still have to pay for the
school. The plan reportedly requires that the school district lease the high
school for $1 a year for the 30-year life of the bonds, then buy the
facilities when the bonds are paid off.
Granted, busing school kids up to 70 miles a day to already-overcrowded
Albuquerque schools isn't necessarily safe or conducive to learning. However,
the scenario that led to this less-than desirable situation was, to some
extent, of Rio Rancho's own making. Since the late 1980s, developers had been
pushing to secede from the Albuquerque district and build their own schools.
With this kind of background noise, Rio Rancho understandably didn't get much
sympathy from Albuquerque when it came to planning new schools. Into this mix
strolled Intel which, with the help of a $114 million incentive package that
included tax breaks, quickly put up a manufacturing plant and started cranking
out microprocessors. As befits its roots in the orchards of the Santa Clara
Valley, Intel knows when plums are ripe for picking. 
When it comes time to pay the piper, a new generation of Rio Rancho taxpayers
will be held accountable. The out-of-state developers will be gone, Intel will
have other CISCs to fry, and taxpayers will be scratching their heads while
digging deeper into their pockets. 
Of course, until the Rio Rancho school issue came up, Intel had never
suggested that philanthropy was part of its corporate mission. If the company
really did care about the community, it might have followed the lead of
Microsoft's Bill Gates, who in 1992 donated $6 million of his own money
towards constructing an information-science building at Stanford
University--an institution he has no affiliation with whatsoever. From his
perspective, Gates simply wanted "to invest in the future of the industry"
(okay, and get a nice tax deduction in the process). Or maybe Intel should
look to HP's William Hewlett, who donated $15 million as part of a $50 million
grant to the Bay Area School Reform Collaborative. Since Intel wants to
promote its "benevolent" interest in education, maybe the company should study
Gates' and Hewlett's unselfish acts and create a situation in which everyone
wins all the time.
Jonathan Ericksoneditor-in-chief












































LETTERS


C and History 


Dear DDJ,
Regarding Al Stevens' inquiry as to the choices behind the current structure
of C parameter declarations ("C Programming," DDJ, December 1994).
Functions in K&R C had arguments typed with parameter-declaration, allowing
Example 1(a). The argument would effectively be "moved" into register storage
by the compiler. Prior to this arrangement (predating K&R C), you had to do
this manually; see Example 1(b). In other words, this was an improvement in
the language. (Early C compilers used many compromises of this ilk.) ANSI C,
of course, uses Example 1(c).
When ANSI C came on the scene, compilers were allowed to intermix K&R
argument-declaration syntax with that of ANSI for backwards compatibility.
Declaration semantics thus must be interchangeable. (All ANSI compilers will
balk at this.) 
With declaration syntax set, prototypes must again use the same syntax, since
they may be generated by automatic means. (It would be unwise to have
different ones, although in effect, compilers restrict you to a "specifier
qualifier list.") However, in the case of prototypes, the point Al was making
is well taken.
In fact, we could take this inquiry further. For example, what if we were to
extend the language and add new storage classes? Might these be desirable to
include in a prototype? For that matter, could there be an advantage in
allowing a function prototype to enforce a "register" storage class in a
prototype (meaning that the argument was always passed via register)?
One of the great strengths of C is that it permitted an implementor a good
degree of latitude in exploiting machine architecture. Unlike PL/I, C attempts
to be more of a high-level assembler than a high-level language. While PL/I
allowed a rich set of operations to be embedded in the language itself, use of
those features could involve considerable run-time overhead that the
programmer needed to be aware of. However, with C, all of the operations
typically matched the machine language almost one for one. (Many PL/I
compilers of the time would detect and "fix" typing errors, allowing you to
run programs with significant errors in the code. This went loggerheads with
the obsessive type-checking of Pascal, which would flat-out refuse to execute
any imperfect code--so much for the differences between the then "old" and
"new" schools.)
In sum, C's role as a language is tightly tied to low-level implementation
details, so that the syntax and semantics of variables must be fluid enough to
match machine characteristics of the future. This is completely unlike "big"
high-level languages, which attempt to offer versatility for the price of the
same old complexity we've seen before.
Incidentally, what Al describes as "C minus minus" reminds me a lot of the B
programming language. (B, the predecessor of C, was an inspired approximation
of the Bootstrap Cambridge Programming Language, BCPL.) B was revived by Mark
Horton (then at the University of California at Berkeley) for his thesis work
on a language editor (bedit) that merged the concepts of text editor and
incremental compiler. The language was simplified from C to allow for
incremental parsing and "code" generation as the program was being composed in
the editor (thus, it always was immediately syntax-checked and runable). In
effect, it worked much like Microsoft's Visual products (which may not take
the concept quite as far).
The code was part of past Berkeley BSD releases (in the /usr/src/contrib
directory) and is probably still available on the Internet.
Bill Jolitz
Oakland, California


More FFTs


Dear DDJ,
In his article "Faster FFTs" ("Algorithm Alley," DDJ, February 1995), Iwan
Dobbe describes a fast radix 2 FFT algorithm for complex data. It is good to
see some numerical-methods articles in DDJ; I hope we will see more. Here are
a couple of minor changes that can be used to improve speed and accuracy.
A small improvement in speed may be had by noting that in the routine
Shuffle2Arr data is only swapped if the bit-reversed number is greater than
the index. This means that you can avoid the unnecessary calls in Example 2(a)
by using Example 2(b). This works because all the numbers after 2b-2((b+1)/2)
will become smaller (or remain the same) when bit reversed. Unfortunately, the
savings are not great because the array shuffle is a minor component of total
time. I liked the idea of tabulating the sin and cos coefficients.
The second improvement involves the best way to generate the Qk phase factors
by recurrence relation. This is worth optimizing because with large
transforms, the numerical rounding errors accumulate quickly. Classical wisdom
is to use double precision for the Qk relations. Starting from Example 2(c)
and using the identity in Example 2(d) instead of using cos(x) directly, you
can minimize rounding errors, as in Example 2(e).
Finally, and most interesting, because the coefficients have been stored in a
table, it is possible to optimize them to minimize the rms error in the FFT
phase factors, allowing for real truncation. This has to be done on the target
hardware platform.
Basically the idea is to minimize the difference between phase factors
computed by recurrence and the true trigonometric functions. To do this, work
out all the exact phase factors once using double-precision, standard
trigonometric functions, then truncate to real precision and save them in
arrays TQr[], TQi[]. Set initial values for s, s2, Qr[0], and Qi[0]. Generate
Qr[], Qi[] using the recurrence formulas. Compute the chi-squared error in
Example 2(f). Then use any minimization routine that depends only on function
evaluation (that is, Golden section search) to optimize chi-squared by varying
the initial starting values and recompute Qr, Qi (a search range of +1e-6 is
plenty). For example, calculated on a 486DX2, the chi-squared error made by
using the recurrence relations instead of explicit trig functions works out as
illustrated in Table 1.
In other words, by optimizing the coefficients, you can decrease the rms
errors by a factor of 60 for long transforms (=1.8 extra sig. fig.). Just put
the optimized values for s and s2 into the coefficient tables. It is only
worth optimizing if you will compute a lot of big FFTs. This is a useful gain
in accuracy for signal-processing work.
It is also worth mentioning that many experimental datasets are real, rather
than complex data, and there is a useful trick to transform a real dataset to
a complex conjugate symmetric form. (See, for example, Numerical Recipes in
C.)
Martin Brown
East Rounton, England


Interoperable Objects


Dear DDJ,
I read with interest the articles in Dr. Dobb's Special Report on
Interoperable Objects (Winter, 1995). As a developer in a large corporation, I
am very interested in choosing a technology to rebuild our applications for
the hardware of today and tomorrow. We have found out the hard way that it is
impossible to keep a business-application portfolio up to date without a great
deal of reuse. The easiest way to write a new transaction may be to copy an
old one, but it is impossible to keep hundreds of them up to date with the
needs of the business.
I am being asked to deliver new business functions to the users within a
couple of weeks of the request. If someone is already marketing an object with
the function I need, then there is no problem. But what if the best available
object has only 90 percent of the function? What if I need to modify an object
in some way? What if I need to develop it from scratch?
I look for a few fundamental criteria in a software technology. Can it do what
I want? How many lines of code will I have to write? How independent are the
objects? How likely is it to succeed in the marketplace?
I would give Microsoft OLE a high score on the last point. I expect that it
will not be too long before I use some OLE components in a business
application. But regardless of what Microsoft says, I believe that
object-oriented technology with inheritance is the easiest way to reuse code.
If an object has 90 percent of the function I need, then I get all that
function with a few lines of code. I only need to write the 10 percent that is
different. In my case, transactions, a large portfolio of similar objects,
makes this especially powerful.
The biggest shortcoming of object-oriented development tools has been the
matter of independence. With SOM offering binary compatibility between
different versions of an object, this problem is reduced. I can modify a SOM
object that I bought without having access to its source code. I write a new
object that inherits from the original one and override the methods I need to
modify. The base class or the derived class can be changed without having to
recompile the other.
In short, Microsoft can say "The problem with..." all it wants. SOM seems to
me to be the best available technology for me to begin to build systems that
can keep up with the needs of the business.
Ron Brubacher
London, Ontario


Ewoes


Dear DDJ,
Regarding "Swaine's Flames" in the March, 1995 DDJ: In medieval times, church
was people's television, rock concert, and ball game all in one. Not only had
the sermons (and writings and teachings) gotten self referential (religion
talking about religion) but they were in Latin, a language most people did not
even understand. Yet, people kept going to the show that the church gave. 
The most powerful institution of its day had created a totally artificial,
self-referential reality in which people actually lived--they thought those
things were real, and they saw their real lives in terms of these artificial
creations.
I think our popular culture is headed in the same direction. Writers write
about writers. Violence in [Quentin] Tarantino's movies is inspired by
violence he'd seen in other movies, and so on. You get e-mail about e-mail.
These are baby steps, but as less and less time is spent feeding us and
keeping us dry and warm, we can afford to create yet another artificial world.
But this time the technology is going to be a lot better, so it is going to be
a lot more real. Cathedrals are nothing compared to what we are going to build
next. 
Wilhelm Sarasalo 

pacsoft@netcom.com
Dear DDJ,
I'm sorry to hear that Michael Swaine's elife collapsed under the estrain of
the eresponses to his emoticontest. If I had had any suspicion that he was
still taken of the madness that he could, as an ecelebrity, expect to eread
and eanswer even a modest portion of his email, I would have eoffered a more
friendly and voluble eletter than my (correct, eattached) eentry of November
8.
Upon reading the March 1995 "Swaine's Flames," I have decided to ewrite this
emissive and esend it redundantly to you and the DDJ editors, to help limit
the erisk that you might elose this one!
Even if this isn't the right answer, the emoticons bear an uncanny resemblance
to Siskel and Ebert.
Lyle Wiedeman
wiedeman@altair.acs.uci.edu
Example 1: C and History.
(a) foo(a)
 register int a;
 {
 ...
 }

(b) foo(a)
 int a;
 {
 register int ra;
 ra = a;
 ...(code uses ra instead of a)
 }

(c) foo(register int a)
 {
 ...
 }
Example 2: More FFTs.
(a) do
 {
 N = N*2;
 bitlength = bitlength-1;
 } while (bitlength >0)
 for (IndexOld = 0; IndexOld <= N-1; IndexOld++)
 {....

(b) N = (1<<bitlength) - (1<<((1+bitlength)/2));
 for (IndexOld = 0; IndexOld < N; IndexOld++)
 {....

(c) c = cos(x); s = sin(x); /* from table */
 temp = Qr;
 Qr = Qr*c - Qi*s;
 Qi = Qi*c + temp*s;

(d) cos(x) = 1 - 2.(sin(x/2))^2

(e) s2 = 2*(sin(x/2))^2 ; /* from table - replaces c */
 temp = Qr;
 Qr = Qr - Qi*s - Qr*s2;
 Qi = Qi + temp*s - Qi*s2; /* note this adds two extra
 subtractions */

(f) for (i=0; i<= N; i++)
 {
 dr = (TQr[i]-Qr[i]);
 di = (TQi[i]-Qi[i]);
 chisq = chisq+ dr*dr + di*di
 }
Table 1: More FFTs.

 Size Original (c,s) Modified (s2,s) Optimized (s2,s)
 28 1.05E-11 5.83E-12 4.06E-14
 1024 6.90E-10 4.49E-10 2.17E-13
16384 5.83E -7 1.10E-7 1.49E-10



























































Generic Programming and the C++ STL


Focusing on data representation and algorithms




Dan Zigmond


Dan, a principal of Avatar Software Inc., can be contacted at djz@avasoft.com.


The C++ object-oriented programming model focuses so heavily on specifying
interfaces for encapsulation that programmers sometimes gloss over the meat of
the software-development process--data representation and algorithms. Once a
class interface has been specified, the details of its implementation are
often said to be "irrelevant" to the outside world. There is some truth to
this, but only in a limited sense. Anyone who actually uses the class cares
whether or not it has been implemented correctly and efficiently, and whether
or not the internal algorithms and data representations chosen by the class
designer were appropriate. 
Generic programming provides an alternative to this black-box approach. Where
object-oriented programming attempts to take abstract "objects" and give them
real-world representations, generic programming tries to do the same with
algorithms. It does this by focusing on two questions: 
How can you represent efficient algorithms independently of any particular
data-representation scheme? 
How can you provide an interface to a diverse set of data-representation
strategies that gives us the flexibility to choose appropriate
representations, but also facilitates the development of more-abstract
algorithms?
Until recently, C++ didn't directly address either problem. The assumption was
that programming techniques that worked in C would also work in C++. Once the
class hierarchy was laid out, C++ simply became a "better C" in terms of the
tools it provided for implementing the class. 
With the C++-standard committee's adoption of Alexander Stepanov and Meng
Lee's Standard Template Library (STL), however, everything has changed. (For
background information on STL, see Al Stevens' "C Programming" column, DDJ,
March and April 1995.) STL addresses both questions by introducing the
"iterator"--a generalization of a standard C pointer that represents a
position in an abstract data structure in the same way that a pointer
represents a position in memory. There are several types of iterators in STL,
each of which defines a distinct subset of the operations we normally
associate with pointers. At the very least, all iterators support some sort of
incrementing and dereferencing operations. 
STL uses iterators to support generic programming in the following way: First,
all generic algorithms use iterators to access data structures, rather than
interacting with data structures directly. Secondly, all generic data
structures provide the most advanced type of iterator they can implement
efficiently. Thus, iterators serve as the interface to both generic algorithms
and generic data structures, allowing you to mix and match the two at will. In
Example 1(a), for instance, the STL find algorithm will return a pointer to
the first element of numbers containing the number 37. This same algorithm
will also work if you switch from a C array to an STL list, as in Example
1(b).


The Lexicon Example


To illustrate how the STL works, I'll present a filter program called
"Lexicon" that takes ASCII text and outputs an alphabetized list of all the
unique words in that text, ignoring case and punctuation. The program starts
by reading data and storing it in a structure you can work with, and ends by
printing the results. In the middle, it must convert each word to a standard
form in which both case and punctuation are ignored, remove duplicate words
from the data structure, and sort the words alphabetically. One of the
interesting things about this problem is that, at first glance, even the
ordering of these middle steps is unclear.
Reading the words into a data structure is easy. You start by using an STL
vector to store your data; see Example 2. The first line creates a vector of
strings named "words." The copy function (like find) is among the most basic
algorithms of the STL. It takes three iterators: a start, end, and
destination. If the start and end are the same, the algorithm does nothing. If
not, it begins by copying the data stored at the start iterator to the memory
location indicated by the destination iterator. Then both start and
destination are incremented by one. If the start iterator still does not equal
the end iterator, the process repeats.
This seems simple enough, but the actual code in Example 2 looks somewhat more
complicated because you're copying from a C++ istream, not from another vector
or sequence. The istream_iterator class allows you to treat an istream as a
sequence by providing an iterator into it. The constructor istream_iterator<
string, ptrdiff_t >(cin) builds an iterator pointing to the next string to be
read from cin. The default constructor istream_iterator< string, ptrdiff_t >()
builds an iterator effectively pointing to EOF. STL guarantees that the first
iterator will equal the second once you've read everything there is to read
from cin, so this is the standard idiom for reading everything from a string
into an STL container.
The third iterator is also a bit complicated. Normally, iterators operate in
overwrite mode. When you iterate through a sequence like a vector with a
standard iterator, you pass over its existing elements rather than creating
new elements, just as you'd expect. If you replaced this third iterator with
something more intuitive, say, words.begin(), the copy algorithm would try to
copy all the strings from cin on top of the existing elements of vector words.
But because words is empty, you would quickly run out of space. 
Note that inserter (Example 2) is a simple way of creating a new type of
iterator, called an "insertion iterator." The first argument specifies the
container you're using, and the second element describes where in the
container you want to start inserting. Unlike standard iterators, the
insertion iterator that is returned will insert new elements when incremented,
rather than passing over existing elements. This is what you want when
creating a new sequence based on an existing sequence of unknown size.
At this point, you need to decide what to do next. For starters, you can't
remove the duplicates before putting all the words in a standard form.
Furthermore, if you sort the words as they are, you'll just have to sort them
again later. Consequently, standardizing seems like the most logical next
step. Luckily, transforming all the elements of your vector into a
standardized form is straightforward. A simple standard form for your words
would be all lowercase with no punctuation. Suppose you have a function
standardize() that does just that; it takes a string argument and returns that
same string in all lowercase letters and with punctuation stripped out. All of
your strings can then be standardized with a single call to the transform
algorithm; see Example 3(a). This example traverses a sequence from
words.begin() to words.end(), calling your standardize function on each
element and placing the result of each call into the sequence starting at
words.begin(). Note that you are not limited to transforming a sequence in
place; you can use any forward iterator as the third argument. This usage is
common for in-place transformations, but you can use the full generality of
transform to combine your first two steps into one; see Example 3(b).
To wrap up the Lexicon filter example, you could remove duplicates first,
following the reasoning that it's better to sort a shorter sequence than a
longer one. However, Example 3(c) presents an alternative. The first line
simply sorts the vector alphabetically from start to finish, using the STL's
implementation of quicksort. The second line calls unique, an algorithm that
compacts a sorted sequence by removing adjacent duplicates, then returns an
iterator pointing just past the last element in the compacted sequence. The
order of these two operations is crucial: Because removing duplicates in an
unsorted sequence is an extremely expensive operation, unique only works if
the sequence has been sorted so that all duplicates are adjacent. Although
sorting a shorter sequence is faster than sorting a longer one, removing
duplicates from an unsorted sequence is so slow that it's impractical.
Finally, the third statement copies all the relevant elements to an output
iterator constructed based on cout. The second argument to this constructor
means that a newline will be added after each element written to this
iterator.
The only thing left is to define standardize. The exact implementation of such
a function depends on the string class you use. The STL-based class I use
(from Modena Software's STL++ package) lends itself to a fairly simple
definition. In Example 4, the first two lines remove all punctuation.
Specifically, remove_if will compact the string by removing all characters for
which the standard-library function ispunct returns True; the erase member
function actually trims the trailing characters from the string. The third
line of the function transforms the string in place by calling the standard
C-library function tolower on each character.


Another Approach


Listing One is the code for the Lexicon example. The program's run-time
performance is fairly good. Reading the data from standard input, transforming
it, and storing it in our vector is an O(n) operation. Quicksort completes in
O(nlogn) time on average. The unique algorithm and the final copy to cout are
both O(n) operations. In short, the algorithm makes three complete passes
through the data and does one efficient sort. 
Still, it seems there may be a better way. If you could somehow maintain the
internal data structure in alphabetical order throughout, you could simply
discard duplicates as they are read rather than storing, sorting, and using
unique to weed them out later. One way to do this is to use the STL set class
instead of a vector. Example 5(a) shows the set-required changes to the first
two lines of Example 2.
What difference does this small change make? The first line creates a set of
strings that is sorted with respect to the functor less< string >, which means
"sorted alphabetically." Sets are used in the STL primarily to facilitate the
fast lookup of their keys; for that reason, they are always stored in sorted
order. This also makes it efficient to check for duplicates when a new element
is inserted, which is important because STL sets are defined to have no
duplicate elements. Using sets instead of vectors, the transform algorithm can
be used exactly as before, but with profoundly different results--duplicates
are automatically discarded and the entire data structure is maintained in
sorted order. The program is therefore considerably simplified, since all
that's left to do is output the result; see Example 5(b). Likewise, the other
steps (sort and unique) have become redundant. (Listing Two is the complete
code for this version.)
The difference in speed between the approach illustrated in Example 2 and that
of Example 5 is modest. On my machine, the set implementation is consistently
about 13 percent faster than the original on large pieces of text. The
set-based algorithm is much simpler--it has only one O(n) output pass, plus
the original insertions--but inserting into a red-black tree is a much more
expensive operation than appending on the end of a vector. Each lookup into a
red-black tree requires O(logn) steps. If the new word is a duplicate, the
"insertion" stops there; if not, a new node must be created and the tree
adjusted so as to remain balanced within the strict rules of red-black trees.
The time required to construct the set of words is therefore difficult to
calculate because it depends on the degree of duplication with the word list,
and each insertion becomes progressively more costly as the size of the set
increases. (For more information on red-black trees, see "Red-Black Trees," by
Bruce Schneier, DDJ, April 1992.)


Why Use the STL?


What has the STL bought you in these examples? The easiest way to answer this
question is to look at all the things you didn't have to do:
Write your own dynamic data structures. Although an unbounded vector class is
not very difficult to write, it is nontrivial, and having to write one from
scratch would certainly have more than doubled the development time. A
red-black tree is much harder; I've seen programmers wrestle with the
balancing rules for a week and still not be 100 percent sure they are right.
Write your own sorting and searching routines. Many of us have written
quicksorts and tree lookups before. This time, someone's done that for us.
Pay much extra for changing data structures and algorithms midstream.
The first point could be made about almost any class library--most already
provide data structures such as vector and set that attend to memory
management. Likewise, the second point could also be made about most
libraries. 
It's the third point, however, that I want to emphasize. My understanding of
the problem evolved as I worked out the two solutions presented here. Although
the algorithms are functionally equivalent and look similar at the highest
level, the actual steps taken underneath are quite different. One version
takes several passes along a simple, linear structure; the other painstakingly
builds a complicated structure that, when fully constructed, essentially
solves the problem for you. Yet when switching from one to the other, you only
had to change one, and delete two, lines of code.
When I implemented the set version of the algorithm, I intuitively expected it
to be faster than the vector implementation. It wasn't as much of an
improvement as I thought, and the memory-usage calculations were discouraging
because of all the extra pointers the red-black tree needs to maintain. Yet
the price I paid for a little experimentation was small enough that it
encouraged me to look for a better way. In doing so, I learned something about
red-black trees and the diminishing returns of clever optimization when you
start with solid, efficient data structures and algorithms in the first place.
I also developed two programs with very different run-time characteristics,
each of which might be a good solution under certain circumstances. Without
the STL, the experience would have been different. For instance, once I had
written a dynamic vector class and quicksort algorithm from scratch, it's
unlikely I would have been willing to take the time to write a completely
different program based on balanced trees. In the end, my program would likely
have represented my first guess as to the best solution, because the cost of
change would have been too great to warrant any experimentation.
Generic programming tools such as STL let us take a more evolutionary and
experimental approach to programming. In this way, they can be a healthy
complement (or even an antidote) to standard object-oriented design and
programming techniques. Where classes encourage us to define interfaces in
advance, STL allows us to experiment with the implementation until we get it
right, gradually refining our understanding of problems and the trade-offs
inherent in their solutions.



For More Information


The original reference implementation of the STL developed by Stepanov and Lee
is available free-of-charge from HP Labs. You can download it from the
directory /stl of butler.hpl.hp.com. Unfortunately, most compilers can't yet
handle STL's sophisticated template operations (Borland C++ 4.5 is an
exception). Two commercial STL implementations are available that work on a
wider variety of platforms: STL++ from Modena Software (Los Gatos, CA) and
STL<ToolKit> from ObjectSpace (Dallas, TX). The latter is unique in that it
supports virtually all major UNIX C++ compilers, including those based on
cfront. Compiler vendors will probably start bundling the STL with their
products soon; it is already bundled with Metrowerks' CodeWarrior 6 for the
Macintosh. Likewise, Symantec has announced upcoming STL support for its
Macintosh C++ environments.
Information about STL is also available on the World Wide Web
(http://www.cs.rpi.edu/~musser/stl.html) from David Musser, one of the
original generic-programming researchers. The Modena string class used in this
article is also available at this site, along with code samples and
documentation.
Example 1: (a) STL find algorithm; (b) switching from a C array to an STL
list.
(a) int numbers[ 100 ]; .... int* i = find( numbers, numbers + 100, 37 );
(b) list< int > numbers; .... list< int >::iterator i = find( numbers.begin(),
numbers.end(), 37 );
Example 2: Using an STL vector to store data.
vector< string > words;
copy( istream_iterator< string, ptrdiff_t >( cin ),
 istream_iterator< string, ptrdiff_t >(),
 inserter( words, words.end() ) );
Example 3: (a) Standardizing all of the strings with a single call to the
transform algorithm; (b) using the full generality of transform to combine the
two steps into one; (c) sorting and removing duplicates.
(a) transform( words.begin(), words.end(), words.begin(),
 standardize );
(b) transform( istream_iterator< string, ptrdiff_t >( cin ), istream_iterator<
string, ptrdiff_t >(), inserter( words, words.end() ), standardize )
(c) sort( words.begin(), words.end() ); vector< string >::iterator i = unique(
words.begin(), words.end() ); copy( words.begin(), i, ostream_iterator< string
>( cout, "\n" ) );
Example 4: Using a string class to define standardize.
string standardize( string s ) {
 string:iterator i = remove_if( s.begin(), s.end(), ispunct );
 s.erase( i, s.end() );
 transform( s.begin(), s.end(), s.begin(), tolower );
 return s;
}
Example 5: (a) Required changes to Example 2 for discarding duplicates; (b)
outputting the result.
(a) set< string, less< string > > words; transform( istream_iterator< string,
ptrdiff_t >( cin ), istream_iterator< string, ptrdiff_t >(), inserter( words,
words.end() ), standardize );
(b) copy( words.begin(), words.end(), ostream_iterator< string >( cout, "\n" )
);

Listing One
#include <set.h>
#include <mstring.h>
#include <algo.h>
#include <ctype.h>
// Return a copy of the string in "standard" form (lowercase, no punctuation)
string standardize( string s ) {
 string::iterator i = remove_if( s.begin(), s.end(), ispunct );
 s.erase( i, s.end() );
 transform( s.begin(), s.end(), s.begin(), tolower );
 return s;
}
// Filter a text file into an alphabetzied list of unique words contained 
// in that file, ignoring case and punctuation.
int main( int argc, char** ) {
 if ( argc != 1 ) throw("usage: lexicon\n");
 
 set< string, less< string > > words;
 transform( istream_iterator< string, ptrdiff_t >( cin ), 
 istream_iterator< string, ptrdiff_t >(),
 inserter( words, words.end() ),
 standardize );
 copy( words.begin(), words.end(), ostream_iterator< string >( cout, "\n" ) );
 return( 0 );
}

Listing Two
#include <vector.h>
#include <mstring.h>

#include <algo.h>
#include <ctype.h>
// Return a copy of the string in "standard" form (lowercase, no punctuation)
string standardize( string s ) {
 string::iterator i = remove_if( s.begin(), s.end(), ispunct );
 s.erase( i, s.end() );
 transform( s.begin(), s.end(), s.begin(), tolower );
 return s;
}
// Filter a text file into an alphabetzied list of unique words contained 
// in that file, ignoring case and punctuation.
int main( int argc, char** ) {
 if ( argc != 1 ) throw("usage: lexicon2\n");
 vector< string > words;
 transform( istream_iterator< string, ptrdiff_t >( cin ), 
 istream_iterator< string, ptrdiff_t >(),
 inserter( words, words.end() ),
 standardize );
 sort( words.begin(), words.end() );
 vector< string >::iterator i = unique( words.begin(), words.end() );
 copy( words.begin(), i, ostream_iterator< string >( cout, "\n" ) );
 return( 0 );
}








































Standard C: An Update


Whither goest Standard C?




Rex Jaeschke


Rex is the chair of X3J11, the committee responsible for the ANSI C Standard.
Rex can be reached at rex@aussie.com.


ANSI and ISO rules require that standards be reviewed five years after their
adoption. Because the original ANSI C Standard was adopted in 1989 and
replaced by the ANSI/ISO C Standard in 1990, the standard must be reviewed
this year. There are three possible outcomes from such a review:
Withdraw the standard because it no longer is applicable.
Re-endorse the standard as is. That is, determine that it still meets the
needs of industry and requires no enhancements at this time.
Revise the standard, primarily to incorporate features deemed necessary or
useful. These features might reflect new demands in the industry, for example,
or they may be the result of experimentation through vendor extensions.
In 1994, at the request of national member bodies, P.J. Plauger, convener of
SC22/WG14 (the committee responsible for the ISO C Standard), obtained
permission to begin a review of ISO C prior to the required review. Serious
discussion regarding a review began at the Tokyo meeting in June 1994, where
the committee decided to revise the standard. The committee also decided not
to let C be constrained to a subset of C++. To paraphrase an unofficial C++
Standard principle, we want to be as compatible with C++ as possible, but no
more so.
Consequently, we re-endorsed the guiding principles used in defining the
original ANSI C Standard:
Existing code is important, existing implementations are not.
C code can be portable.
C code can be nonportable.
Avoid quiet changes.
A standard is a treaty between implementor and programmer.
Keep the spirit of C.
To these, we added some new principles:
Support international programming.
Codify existing practice to address evident deficiencies.
Minimize incompatibilities with C90.
Minimize incompatibilities with C++.
Maintain conceptual simplicity.
Although there has been no real opposition within the standard's committee,
more than a few have criticized our decision to revise the C Standard. The
most common complaint I hear is: "Why bother? It is obvious that C++ is the
future of C!" Our response is: "Regarding our relationship with C++, we are
content to let C++ be the big and ambitious language. While we may well
embrace some features of C++, it is not our intention that C become C++."
At a previous meeting, I volunteered to draft the charter for C9X (as the
revision is unofficially known). This charter was refined at the Tokyo
meeting, then again at the Plano, Texas, meeting in December 1994.
To help us focus on the job at hand and give the user community a good idea of
our timetable, we propose to spell out, in detail, what C9X will look like, by
December 1996, and to have a revised standard adopted within three years of
that date.
To help reduce the amount of pure invention, we adopted the principle "Codify
existing practice to address evident deficiencies." That is, we accept only
those concepts that have some prior art (not necessarily from C
implementations). Unless some proposed new feature addresses a deficiency felt
by more than a few C programmers, we will not entertain it. To achieve this
essential goal, some good (but inventive) proposals will probably have to be
rejected.
By the time you read this, we will have met in Copenhagen, Denmark (June
12-16, 1995) to debate technical proposals for C9X. At this stage, eight major
proposals are expected: 
Restricted pointers.
Variable-length arrays.
Designated initializers.
Compound literals. 
Floating-point extensions.
Complex arithmetic.
Extended integer support.
Some form of C++-like class support.
The first seven are the result of the Numerical C Extensions Group work I
started back in 1989. Eventually, this project became part of X3J11's charter.
This project has been completed and is in the process of being accepted by
ANSI as a Technical Report (TR). The "Data Parallel C Extensions" component of
the TR was not proposed for inclusion in C9X. We felt this field was rapidly
evolving and that a standard for these features was premature.


Restricted Pointers


The proposal concerning restricted pointers involves the addition of a type
qualifier called restrict. In some sense, restrict is complementary to
volatile; whereas volatile inhibits certain optimizations, restrict explicitly
allows some optimizations. 
A significant obstacle to optimization is aliases to objects, via the use of
pointers. Often, the information available in a function, or even within a
compilation unit, is insufficient to determine whether two pointers can point
to the same object.
By declaring a pointer restrict-qualified, we indicate that it points to a
unique object, as if that object were allocated by a call to malloc. Note that
the restrict qualifier can only be applied to a data pointer; it is a
constraint violation to apply it to a nonpointer object or to a function
pointer.
Late in the development process for the original ANSI C Standard, the keyword
noalias was added to the draft; three months later, it was withdrawn. The
definitions of noalias and restrict overlap, but restrict is a more
conservative approach that, for those vendors that have already implemented
it, provides a good deal of bang for their buck.



Variable-Length Arrays


The proposal for variable-length arrays allows the size of an array dimension
to be specified at run time, provided that array has automatic storage
duration or is a formal parameter; see Example 1. A variable-length array
cannot be a member of a structure or union. 
If an array is declared using a size of * (as in int i[*];), then it is a
variable-length array type of unspecified size, which can only be used in
declarations with function-prototype scope. If the size is a nonconstant
expression, that expression must have some integer type, and shall evaluate to
a value greater than 0 at run time. The expression may contain side effects.
The size of a variable-length array does not change until the execution of the
block containing its declaration has ended. Because the size of a
variable-length array cannot normally be determined until run time, sizeof
usually must determine its size at run time.


Designated Initializers


This designated-initializers proposal extends initializer syntax to allow two
new forms. It allows specific elements of an array or specific members of a
structure to be initialized without regard to their relative position within
the array or structure. For instance, Example 2(a) initializes elements 0, 1,
17, and 18 explicitly, with elements 2-16 and 19 taking on the value 0;
Example 2(b) allows new enumeration constants to be added at the front or
middle of the list, or the list order to be rearranged, without requiring the
array's initializer list to be changed. As Example 2(c) illustrates, structure
members can be initialized in a like manner.


Compound Literals


The proposed compound literals allow an unnamed object to be constructed at
run time using syntax like that of a cast, providing a capability similar to a
C++ constructor; see Example 3.
In case 1, you construct an array at run time from the expression list shown.
The const qualifier prohibits write access to that array.
In case 2, you construct two unnamed Point structures, which you then pass by
address to drawline.
In the case of array and structure compound literals, this proposal can take
advantage of designated initializers.


Floating-Point Extensions


The floating-point extensions proposal provides machinery to achieve more
predictable floating-point arithmetic in general, as well as a binding for
IEEE-based implementations. It achieves this via the addition of numerous
operator-like macros, several pragmas, some predefined macros, and an
extensive library of functions declared in several headers. While the library
portion proposes support for overloaded functions, this mechanism is not
intended for general use by programmers.


Complex Arithmetic


The proposal for complex arithmetic specifies a set of extensions to support
float, double, and long double complex types and arithmetic operations on
expressions having those types. For C++ compatibility, the type names are
float_complex, double_complex, and long_double_complex rather than float
complex, double complex, and long double complex. A family of imaginary types
is also proposed, with names float_imaginary, double_imaginary, and
long_double_imaginary. The imaginary constant i is available via the macro I,
defined in complex.h. The standard math library is extended to support complex
arguments. 


Extended Integer Support


The extended-integer-support proposal defines a header inttypes.h, containing
a family of macros and type synonyms that allow the programmer to find out the
available set of integer precisions and to define objects of a minimum
precision. Table 1 provides a sampling of the type synonyms.
A family of macros similar to those in limits.h defines the minima and maxima
for these types. If a type is not supported, its corresponding macros will be
undefined, allowing code to be conditionally compiled.
Another family of macros allows portable calls to printf and scanf, even
though different implementations may define different conversion specifiers.


C++-like Class Support


The initial discussion has focused primarily on encapsulation and the addition
of the keywords class, private, and public, with some discussion of single
inheritance and virtual functions. At the time of writing, the latest revision
of this proposal has not yet been finalized. However, previous versions
excluded constructors and destructors, which eliminated a number of technical
problems, as well as some benefits.


Miscellaneous Proposals


The following issues have been debated via our e-mail reflector and/or are on
the agenda for the Copenhagen meeting:
Allow #line lines within macro calls to help programs that mechanically
generate C source code. (The proposal was rejected.)
Improve random-number guidelines to enhance the quality of number sequences
generated by the rand function.
Add a Boolean type.
Add string classification and conversion functions. This proposes the addition
of equivalent string functions for the ctype character routines.
Add signed-integer division. Currently, if either operand is negative,
signed-integer division has implementation-defined semantics. This proposal
suggests we make them well defined, using the same rules as Fortran.
Add a big-integer library, a specification for an extensive set of functions
that provide support for at least 64-bit integers. This includes issues such
as representation and Endianness.
Add the predefined identifier __FUNC__ to allow access to the name of the
enclosing function for use in debugging statements and the assert macro.

Deprecate the keyword auto. This keyword is never needed, so why do we
continue to support it?
Allow empty arguments in macro replacement. Allowing macro calls such as
M(10,) or M(,20) is useful, particularly when used with the preprocessor
operators.
Add //-style comments. Mainstream C compilers provide these as an extension
and many C programmers think these comments are already part of Standard C.
Deprecate implicit int in declarations. Currently, if a type specifier is
omitted, int is assumed (for example, in static i;). Also, in an old-style
function definition such as void f(i){}, i has an implied type of int.
Move toward a single type for character. This is an attempt to somehow combine
char and wchar_t. The whole issue of bytes versus chars versus characters
needs to be reexamined, particularly in light of the expanding support for and
standardization of large character sets.
Add nested functions.
Extend the integer-type system. Define syntax to allow an open-ended scheme
for specifying the minimum or exact bit width desired for an integer object.
Add tag compatibility to address issues relating to the compatibility of
multiple structure, union, or enumeration types declared in separate
translation units.


Conclusion


It is too early to predict just what will be in C9X. However, the seven
proposals arising from the ANSI TR have been under development for three to
five years, and all have been implemented by one or more vendors. Therefore,
it seems reasonable that at least some of these will make the final cut, but
not necessarily in their current form. As to the adoption of something
substantial from C++, I think that the considerable sentiment for such
proposals will generate the most controversy and animated debate. No doubt
there will also be numerous wording improvements and minor additions, such as
the inclusion of //-style comments. In any event, it's a safe bet that the
resulting language will still look like the C we currently know and love.
Example 1: Variable-length arrays.
extern int n;
void f(int m, int x[m][n])
{
 char c[m];
 int (*p)[n];
}
Example 2: (a) Initializing elements; (b) adding new enumeration constants;
(c) initializing structure members.
(a)
int i[20] = {5, 10, [17] = 50, 60};

(b)
enum color { red, green, blue };
unsigned int total[] = {
 [red] = 100,
 [blue] = 200,
 [green] = 50
};

(c)
struct date {
 int year;
 int month;
 int day;
};
struct date birthday = {
 [month] = 1,
 [day] = 2,
 [year] = 1950
};
Example 3: Compound literals.
struct Point {
 int x;
 int y;
};
void drawline(struct Point *, struct Point *);
void f(int x2, int y2)
{
/*1*/ const int *p = (int []){x2, 5, y2, 4};
/*2*/ drawline(&(struct Point){0,0}, &(struct Point){x2,y2});
}
Table 1: Extended integer support.
Macro Description
int8_t 8-bit signed integer.
int16_t 16-bit signed integer.

int32_t 32-bit signed integer.
int64_t 64-bit signed integer.
intptr_t Signed integer type capable of holding a void.
uintfast_t Most-efficient unsigned integer type.
uint_least8_t Smallest unsigned integer having at least 8 bits.


























































A Pooling Memory Manager for C++


Plugging memory leaks and chasing null pointers




Kirit Saelensminde


Kirit works for Motion Graphics Limited in London. He can be contacted at
kgs@mgl.win-uk.net.


Raytracers make huge demands on memory. In the process of rendering a
96x128-pixel image, for instance, a ray-tracing program was allocating and
deallocating small blocks of memory (between 20 and 100 bytes) millions of
times, although the maximum memory needed at any one time was typically less
than 11 KB; see Figure 1. Most implementations of new and malloc are not
designed to cope with this many memory allocations and deallocations.
Applications such as raytracers can also have trouble detecting memory leaks.
If a leak is introduced, the program can munch its way through 10 MB of
memory, causing severe thrashing when the program starts to use virtual
memory. Although the MFC debugging memory allocator I was using checked for
memory leaks, it didn't work with the memory pooling that I had added.
Consequently, I had to implement my own memory-allocation facility. In this
article, I'll focus on the strategy and tools that I developed to handle
dynamic-memory allocation in the process of writing an object-oriented
raytracer.


A Memory Strategy


Since the memory-management problems the raytracer introduced are complex,
I'll describe the strategy I implemented in relatively simple terms of classes
that deal with "companies" and their "addresses."
When first designing classes, a declaration of the Company and Address classes
(see Example 1) makes it possible to store and retrieve addresses, as well as
reuse the Address class in other places. Clearly, Company will contain an
instance of each of the classes String and Address. This causes no problem
until you want to add international addresses, where each address may have a
slightly different format. In the U.S., you would store a zip code, while the
U.K. requires a post code. Each address has a different format and should
appear differently in dialog boxes and when printed. To do this, you could
subclass Address for both U.S. and U.K. addresses, as in Example 2.
Because of the way C++ handles polymorphism, you would also need to change
Company so that it uses a pointer to (rather than an instance of) the Address
class (Example 3). This is not a trivial change. There is no longer an
instance of Address in Company, but a reference--and this means you must now
change every single line of code that uses the Company or Address classes so
they will work properly!
You may argue that changing part of the object hierarchy is bound to create a
need to change other parts of the code. This need not be the case, however. It
is possible to design the class from the outset so that this sort of change
does not mean reworking thousands of lines of code. Example 4 gives an
alternative to the implementation in Example 1. In Example 4 you use object
pointers at all times. When you find that more flexibility is needed in the
address storage, then all you do is change the internal implementation of
address to accommodate it. All the client objects are already using pointers.
This does lead to another problem: When designing the classes, you must
consider what is going to happen to all these new instances you are creating.
The strategy I use comes from considering the differences in Example 3 and
Example 4 and remembering that the changes should remain transparent to all
client objects.
To achieve this transparency, constructors and member functions that set
values or command changes from a receiving object must pass pointers to new
object instances, for which the receiver then becomes responsible. The only
exception is when a pointer is declared as const. In this case, I assume that
the function that constructs the object should keep hold of it.
The other element of the memory-management strategy involves querying
messages. Whenever an object's state is questioned, a pointer to a new object
containing the answer is returned, which the querying function is then
responsible for deleting.
This approach strengthens the data hiding that you achieve through the use of
objects, because you make no assumptions about any subclassing of the Address
class. Neither do you assume anything about the storage of the address within
instances of the Company class. Overall, this weakens the coupling between
object classes, which allows reuse of objects in the long term.
Unfortunately, there is a caveat to this strategy involving the number of
memory allocations: Every time an object wants to find or change the address
of a customer instance, a new instance of Address must be created on the heap.


The Memory Manager


The files mempool.h and mempool.cpp (available electronically; see
"Availability," page 3) implement a pooling memory manager. To provide
diagnostic services during code debugging that can be disabled during a
release compile, the mempool.cpp program uses conditional compilation. All
this debugging help is turned on by the MEMDIAG symbol.
The memory-pooling strategy itself is straightforward. MemoryPool, an array of
MemHeader pointers, is created by the PoolOn function. Due to the MemLink
member's constant presence in the MemHeader, you can chain MemHeaders together
into a simple list. Figure 2 shows the different forms of the structure.
MemoryPool is just an array of these lists.
When a request for memory is generated, it is passed to ARAlloc (see the
version implemented in mempool.cpp), which checks that the memory pool is
turned on and that there is a memory block of the correct size in the memory
pool. If both these conditions are met, the head of the list is removed and
that memory block will be used to satisfy the request. If the pool is not on
or there are no spare blocks of the correct size, one is taken from the
operating system. Either way, the size of the memory block is written to the
size part of the MemLink union just before the memory block is handed out.
When a block is freed, it is passed to ARFree. It finds the MemHeader
associated with the memory block, through which it finds the size of the
memory block. If the memory pool has been turned off, then the block is simply
freed back to the operating system; if the pool is on, it is placed at the
head of the correct list in the MemoryPool array.


Memory-Pooling Diagnostics


All of the diagnostics for the memory pool are controlled through the MEMDIAG
symbol, which should be defined project wide whenever you do a debug compile.
Table 1 describes the #defines at the beginning of mempool.cpp that control
aspects of the manager that are important for both debug and release compiles.
Table 2 gives descriptions of the symbols used only for debug compiles.
Switching between options is usually just a matter of commenting out lines in
mempool.cpp.
The header file is arranged so that all the debugging constructs automatically
turn themselves off when MEMDIAG is not defined. This means that you can use
the OUTP and ASSERT macros in your code and only have them produce compiled
code when debugging. You can include support for the memory system by just
including the header file in your implementation files.
Figure 2 shows the difference in the structures created by this conditional
compilation. When the diagnostics are disabled, you only get a small overhead
(usually four bytes on a 32-bit system) for each memory allocation.
To see how the conditional compilation works, look at one of the most useful
diagnostics that the memory manager produces--file and line numbers for all
leaked memory blocks.
To output this information, you need to first pass it on to the memory
manager. Second, you must keep a separate record of all the blocks passed out
and tick them off when they are returned. This will slow down each free
because you must find the freed block among all the outstanding blocks of
memory. Because of this performance hit, you usually use this diagnostic as a
last resort.
In turning on this diagnostic, you don't want to have to recompile all the
code after encountering a memory leak. Consequently, special macros in the
header file map to two versions of the ARAlloc function. One ARAlloc is passed
a pointer to the name of the file and line number of the allocation, as well
as the size of the requested block. The other ARAlloc is only given the size.
When you are not using the full memory tracking (controlled by the MEMTRACKING
symbol), the three-argument version of ARAlloc calls the single-argument
version. When MEMTRACKING is defined, it retrieves the start of the MemHeader
block from in front of the returned memory block so that the filename and the
line number will be stored in the header.
The single-argument version of ARAlloc and ARFree handle the other part of the
work. When full-memory tracking is turned on, a second MemHeader list store
(MemoryAlloced) is set up. Before the address of the requested memory block is
returned, the MemHeader is placed on the head of the correct MemoryAlloced
list and default values are given to the filename and line number; this is why
you need two next pointers in Figure 2(a).
At this point, ARFree walks down this list until it finds the correct memory
block, which is removed from the list and the hole plugged. Also check for
blocks of memory allocated before the memory pooling was turned on, because
they may not be in the list.
The final part of the story is in PoolOff, which must check through MemAlloced
when the memory pool is finally turned off. It can now report the size, line
number, and filename where the memory allocation took place.


Conclusion



Although this pooling memory manager is reasonably complete, it can still be
improved upon. The use of the Windows GlobalAlloc is dangerous because of the
limited number of available handles (at least in the 16-bit version). Also,
there is no checking for failed memory allocations (the assumption is that a
memory exception would bypass our code altogether), which can cause problems
during stress testing. You can also add a lot more diagnostic and performance
information to the implementation.
For a more-detailed discussion of memory pooling with a look at performance
issues, I recommend Arthur Applegate's article "Rethinking Memory Management"
(DDJ, June 1994).
Figure 1: The memory report from a 96x128-pixel render in the raytracer. Note
the number of allocations handled (more than 2 million) and the amount of
memory taken from the system (less than 11 KB) compared with the reuse of that
memory (92.3 MB).
Pooling off 1. Alloced: 11222 (0.0Mb) Freed: 0 (0.0Mb)
Mem ARAllocs: 2283781 ARFrees: 2283781
bytes allocs hits frees left over
 14 4 47205 47209 0
 18 3 0 3 0
 22 75 71905 71980 0
 26 1 1 2 0
 30 51 73180 73231 0
 32 7 151042 151049 0
 38 34 1512071 1512105 0
 54 5 222752 222757 0
 82 4 140799 140803 0
 100 7 64537 64544 0
 134 38 60 98 0
Check total: alloc: 11222 (0.0Mb) hits: 96766114 (92.3Mb)
Figure 2: The layout of the memory created by the memory manager with (a) full
diagnostics; and (b) no diagnostics. The first part of the block is a union
called link, which contains the next and size elements. A pointer to the start
of memory block is returned in both cases.
Example 1: Initial design of the Company and Address classes.
class Address {
 ...
 String line1, line2, city;
 long zip_code;
 ...
};
class Company {
 public:
 // Constructor
 Company( String aname, Address
 anAddress );
 // Get/Set address
 Address GetAddress( void );
 void SetAddress( Address
 newAddress );
 protected:
 String name;
 Address address;
};
Example 2: Subclassing Address to handle international addresses.
class Address {
 ...
 String line1, line2, city;
 ...
};
class AddressUS : public Address {
 ...
 long zip_code;
 ...
};
class AddressUK : public Address {
 ...
 String post_code;
 ...
};
Example 3: Changing Company so that it uses a pointer to (rather than an
instance of) the Address class.
class Company {
 public:

 // Constructor/Destructor
 Company( String aname, Address
 *anAddress );
 ~Company( void );
 // Get/Set address
 Address *GetAddress( void );
 void SetAddress( Address
 *newAddress );
 protected:
 String name;
 Address *address;
};
Example 4: An alternative to Example 1.
class Address {
 ...
 String line1, line2, city;
 long zip_code;
 ...
};
class Company {
 public:
 // Constructor/Destructor
 Company( String aname, Address
 *anAddress );
 ~Company( void );
 // Get/Set address
 Address *GetAddress( void );
 void SetAddress( Address
 *newAddress );
 protected:
 String name;
 Address address;
};
Table 1: #defines used within mempool.cpp to control its execution in both
release and debug modes.
#define Description
MEMALLOC Defines how the memory is allocated whenever a
 new block is needed.
MEMNOFREENULL When defined, a NULL pointer is never freed or
 deleted. This will remove a
 check that is made on every free.
MEMLARGEALLOC Allows allocation requests for blocks of memory
 larger than g_memory_size to be
 passed through the memory manager.
MEMALLNEW When defined, new versions of global
 new and delete
 operators are defined that pass all memory
 allocations through the memory pool. This
 option is not compatible with MFC because it
 defines its own global new and
 delete operators.
Table 2: #defines used within mempool.cpp to control its diagnostic functions.
These are only activated when MEMDIAG is defined.
#define Description

MEMBOUNDS Adds an 8-byte block to the beginning and the
 end of the memory block. If any of the bytes
 within this checking area have been changed,
 then the entire block is discarded.
MEMFREENULL When not defined, reports every time a NULL
 pointer is freed.

MEMDOUBLEFREE Adds a check for each free to
 make sure that the memory block has not been
 freed before.
MEMMINE Adds a magic number in the memory header so that
 ARFree can check that the memory
 block was allocated by ARAlloc.
MEMTRACKING Turns on the full memory-tracking service. Stores
 line numbers and filenames when known. To get
 this information for all news,
 add #define new MEMTRACKNEW
 after you have included all header files and
 defined your classes.



















































Implementing Bit Vectors in C


Creating arrays of Boolean values is the key




James Blustein


James is a PhD student in computer science at the University of Western
Ontario. He can be contacted at jamie@uwo.ca.


Bit vectors provide an extremely space- and time-efficient means of
implementing arrays of Boolean values. In the most common case, bit vectors
can represent data in one-eighth the space that a straightforward integer
representation uses.
Many programmers are familiar with the use of single integers as collections
of bit fields to flag error conditions and the like; see The C Programming
Language, Second Edition, by Brian W. Kernighan and Dennis M. Ritchie
(Prentice-Hall, 1988). Bit vectors are an extension of that concept. Treating
an array of integers as though it were a single integer allows the array to
encode even more data. Bit vectors are well suited to representing finite sets
and are useful in hashing and signature applications. In Programming Pearls
(Addison-Wesley, 1989), Jon Bentley showed that the complexity of many large
software problems can be greatly reduced by using bit vectors.
In this article, I'll present a portable bit-vector implementation in C. I
developed the code for use in two programs: a Bloom-filter program for hashing
on document signatures and a statistical-analysis program in which a number of
three-dimensional matrices of data must be analyzed. I use two bit vectors to
record which of the matrices have been selected and which matrices' elements
to analyze. This makes it simple to copy the submatrices of selected data,
which I pass to general matrix-manipulation routines. The functions work well
as part of the menu system from which the user selects data.


The Data Structure


At the heart of the implementation are arrays composed of the type bit (see
Listing Three). The type must be an unsigned integer--unsigned char, unsigned
short, unsigned int, or unsigned long--for the code to be portable; otherwise
sign extension of the bitwise operators will produce different results on
different machines.
Each element of a bit vector is represented internally by BITS_SZ bits. The
definition of BITS_SZ is in Listing One. If the integer type used is unsigned
char and CHAR_BIT is defined in the limits.h header file, then BITS_SZ is a
preprocessor definition (technically, a macro with no arguments). Otherwise,
the bits_size() function is used to count the number of bits in the chosen
type. This slight complication is necessary for a portable implementation and
requires a negligible amount of run time. To ensure that BITS_SZ has been
properly initialized, call ba_init() once before using any of the other
functions. In Listing Three, I have defined the type to be unsigned char. On
most machines, this means that BITS_SZ will be 8.
The number of elements needed to hold n bits is the number of full elements
needed and possibly an extra element for the overflow. The macro NELEM()
(Listing Four) is used to compute the exact number in many of the routines in
Listing Two.
The bit vectors imitate an array composed of single Boolean values that cannot
be subscripted in the usual way. The functions ba_assign() and ba_value() in
Listing Two provide ways of setting and reading the values, respectively. 
You may find it helpful to think of a bit vector as representing a set
containing only elements that have the value 1. The universe from which the
set draws its elements has the same size as the bit vector. Any set operation
can be performed using a combination of the four set operations given here. If
the two bit vectors have the same number of elements, then ba_union() can be
simulated using ba_complement and ba_intersection (and vice versa). Figure 1
shows how to compute a type of set difference.
If you prefer, you can think of the bit vectors as an extension of C's bitwise
operators. C operators are defined to work with single integers. The routines
presented here perform the equivalent operations, but with a synthetic integer
composed of an arbitrary number of bits. Although the names of some of the
functions come from the set perspective, you can think of them in terms of AND
(ba_intersection()), Inclusive OR (ba_union()), XOR (ba_diff()), and NEGATION
(ba_complement()).
Listing Two illustrates an implementation of bit vectors using C; Table 1
describes the functions.


How It Works


The bit vector is really a 2-D array of integers. To access the nth element of
a bit vector, the code first locates the integer that contains that bit, and
then it indexes into that integer to the exact bit. The ba_assign() function
masks the bit to be set or cleared.
To be as general as possible, most of the routines require the calling routine
to pass in the number of elements in the bit-vector array. This way the
routines can be used with dynamically created bit vectors, such as those
ba_new() provides. 
If the size of the bit vector bv is known at compile time, then sizeof bv /
sizeof bv[0] will be a compile-time constant representing the number of
elements.
If the bit vector is dynamically allocated, then you can keep track of its
size by wrapping it in a struct like that in Example 1(a). Example 1(b) is the
struct I use for the menu-driven statistical program mentioned previously.
The set operations, ba_intersection(), ba_union(), and ba_diff() can be used
to combine bit vectors in the same ways that the &, , and ^ bitwise operators
are used to combine single integers (see Kernighan and Ritchie). The
ba_complement() function can be used to apply the one's complement (~)
operator to every element of a bit vector. Figure 1 shows how the set
operations can make a new bit vector with only those bits that are 1 in
exactly one of the bit vectors A and B. This value can be contrasted with the
effects of the set_diff() function. The values shown were created with a
program written using the functions in Listing Two.
It may be more efficient for many of the functions in Listing Two to be
written as macros, but I prefer functions because they provide increased
modularity.
Most of the functions treat bit vectors as 1-D arrays of integers. The naive
implementation would be to loop through every bit in every integer. These
routines loop through every integer and set all the bits at once--providing a
speedup of BITS_SZ times. 
Listing Six contains alternative (slower, yet perhaps more intuitive) methods
for computing the union and intersection functions. Contrast these with the
implementations in Listing Two. 
The ba_intersection() function is based on the observation that the
intersection of two bit vectors contains as many bits as the shorter bit
vector (only those elements that are set in both vectors appear in the
intersection). ba_intersection() does a bitwise AND of the corresponding
integers in both vectors up to and including the last one for the smallest of
the vectors.
A bit-by-bit method of computing the intersection appears in Listing Six. It
relies on the numeric values of Boolean expressions in C. The mask select is
made up of all 0s except for a single 1 to mask the selected bit. 
The union operation assumes that if a bit is set in either of the two bit
vectors, then it is also set in their union. The ba_union() function first
sets all the bits that appear in the longest of the two vectors. Then it loops
through every integer in the shorter one, assigning BITS_SZ bits at a time to
the union. The alternative form in Listing Six loops through every bit in the
shorter vector. 
The ba_count() function is written to be portable but fast. In the general
case, it loops through every bit in the vector. If the BITS_SZ is known before
the file is compiled, then a lookup table can be included so that every
integer can be inspected at once. The code in Listing Two includes a lookup
table for the case where BITS_SZ is 8, as it would be for most unsigned char
implementations. If BITS_SZ is not 8, then a bit-looping method counts every
bit in every integer.
The space cost incurred by the table is well worth the speedup it provides.
Obviously, if you choose to use a type with a different number of bits, this
table should be replaced with a table for the size of the type you choose. The
table was computed using the bit-looping method.
The only advantage to using ba_dotprod() rather than calling ba_assign() and
ba_value() in a nested loop is that ba_dotprod() does not incur as much
calling overhead because it reads and sets the array elements directly. 
If not every element of the array is used, there can be internal fragmentation
of as much as BITS_SZ-1 unused bits. The ba_count() function is optimized to
examine every element of the array. For this function to work as optimized,
all unused bits must be 0, which I call the "canonical form." Bit vectors
created using ba_new() and changed only using functions in the listings will
always be in this form. The ba_all_assign() and ba_complement() functions
change their first argument to force it into canonical or standard form. The
size parameter passed to these functions must be exact, or some bits may be
changed inappropriately.
It would be more efficient to coerce the bit vector into canonical form in
ba_count() instead of many of the other functions. However, I feel it makes
more sense to have a function that is meant only to count bits do just that.
It would be an unpleasant side effect if the count function changed its
argument and ba_complement(), for example, did not create the true complement.
The cost is very small, and I feel it is worthwhile.
In Figure 2, the array bv is a bit vector composed of NumInts integers used to
represent NumElem elements (bits). The code bv[NumInts-1] &=
(~0<<(BITS_SZ-(NumElem%BITS_SZ))); affects the value of the last element of
the array bv. As you can see, ~0 is an integer composed entirely of 1 bits,
and ~0<<(BITS_SZ-(NumElem%BITS_SZ))) is a value used to mask off the unused
bits at the tail end of the array. By shifting the 1s left as many times as
there are unused bits, the 1s move left and are replaced by 0s, which clear
the rightmost bits and produce the canonical form. 
In Figure 2, BITS_SZ is 8 and numbits is 19. Therefore, only three bits of the
eight in the last integer are used. Figure 2 shows how the mask is built and
what happens when it is applied.


Stylistic Decisions 



I chose to implement NELEM (Listing Four) and CANONIZE (Listing Five) as
macros rather than as static functions because they are used many times and
are very simple. The overhead of calling either of them as a function seemed
unjustified. The first_is_largest() function is more complicated, however. It
is declared static so that it cannot accidentally be called by functions in
any other files. Its name also serves to document its purpose. None of the
macros nor the static function can be called by functions outside of the file.
All the nonstatic functions' names are unique within six monocase characters
and begin with ba_ so they won't conflict with functions declared in other
files.
I chose to declare the type bool (Listing Three) as an enum rather than as a
preprocessor symbol or explicit integer type because many debuggers will
display symbolic names of enums.


Conclusion


Bit vectors provide a fast, space-efficient method of representing and
manipulating arrays of Boolean values. They are great for representing finite
sets and as part of other data structures. 
Figure 1: Set operations with bit vectors. (a) Union of two bit vectors; (b)
intersection of two bit vectors; (c) set difference of two bit vectors; (d)
using set operations to create a new set.
Figure 2: Converting 10111111 into canonical form.
Table 1: (a) Creation and initialization; (b) setting and clearing; (c)
conversion operations; (d) set operations; (e) miscellaneous function.
Function Descriptions
(a) ba_init() Determines word size of current
 platform. Makes code portable.
 ba_new() Dynamically allocates and
 initializes memory to hold a
 bit vector of given size.
 ba_copy() Copies a subset of one bit
 vector into another bit vector.
(b) ba_assign() Sets (or clears) an element of a
 given bit vector.
 ba_value() Returns the value of one element
 of a given bit vector.
 ba_toggle() Flips the state of an individual
 element in a given bit vector.
 ba_all assign() Efficiently sets (or
 clears) all elements of a
 bit vector at once.
(c) ba_b2str() Produces a string version of a
 subset of a given bit vector.
 Can be used to display bit
 vector, as ba_print()
 demonstrates.
 ba_ul2b() Sets a bit vector to a value
 that matches an unsigned long.
(d) ba_count() Returns number of 1 elements in
 a given bit vector. The number
 of 0 elements is the difference
 between the total number of
 elements and this value.
 ba_intersection() Efficiently creates the
 set intersection of two
 bit vectors. Figure 1(a)
 shows how to filter only
 certain elements using
 ba_intersection().
 ba_union() Efficiently creates the set
 union of two bit vectors.
 Figure 1(b) shows how to
 merge two bit vectors using
 ba_union().
 ba_diff() Creates a new bit vector with
 bits set only where both others do not.
 ba_complement() Flips all elements of a
 given bit vector.
 ba_dotprod() Computes the scalar product of
 two bit vectors.
(e) ba_print() Prints a string representation

 of a bit vector.
Example 1: (a) A struct used to keep track of the size of a dynamically
allocated bit vector; (b) a struct used in a statistical-analysis program.
(a) typedef struct { elem_t size; /* how many items in array */ bit * array;
/* bit vector recording which elements are selected */ } bitvector;
(b) typedef struct { elem_t size; /* how many items in selected */ bit *
selected; /* bit vector recording which elements are selected */ elem_t max;
/* maximum possible size */ char ** name; /* array of names of items */ char *
title; /* what data is represented by this struct? */ } chose_t;

Listing One
/* BITS_SZ -- BITS_SZ is the number of bits in a single bits' type. */
/* Definition of BITS_SZ */
#ifdef CHAR_BIT
 /* assumes typedef unsigned char bits */
 #define BITS_SZ (CHAR_BIT)
#else
 static elem_t bits_size(void);
 elem_t BITS_SZ = 0; /* until it is initialized by ba_init() */
 static elem_t bits_size(void) {
 /* Adapted from the wordlength() function on page 54 (Exercise
 2-8) of _The C Answer Book_ (2nd ed.) by Clovis L. Tondo
 and Scott E. Gimpel. Prentice-Hall, Inc., 1989. */
 elem_t i;
 bits v = (bits)~0;
 for (i=1; (v = v >> 1) > 0; i++) 
 ; /* EMPTY */
 return (i);
 }
#endif

Listing Two
#include <stdio.h>
#include <malloc.h>
#include "types.h" /* Listing 3 */
#include "bitarr.h" /* exported prototypes */
/* Listing 1 (definition of BITS_SZ) goes here */
/* Listing 4 (NELEM macro) goes here */
/* Listing 5 (CANONIZE macro) goes here */
typedef struct {elem_t size; bit *vector;} BitVector;
static void first_is_biggest(BitVector bv[2], unsigned *, unsigned *);
/* ------ Initialization and Creation Code ------ */
elem_t ba_init(void) 
{
/* ba_init()
 PRE: Must be called before use of any other ba_ functions. Should
 only be called once.
 POST: Returns the number of values that can be stored in one variable of
 type bit. If <limits.h> does not define CHAR_BIT then the module
 global variable BITS_SZ has been set to the appropriate value. 
*/
 #ifndef BITS_SZ
 if (!BITS_SZ) {
 BITS_SZ = bits_size();
 }
 #endif
 return (BITS_SZ);
} /* ba_init() */
bit *ba_new(const elem_t nelems) 
{
/* ba_new()
 PURPOSE: dynamically allocate space for an array of nelems bits
 and initalize the bits to all be zero.
 PRE: nelems is the number of Boolean values required in an array

 POST: either a pointer to an initialized (all zero) array of bit or 
 space was not available and NULL was returned
 NOTE: calloc() guarantees that the space has been initialized to 0.
 Used by: ba_ul2b(), ba_intersection() and ba_union().
*/
 size_t howmany = NELEM(nelems,(BITS_SZ));
 return ((bit *)calloc(howmany, sizeof(bit)));
} /* ba_new() */
void ba_copy(bit dst[], const bit src[], const elem_t size) 
{
/* ba_copy()
 PRE: dst has been initialized to hold size elements. src 
 is the array of bit to be copied to dst.
 POST: dst is identical to the first size bits of src. src is unchanged.
 Used by: ba_union()
*/
 elem_t nelem = NELEM(size,(BITS_SZ));
 register elem_t i;
 for (i=0; i < nelem; i++) {
 dst[i] = src[i];
 }
} /* ba_copy() */
/* ------- Assigning and Retrieving Values ------ */
void ba_assign( bit arr[], elem_t elem, const bool value) 
{
/* ba_assign()
 PURPOSE: set or clear the bit in position elem of the array arr
 PRE: arr[elem] is to be set (assigned to 1) if value is TRUE, 
 otherwise it is to be cleared (assigned to 0). 
 POST: PRE fulfilled. All other bits unchanged.
 SEE ALSO: ba_all_assign()
 Used by: ba_ul2b()
*/
 if (value) {
 arr[elem / BITS_SZ] = (1 << (elem % BITS_SZ));
 } else {
 arr[elem / BITS_SZ] &= ~(1 << (elem % BITS_SZ));
 }
} /* ba_assign() */
bool ba_value(const bit arr[], const elem_t elem) 
{
/* ba_value()
 PRE: arr must have at least elem elements
 POST: The value of the elemth element of arr has been returned
 (as though arr was just a 1-dimensional array of bit)
 Used by: ba_b2str() and ba_count()
*/
 return(arr[elem / BITS_SZ] & (1 << (elem % BITS_SZ)));
} /* ba_value() */
void ba_toggle( bit arr[], const elem_t elem) 
{
/* ba_toggle()
 PRE: arr must have at least elem elements
 POST: The value of the elemth element of arr has been flipped, 
 i.e. if it was 1 it is 0; if it was 0 it is 1.
 SEE ALSO: ba_complement()
*/
 arr[elem / BITS_SZ] ^= (1 << (elem % BITS_SZ));
} /* ba_toggle() */

void ba_all_assign( bit arr[], const elem_t size, const bool value) 
{
/* ba_all_assign()
 PRE: arr has been initialized to have *exactly* size elements.
 POST: All size elements of arr have been set to value. 
 The array is in canonical form, i.e. trailing elements are all 0.
 NOTE: The array allocated by ba_new() has all elements 0 and is 
 therefore in canonical form.
 SEE ALSO: ba_assign()
 Used by: ba_ul2b()
*/
 elem_t nelem = NELEM(size,(BITS_SZ));
 bit setval = (value) ?~0 :0;
 register elem_t i;
 for (i=0; i < nelem; i++) {
 arr[i] = setval;
 }
 /* force canonical form */
 CANONIZE(arr,nelem,size);
} /* ba_all_assign() */
/* ------- Conversion Routines ------- */
bit * ba_ul2b(unsigned long num, bit * arr, elem_t * size) 
{
/* ba_ul2b()
 PRE: Either arr points to space allocated to hold enough bits to
 represent num (namely the ceiling of the base 2 logarithm
 of num). size points to the number of bit to use.
 OR arr is NULL and the caller is requesting that enough
 space be allocated to hold the representation before the
 translation is made. size points to space allocated to
 hold the count of the number of bit needed for the
 conversion (enough for MAXLONG).
 POST: A pointer to a right-aligned array of bits representing the
 unsigned value num has been returned and size points to
 the number of bits needed to hold the value. 
 OR the request to allocate space for such an array could not be granted
 NOTES: - The first argument is unsigned. 
 - It is bad to pass a size that is too small to hold the
 bit array representation of num [K&R II, p.100].
 - Should the size be the maximum size (if size > 0) even
 if more bits are needed? The user can always use a filter
 composed of all 1s (see ba_all_assign()) intersected with
 result (see ba_intersection()).
*/
 register elem_t i;
 
 if (NULL != arr) {
 ba_all_assign(arr, *size, 0);
 } else {
 *size = NELEM(sizeof(num),sizeof(bit));
 *size *= BITS_SZ;
 if (NULL == (arr = ba_new(*size))) {
 return (arr);
 } 
 }
 /* usual base conversion algorithm */
 for (i=0; num; num >>= 1, i++) {
 ba_assign(arr, (*size - i - 1), (1 == (num & 01)));
 }

 return (arr);
} /* ba_ul2b() */
char * ba_b2str(const bit arr[], const elem_t size, char * dest)
{
/* ba_b2str()
 PRE: arr is a bit array with at least size elements. Either
 dest points to enough allocated space to hold size + 1
 characters or dest is NULL and such space is to be
 dynamically allocated.
 POST: Either dest points to a null-terminated string that
 contains a character representation of the first size
 elements of the bit array arr;
 OR dest is NULL and a request to dynamically allocate memory
 for a string to hold a character representation of arr was
 not be granted.
 Used by: ba_print()
*/
 register elem_t i;
 if ((NULL != dest) \
 (NULL != (dest = (char *)malloc(size + 1)))) {
 for (i=0; i < size; i++) {
 dest[i] = (ba_value(arr,i) ?'1' :'0');
 }
 dest[size] = '\0';
 }
 return (dest);
} /* ba_b2str() */
/* ------- Mathematical Applications -------- */
unsigned long ba_count(const bit arr[], const elem_t size) 
{
/* ba_count()
 PRE: arr is an allocated bit array with at least size elements
 POST: The number of 1 bits in the first size elements of arr
 have been returned.
 NOTE: if arr is not in canonical form, i.e. if some unused bits
 are 1, then an unexpected value may be returned.
*/
 register unsigned long count; 
 register elem_t i; 
 elem_t nelem = NELEM(size,(BITS_SZ));
 static const unsigned bitcount[256] = {0, 1, 1, 2, 1, 2, 2, 3, 1, \
 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, \
 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, \
 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, \
 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, \
 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, \
 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, \
 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, \
 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, \
 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, \
 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, \
 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, \
 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, \
 6, 6, 7, 6, 7, 7, 8};
 if (8 == BITS_SZ) {
 /* lookup table will speed this up a lot */
 for (count = 0L, i = 0; i < nelem; i++) {
 count += bitcount[arr[i]];
 } 

 } else {
 for (count = 0L; i < size; i++) {
 if (ba_value(arr, i)) {
 count++;
 }
 }
 }
 return (count);
} /* ba_count() */
bool ba_intersection( bit first[], bit second[], bit * result[],
 const elem_t size_first, const elem_t size_second) 
{
 /* ba_intersection()
 PRE: first is a bit array of at least size_first elements.
 second is a bit array of at least size_second elements.
 result points to enough space to hold the as many elements
 as the smallest of size_first and size_second;
 OR result points to NULL and such space is to be dynamically allocated.
 POST: TRUE has been returned and 
 result points to a bit array containing the intersection
 of the two arrays up to the smallest of the two sizes;
 OR FALSE has been returned and
 result pointed to NULL (a request was made to allocate 
 enough memory to store the intersection) but the required 
 memory could not be obtained.
 NOTE: This runs faster if the first array is not smaller than second.
 */
 register elem_t i;
 elem_t numints; 
 unsigned largest=0, smallest=1;
 BitVector bv[2];
 bv[largest].size = size_first;
 bv[largest].vector = first;
 bv[smallest].size = size_second;
 bv[smallest].vector = second;
 first_is_biggest(bv, &largest, &smallest);
 /* allocate space if *result is NULL */
 if ((NULL == *result) && \
 (NULL == (*result = ba_new(bv[largest].size)))) {
 return(FALSE); /* can't get memory, so can't continue */
 } else {
 numints = NELEM(size_second,(BITS_SZ));
 for (i=0; i < numints; i++) {
 (*result)[i] = (bv[smallest].vector[i] & \
 bv[largest].vector[i]);
 }
 /* bits beyond size_second should be zero -- canonical form */
 CANONIZE(*result, numints, size_second);
 return(TRUE);
 }
} /* ba_intersection() */
bool ba_union( bit first[], bit second[], bit * result[],
 const elem_t size_first, const elem_t size_second) 
{
 /* ba_union()
 PRE: first is a bit array of at least size_first elements.
 second is a bit array of at least size_second elements.
 result points to enough space to hold the as many elements
 as the largest of size_first and size_second;

 OR result points to NULL and such space is to be dynamically allocated.
 POST: TRUE has been returned and result points to a bit array containing 
 union of two arrays (up to the size of the largest of the two sizes);
 OR FALSE has been returned and result pointed to NULL (a request was 
 made to allocate enough memory to store the union) but the required 
 memory could not be obtained.
 NOTE: This runs faster if the first array is not smaller than second.
 */
 register elem_t i;
 elem_t numints; 
 unsigned largest=0, smallest=1;
 BitVector bv[2];
 bv[largest].size = size_first;
 bv[largest].vector = first;
 bv[smallest].size = size_second;
 bv[smallest].vector = second;
 first_is_biggest(bv, &largest, &smallest);
 if ((NULL == *result) && \
 (NULL == (*result = ba_new(bv[largest].size)))) {
 return(FALSE); 
 } else {
 ba_copy(*result, bv[largest].vector, bv[largest].size);
 numints = NELEM(bv[smallest].size,(BITS_SZ));
 for (i=0; i < numints; i++) {
 (*result)[i] = bv[smallest].vector[i];
 }
 CANONIZE(*result, numints, bv[largest].size);
 return(TRUE);
 }
} /* ba_union() */
bool ba_diff( bit first[], bit second[], bit * diff[],
 const elem_t size_first, const elem_t size_second) 
{
 /* ba_diff()
 PRE: first is a bit array of at least size_first elements.
 second is a bit array of at least size_second elements.
 diff points to enough space to hold the as many elements
 as the largest of size_first and size_second;
 OR diff points to NULL and such space is to be dynamically allocated. 
 POST: TRUE has been returned and 
 diff points to a bit array containing the union of the
 two arrays (up to the size of the largest of the two sizes);
 OR FALSE has been returned and result pointed to NULL (a request was 
 made to allocate enough memory to store the result) but the required 
 memory could not be obtained.
 NOTE: This runs faster if the first array is not smaller than second.
 */
 register elem_t i;
 elem_t numints; 
 unsigned largest=0, smallest=1;
 BitVector bv[2];
 bv[largest].size = size_first;
 bv[largest].vector = first;
 bv[smallest].size = size_second;
 bv[smallest].vector = second;
 first_is_biggest(bv, &largest, &smallest);
 if ((NULL == *diff) && \
 (NULL == (*diff = ba_new(bv[largest].size)))) {
 return(FALSE); 

 } else {
 ba_copy(*diff, bv[largest].vector, bv[largest].size);
 numints = NELEM(bv[smallest].size,(BITS_SZ));
 for (i=0; i < numints; i++) {
 (*diff)[i] ^= bv[smallest].vector[i];
 }
 CANONIZE(*diff, numints, bv[largest].size);
 return(TRUE);
 }
} /* ba_diff() */
void ba_complement( bit arr[], const elem_t size)
{
/* ba_complement()
 PRE: arr is a bit array composed of *exactly* size elements.
 POST: All bits in arr have been flipped and arr is in canonical form.
 SEE ALSO: ba_toggle()
*/
 elem_t nelem = NELEM(size,(BITS_SZ));
 register elem_t i;
 for (i=0; i < nelem; i++) {
 arr[i] = ~arr[i];
 }
 /* force canonical form */
 CANONIZE(arr, nelem, size); 
} /* ba_complement() */
unsigned long ba_dotprod(const bit first[], const bit second[],
 const elem_t size_first, const elem_t size_second) 
{
 /* ba_dotprod()
 PRE: first is an array of at least size_first bits. second
 is an array of at least size_second bits.
 POST: The scalar product of the two vectors represented by the
 first size_first elements of first and the first
 size_second elements of second have been returned.
*/
 register elem_t i, j;
 register unsigned long sum = 0L;
 for (i=0; i < size_first; i++) {
 for (j=0; j < size_second; j++) {
 sum += (first[i/BITS_SZ] & (1<<(i % BITS_SZ))) \
 && \
 (second[j/BITS_SZ] & (1<<(j % BITS_SZ)));
 }
 }
 return (sum);
} /* ba_dotprod() */
/* ------ Miscellaneous ------- */
static
void first_is_biggest(BitVector bv[2], unsigned * big, unsigned * small)
{
 if (bv[*big].size < bv[*small].size) {
 unsigned temp;
 temp = *big;
 *big = *small;
 *small = temp;
 }
} /* first_is_biggest() */
/* -------- Miscellaneous --------- */
bool ba_print(const bit arr[], const elem_t size, FILE * dest) 

{
 char * to_print = ba_b2str(arr, size, NULL);
 if (NULL != to_print) {
 bool status = (EOF != fputs(to_print, dest) );
 free(to_print);
 return (status);
 } else {
 return (FALSE);
 }
} /* ba_print() */

Listing Three
/* "types.h" */
#include <stddef.h>
 typedef enum bool {FALSE, TRUE} bool;
 typedef size_t elem_t;
 typedef unsigned char bit; 

Listing Four
/* macro NELEM(). The number of elements, nelem, in an array of N bits can be
 computed using the formula:
 if (0 == (N % BITS_SZ))
 nelem = N/BITS_SZ
 else
 nelem = N/BITS_SZ + 1
 This can be represented in any of these ways:
 nelem = N/(BITS_SZ) + 1 - (0 == (N %(BITS_SZ)))
 nelem = N/(BITS_SZ) + !(0 == (N %(BITS_SZ)))
 nelem = N/(BITS_SZ) + (0 != (N %(BITS_SZ)))
 The macro NELEM uses this last form. 
*/
#define NELEM(N,ELEMPER) (N / (ELEMPER) + (0 != (N % (ELEMPER))))

Listing Five
/* macro CANONIZE(). Array is an array of NumInts type bit representing 
 NumElem bits Forces Array into canonical form, i.e. all unused bits are 
 set to 0 */
#define CANONIZE(Array,NumInts,NumElem) \
 (Array)[NumInts - 1] &= (~0 << (BITS_SZ - (NumElem % BITS_SZ))); 

Listing Six
void ba_Xintersection(const bits first[], const bits second[], bits *
result[],
 elem_t size_first, elem_t size_second) 
{
 register elem_t i;
 register bits select;
 bits * largest = first;
 bits * smallest = second;
 for (i=0; i < size_smallest; i++) {
 select = (1 << (i % BITS_SZ));
 *result[i/BITS_SZ] = \
 (smallest[i/BITS_SZ] & select) && \
 (largest[i/BITS_SZ] & select);
 }
} /* ba_Xintersection() */
void ba_Xunion(const bits first[], const bits second[], bits * result[], 
 elem_t size_first, elem_t size_second) 
{
 register elem_t i;

 bits * largest = first, * smallest = second;
 /* any bits that are set in the longest vector are set in the union */
 /* all other bits are initially zero */
 ba_copy(*result, largest, size_first);
 /* bits that are set in the shortest vector are set in the union */
 for (i=0; i < size_smallest; i++) {
 *result[i/BITS_SZ] = \
 (smallest[i/BITS_SZ] & (1 << (i % BITS_SZ)));
 }
} /* ba_Xunion() */





















































Alpha Blending Graphic Images


Combining images for special effects




Tim Wittenburg


Tim, who is a team leader at AmeriData Consulting, has developed
flight-simulator technology for the U.S. Air Force and is author of
Photo-Based 3D Graphics in C++ (John Wiley & Sons, 1995).


Many memorable movie scenes have been created using a graphics technique known
as "blending." In Jurassic Park, for example, computer-generated dinosaurs
were blended into existing live-action footage. 
This article describes a powerful graphics technique known as "alpha blending"
(sometimes referred to as "image compositing"). Careful application of this
algorithm permits two or more images to be composited in such a way that
viewers can't detect that the resulting image is a composite. I'll present an
abridged version of the Porter-Duff alpha-blending algorithm (described by T.
Porter and T. Duff in their paper "Compositing Digital Images," in the
SIGGRAPH '84 Proceedings) and show several applications, including how
blending can be used to create realistic shadows.
The fundamental idea behind blending is that a third channel or image can be
used to drive a blending process, which combines the image of the object to be
blended (the "cutout image") and background images. The blending techniques I
present here combine the cutout and background images using the equation
Bij=Cij Aij+(1-Aij) Bij, where i and j are image column and row indexes,
respectively, and Aij is a factor (called "alpha") that has a value between 0
and 1 inclusive. Bij is a pixel in the output image and Cij is a pixel in the
cutout image. As Figure 1 illustrates, you implement blending by applying the
blending equation to three image objects: the cutout image, the corresponding
alpha image, and the output image. Each pixel (i,j) of the cutout image is
assumed to be "lined up" or colocated with pixel (i,j) in the cutout's alpha
image. Each pixel in the alpha image contains a number that can be interpreted
as an alpha factor. The alpha factor acts as a translucency indicator: A value
of 0 implies transparency and a value of 1 implies complete opaqueness.
From another perspective, the blending equation replaces each background-image
pixel with a weighted sum of itself and the corresponding cutout-image pixel.
The weights are provided by alpha-image pixel values. It then follows that if
each alpha factor in the alpha image is set to 1, the cutout image will
replace the background pixel over which it is superimposed. If the alphas are
all 0, then blending the cutout image into the background image will have no
effect since each pixel in the cutout image will have effectively been made
transparent. Even more interesting things start happening when you use alphas
between 0 and 1. 


Making an Alpha Image from a Cutout


An alpha image can be generated during the process of making a cutout image.
The overall process of creating an alpha image is as follows:
1. Identify the object of interest.
2. Create a polygon "mask" over the object of interest and remove the
background.
3. Remove unnecessary borders to create the cutout image and mask.
4. Create the alpha image by softening the edges of the mask image.
For instance, the bottom portion of Figure 2 shows a mask image generated from
an example tree-cutout image. The mask image was created by setting each pixel
in the alpha image that corresponds to a pixel inside the cutout (tree) area
to the value 255 (white). Conversely, all pixels which are located outside the
cutout area are assigned the value 0 (black). The mask image could be called a
"binary image" because it only has two values: 0 and 255. 
To convert the mask image into an alpha image, the edges of the white area in
the mask image are smoothed. In smoothing, the mask-image alpha factors will
be calculated in pixels along the edges of the tree that are greater than 0
and less than 1. When this mask image is then used to blend the cutout and
background images together, an effect will be produced whereby the outermost
edges of the tree are made translucent. The end result is that the normal
effects of aliasing are reduced because pixels along the edges of the cutout
tree image (where one form of aliasing occurs) are being combined with the
background. 


Edge Smoothing


To get the proper effect, you need to be very particular about performing edge
smoothing in an alpha image. The top part of Figure 2 shows three sideview
plots of the intensities in one horizontal line of the mask image shown in the
bottom half of Figure 2. The lower-intensity plot is labeled "Unsmoothed."
This plot shows the edge profile of the mask image. If you smooth the edges of
the mask image by applying a simple, sliding-window average (or "block")
filter, the result is the situation diagrammed in the middle-intensity plot
labeled "Conventional Block Filter" in the upper part of Figure 2. In this
case, the block filter will "smear" the mask image into areas for which there
are no corresponding cutout-image pixels. The parts of the image affected are
indicated by the small white triangular areas that lie outside the vertical
lines denoting the edges of the cutout image. Nonzero alphas in these
triangular regions will cause 0-valued pixels from the cutout image to be
mixed into the background image; the result is that the blended image exhibits
a dark halo. What you want instead is to smooth the mask image so that the
alpha factors corresponding to 0 pixels in the cutout image remain 0. In other
words, you only want to calculate alpha factors where the mask-image pixels
are greater than 0. This approach will produce the result labeled "Desired" in
the uppermost portion of Figure 2, where the alpha factors along the edges of
the tree are less than 1 and all alphas outside the edges of the tree are 0.
Alphas lying in the interior of the cutout remain 255.
Figure 3, a visual-effect scene created by compositing two maple-leaf models
based on the same image, illustrates the blending effect. Figure 3(a) is
blended, while Figure 3(b) is not. Figure 4(a) is a close-up of the blended
leaf border composited using alpha blending. Figure 4(b), on the other hand,
is a close-up of the same portion of the leaf border, but unblended. The image
in Figure 4(a) exhibits less contrast than its unblended counterpart. This
lack of contrast will cause the model's edges to remain unnoticed by your eye.
In this case, what you don't see makes the difference. A cutout image and its
corresponding alpha image can be used during scene generation to composite
models into the final scene. Blending the cutout into the final scene
minimizes the effects of aliasing.
This edge-smoothing algorithm is implemented as a two-pass process in Listing
One. In particular, the smoothX3NN function is applied to the three nearest
neighboring pixels in the image's horizontal lines (which are oriented along
the x axis of the world-coordinate system); the smoothY3NN function is applied
to the three nearest neighboring pixels in vertical columns in the image. To
smooth the mask image in both the x and y directions, a call must be made to
each of the smoothing functions. The order in which the calls are made will
not appreciably change the outcome. 
The smoothX3NN function makes a pass over the image, line by line. As the
function traverses the pixels in each line, it calculates the average of the
current pixel and one pixel on either side of it. These three pixels make up
an "averaging window." Function smoothX3NN writes an output pixel into the
output alpha image (in the location corresponding to the center of the
averaging window) only if the pixel in the center of the averaging window has
a corresponding nonzero cutout-image pixel. Larger averaging windows make for
greater smoothing effects. For example, a smoothX4NN and smoothY4NN could
easily be constructed and applied to mask images. Listing One contains the
edge-smoothing functions that convert a mask image into an alpha image.


Opaqueness, Shadows, and the Alpha-Scale Factor


Suppose you want to uniformly vary the opaqueness of all the alpha-scale
factors in an alpha image. Instead of creating a new alpha image containing
the scaled alpha factors, you can incorporate a scale factor into the blending
equation as follows. First let f equal the new alpha factor, which
incorporates the alpha-scale factor s, as in f=sAij/m, where m is 255, the
maximum 8-bit pixel value. Since Aij ranges from 0 to 255, the alpha-scale
factor Aij/m ranges from 0 to 1. The blending equation can then be rewritten
as Bij=f Cij+(1-f) Bij. The new equation enables the contribution of the alpha
image to the final, blended result to be scaled by varying s. By default, the
alpha-scale factor is set to a value of 1 in the scene file. Setting it to 0.5
would cause a cutout-image pixel with a corresponding alpha-image pixel value
of 255 (the maximum possible) to be combined with the output-image pixel in a
ratio of 1:1. In other words, the output pixel would be a 50 percent mixture
of both cutout-image pixel and background-image pixel. Figure 5 shows a
visual-effect scene in which the same model is blended into a background image
with four different alpha-scale factors. From left to right, the alpha scale
factors are: 0.8, 0.6, 0.4, and 0.2. Varying the alpha-scale factor during
generation of a sequence of images can result in morphing effects.
Suppose, however, that you wish to add a shadow to a model. Using a variation
of the alpha-scale factor idea, you can simply add to the scene file another
model that uses the same image file as the original model. To add shadows, you
first include another copy of each model in the scene file, renaming the model
to indicate that it is a shadow. Now you interactively rotate and position
each shadow model using the scene preview tool until they appear in the
desired perspective. 
Once the shadow models have been suitably positioned, how do you make the
shadows dark? Actually, you want to subtract an amount from the area of the
background image upon which the shadow is cast. The amount to be subtracted
can be determined from the alpha values and the alpha-scale value itself. You
can cause such a subtraction to occur by making the alpha scale negative. In
the case of a negative alpha-scale factor s, the blending equation is altered
to the form Bij=Bij+fCij. Since f will be negative if s is negative, a
subtraction is performed as desired. Obviously, a darker shadow requires a
greater negative alpha-scale factor. A good starting point is to make alpha
scale -0.2; this will subtract 20 percent of the alpha-image value from the
background pixels. A lower bound is placed on Bij so that it cannot have a
calculated value of less than 1. 
The two leftmost shadows shown in Figure 6 were created by blending the alpha
image itself into the background. The shadow of the spruce tree on the right
side of Figure 6 was created by using the spruce-tree image itself as a source
of values to subtract from the background image. The result is a more complex
and interesting looking shadow. Since the shadow of an object can be occluded
by the model from which it is derived, the shadow model is always rendered
first.


The Blend Function


Listing Two implements the blending equation and incorporates the
modifications for positive and negative alpha-scale factors. Note that the
blend function accommodated the possibility that the blended output pixels may
be offset by an amount specified in the function arguments xOffset and
yOffset. Listing Two contains the function blend.


Summary



When you understand how alpha blending works, you can use it to smooth edges
of cutout images, add shadows to existing models, alter the opaqueness of any
model, and other special effects. You can also combine alpha blending with
other graphics techniques, such as digital image warping ("morphing"); Figure
7 is an example of this. 
Figure 1: Implementing the blending equation. 
Figure 2: Generating a mask image from a cutout image.
Figure 3: (a) Blended model; (b) unblended model.
Figure 4: (a) Closeup of unblended model; (b) closeup of blended model
(composited using alpha blending).
Figure 5: Blending into a background image with different alpha-scale factors.

Figure 6: Creating shadows by blending the alpha image itself into the
background. 
Figure 7: Combining three-dimensional image warping with alpha blending for
special effects.

Listing One
void memImage::smoothX3NN(){
 int x, y;
 BYTE HUGE *myTemp = bytes;
 for (y = 1; y <= imageHeight; y++){
 for (x = 1; x < imageWidth; x++){
 if(x == 1 && *myTemp > 0){
 *myTemp = (*(myTemp)+ *(myTemp+1))*0.5;
 }
 else
 if(x == imageWidth && *myTemp > 0){
 *myTemp = (*(myTemp-1) + *(myTemp))*0.5;
 }
 else
 if(x > 1 && x < imageWidth && *myTemp > 0){
 *myTemp = (*(myTemp-1) + *(myTemp)+ *(myTemp+1))*0.33333;
 }
 myTemp++;
 }
 myTemp+=pads;
 }
}
void memImage::smoothY3NN(){
 int x, y, y1, y2, y3, result;
 for (x = 1; x <= imageWidth; x++){
 for (y = 1; y <= imageHeight; y++){
 if(y > 1) y1 = getMPixel(x, y - 1);
 y2 = getMPixel(x, y);
 if(y < imageHeight) y3 = getMPixel(x, y + 1);
 result = 0;
 if(y == 1 && y2 > 0)
 result = (y2 + y3) * 0.5;
 if(y > 1 && y < imageHeight && y2 > 0)
 result = (y1 + y2 + y3) * 0.33333;
 if(y == imageHeight && y2 > 0)
 result = (y1 + y2) * 0.5;
 setMPixel(x, y, (BYTE)result);
 }
 }
}

Listing Two
short blend(memImage *inImage, memImage *alphaImage, memImage *outImage,
 float alphaScale, short xOffset, short yOffset){
 //
 // Blend over the common area in input and mask images
 //
 short inputRows = inImage->getHeight();

 short inputCols = inImage->getWidth();
 short maskRows = maskImage->getHeight();
 short maskCols = maskImage->getWidth();
 short commonRows = min(inputRows, maskRows);
 short commonCols = min(inputCols, maskCols);
 //
 // each memImage is assumed to be opened for random access
 short x, y;
 BYTE maskPixel, inPixel, outPixel, addedPixel;
 float inWeight, outWeight;
 for(y = 1; y <= commonRows; y++){
 for(x = 1; x <= commonCols; x++){
 maskPixel = maskImage->getMPixel(x, y);
 if(maskPixel > 0){
 inPixel = inImage->getMPixel(x, y);
 outPixel = outImage->getMPixel(x + xOffset, y + yOffset);
 inWeight = (float)maskPixel / 255.0 * alphaScale;
 outWeight = 1.0 - inWeight;
 if(alphaScale > 0.0)
 addedPixel = (inWeight * (float)inPixel) + 
 (outWeight *(float)outPixel) + 0.5;
 else{
 addedPixel = (float)outPixel + (inWeight *(float)inPixel) + 0.5;
 // make certain shadows won't produce negative values
 if (addedPixel > outPixel) addedPixel = outPixel;
 }
 if (addedPixel < 1) addedPixel = 1; 
 if (alphaScale == 0.0) addedPixel = 0;
 outImage->setMPixel(x + xOffset, y + yOffset, addedPixel);
 }
 }
 }
 return 0;
}





























Java and Internet Programming


Similar to C and C++, but much simpler




Arthur van Hoff


Arthur is a staff engineer at Sun Microsystems and has been involved in the
development of the Java language for two years. He is the author of the Java
compiler and one of the architects of the HotJava WWW browser. Arthur can be
contacted at avh@eng.sun.com.


In the early days of the Internet, most machines ran some dialect of UNIX.
Consequently, most software written in languages such as C or ANSI C was
relatively portable, and most programs were distributed with source, so they
could be compiled for the user's preferred operating environment.
While the exponential growth of the Internet has opened up new and exciting
opportunities, its newly heterogeneous nature hinders the distribution of
software in binary format. Also, as more PCs, Macs, and other non-UNIX
machines connect to the Internet, porting is becoming more difficult. But
porting is not the only barrier to distributing software on the Internet. What
about security? Have you ever downloaded a binary from a public ftp site and
executed it on your machine? There are no guarantees that such a program won't
steal your password or delete a critical file. Even when the software comes
from a respected vendor, it can be modified as it is transported over the
Internet.
In 1990, a small team of engineers at Sun, headed by James Gosling, started
developing software for the consumer-electronics market. Initially the team
used C++, but the wide variety of hardware architectures used in consumer
electronics--coupled with the requirement for robustness--made this
problematic. The team therefore developed a new language called "Java," which,
it turns out, also addresses many of the issues of software distribution on
the Internet.
Java is a simple, object-oriented, multithreaded, garbage-collected, secure,
robust, architecture-neutral, portable, high-performance, dynamic language.
The language is similar to C and C++ but much simpler. Java programs are
compiled into a binary format that can be executed on many platforms without
recompilation. The language contains mechanisms to verify and execute binary
Java programs in a controlled environment, protecting your computer from
potential viruses and security violations.
To demonstrate the capabilities of Java, Sun has developed a WWW browser
called "HotJava," written entirely in Java itself, which has the unique
capability to execute interactive content embedded in HTML pages. These
interactive programs are automatically downloaded with the HTML document and
enable the extension of the browser in a natural way. Existing examples of
embedded applications include animations, simulations, teaching tools,
spreadsheets, and the like. Both HotJava and Java are available at
http://java.sun.com/ (or by sending mail to java@java.sun.com).
Java and HotJava are free for noncommercial use. The source for the Java
interpreter, compiler, and HotJava browser are also freely available, but
there are some restrictions on incorporating Java into commercial products.
Check http://java.sun.com/ for licensing terms.
HotJava is an example of extensible Internet software. The browser downloads
code from the Internet, allowing the browser to extend its functionality
gradually. This is similar to Visual Basic's VBX extensions, except that Java
is a real programming language--it is not tied to a particular processor, and
it is secure.
The alpha version of Java/HotJava for Solaris was released this spring. Ports
for Solaris x86, SunOS 4.x, Windows 95, Window NT, and Macintosh System 7.5
are underway at Sun. Since the release was announced, we have received
inquiries about porting Java to Windows 3.1, Linux, AIX, SCO, SGI, HP/UX, DEC
Alpha, NextStep, NetBSD, UnixWare, OS/2, Plan 9, OS9, Acorn, Taligent, Amiga,
Sega, and vt100s.


Java-Language Overview


The Java syntax and semantics for expressions and statements are almost
identical to those of ANSI C. This makes the language easy to learn for
someone familiar with C or C++. Listing One is a simple hash-table
implementation in Java.
Java's types and semantics are very well defined. Because Java code can run on
many platforms, it is important that small differences not be introduced by
the underlying hardware. Unlike C and C++, the precision of numeric types is
always the same (see Table 1), the order of evaluation of function arguments
is always left to right, and the rules for selecting overloaded methods are
clear. All of these are poorly defined in C and C++.
We've introduced features that make the language robust. The language supports
C++-style exceptions that can be thrown and caught by Java programs. An
exception is also generated when dereferencing a null pointer, accessing
outside the bounds of an array, or when running out of memory. Java is garbage
collected, which means that you never have to explicitly free heap-allocated
objects. All this makes it very easy to write robust code that can recover
safely from run-time errors.
Java is object oriented. Except for numbers, everything is an object. The
language is statically typed, but every object has a type that can be examined
at run time. You can cast from one type to another, but an exception is thrown
if the run-time types are not compatible.
Java supports a single-inheritance class hierarchy. However, a class can
implement multiple interfaces in a way similar to IDL. All Java methods are
virtual unless declared static, giving Java a much simpler method-invocation
model than C++. Java supports method overloading, but unlike C++, the rules
for method matching are much better defined. All this allows for an efficient
implementation of method invocations, with some of the flexibility of multiple
inheritance.
Java can be used for writing large programs. Java classes are arranged in a
modular way using packages. There are no header files! A class is compiled
into a binary representation used in further compilations. This means that
there are never any inconsistencies between the declared class and the actual
implementation. The access to type information, methods, and instance
variables is controlled using the public, protected, and private keywords
familiar from C++.
It is, of course, possible to interface Java to other languages. This is
important when using existing libraries such as Xlib, Motif, or legacy
database software. To facilitate this, methods can be declared native and
implemented in C or some other native language. The Java run time can be
extended to include new native-method implementations by dynamically linking
in software libraries. Tools are provided to generate stub functions that
enable calls from Java to C and back. The interface between Java and C is
natural because the languages are so similar.


Multithreading


Java is fully multithreaded. Preemptive multithreading usually introduces a
whole range of gnarly problems, but Java has features that make multithreaded
programming straightforward.
For instance, synchronization is a primitive feature of Java. A method
declared to be synchronized automatically locks the object on which it is
invoked. When the method returns, the lock is automatically released. This
happens even when the method raises an exception or the thread running the
method is killed. This feature alleviates explicit locking, a source of many
programming errors. The locks are always reentrant, so a thread can grab the
same lock several times without causing deadlock.
The Hashtable class in Listing One uses synchronized methods to serialize
access to the hash table. Only one thread at a time can manipulate the
internal state. If two threads use the same Hashtable instance, they will not
corrupt the internal state of the Hashtable because all access is
automatically serialized by the synchronized methods.


Compilation and Performance


The Java compiler is written in Java itself. It compiles Java methods into
bytecodes, which are stored in a class file. The class file also contains all
the type information for a class. Byte codes are essentially very simple
instructions for a virtual machine. As the name implies, most bytecodes are
only one byte long, so bytecode programs are usually a lot smaller than their
native equivalent.
There are several advantages to compiling Java into bytecodes. Bytecodes are
portable because they do not require a particular processor architecture or
other hardware support. They are byte-order independent so that they can be
executed on both Big- and Little-endian machines. And because the bytecodes
are so simple, it is possible to write a very efficient interpreter for Java
bytecodes.
Unlike ordinary machine instructions, Java bytecodes are typed. Each bytecode
specifies the exact type of its operands, which makes it possible to apply a
simple data-flow algorithm to verify that the bytecodes obey the language
constraints.
The Java virtual machine has both a small stack to store temporary expression
values and a set of registers for local variables. There are bytecodes for
creating new class instances, accessing arrays, invoking methods, handling
exceptions, and the like. Listing Two shows a typical method; Listing Three
shows what this method looks like when compiled to bytecodes.
The Java interpreter on SPARC is about 15 times slower than compiled C or C++.
This is acceptable for most applications, especially if you consider all the
benefits of array-bounds checking, null-pointer checking, garbage collection,
and so on. However, this is obviously not fast enough to satisfy the real
speed freaks, who want to write JPEG decompressors and 3-D rendering
algorithms in Java. To squeeze the last bit of performance out of your
machine, it is possible to convert the machine-independent bytecodes to
machine instructions. This makes Java almost as fast as C or C++ (there is
still some extra overhead for unavoidable run-time checks). Listing Four shows
the SPARC instructions generated by the example in Listing Three.
The bytecodes can be converted to machine instructions either in advance or
when the bytecodes are loaded. The translation is straightforward, but to make
full use of the processor's capabilities it may be necessary to do some
data-flow and register allocation. It is usually not necessary to convert all
bytecodes in your system to machine instructions. Machine instructions are
bulky, so the best performance is usually achieved by compiling only a few
performance-critical classes.


Garbage Collection



Garbage collection eliminates the need to free heap-allocated objects
explicitly. Memory used by objects that are no longer referenced is
automatically reclaimed. This significantly reduces the burden on the
programmer. Memory leaks are often hard to find and can affect the performance
of applications significantly. Java uses a simple, conservative,
mark-and-sweep algorithm for garbage collection.
All objects are internally referenced through a handle (a double indirection).
This allows the garbage collector to move objects in the heap to avoid
fragmentation. An exception is made for objects pointed to directly. In that
case, the object is not moved because a native method, machine code, or the
interpreter is accessing it directly.
Garbage collection of a 4-MB heap takes an average of 100 milliseconds on a
SPARCstation 10. To reduce its impact, garbage collection is done not only
when the system runs out of memory, but also periodically when the system is
idle. Since most garbage collection is done when the system is idle,
interactive applications, or applications that use I/O are rarely impacted.
Unless you are doing real-time computation, it is unlikely that you will ever
notice the garbage collector. The language is designed to support incremental,
multithreaded, and generational garbage collectors, as well.
Java also supports finalization. This is useful when a Java object is
associated with a resource not controlled by the garbage collector, such as a
file descriptor or a native UI component. Before the object is garbage
collected, the finalizer method of the object is called, giving the object a
chance to release its external resources. Finalization is used in the
implementation of the FileInputStream class in Listing Five. The file
descriptor associated with the stream must be closed before the
FileInputStream instance is garbage collected.


Security


Today it is common practice to download binaries from the Internet and execute
them on your machine. However, no tools are available to verify that the code
is well behaved and free of viruses. Java is designed to allow compiled code
to be downloaded without introducing a security hazard. As described earlier,
bytecodes contain additional type information that make them verifiable. It is
possible to check whether the Java language constraints have been broken. For
example, it is illegal to treat an integer as a pointer or to load something
from the stack when the stack is empty. We have placed additional restrictions
on bytecodes to make sure that the verification is somewhat easier than
solving the halting problem.
Once a piece of downloaded code has been verified, it is safe to assume that
it does not break any of the language constraints. Thus, if a variable is
declared private, it really is private. In C++ the private and protected
modifiers are merely hints to the programmer and are easily defeated (just put
#define private public before the first #include). Not so in Java, where
private methods are really private. Bytecode verification makes sure that this
rule is observed.
Once it is possible to prove that downloaded bytecodes do not violate the
language constraints, it becomes possible to add the next layer of security.
At this level you can control what the downloaded code can and cannot do. The
checks for this can be written in Java itself because it is impossible for the
downloaded code to bypass the checks.
In the future, Java will also provide features for signing code using
public-key encryption. This will allow the secure exchange of Java code with
trusted partners over the Internet.


Conclusion


I have been coding in Java for nearly two years. I can't remember the last
time I was forced to use C++, but I vaguely remember how painful it was. Java
is one of the coolest languages to come along in a long time. 
Table 1: Simple Java types and their representations.
Type Meaning
boolean 1 bit
byte 8 bits
char 16 bits (unsigned)
short 16 bits
int 32 bits
long 64 bits
float 32 bits IEEE-754
double 64 bits IEEE-754

Listing One
package containers;
/* Hashtable collision list. */
class HashtableEntry {
 int hash;
 Object key;
 Object value;
 HashtableEntry next;
}
/* Hashtable class. Maps keys to values. Any object can 
 * be used as a key and/or value. 
 * To sucessfully store and retrieve objects from a hash 
 * table the object used as the key must implement the hashCode() 
 * and equals() methods.
 * This example creates a hashtable of numbers. It uses the names of
 * the numbers as keys:
 * 
 * Hashtable numbers = new Hashtable();
 * numbers.put("one", new Integer(1));
 * numbers.put("two", new Integer(1));
 * numbers.put("three", new Integer(1));
 * 
 * To retrieve a number use:
 * Integer n = (Integer)numbers.get("two");
 * if (n != null) {
 * System.out.println("two = " + n);
 * }
 */

public final class Hashtable {
 private HashtableEntry table[];
 private int count;
 private int threshold;
 private final float loadFactor = 0.75;
 /* Construct a new, empty hashtable. */
 public Hashtable() {
 table = new HashtableEntry[101];
 threshold = (int)(table.length * loadFactor);
 }
 /* Return the size of the hashtable */
 public int size() {
 return count;
 }
 /* Gets the object associated with a key in the hashtable. */
 public synchronized Object get(Object key) {
 HashtableEntry tab[] = table;
 int hash = key.hashCode();
 int index = (hash & 0x7FFFFFFF) % tab.length;
 for (HashtableEntry e = tab[index] ; e != null ; e = e.next) {
 if ((e.hash == hash) && e.key.equals(key)) {
 return e.value;
 }
 }
 return null;
 }
 /* Rehashes the content of the table into a bigger table. */
 private void rehash() {
 int oldCapacity = table.length;
 HashtableEntry oldTable[] = table;
 
 int newCapacity = oldCapacity * 2 + 1;
 HashtableEntry newTable[] = new HashtableEntry[newCapacity];
 threshold = (int)(newCapacity * loadFactor);
 table = newTable;
 for (int i = oldCapacity ; i-- > 0 ;) {
 for (HashtableEntry old = oldTable[i] ; old != null ; ) {
 HashtableEntry e = old;
 old = old.next;
 int index = (e.hash & 0x7FFFFFFF) % newCapacity;
 e.next = newTable[index];
 newTable[index] = e;
 }
 }
 }
 /* Puts the specified element into the hashtable, using the specified
 * key. The element may be retrieved by doing a get() with the same key.
 */
 public synchronized Object put(Object key, Object value) {
 // Make sure the value is not null
 if (value == null) {
 throw new NullPointerException();
 }
 // Makes sure the key is not already in the hashtable.
 HashtableEntry tab[] = table;
 int hash = key.hashCode();
 int index = (hash & 0x7FFFFFFF) % tab.length;
 for (HashtableEntry e = tab[index] ; e != null ; e = e.next) {
 if ((e.hash == hash) && e.key.equals(key)) {

 Object old = e.value;
 e.value = value;
 return old;
 }
 }
 if (count >= threshold) {
 // Rehash the table if the threshold is exceeded
 rehash();
 return put(key, value);
 } 
 // Creates the new entry.
 HashtableEntry e = new HashtableEntry();
 e.hash = hash;
 e.key = key;
 e.value = value;
 e.next = tab[index];
 tab[index] = e;
 count++;
 return null;
 }
}

Listing Two
int sum(int array[]) { 
 int sum = 0;
 for (int i = array.length ; --i >= 0 ; ) {
 sum += array[i];
 }
 return sum;
}

Listing Three
Method int sum(int [])
 0 iconst 0 ;; sum = 0
 1 istore R1
 2 aload R0 ;; i = array.length;
 3 arraylength
 4 istore R2
 5 goto 14
 8 iload R1 ;; sum += array[i]
 9 aload R0
 10 iload R2
 11 iaload
 12 iadd
 13 istore R1
 14 iinc R2 -1 ;; if (--i >= 0) then goto 8
 17 iload R2
 18 ifge 8
 21 iload R1 ;; return sum
 22 ireturn

Listing Four
Method int sum(int [])
 save %sp, -96, %sp 
 tst %i0 ;; is the array null? 
 mov 0, %l0 ;; set the sum to 0 
 teq %0, 16 ;; perform a hardware trap on null 
 ld [%i0 + 4], %g1 ;; get the array length field into %l1
 ba test 

 srl %g1, 5, %l1 
loop: 
 ld [%i0 + 4], %o0 ;; get the array length into %g4 
 srl %o0, 5, %g4 
 sll %l1, 2, %g3 ;; convert i into an offset 
 cmp %g4, %l1 ;; is the index out of range 
 tltu %0, 17 ;; perform a hardware trap if it is 
 ld [%g2 + %g3],%g1 ;; get the value 
 add %l0, %g1, %l0 ;; add it into the sum 
test: 
 addcc %l1, -1, %l1 ;; add -1 to the index 
 bge loop ;; loop if necessary 
 nop 
 jmpl %i7 + 8 ;; return the value now in %l0
 restore %g0, %l0, %o0

Listing Five
class FileInputStream extends InputStream {
 private int fd = -1;
 /** Open a file for reading. */
 private native int open(String name);
 /** Create an input file given a file name. */
 public FileInputStream(String name) {
 fd = open(name);
 }
 /** Read a byte. */
 public native int read();
 /** Close the input stream. */
 public native void close();
 /** Close the stream when the stream is finalized. */
 protected void finalize() {
 close();
 }
}





























JPEG-Like Image Compression, Part 2


CAL is a C++ class library that provides fast, efficient compression




Craig A. Lindley


Craig is a founder of Enhanced Data Technology and author of Practical Image
Processing in C, Practical Ray Tracing in C, and Photographic Imaging
Techniques for Windows (all published by John Wiley & Sons). Craig can be
contacted at edt@rmii.com. EDT also maintains a home page on the WWW at
www.mirical.com.


In last month's installment of this two-part article, I described the basic
techniques, algorithms, and vocabulary of JPEG image compression. In
particular, I examined the concepts of discrete-cosine transforms,
frequency-coefficient quantization, Huffman encoding, color-space conversion,
and image subsampling. I also presented the various considerations which
influenced the implementation of an image-compression technique called "CAL"
(my initials), which implements the same algorithms as JPEG, but encapsulates
the images in a simple, proprietary file format. This month, I'll focus on CAL
and how it differs from JPEG, then present the C++ classes on which CAL is
built.


JPEG versus CAL


CAL differs from JPEG in that it provides only a partial subset of the overall
functionality described in the JPEG specification. In particular, CAL:
Supports only the baseline-sequential mode of operation.
Supports only Huffman encoding for entropy coding.
Uses the default, static Huffman tables from the JPEG specification for
encoding/decoding of images.
Does not carry any Huffman code tables or symbol tables within the encoded CAL
image when it is stored in a file.
Uses a simple file format.
Supports only 8-bit gray-scale and 24-bit RGB true-color images.
CAL's image compression is greater than that of lossless techniques (100:1, in
some cases). CAL is also fast, small, easy to use, and royalty free. Unlike
standard JPEG images, however, CAL images can't be shared or exchanged with
non-CAL-aware applications. Still, you could use CAL image compression as an
internal format for an application program that requires JPEG-like
image-compression levels, and you could exchange images between two CAL-aware
applications. CAL could be especially useful for transmission of images via
modem. Because of CAL's small size, the receiving program that incorporates
CAL might even be sent over the modem.


The CAL Image File Format


JPEG and CAL differ greatly in the file format in which they store compressed
images. JPEG images are stored in JFIF or TIFF formats, whereas CAL images are
stored in a simple file format.
The CAL file format is simple because the format of compressed images is
fixed. In contrast, the JPEG specification allows for many different image
formats. The specification supports a varying number of image components, a
variety of color spaces, and various subsampling factors, in addition to
custom Huffman tables. To enable this kind of flexibility, the JPEG file
formats, by necessity, must be more complex.
A CAL image file is made up of a file header, followed by the compressed image
data. Example 1 defines a CAL file header (see also Listing One). The file
header for a CAL file is built by the CAL compression software. Part of the
information destined for the file header is passed into the CompressCALImage
class constructor as parameters, while other portions are calculated within
the constructor before being stored into the header. In either case, the
compression software fills in each of the header entries as the image is being
compressed. CAL expansion software in the ExpandCALImage class is driven
completely from the information contained in the CAL file header. This allows
the expansion software to be simple.
The image data following the header is stored in sequential, Huffman-encoded,
minimum coded unit (MCU) blocks. Each MCU of the image is compressed
separately and padded to a byte boundary with 1 bits. Four blocks of image
data make up an MCU for a gray-scale image, whereas six blocks are used for
true-color images (four Y, one Cb, and one Cr). The number of blocks per MCU
is contained in the file header.


The CAL Code


Table 1 lists the files which comprise the CAL code. All of the CAL files
listed (along with sample images and programs) are available electronically;
see "Availability," page 3.
Although the CAL code was developed using Borland C++ 3.1 for 16-bit Windows,
it should easily port to other environments. Most of the code is standard C++
with a few Windows-specific exceptions. Areas that need attention during
porting are the file I/O code in fileio.cpp, which uses the Windows calls for
I/O, and the functions MyAlloc and MyFree (see Listing Two), which use the
Windows functions GlobalAlloc, GlobalLock, and GlobalFree. In addition, all
references to huge pointers will need attention in non-MS-DOS environments.
Finally, since true-color RGB pixel data is stored as B, G, and R in Windows'
DIB format, care must be taken in environments that store RGB data rationally.
Porting the code to the Win32 environment will probably result in a
significant speed increase. 
As presented, the example Windows application (used only to exercise the CAL
code) in the file caltst.cpp (available electronically) is nonoperational; it
depends on a graphics file library that can't be distributed with the CAL code
in this article. (See my book Photographic Imaging Techniques for Windows for
information on acquiring this graphics library.) All of the CAL code itself,
however, is fully operational as presented.


Image-Expansion Code


An object of the ExpandCALImage class is used to expand a previously
compressed CAL image file. The single constructor for ExpandCALImage requires
a filename of a CAL image as its parameter. One of the first operations
performed within the constructor is the creation of a Huffman class object for
file I/O and Huffman decoding. The new Huffman object first reads in the
header of the specified image file. To verify that the specified file does
indeed contain a CAL image, the CALFileTag entry in the file header is checked
for the letters "CL." If the file contains a CAL image, a block of memory is
allocated to store the image as it is expanded; the block size is specified in
the file header. Next, an instance of the BufferManager class is created for
managing the decoded image data. The BufferManager, in conjunction with the
ColorConvert class it spawns, handles YCbCr-to-RGB color conversion on the
expanded image data as the image is made available. Finally, the quantization
tables are initialized to the quality factor specified in the image file
header, and the variables used in the DC-coefficient difference calculations
are initialized to 0. If all operations performed in the constructor are
successful, the class variable ErrorCode is set to a no-error indication. Many
member functions of this class check ErrorCode for the no-error indication
before proceeding with execution. 
The only purpose of the destructor for ExpandCALImage is to free the Huffman
and BufferManager objects along with the memory allocated for the image by the
constructor.
The ExpandImage member function expands images by organizing calls to
functions in other classes that perform the expansion process. The expansion
process is entirely driven by information contained in the CAL file header.
Specifically, the header tells the code the number of MCUs in the image, and
how many blocks make up each MCU. Since CAL uses a fixed subsampling
mechanism, the content of a block in an MCU can be determined by the block
number. This is important because Y blocks must be processed differently than
either Cb or Cr blocks. For gray-scale images, there are four blocks in an MCU
numbered 0..3 and they all contain Y information. For true-color image, there
are six blocks in an MCU numbered 0..5. Blocks 0..3 contain Y information,
block 4 contains Cb information, and block 5 contains Cr information. A switch
statement determines how to process each block.
All Y blocks are processed in exactly the same fashion. First, the
Huffman-encoded frequency coefficients must be decoded from the bit stream by
a call to the DecodeBlock function. Notice that DecodeBlock is instructed to
use the luminance tables for decoding Y data. With the block decoded, the
actual value of the DC coefficient for the block can be calculated from the
decoded value and the previous DC-coefficient value stored in the class
object. The new value of the DC coefficient is stored for use by the next
block of Y data. The next step is to subject the decoded frequency-coefficient
data to dequantization with a call to QuantizeBlock. Here, too, the function
is told to use the luminance dequantization table on the Y data block and to
perform the DEQUANT operation.
Operations on the Cb and Cr chrominance blocks are very similar to those
performed on the Y block with the exception that chrominance tables are used
in place of the luminance table for decoding and dequantizing the data. Also,
the DC-coefficient calculations are performed independently for the Cb and Cr
blocks.
After either a luminance or chrominance block is decoded, it is subjected to
an inverse zigzag reordering that places the ordered frequency-coefficient
values back into pixel order. This done, the data is processed by an inverse
cosine transform to retrieve the pixel data from the frequency data. The
PutNextBlock function in the BufferManager class object is finally called to
store the recovered pixel values back into the image buffer (after color
conversion to RGB, of course). 
The Huffman object's FlushInputStream member function is called between
processing MCUs of image data. Remember, each MCU is separate and is padded to
a byte boundary by the addition of 1 bits to the data stream. At the
conclusion of MCU processing, the FlushInputStream function resets the
Huffman-decoding software for processing the next MCU of data. This properly
disposes of any padding bits added to the bit stream.

The processing of compressed image-data blocks continues until there are no
more blocks to process. At that time ExpandImage returns a True, indicating
success.
After the image-expansion process completes successfully, the ExpandCALImage
object contains the image in memory (in DIB format). The application software
can then poll the object for the image parameters and data. 


Image-Compression Code


CAL image compression is handled by an object of the CompressCALImage class.
The constructor for CompressCALImage is passed the following: 
A filename to be given the compressed image.
An indication of whether the image is gray scale or true color.
A pointer to the image data in memory. 
A pointer to the image palette in memory (if required).
The dimensions of the image in pixels.
The number of bits per pixel for the image.
The number of colors considered important for the image.
A quality-factor compression control. 
As mentioned, some of these parameters are placed directly in the CAL file
header, while others are used in calculations that eventually become entries
in the header. For now, the header entries BitsPerPixel and NumberOfColors are
not used by the CAL code but are included for future capability expansion,
specifically to allow CAL to handle palettized color images.
The CompressCALImage constructor first creates a Huffman object for file I/O
and a BufferManager class object for extraction of pixel data from the image
being compressed. After these objects are instantiated, the header structure
within the class is filled with appropriate image information. Next, the
quantization tables are initialized by a call to SetQuality. The quantization
tables are parameterized using the quality factor passed in as a parameter to
the constructor. Finally, the previous DC-coefficient variables are set to 0
so that they can be used to encode DC-coefficient differences instead of
absolute values. The destructor for the CompressCALImage is simple; it serves
only to delete the Huffman and the BufferManager class objects, as they aren't
needed by the time the destructor runs.
Image compression occurs when CompressImage is called. After checking if the
initialization performed within the class constructor was successful, the
header (filled in by the constructor) is written to the output file. Following
this, two nested loops control the compression process. The outer loop is
traversed for each MCU in the image, and the inner loop is traversed for each
block of each MCU. Image compression proceeds in reverse order from image
expansion. First, a call to GetNextBlock retrieves the next block of image
data. The block number to be fetched is passed to this function as a
parameter. A block number of 0..3 will fetch a Y block, a block number of 4
will fetch a Cb block, and a block number of 5 will fetch a Cr block. Because
gray-scale images have only four blocks per MCU, blocks of nonexistent CB or
Cr data will never be fetched. True-color images, however, utilize all block
varieties.
After a block of image data is fetched, it undergoes discrete cosine
transformation, followed by a forward zigzag. The frequency coefficients are
then in the proper, increasing-frequency order for the quantization and
encoding processes. The processing that follows depends upon the type of
block, which is determined by the block number. Block numbers 0..3 of Y data
are forward quantized with luminance-quantization tables. The difference in
DC-coefficient value replaces the DC coefficient in the data block, whereupon
EncodeBlock encodes the luminance data. Chrominance data is handled in a
similar fashion, except chrominance tables are used for quantization and
Huffman encoding. When a complete MCU of data has been processed, the Huffman
data stream is flushed (padded with 1s) and processing begins on the next MCU.
When there are no more MCUs to process, the CAL file is closed with a call to
CloseFile. A Boolean True is returned if image compression was successful.


The BufferManager and ColorConvert Classes


During image compression, the buffer manager breaks up an image into blocks
appropriate for compression. Conversely, during image expansion, the buffer
manager reconstructs an image from blocks of data passed to it. The buffer
manager manages four internal buffers (Y1, Y2, Cb, and Cr) when true-color
images are processed and just two buffers (Y1 and Y2) for gray-scale images.
These buffers can accommodate 16 rows of full-width image data, which are
necessary to support the two-dimensional 4:2:2 subsampling performed on the
image's chrominance components. Because of the subsampling, each 16x16 group
of pixels in a true-color image results in four Y blocks and one block each of
Cb and Cr. Each 16x16 group of pixels in a gray-scale image results in four Y
blocks only.
During image compression, BufferManager reads 16 rows of image data at a time
from the DIB image pointed to in memory, calls upon a ColorConvert class
object to convert the RGB data to YCbCr, and then stores the converted data in
its internal buffers. Every call to GetNextBlock retrieves a block of data
from the buffer identified by the requested block number. Blocks 0 and 1 are
fetched from Y1; blocks 2 and 3, from Y2; block 4, from Cb; and block 5, from
the Cr buffer. When there is no more data in the buffers, the next 16 rows of
image data are read in and the process repeats.
During image expansion, a call to PutNextBlock places a block of image data
into the buffer identified by the specified block number. Block numbers 0 and
1 are stored in the Y1 buffer; block numbers 2 and 3, in the Y2 buffer; block
number 4, in the Cb buffer; and block number 5, in the Cr buffer. When
PutNextBlock is called and the internal buffers are full, 16 rows of data will
be retrieved from the buffers, converted back to RGB format, and stored in the
memory allocated for the image at the appropriate location (determined by the
row number).
The ColorConvert class is also complex because it converts colors using scaled
long integers instead of the more traditional floating-point numbers. This was
done for performance. (Refer to last month's installment for the equations
used for RGB-to-YCbCr color-space conversions.) Note that all of the possible
terms in the color equations in the ColorConvert code are precomputed within
the class constructor, so only array lookups and adds are necessary when the
code needs to convert colors. This was also done to enhance performance. It's
also important to offset the Cb and Cr values that occupy the range -0.5 to
+0.5 when the scale is from 0.0 to 1.0 so that they reside in the range of
byte values from 0 to 255. This is done by setting the value of 0 at code 127.
Chrominance values above 127 are then considered positive, and values below
127 are negative. The code for the ColorConvert class (see color.hpp and
color.cpp) shows how this offset is managed, how the 2-D chrominance
subsampling is performed, and how long integers and scaling are used in place
of floating-point numbers during the color calculations. 


Using CAL in an Application


Example 2 compresses a true-color, in-memory Windows DIB image and stores it
in a file called "cal50.cal." Once the CAL image has been compressed, the code
expands the CAL image back into memory for processing/display. This code
illustrates both CAL file compression and expansion. For simplicity, error
detection and reporting are missing from this example. 


Conclusions


JPEG compression is no more than the logical application of various
image-processing algorithms to image data. As CAL illustrates, when JPEG
compression is broken down into its constituent algorithms, the pieces are
easily understood.
Table 1: CAL file/class breakdown.
FILENAME:buffman.hpp
CONTENTS:BufferManager class-interface definition.
PURPOSE:This and buffman.cpp break images into blocks for encoding and
reconstruct images from blocks during decoding.

FILENAME:buffman.cpp
CONTENTS:BufferManager class-member functions.

FILENAME:cal.hpp
CONTENTS:ExpandCALImage/CompressCALImage class-interface definitions.
PURPOSE:This and cal.cpp form the high-level interface into the CAL code.

FILENAME:cal.cpp
CONTENTS:ExpandCALImage/CompressCALImage class-member functions.

FILENAME:cal.prj
CONTENTS:Project file for Borland C++ Version 3.1.
PURPOSE:Used to build the CAL code in the IDE.


FILENAME:caltst.cpp
CONTENTS:Example of CAL usage--nonfunctional.
PURPOSE:CAL code example.

FILENAME:caltst.def
CONTENTS:Module-definition file for Windows.
PURPOSE:Required for example Windows application.

FILENAME:color.hpp
CONTENTS:ColorConvert class-interface definition.
PURPOSE:Performs RGB-to-YCbCr and YCbCr-to-RGB color-space conversions.

FILENAME:color.cpp
CONTENTS:ColorConvert class-member functions.
PURPOSE:Tightly coupled with BufferManager class code.

FILENAME:dct.hpp
CONTENTS:DCT class-interface definition.
PURPOSE:Brute-force attempt at a DCT.

FILENAME:dct.cpp
CONTENTS:DCT class-member functions.
PURPOSE:Brute-force attempt at a DCT.

FILENAME:dct1.hpp
CONTENTS:Improved DCT class-interface definition.
PURPOSE:Improved DCT from Independent JPEG group's JPEG software.

FILENAME:dct1.cpp
CONTENTS:Improved DCT class-member functions.
PURPOSE:Improved DCT from Independent JPEG group's JPEG software.

FILENAME:errors.h
CONTENTS:Miscellaneous error-code definitions.
PURPOSE:Defines errors that can occur during reading and writing of CAL files.

FILENAME:fileio.hpp
CONTENTS:AFile class interface definition.
PURPOSE:C++ class for reading and writing files.

FILENAME:fileio.cpp
CONTENTS:AFile class member functions.
PURPOSE:Code is Windows specific.

FILENAME:huffman.hpp
CONTENTS:Huffman class-interface definition.
PURPOSE:Contains all the entropy-encoding/decoding code.

FILENAME:huffman.cpp
CONTENTS:Huffman class-member functions.
PURPOSE:Contains all the entropy-encoding/decoding code.

FILENAME:misc.hpp
CONTENTS:Miscellaneous data-structure and type definitions.

FILENAME:quant.hpp
CONTENTS:Quantize class-interface definition.
PURPOSE:Performs quantization and dequantization of image data.


FILENAME:quant.cpp
CONTENTS:Quantize class-member functions.
PURPOSE:Performs quantization and dequantization of image data.

FILENAME:tables.hpp
CONTENTS:Miscellaneous table-prototype definitions.
PURPOSE:Tables used throughout the CAL code, including quantization, zigzag,
and Huffman encoding and decoding tables.

FILENAME:tables.cpp
CONTENTS:The tables.
Example 1: CAL file-header definition.
typedef struct {
WORD StructureSize; // Size of structure for version control
WORD CALFileTag; // Tag should be "CL"
IMAGETYPE ImageType; // Type of image - gray scale or true color
WORD ImageWidth; // Image width in pixels
WORD ImageHeight; // Image height in pixels
DWORD RasterSize; // DIB raster size including padding
WORD BitsPerPixel; // Number of bits per pixel for the image.
 // 8 for gray scale and 24 for true color images
WORD NumberOfColors; // Number of important colors in image usually 256 for
 // gray scale images and always 0 for true color images
WORD QualityFactor; // Quality factor image compressed with. Range 10..100
WORD NumberOfMCUs; // Total number of MCUs in image
WORD BlocksPerMCU; // Number of blocks in a single MCU.
 // 4 for gray scale and 6 for true color images
RGBCOLOR Palette[256]; // Palettefor image display. Required only for gray
scale images
DWORD Unused1; // Whatever you want you can put here
} CALFILEHEADER;
Example 2: Storing a true-color, in-memory Windows DIB image.
CompressCALImage *C; // Create an instance of class for CAL compression
ExpandCALImage *E; // Create an instance of class for CAL expansion
// Attempt to compress a CAL image from a true color DIB image in memory
// Parameter passed to instance of CompressCALImage object refer to image
// in memory. Quality factor of 50 used.
C = new CompressCALImage("cal50.cal", TRUECOLORTYPE,
 Ptr to DIB data, NULL,
 ImageWidth, ImageHeight,
 24, 0, 50);
// Now compress the image
C->CompressImage();
delete C; // Compression object is no longer needed
// Now attempt to expand the previously compressed CAL image
E = new ExpandCALImage("cal50.cal");
// Now expand the image
E->ExpandImage();
// At the conclusion of image expansion, the E object contains the DIB image
and its
// associated specifications.Make sure to copy the image data out of the E
object before
// the object is deleted otherwise the image will be destroyed as well as the
object.
BYTE huge *lpImage = E->GetDataPtr(); // Get ptr to DIB image data
WORD ImageWidth = E->GetWidth();
WORD ImageHeight = E->GetHeight();
// Process and/or display the image here
delete E; // Delete object when it and the image are no longer needed

Listing One
// Compress and Expand CAL Files Class Interface Definition
#ifndef CAL_HPP

#define CAL_HPP
#include "dct1.hpp"
#include "quant.hpp"
#include "huffman.hpp"
#include "bufman.hpp"
#ifndef __RGBCOLOR
#define __RGBCOLOR
typedef struct {
 BYTE Red;
 BYTE Green;
 BYTE Blue;
} RGBCOLOR;
#endif
// Define the CAL file header
typedef struct {
 WORD StructureSize; // Size of structure for version control
 WORD CALFileTag; // Tag should be CL
 IMAGETYPE ImageType; // Type of image
 WORD ImageWidth; // Image width in pixels
 WORD ImageHeight; // Image height in pixels
 DWORD RasterSize; // DIB Raster size including padding
 WORD BitsPerPixel; // Number of bits per pixel
 WORD NumberOfColors; // Number of important colors in image
 WORD QualityFactor; // Quality factor image compressed with
 WORD NumberOfMCUs; // Total number of MCUs in image
 WORD BlocksPerMCU; // Number of blocks in a single MCU
 RGBCOLOR Palette[256]; // Palette for image display
 DWORD Unused1; // TBD
} CALFILEHEADER;
// The Expand Class Definition
class ExpandCALImage {
 private:
 int ErrorCode;
 CALFILEHEADER Header;
 BYTE huge *lpImageData;
 BufferManager *BM;
 DCT InvTransform;
 Quantize InvQuant;
 Huffman *Decoder;
 int PreviousYBlockDCValue; // DC values of previously decoded blocks
 int PreviousCbBlockDCValue;
 int PreviousCrBlockDCValue;
 public:
 ExpandCALImage(LPSTR FileName);
 virtual ~ExpandCALImage(void);
 BOOL ExpandImage(void);
 WORD GetWidth(void) { return Header.ImageWidth; }
 WORD GetHeight(void) { return Header.ImageHeight; }
 WORD GetColors(void) { return Header.NumberOfColors; }
 WORD GetBitsPerPixel(void) { return Header.BitsPerPixel; }
 DWORD GetRasterSize(void) { return Header.RasterSize; }
 BYTE huge * GetDataPtr(void) { return lpImageData; }
 RGBCOLOR * GetPalettePtr(void) { return Header.Palette; }
 int GetError(void);
};
// The Compress Class Definition
class CompressCALImage {
 private:
 int ErrorCode;

 CALFILEHEADER Header;
 BYTE huge *lpImageData;
 WORD BlocksPerMCU;
 WORD NumberOfMCUs;
 int PreviousYBlockDCValue; // DC values of previously encoded blocks
 int PreviousCbBlockDCValue;
 int PreviousCrBlockDCValue;
 BufferManager *BM;
 DCT FwdTransform;
 Quantize FwdQuant;
 Huffman *Encoder;
 public:
 CompressCALImage(LPSTR FileName, IMAGETYPE Type,
 BYTE huge *lpImage, RGBCOLOR *lpPalette,
 WORD Width, WORD Height,
 WORD BitsPerPixel, WORD NumOfColors,
 WORD QualityFactor);
 virtual ~CompressCALImage(void);
 BOOL CompressImage(void);
 int GetError(void);
};
#endif

Listing Two
// Compress and Expand CAL Files Class Member Functions
#include "string.h"
#include "cal.hpp"
#include "errors.h"
// The following functions deal with CAL file expansion
// Class Constructor
ExpandCALImage::ExpandCALImage(LPSTR FileName) {
 ErrorCode = NoError;
 // Clear header storage
 memset(&Header, 0, sizeof(CALFILEHEADER));
 Decoder = NULL; // Initialize object ptrs to NULL
 BM = NULL;
 // Instantiate a Huffman object in order to read file header
 Decoder = new Huffman(FileName, HUFFMANDECODE);
 if (!Decoder) { // Memory problem if object not created
 ErrorCode = ENoMemory;
 return;
 }
 // Now read the file header
 Decoder->FileObject.ReadMBytes((BYTE huge *) &Header, sizeof(CALFILEHEADER));
 // Check header tag to verify file type
 if (strncmp((char *) &(Header.CALFileTag), "CL", 2) != 0) {
 ErrorCode = ENotCALFile;
 return;
 }
 // Now allocate a block of memory to contain the expanded image
 lpImageData = (BYTE huge *) MyAlloc(Header.RasterSize);
 if (!lpImageData) {
 ErrorCode = ENoMemory;
 return;
 }
 // Now instantiate a Buffer Manager object to manage the image data
 BM = new BufferManager(Header.ImageType, Header.ImageWidth, 
 Header.ImageHeight, lpImageData);
 if (!BM) { // Memory problem if object not created

 ErrorCode = ENoMemory;
 return;
 }
 // Build quantization tables for decoding image
 InvQuant.SetQuality(Header.QualityFactor);
 // Initialize previous DC values for the various image color components
 // to zero. Used in computing and decoding the DC difference values.
 PreviousYBlockDCValue = 0;
 PreviousCbBlockDCValue = 0;
 PreviousCrBlockDCValue = 0;
}
ExpandCALImage::~ExpandCALImage(void) {
 // Release any objects and/or memory used
 if (Decoder) delete Decoder;
 if (BM) delete BM;
 if (lpImageData) MyFree(lpImageData);
}
// The call to this function performs image expansion from CAL file to DIB.
BOOL ExpandCALImage::ExpandImage(void) {
 INTBLOCK iBlock, iBlock1;
 BYTEBLOCK bBlock;
 // Make sure no errors have occurred before preceeding
 if (ErrorCode != NoError)
 return FALSE;
 // For each MCU of image do
 for (register int MCU = 0; MCU < Header.NumberOfMCUs; MCU++) {
 // For each block of MCU
 for (register int Block = 0; Block < Header.BlocksPerMCU; Block++) {
 // Determine what to do from block count
 switch(Block) { // Blocks 0..3 are luma samples
 case 0:
 case 1:
 case 2:
 case 3: // Decode the luma samples
 // Decode the luma samples
 Decoder->DecodeBlock((int *) iBlock, USELUMATABLE);
 // Decode the actual DC coefficient value from the encoded delta
 *((int *) iBlock) += PreviousYBlockDCValue;
 PreviousYBlockDCValue = *((int *) iBlock);
 // Dequantize the block
 InvQuant.QuantizeBlock((int *) iBlock, LUMA, DEQUANT);
 break;
 case 4: // Decode the chroma samples
 case 5:
 // Decode the chroma samples
 Decoder->DecodeBlock((int *) iBlock, USECHROMATABLE);
 // Decode the actual DC coefficient value from the encoded delta
 if (Block == 4) { // If the Cb block
 *((int *) iBlock) += PreviousCbBlockDCValue;
 PreviousCbBlockDCValue = *((int *) iBlock);
 } else { // If the Cr block
 *((int *) iBlock) += PreviousCrBlockDCValue;
 PreviousCrBlockDCValue = *((int *) iBlock);
 }
 // Dequantize the block
 InvQuant.QuantizeBlock((int *) iBlock, CHROMA, DEQUANT);
 break;
 }
 // Zigzag reorder block

 InvTransform.ZigZagReorder((int *) iBlock, (int *) iBlock1, INVERSEREORDER);
 // Now perform the inverse DCT on the image data
 InvTransform.IDCT(&iBlock1, &bBlock);
 // Store the recovered image data into the DIB memory
 BM->PutNextBlock((BYTE *) bBlock, Block);
 }
 Decoder->FlushInputStream();
 }
 // Flush the DIB image data to the buffer
 BM->FlushDIBData();
 // Close the file
 Decoder->FileObject.CloseFile();
 return TRUE;
}
// Return error code if any for last operation
int ExpandCALImage::GetError(void) {
 int Code = ErrorCode;
 ErrorCode = NoError;
 return Code;
}
// The following functions deal with CAL file compression
CompressCALImage::CompressCALImage(
 LPSTR FileName, IMAGETYPE Type,
 BYTE huge *lpImage, RGBCOLOR *lpPalette,
 WORD Width, WORD Height,
 WORD BitsPerPixel, WORD NumberOfColors,
 WORD QualityFactor) {
 ErrorCode = NoError; // Assume no errors have occurred
 // Clear header storage
 memset(&Header, 0, sizeof(CALFILEHEADER));
 Encoder = NULL; // Initialize object ptrs to NULL
 BM = NULL;
 // Instantiate a Huffman object in order to write file header
 Encoder = new Huffman(FileName, HUFFMANENCODE);
 if (!Encoder) { // Memory problem if object not created
 ErrorCode = ENoMemory;
 return;
 }
 // Now instantiate a Buffer Manager object to manage the image data
 BM = new BufferManager(Type, Width, Height, lpImage);
 if (!BM) { // Memory problem if object not created
 ErrorCode = ENoMemory;
 return;
 }
 // Fill in the header entries from the parameters passed in
 Header.StructureSize = sizeof(CALFILEHEADER); // Write structure size
 lstrcpy((LPSTR) &Header.CALFileTag, "CL"); // Write CAL tag
 Header.ImageType = Type; // Type of image
 Header.ImageWidth = Width; // Image width in pixels
 Header.ImageHeight = Height; // Image height in pixels
 // Calculate DIB Raster size including appropriate padding
 DWORD BytesPerLine = (Type == TRUECOLORTYPE) ? Width * 3:Width;
 BytesPerLine = ALIGN_DWORD(BytesPerLine);
 Header.RasterSize = BytesPerLine * Height;
 Header.BitsPerPixel = BitsPerPixel; // Number of bits per pixel
 Header.NumberOfColors = NumberOfColors; // Number of colors in image
 Header.QualityFactor = QualityFactor; // Quality factor 
 // Now calculate image statistics for two dimensional color subsampling
 if (Type == TRUECOLORTYPE) // If image is true color there are

 BlocksPerMCU = 6; // 6 blocks / MCU. 4 luma and 2 chroma
 else // If image is black/white there are
 BlocksPerMCU = 4; // 4 blocks / MCU. 4 luma
 // Store results in image header
 Header.BlocksPerMCU = BlocksPerMCU;
 WORD NumberOfHorzBlocks = ((Width + 15) / 16) * 2; // Horizontal 8x8 blocks 
 WORD NumberOfVertBlocks = ((Height + 15) / 16) * 2; // Vertical 8x8 blocks 
 NumberOfMCUs = (NumberOfHorzBlocks / 2) * (NumberOfVertBlocks / 2);
 // Store results in image header
 Header.NumberOfMCUs = NumberOfMCUs;
 // Copy palette if required
 if ((Type == PALETTECOLORTYPE) (Type == GRAYSCALETYPE))
 memcpy(&Header.Palette, lpPalette, NumberOfColors * sizeof(RGBCOLOR));
 lpImageData = lpImage; // Copy ptr to DIB image data
 // Build quantization tables for encoding image
 FwdQuant.SetQuality(QualityFactor);
 // Initialize previous DC values for the various image color components
 // to zero. Used in computing and encoding the DC difference values.
 PreviousYBlockDCValue = 0;
 PreviousCbBlockDCValue = 0;
 PreviousCrBlockDCValue = 0;
}
// Class Destructor
CompressCALImage::~CompressCALImage(void) {
 // Release any objects used
 if (Encoder) delete Encoder;
 if (BM) delete BM;
}
BOOL CompressCALImage::CompressImage(void) {
 BYTEBLOCK bBlock;
 INTBLOCK iBlock, iBlock1;
 int TempInt;
 // Make sure no errors have occurred before preceeding
 if (ErrorCode != NoError)
 return FALSE;
 // First write the initialized header to the specified file
 Encoder->FileObject.WriteMBytes((BYTE huge *) &Header,sizeof(CALFILEHEADER));
 // For each MCU of image
 for (register int MCU = 0; MCU < NumberOfMCUs; MCU++) {
 // For each block of an MCU
 for (register int Block = 0; Block < BlocksPerMCU; Block++) {
 // Get a block of image data to process
 BM->GetNextBlock((BYTE *) bBlock, Block);
 // Do DCT on the block
 FwdTransform.FDCT(&bBlock, &iBlock);
 // Zigzag reorder block
 FwdTransform.ZigZagReorder((int *)iBlock,(int *) iBlock1,FORWARDREORDER);
 // Determine what to do next from block count
 switch(Block) {
 case 0:
 case 1:
 case 2:
 case 3: // Process luma samples
 // Quantize the block
 FwdQuant.QuantizeBlock((int *) iBlock1, LUMA, QUANT);
 // Calculate and encode the differential DC value. First get the
 // DC coefficient from the block (the first element) and save it.
 // Next subtract the previous DC value from it. Finally store
 // the actual DC coefficient value for encoding the next block.

 TempInt = *((int *) iBlock1);
 *((int *) iBlock1) -= PreviousYBlockDCValue;
 PreviousYBlockDCValue = TempInt;
 // Encode the block into the Huffman bit stream.
 Encoder->EncodeBlock((int *) iBlock1, USELUMATABLE);
 break;
 case 4: // Process Cb and Cr samples
 case 5:
 // Quantize the block
 FwdQuant.QuantizeBlock((int *) iBlock1, CHROMA, QUANT);
 // Calculate and encode differential DC value. See comments above.
 TempInt = *((int *) iBlock1); // Get the DC coefficient of block
 if (Block == 4) { // If the Cb block
 *((int *) iBlock1) -= PreviousCbBlockDCValue;
 PreviousCbBlockDCValue = TempInt;
 } else { // If the Cr block
 *((int *) iBlock1) -= PreviousCrBlockDCValue;
 PreviousCrBlockDCValue = TempInt;
 }
 // Encode the block into the Huffman bit stream.
 Encoder->EncodeBlock((int *) iBlock1, USECHROMATABLE);
 break;
 }
 }
 Encoder->FlushOutputStream();
 }
 // Now close the output file
 Encoder->FileObject.CloseFile();
 // Signal all is well
 return TRUE;
}
// Return error code if any for last operation
int CompressCALImage::GetError(void) {
 int Code = ErrorCode;
 ErrorCode = NoError;
 return Code;
}
// The following miscellaneous functions are used throught the code.
// Allocate a block of memory from global heap. Store handle within block.
void far * MyAlloc(DWORD Size) {
 HGLOBAL hMem;
 // Attempt to allocate the desired size block of memory
 if ((hMem = GlobalAlloc (GHND, Size + sizeof(HGLOBAL)))== NULL)
 return (void far *) NULL;
 void far *pMem = GlobalLock(hMem); // Get a pointer to the memory block
 *((HGLOBAL far *) pMem) = hMem; // Store handle in block
 // Return pointer that points past handle
 return ((LPSTR) pMem + sizeof(HGLOBAL)); 
}
// Free a block of global memory. Handle is stored within block.
void MyFree(void far * pMem) {
 LPSTR HandlePtr = (LPSTR) pMem - sizeof(HGLOBAL);
 HGLOBAL hMem = *((HGLOBAL far *) HandlePtr);
 GlobalUnlock(hMem);
 GlobalFree(hMem);
 pMem = NULL; // Zero pointer before return
}


































































The C++ Standard Library


An extensible collection of software components




Michael J. Vilot


Michael, who is president of ObjectCraft and a columnist for The C++Report,
chairs the Library Working Group of the ANSIX3J16. He can be contacted at
mjv@objects.mv.com.


After five years of discussion, the ANSI and ISO C++ Committees have released
the Committee Draft (CD), their first official document. The CD has recently
been circulated for review and comment by the international C++ development
community. The CD, which X3J16/WG21 released, can be found at the following
addresses: ftp://research.att.com/dist/stdc++/WP,
http://www.cygnus.com/~mrs/wp-draft/~mrs, http://www.maths .warwick.ac.uk/c++
and ftp://maths.warwick.ac.uk/pub/c++/std/wp.
If you're familiar with Ellis and Stroustrup's The Annotated C++ Reference
Manual (ARM), you'll find few surprises. While somewhat wordier, the CD
describes the C++ language using essentially the same organization and format
as the ARM. Namespaces, run-time type identification, and the new cast
notation are probably the most significant additions to the language not
anticipated in the ARM.
The biggest differences between "ARM C++" and "Standard C++" are not in the
language, however, but in the available library facilities. Over half the CD
text explains the many components in the C++ Standard Library.
Instead of trying to cover every aspect of the library in this article, I'll
focus on the most commonly used components: iostreams, strings, and some of
the containers, iterators, and algorithms included in the C++ Standard Library
from the HP C++ Standard Template Library (STL).


Library Overview


Unlike GUI components or application frameworks, the C++ Standard Library
provides general-purpose components for common programming tasks. Its main
value is in providing efficient and reliable templates, classes, and functions
that eliminate the need to handcraft low-level data structures and algorithms.
As such, it can be used both by nontrivial C++ programs and as the foundation
for more-ambitious libraries.
The essential structure of the C++ Standard Library can be represented by
Figure 1. Each of the ten categories provide definitions for the following
types of Standard entities: macros, values, types, templates, classes (and
structs), functions (including operators), and objects. Since many of the
nonmember functions operate upon instances of the specified classes, the
collection of such functions and their associated class is often referred to
as a "component."
The Language Support components are required by certain parts of the C++
language, such as dynamic-memory allocation and exception processing. The
Diagnostics category includes the definition of the standard exceptions thrown
by other library components, providing support for uniform error reporting by
the library. General Utilities components are used by other standard-library
components (including the memory allocator used throughout the STL components)
but can also be used directly by C++ programs.
The Strings category provides basic text representation and manipulation,
while Localization components provide locale-dependent formatting facilities.
The Containers, Iterators, and Algorithms categories incorporate the essential
elements of the STL library. The Numerics category includes the STL
generalized numeric algorithms, complex numbers, and support for array-based
(n-at-a-time) computations. The Input/Output category is the largest, and it
contains the iostream's components.
The C++ Standard Library also provides the facilities of the amended ISO
Standard C library, suitably adjusted to ensure static type safety. The
components of this library are referenced, as needed, from the various
categories.


Headers


As with C, a C++ program gains access to standard-library facilities by
#includeing the appropriate header(s). The C++ Standard defines 68 headers.
These additional headers are the result of reconciling two conflicting goals:
C++ namespace organization, and compatibility with existing C++ source code.
Table 1 lists the 32 new C++ headers.
In Standard C++, all library entities (except preprocessor macros) are defined
within the namespace std. In C, all library entities are defined in the global
namespace. To retain the advantages of an organized namespace, yet preserve
the meaning of existing C library #include directives, the facilities of the
Standard C Library are provided in additional headers; see Table 2.
The contents of each header cname are essentially the same as the
corresponding C header name.h, as specified in the ISO C standard. In the C++
Standard Library, however, these declarations and definitions are within the
scope of the std namespace.
Therefore, the C++ Standard Library also supports the 18 name.h (C header)
forms. Each of these headers #includes the corresponding cname header and the
appropriate using declarations in order to place all of its declarations and
definitions into the global namespace.
Thus, a C++ program can be ported quickly from its C version, as shown in
Listing One. Conversely, the program could be written as a C++ program that is
meticulous about its namespaces; see Listing Two.
These programs illustrate some of the most common tasks performed in C and C++
programs: handling string text to and from files in secondary storage. The C++
Standard Library provides components that simplify these tasks.


Strings and iostreams


Consider the relatively straightforward programming task of displaying the
contents of a text file. Listing Three uses the facilities of the Standard C
library to open the file named in its command-line argument, read the file a
line at a time, and print out each line on the standard output (preceded by
the number of characters in the line). The essential part of this program
involves only four lines: those containing fopen(), feof(), fgets(), and
printf(). The rest is error handling and recovery. Given a simple text file
(such as a daily "to-do" list), the program produces the output in Listing
Four. Listing Five performs the same task, using the string and iostream
components from the C++ Standard Library. It is noticeably shorter, mostly
because the iostream components encapsulate much of the file-related error
checking. iostreams are the preferred mechanism for C++ program input and
output. The components in the C++ Standard Library (such as strings, locales,
and complex numbers) overload the >> and << operators to provide formatted
input and output using iostreams. The Standard iostreams are much the same as
those that C++ developers have been using for the past ten years. Table 3
summarizes only the components provided in the standard headers <iosfwd>,
<iostream>, <ios>, <streambuf>, <istream>, <ostream>, <iomanip>, <sstream>,
and <fstream>.
The iostream facilities have been generalized to accommodate both char- and
wchar_t-based character sequences (and others). The older C library facilities
are also available, in <cstdio> (for chars) and <cwchar> (for wchar_ts).
The C++ Standard Library also provides components for manipulating sequences
of characters, where characters may be of type char, wchar_t, or of a type
defined in a C++ program. It provides both the C++ string classes and
null-terminated sequence utilities from the C library. Table 4 summarizes the
string components from the standard header <string>.
The library provides a basic_string template, which defines the semantics of
strings. The string and wstring types are predefined template instantiations
(of basic_string<char> and basic_string<wchar_t>, respectively) provided by
the library.
The older C library facilities are also available, provided by the standard
headers <cctype>, <cwctype>, <cstring>, <cwchar>, and the multibyte
conversions from <cstdlib>.


Containers, Iterators, and Algorithms


A more interesting application involving to-do lists might sort them according
to certain criteria. It would be handy to first read the list in from the file
and then keep it around in memory. The C++ program in Listing Six uses the
most obvious representation: a simple list of strings. 
This program uses two more components from the C++ Standard Library: an
instance of the list template to hold the strings describing things to do, and
an iterator type to step through the list and print each item.

Lists are just one component that C++ programs may use to organize collections
of information. Table 5 summarizes the standard sequences and associative
containers. The < and == relations are defined so that these containers can be
used with the standard iterator and algorithm components. A C++ program may
define additional container components. With suitably defined relational
operations, they can also be used with the standard iterator and algorithm
components.
Iterators are the "glue" between the standard containers and the algorithms
that operate upon them. Table 6 summarizes the components for iterator tags,
predefined iterators, stream iterators, and streambuf iterators from the
standard header <iterator>. 
C++ programs are not restricted to using the specific predefined iterator
components. For example, pointers and pointer arithmetic can be used as
iterators over many sequences. The standard algorithms work equally well with
both kinds of iterators. Table 7 lists the algorithm components from the
standard header <algorithm>.
Listing Seven uses iterators to invoke the standard sort() algorithm on the
to-do list. Since sort() requires random access to the elements to be sorted,
this version of the program uses a vector instead of a list. The rest of the
program is the same as before. This program uses the default ordering relation
on strings (operator<) to sort the to-do list. While this works well as a
default, it is subject to the vagaries of local string representation. For
example, Listing Eight shows how this program can be led into making a mistake
when the input has some unfortunate spaces (in the default ASCII collation,
spaces sort before other characters).
The next version of the program takes a more comprehensive approach to
prioritizing to-do items. In addition to a numerical ranking, each item can
have a due date. Listing Nine contains the necessary scaffolding.
With these definitions in place, it requires very little change to support
relatively sophisticated to-do-list processing. Listing Ten is essentially the
same as the previous version, requiring only the substitution of the new
representation of the items on the list.
Given the correct handling of date input and ordering (via struct when), the
program will now correctly sort the to-do list items. Listing Eleven shows the
result, sorting by date and ordinal priority within each date.
The final version of this program is only slightly more clever. Rather than
waiting until it reads all the to-do items to explicitly sort the list, it
sorts the list as it reads each item. Listing Twelve uses a self-organizing
data structure, the standard priority_queue, to keep the list sorted.


Conclusion


The C++ Standard Library is large, providing dozens of classes and hundreds of
functions--most of them templates. Although larger than the Standard C
Library, it is still smaller than many commercial C++ class libraries in
widespread use today. The C++ Standard Library provides an extensible
collection of flexible, general-purpose software components that can be used
directly by a C++ program or extended and combined as the basis for even more
ambitious C++ class libraries.
The library has a clear organization, which makes it easier to describe,
understand, extend, and use its components. In this article, I've provided a
glimpse of the powerful capabilities contained in the library. C++ developers
will be finding creative uses for these components--especially the standard
data structures and algorithms--for many years to come.
Figure 1: The major functional areas of the C++Standard Library.
Table 1: C++ Standard Library headers.
<algorithm> <new>
<bitset> <numeric>
<complex> <ostream>
<deque> <queue>
<exception> <set>
<fstream> <sstream>
<functional> <stack>
<iomanip> <stdexcept>
<ios> <streambuf>
<iosfwd> <string>
<iostream> <typeinfo>
<istream> <utility>
<iterator> <valarray>
<limits> <vector>
<list> <set>
<locale> <sstream>
<map> <stack>
<memory> <stdexcept>
Table 2: C++ headers for C library facilities.
<cassert>
<cctype>
<cerrno>
<cfloat>
<ciso646>
<climits>
<clocale>
<cmath>
<csetjmp>
<csignal>
<cstdarg>
<cstddef>
<cstdio>
<cstdlib>
<cstring>
<ctime>
<cwchar>
<cwctype>
<cmath>
<csetjmp>
<csignal>
<cstdarg>
<cstddef>

<cstdio>
<cstdlib>
<cstring
Table 3: iostream synopsis. (a) Template classes; (b) classes; (c) types; (d)
objects.
(a)
basic_ios
ios_traits
basic_streambuf
basic_istream
basic_ostream
basic_stringbuf
basic_istringstream
basic_ostringstream
basic_filebuf
basic_ifstream
basic_ofstream
(b)
ios_base
(c)
ios
wios
streambuf
istream
wistream
wstreambuf
ostream
wostream
stringbuf
istringstream
wistringstream
wstringbuf
ostringstream
wostringstream
filebuf
ifstream
wifstream
wfilebuf
ofstream
wofstream
(d)
cin
cout
win
cerr
clog
wout
werr
wlog
Table 4: String synopsis. (a) Template class; (b) types; (c) operations; (d)
iterator access; (e) capacity; (f) element access; (g) modifiers; (h) string
operations.
(a) basic_string
(b) string
 wstring
(c) Construct, copy, assign
(d) begin(), end()
 rbegin(), rend()
(e) empty(), size(), length()
 max_size(), capacity()
 resize(), reserve()
(f) at(), operator[]

 c_str(), data()
(g) append(), operator+=
 assign(), insert()
 remove(), replace()
 copy(), swap()
(h) find()
 rfind()
 find_first_of()
 find_last_of()
 find_first_not_of()
 find_last_not_of()
 compare()
 getline()
Table 5: Container synopsis.
Header Component(s)
<bitset> bitset
<deque> deque
<list> list
<map> map, multimap
<queue> queue, priority_queue
<set> set, multiset
<stack> stack
<vector> vector, vector<bool>
Table 6: Iterator synopsis. (a) Template classes; (b) template structs.
(a)
istream_iteratorback_insert_iteratorostream_iteratorfront_insert_iteratorreverse_bidirectional_iteratorinsert_iteratorreverse_iteratoristreambuf_iteratorostreambuf_iterator
(b)
bidirectional_iteratorinput_iteratorforward_iteratorrandom_access_iterator
Table 7: Algorithm synopsis.
adjacent_find
pop_heap
binary_search
prev_permutation
copy
push_heap
copy_backward
random_shuffle
count
remove
count_if
remove_copy
equal
remove_copy_if
equ-al_range
remove_if
fill
replace
fill_n
replace_copy
find
replace_copy_if
find_if
replace_if
for_each
reverse
generate
reverse_copy
generate_n
rotate

includes
rotate_copy
inplace_merge
search
lexicographical_compare
set_difference
lower_bound
set_intersection
make_heap
set_symmetric_difference
max
set_union
max_element
sort
merge
sort_heap
min
stable_partition
min_element
stable_sort
mismatch
swap
next_permutation
swap_ranges
nth_element
transform
partial_sort
unique
partial_sort_copy
unique_copy
partition
upper_bound

Listing One 
#include <string.h> // C library functions
#include <stdio.h> // are in the global namespace
int main()
{
 char h[] = "hello, ";
 char w[] = "world\n";
 char hw[sizeof h + sizeof w];
 strcpy(hw, h); // catenate strings
 strcat(hw, " ");
 strcat(hw, w);
 printf(hw); // output
 return 0;
}

Listing Two 
#include <cstring> // same C library functions
#include <cstdio> // in namespace std
int main()
{
 char h[] = "hello,";
 char w[] = "world\n";
 char hw[sizeof h + sizeof w];
 std::strcpy(hw, h);
 std::strcat(hw, " ");
 std::strcat(hw, w);

 std::printf(hw);
 return 0;
}

Listing Three
#include <string.h>
#include <stdio.h>
#include <stdlib.h> // for EXIT_FAILURE
int main(int argc, char* argv[]) // copy file to stdout
{
 FILE* file = fopen(argv[1], "r");
 char errbuf[80]; // just in case
 if (!file) {
 sprintf(errbuf, "%s no good", argv[1]);
 perror(errbuf);
 return EXIT_FAILURE;
 }
 while ( !feof(file) ) {
 char line[256]; // hopefully large enough
 char* ok = fgets(line, sizeof line, file);
 if (!ok) {
 sprintf(errbuf, "read error on %s", argv[1]);
 perror(errbuf);
 return EXIT_FAILURE;
 }
 printf("[%2d]\t%s", strlen(line), line); // has '\n' already
 }
 return 0;
}

Listing Four
[18] End the arms race
[ 6] Floss

Listing Five
#include <string>
#include <fstream>
#include <iomanip> // for setw
int main(int argc, char* argv[]) // copy file to cout
{
 using namespace std;
 ifstream in(argv[1]);
 while (in.good()) {
 string line;
 if (getline(in, line))
 cout << '[' << setw(2) << line.size()<< ']' << '\t' << line << endl;
 }
 return 0;
}

Listing Six
#include <string>
#include <fstream>
#include <list>
int main(int argc, char* argv[])
{
 using namespace std;
 ifstream in(argv[1]);
 typedef list<string> ToDo_List;

 ToDo_List to_do;
 while (in.good()) {
 string buf; getline(in, buf); to_do.push_back(buf);
 }
 typedef ToDo_List::iterator iter;
 for (iter i = to_do.begin(); i != to_do.end(); ++i)
 cout << *i << endl;
 return 0;
}

Listing Seven
#include <string>
#include <fstream>
#include <vector> // for random access via operator[]
#include <algorithm> // for sort()
int main(int argc, char* argv[])
{
 using namespace std;
 ifstream in(argv[1]);
 typedef vector<string> ToDo_List;
 ToDo_List to_do;
 while (in.good()) {
 string buf; getline(in, buf); to_do.push_back(buf);
 }
 typedef ToDo_List::iterator iter;
 for (iter i = to_do.begin(); i != to_do.end(); ++i)
 cout << *i << endl;
 sort(to_do.begin(), to_do.end());
 cout << "\nSorted:" << endl;
 for (iter j = to_do.begin(); j != to_do.end(); ++j)
 cout << *j << endl;
 return 0;
}

Listing Eight
Input:
1. End the arms race
 2. Floss
Output:
 2. Floss
1. End the arms race

Listing Nine
// todo.h
#ifndef TODO_H
#define TODO_H
#include <string>
#include <iosfwd> // for istream&, ostream&
struct when {
 int month; // range 1..12
 int day; // range 1..31
};
struct to_do {
 when date;
 int priority;
 std::string what;
};
int operator<(const to_do& td1, const to_do& td2);
int operator>(const to_do& td1, const to_do& td2);

std::istream& operator>>(std::istream& in, to_do& td);
std::ostream& operator<<(std::ostream& out, const to_do& td);
#endif

Listing Ten
#include <string>
#include <fstream>
#include <vector>
#include <algorithm> // for sort()
#include "todo.h"
int main(int argc, char* argv[])
{
 using namespace std;
 ifstream in(argv[1]);
 typedef vector<to_do> ToDo_List;
 ToDo_List td;
 while (in.good()) {
 to_do buf; in >> buf; td.push_back(buf);
 }
 typedef ToDo_List::iterator iter;
 for (iter i = td.begin(); i != td.end(); ++i)
 cout << *i << endl;
 sort(td.begin(), td.end());
 cout << "\nSorted:" << endl;
 for (iter j = td.begin(); j != td.end(); ++j)
 cout << *j << endl;
 return 0;
}

Listing Eleven
Input:
Mar 31 3 Get April Fool's jokes
Mar 31 1 Pick up laundry
Jun 3 2 Dentist appointment
Jan 1 1 Stop the arms race
Jun 2 1 Floss
Output:
Jan 1 1 Stop the arms race
Mar 31 1 Pick up laundry
Mar 31 3 Get April Fool's jokes
Jun 2 1 Floss
Jun 3 2 Dentist appointment

Listing Twelve
#include <string>
#include <fstream>
#include <vector>
#include <queue> // for priority_queue
#include <functional> // for greater
#include "todo.h"
int main(int argc, char* argv[])
{
 using namespace std;
 ifstream in(argv[1]);
 typedef
 priority_queue< vector<to_do>, greater<to_do> >
 ToDo_List;
 ToDo_List td;
 while (in.good()) {

 to_do buf; in >> buf;
 td.push(buf); // sorts as it stores
 }
 while (td.size()) { // get each item, in sorted order:
 cout << td.top() << endl;
 td.pop();
 }
 return 0;
}
DDJ





















































68HC05-Based System Design


Antilock brake systems are real-world embedded systems




Willard J. Dickerson


Willard is a senior design engineer with Motorola's Advanced Microcontroller
Technologies Group. He can be contacted at wild@amcu-tx.sps.mot.com.


Irregular road conditions such as wet or icy roads are a major factor in
automobile accidents. Between 1978 and 1983, for instance, auto-related deaths
averaged more than 50,000 per year in the United States alone. A substantial
number of those accidents were directly attributed to wet pavement, which
significantly reduced tire traction, leading to poor braking efficiency.
Consequently, maintaining braking efficiency regardless of road
conditions--thereby saving lives--has been a major goal of automotive
engineers over the last couple of decades. As a result, antilock braking
systems are now commonplace in most passenger cars. 
Antilock brake systems (ABSs) prevent car wheels from locking when you
abruptly slam the brakes on a slippery or wet road by applying consistent and
even pressure to all four wheels, assuring that each wheel rotates at the same
speed during braking. This prevents the loss of vehicle control that usually
follows brake lock. 
From a system designer's perspective, an ABS is a complex, embedded-control
application requiring control units, sensors, modulators, and software that
ties everything together. In this article, I'll describe an ABS designed
around Motorola's 68HC05B6 (B6) microcontroller. This article is not intended
as a blueprint for ABS implementation, but rather as a set of concepts that
will enable you to use the B6 microcontroller in other embedded-system
applications. 


An Overview of Antilock Brakes


An ABS can be a separate subsystem (see Figure 1) or it can be integrated into
a conventional braking system. In either case, ABSs consist of four major
components: the brake modulator, the hydraulic pressure unit, wheel sensors,
and the electronic control unit (microcontroller).
A wheel sensor at each wheel continually monitors the wheel's speed during
braking. Mismatched wheel speeds are corrected by the brake modulator so that
all four wheels continue rotating at the same speed during braking. The
modulators typically consist of hydraulic valves, each with a
displacement-measurement device, oil reservoir, piston pump (mounted with an
electronic motor), and power source.
Wheel sensors are typically either electro-optical semiconductors or
electromagnetic mechanical devices. Electro-optical semiconductors track the
wheel speed by detecting a light transmission from an LED through a hole at
each rotation or partial rotation of the wheel. This light is received by a
photo diode that generates a train of pulses to the control unit. Finally, the
control unit determines the rate and number of pulses to calculate the speed
of a given wheel. 
Although electro-optical type sensors are low in cost and resistant to
electro-magnetic radiation, they are vulnerable to mud, temperature, and high
humidity. Therefore, most ABSs use electromechanical devices, which offer high
environmental resistance. The electromagnetic device determines wheel velocity
by measuring variations in the induced frequencies caused by road-speed
variations. Therefore, wheel speed is determined through the relationship
between voltage, frequency, and the rate at which the wheel is turning. As the
rate at which the wheel turns increases, so does the frequency of the induced
signal and hence the magnitude of the induced voltage. A side effect of
inductive sensors is the variation in output voltage to input frequency, and
thus, road speeds. This can be compensated for by placing conditioning
circuits between the sensor and the microcontroller.
Typical ABS pressure modulation takes place in the auxiliary circuit through a
control valve. In this example, a 3-way/3-position valve with functions of
pressure build-up, pressure holding, and pressure reduction can apply metered
pressure to the auxiliary piston from a hydraulic accumulator. If a wheel is
about to lock, pressure is initially supplied to the auxiliary piston so that
the ball-seat valve closes. Pressure in the wheel-brake cylinder is thus kept
constant. If it is necessary to reduce pressure in the wheel brake to avert
wheel locking, further pressure can be applied to the auxiliary piston. This
causes the smaller plunger to move to the right so that the plunger-space
volume is increased. As a result, wheel-brake pressure decreases. The energy
supply and the auxiliary piston space are separate from the plunger space. If
the energy supply fails, ABS operations are no longer possible, and a warning
lamp must indicate this.


ABSs and the B6 Microcontroller


The ABS I'll describe here consists of an electronic control unit
(microcontroller), wheel sensors, brake modulators, and hydraulic unit. The
electronic control-unit design can be adapted to most sensors, modulators, and
hydraulic units. Usually the differences are found in the unit's interfaces
and/or software. The controller's signal values can usually be modified
through software.
At the heart of this ABS is the 68HC05B6 microcontroller typical of Motorola's
8-bit CSIC ("customer-specified integrated circuit") family of
microcontrollers. The B6 is an 8-bit CPU surrounded by memory and internal
peripherals and ports; see Figure 2. The B6 is a self-contained computer
needing only a keyboard, external RS-232 translator chip, and a monitor to
view its contents if monitor code is placed in the B6's ROM. Otherwise,
additional external memory chips are needed to store the monitor code. 
Through software, the B6 CPU orchestrates communication between internal
devices and the outside world. The B6 in Figure 2 has both analog peripherals,
including an A/D converter, and digital peripherals, including ROM, EEPROM,
and RAM. Other parts include a timer, serial-communications interface,
external interrupt, watchdog timer, and two pulse-length D/A converters (PLMA
and PLMB). To effectively use a microcontroller such as the B6 as a driver and
sensor for an ABS, the B6 ports, timer, and A/D converter are necessary. 
Interestingly, the B6 is commonly used in a variety of automotive applications
because of its broad autonomy of sensor-intensive features (A/D, ports, D/A,
timer capture, and the like). Other automotive applications for the B6 include
joystick control for mirrors, cruise control, driver's-side door-switch
console control for windows or other apparatus, digital-display control, spark
control in carburetted engines, fuel injection, and storing both odometer and
diagnostic information. The cruise control, joystick, and driver's-side
door-switch console-control applications use the A/D converter. The digital
display is accomplished through the general I/O ports. The spark control and
fuel injection is accomplished through a complicated closed-loop system that
includes the speed from A/D measurements, the timer, the capture, and a PWM
(D/A converter). Finally, odometer and diagnostic information can be stored
with the EEPROM.
Non-automotive B6 applications include modem control through the SCI and
microwave control. Since the B6 feature set is similar to Motorola's HC11-A8,
it lends itself to programmers familiar with either controller. In addition,
the latest features in the Bx family of microcontrollers include a
controller-area network (CAN) (B16 and B32) serial-communications module found
in both automotive and industrial applications.
For software development, you can use the Motorola Modular Development System
(MMDS-05), which consists primarily of software and a base unit with an
adapter that accepts PC-compatible daughterboards for different '05 parts.
Breakpoints can be placed on keywords in the assembly code. In addition, the
IASM05 assembler is provided with this package. Likewise, Byte Craft (North
Waterloo, ON) provides its C-based C6805 Code Development System for the
68HC05 family. Because some MC68HC05 microcontroller instructions have no
counterpart in C, special directives identify unique microcontroller
characteristics to the compiler. Still, the C6805 system generates
source-level debugging information consistent with most available debugging
environments. For more information on programming the 68HC05, refer to "C
Programming for the 68HC05 Microcontroller," by Truman T. Van Sickle (DDJ,
August 1991) and Motorola's Data Book #DL139, Microprocessor, Microcontroller,
and Peripheral Data, Vol. 1. 


B6 System Integration 


The B6 is used as an integrated ABS controller. A typical scenario for ABS
operation might be as follows: 
You press the brake pedal. The B6 (which is constantly monitoring the brake
pressure through polling routines) determines whether or not the ABS brake
modulation should be employed, depending on the relative speed of each wheel.
When brake pressure exceeds the safe threshold and a mismatch in wheel speed
is encountered, the B6 takes control of the brake-pressure valves by
modulating the pressure so that each wheel returns to the same speed, and the
car comes to a smooth stop.
In addition to monitoring the brake pressure and wheel speed, the B6 also
monitors the fluid level, brake-pedal switch, control-module voltage,
brake-valve level, and ignition voltage. If failure is detected in any of
these elements, the B6 directs the ABS to shut down. The brake system will
behave conventionally until the failure is corrected. (In some sophisticated
systems, fault codes corresponding to the problem detected are stored in
nonvolatile EEPROM. A trained mechanic can then access these codes and fix the
problem.)
The B6 will control and detect a variety of elements in the ABS. However,
another microcontroller is often used to monitor system faults and gracefully
bring down the system if a problem is encountered. The model control unit
discussed here does not use an additional fault-detection microcontroller.
Other B6-controlled devices in the ABS include the brake modulator, indicator
lamps, and a relay that controls the power to a hydraulic solenoid. The B6
also detects switch closures, wheel-speed sensors, and the voltage level of
the power supply used for the hydraulic solenoid. Since the B6 does not
include high-current or high-voltage output drivers, buffer units are included
between the B6 and automobile devices controlled or sensed by the B6; see
Figure 3. Although many functions must be simultaneously controlled by the B6
(sensing the wheel speed and modulating brake values, for instance), only one
function can be performed at a time.
In Figure 3, the output drivers typically consist of operational amplifiers
(op amps) and high-power output transistors to increase current from the low
milliampere range to ranges between 0.5 and 1 amps, for relays and the
high-current brake modulator. The input network consists primarily of scaled
voltages compatible to the B6 inputs. The scaling can also be performed by op
amp circuits.
The B6 timer consists primarily of one 16-bit, software-programmable, binary,
free-running upcounter driven by a fixed, divide-by-four prescaler and a
control register. Two output-compare registers, two input-compare registers,
and the oscillator clocks are also included. The clocks are divided by four,
then applied to the counterclock input. The control register includes eight
control bits. 
The timer can be used for many purposes, including waveform measurement of two
input signals and generation of two output waveforms. These functions can also
be performed simultaneously. In the waveform measurement, the output of a
given waveform is applied to the timer capture (TCAP) input. After each edge
from the waveform is generated, the value in the counter is acquired. The
pulse-width time is acquired by measuring the time that passed successively
between the high and low edge. A period of a waveform is determined by
acquiring the counter value after two consecutive high edges. 
Since this is a free-running counter, you must know the counter's value when
the measurement process begins. Therefore, on the first high edge, the value
can be acquired from the timer register and placed on the microcontroller's
main data bus, to be fetched by the CPU. This first value is a reference. On
the second rising edge, the value in the counter's timer register at that
instance is placed on the microcontroller's main data bus and in the CPU, and
subtracted from the value found after the first rising edge. The difference is
the period or the accumulative time between the successive high and low time
of a waveform.


Measuring Wheel Speed


The B6 timer peripheral measures wheel speed. The signal generated by the
wheel sensors is not applied directly to the B6. It is first applied to a
conditioning circuit, where its high-frequency spikes are removed and its
voltage level is brought to a clean 0-5V frequency-modulated, pulsed signal.
This fairly clean signal is applied to one of the TCAP input pins on the B6.
In this model, the four signals from the sensor outputs are applied through a
multiplexer. The output from the multiplexer is applied to both TCAP1 and
TCAP2, so that both high- and low-edge transitions can be detected from the
wheel sensor. The multiplexer is used so that the timer program can
distinguish between the four wheel sensors. PortA bit 0 (PA0) and PortA bit 1
(PA1) control the multiplexer.

The wheel-sensor frequency is directly proportional to the wheel speed. For
example, if the maximum frequency produced from the wheel sensor is about 1800
Hz, then the software can be scaled so that an automobile's speed of 100 miles
per hour (mph) would be detected as the maximum frequency from the sensor.
Conversely, if the vehicle were traveling at about 35 mph, only about 500 Hz
would be produced. The wheel-sensor manufacturer specifies output frequency
versus wheel speed or revolutions per minute (rpm). From this data, scaling
factors can be derived.
If rapid wheel deceleration, excessive wheel slip (the difference between the
circumferential speed of the tire and the car's forward speed), or
incompatible wheel speeds are detected, the hydraulic system's fluid is
activated within milliseconds by the control unit. The brake modulator then
corrects the differences in wheel speeds. When wheel speeds are found
incompatible, the software directs the computer to rectify the problem.


Correcting Wheel Speed


Wheel-speed correction is performed through the brake modulator. By pulsing
the brakes of those wheels rotating beyond a common threshold, the wheels
reduce their speed. Like the wheel-speed sensor, the brakes are modulated one
wheel at a time. Again, the speed of each wheel is examined. If the speed of
all four wheels match within a specified tolerance, the modulation will cease;
otherwise additional modulation is performed. For example, the software could
specify that if any of the wheel-sensor inputs differs from the highest by
more than 10 percent, then the ABS should activate the modulator to obtain
speeds with this range.
PortA, pins 2, 3, 4, and 5, provide the signals for the brake modulators.
Buffers as seen in Figure 3 are used to raise port- output levels high enough
for each modulator input. A modulated pulse train is produced at the output of
each of these port pins when needed. For high-frequency damping, each pulse
follows the other closely. Conversely, for low-frequency clamping of the wheel
through the brake modulator, pulses are more widely spaced. Figure 4 is the
algorithm for the software that governs wheel-speed correction.


B6 A/D Conversion


The B6 A/D converter consists of an 8-bit, successive-approximation converter
and a 16-channel multiplexer. Eight of these channels are for input; the
remainder are used for internal test functions. The eight analog inputs are
available at Port D, bits 0-7. Port D becomes an I/O port if the A/D converter
is disabled.
The A/D converter also includes a control-status register and a data-results
register. The analog inputs are selected through bits 0-3 of the A/D's
control/status register. Results of each conversion are available in the data
register.
An external voltage reference is used for the A/D because drops caused by
loading in the power-supply lines would degrade the accuracy of the A/D
conversion.
For ratio-metric conversions, the source of each analog input should use VRH
at the supply voltage and be referenced to VRL, the lowest voltage converted.
For a typical input voltage equal to or exceeding VRH, the highest voltage
converted, $FF converts as (full scale) with an overflow indication. Input
voltages equal to VRL convert as $00.


Checking Faults or Errors


Faults such as low fluid levels or low voltage can be determined through
level-sensitive inputs--that is, through the A/D converter. Each monitored
device is applied to a dedicated A/D input pin.
A minimum threshold level is specified for each device monitored. If an input
falls below a given threshold, a fault code associated with that device can be
written to EEPROM. Since a fault code was detected, the microcontroller can
direct the ABS to be removed from the brake system until a trained mechanic
can rectify the problem. Figure 5 presents the algorithm for the software that
checks for faults.


Conclusion


Many automobile companies are currently focusing their attention on the
development of systems more advanced than ABS, such as traction control. In
the meantime, ABSs continue to provide a smoother, safer ride.
Figure 1: Four-wheel antilock brake system.
Figure 2: 68HC05B6 block diagram.
Figure 3: System view of ABS A/D interface.
Figure 4: Monitoring and correcting wheel speed.
Figure 5: Guidelines for checking faults.



























Implementing Distributed Objects


Doing it the easy way with NeXT's PDO




Ernest N. Prabhakar


Ernest is president of NextStep/OpenStep User Groups International, and
currently working on his PhD in experimental particle physics at the
California Institute of Technology. He can be reached at ernest@caltech.edu.


Creating distributed applications is generally considered difficult. While
object-oriented programming promises to make the task more tractable, many
programmers still shudder when subjects such as CORBA, OLE, SOM, and OpenDoc
arise. However, programming with distributed objects does not have to be
difficult, if you start with the right foundation.
My first distributed application--a client-server system developed from a
small legacy C program--took just 45 minutes to write and debug. And that,
using only a DDJ article for reference. Let me explain.
The recently published Dr. Dobb's Special Report on Interoperable Objects
(Winter 1994/95) examined the major distributed-object technologies:
Microsoft's Object Linking and Embedding (OLE), IBM's System Object Model
(SOM), CI Labs' OpenDoc, Novell's AppWare Data Bus (ADB), Taligent's
CommonPoint, and NeXT's Portable Distributed Objects (PDO). The issue ends
with a challenge to vendors of distributed-object technologies to implement a
simple client-server application which consists of packaging an existing C
program (the "legacy app") into an interoperable object and its client. For
whatever reasons, only Microsoft and IBM responded to this challenge. The
results are presented in the article "Implementing Interoperable Objects," by
Ray Valds.
After puzzling over the pages of code from Microsoft and IBM, I read the
article "Distributed Applications and NeXT's PDO," by Dennis Gentry. It seemed
to me the job would be trivial with the PDO technology. Consequently, I
created my version of the DDJ challenge--in about 20 minutes. Not having
access to PDO at that time, my original version was written using the native
Distributed Objects (DO) facility under NextStep 3.2. Thanks to Joakim
Johansson (jocke@rat.se), my application was then tested under PDO 2.0.
In this article, I'll describe how to use PDO and Objective-C to write
distributed applications. I'll focus on concepts rather than on a detailed
walk-through of my code. Once you understand the concepts, the code is
trivial.


Three Steps to Distributed Objects


Applications that use distributed objects rely on an application-enabling
foundation that provides the mechanisms for object distribution. This
foundation is part of the system platform. To implement a distributed-object
foundation, all that's needed are objects, a means for finding them, and a
mechanism for distributing messages. While this may seem obvious, it is
amazing how many companies attempt to create so-called interoperable-object
frameworks without a proper object foundation. If you choose smart enough
objects, you can even get a lot of the distribution thrown in for free.
Portable Distributed Objects (PDO) is a distributed-object facility intended
to run on a number of platforms (such as Solaris and HP-UX) and interoperate
with the native Distributed Objects (DO) facility in NextStep. First, NeXT
starts with Objective-C, which adds Smalltalk-like objects and run time to C
(see the accompanying text box entitled "Objective-C and Distributed
Objects"). Second, to locate objects, NeXT uses the Mach nmserver (originally
part of NextStep, but later shipped separately as part of PDO). Finally, to
forward messages across a network, NeXT implements a proxy system. That's all
it takes.


Implementing a Distributed Application


In writing my application, I spent most of my time creating ordinary
Objective-C objects, a process I already knew how to do. After defining the
ordinary object MyServer, you simply register it with the nmserver to make it
distributed, as in Example 1.
When this program runs, it instantiates an object of class MyServer, registers
this object under a particular name, then starts an event loop in that thread
to service requests. Variants of this technique allow you to start up a server
object as a separate thread, or even as a multithreaded object.
The client side is equally simple; see Example 2. Instead of explicitly
allocating a server object, you merely ask for a connection to it. The server
object can be on the same machine or anywhere on the network--the NXConnection
facility will find it. This facility returns an NXProxy object, to which one
can send messages, just as if it were actually an instance of MyServer. As you
can see in the example, you connect to a computation server that multiplies
two floating-point numbers and returns a result. The code also shows how a
server vends other types of objects, such as strings and object references.
Using the type id, you can take advantage of Objective-C's dynamic typing and
send the proxy a message regardless of the object's actual class. The NXProxy
takes care of all the work of translating and transporting the arguments and
return types. It all just works!
Well, not exactly. Up to this point, the code does work, but it is inefficient
and leaks memory. To see how to make a distributed, heterogeneous system work
properly, you must look inside NXProxy.


Protocols in Objective-C


The basic problem is how to deal with potential architectural differences
between client and server machines. For example, you might be running an Intel
(Little-endian) client, accessing a RISC (Big-endian) server. Converting
between the two memory formats is simple, but how do you know when to do it?
This is complicated by the fact that Objective-C selectors do not have the
static typing of C++ member-function calls. That is, the add: selector can
equally accept an integer argument for the Integer class, or a float argument
for the Float class. Which method gets invoked is determined only at run time,
by the actual object called.
To address this, NXProxy has to make a round-trip inquiry of the remote object
to find out the effective signature (argument types) for the actual method. It
then uses this information to package and send the arguments in a
machine-independent format across an NXConnection, where they are unpacked and
delivered to the object. Return values are handled in the same way.
This method involves a large amount of overhead, which is usually solved with
static typing, but that is not appropriate to use with a proxy, since it has a
different data type. Consequently, NeXT introduced the concept of protocols
into the Objective-C language. Protocols are pure interface, similar to
abstract base classes in C++. They follow their own hierarchy and can use
multiple inheritance. This increased separation of interface from
implementation is a powerful tool in its own right, but here I'm concerned
only with its implications for distributed-object applications.
A protocol declaration looks just like an interface declaration in
Objective-C, with inherited protocols in place of a superclass definition.
Angle brackets indicate that a class or instance variable adopts a given
protocol. When a message is sent to a proxy with a protocol indication, it
uses the (local) protocol information to determine the method signature,
rather than making a round trip to the remote object.
Protocols also provide a form of typechecking, in that any attempt to send a
message to a variable that is not in the message's supported protocols results
in a compile-time warning. The compiler will also warn you if you assign it an
incompatibly typed object. You can also use the conformsTo: method to manually
verify that a vended object supports the given protocol. Example 4(a)
specifies a protocol; the application code that uses this protocol is in
Example 4(b).


Memory Allocation


Since you obviously can't pass pointers across different address spaces, there
has to be some mechanism to repackage out-of-line data for transmission across
the network. By using the method signatures just described, NXProxy can
determine when a pointer is being passed. If it is a pointer to an object, it
simply sends another proxy instead. If it is a pointer to a normal variable,
it just sends a copy of the information being pointed to over the network. The
connection on the other side recreates the appropriate pointer, sends back any
modifications, then destroys the remote copy after the remote-procedure call
(RPC) finishes.
In general, this solution works pretty well. The system is smart enough to
realize that not all information has to go both ways. For example, a const
char* can't be modified by the remote process, and hence needn't be returned.
To fine-tune the behavior, you can use the special keywords in, out, and inout
in the protocol declaration. You can also use the keyword bycopy to indicate
that an object that satisfies the NXEncoding and NXDecoding protocol should be
encoded and passed over the wire, rather than sending a proxy. The specifier
oneway is used for void methods that do not need to return, and hence can be
called asynchronously. 
The problem with this behavior is that sometimes you want a passed parameter
to persist--say, when you're adding a string to a lookup table. Or sometimes
you are returning a string to the calling routine, so there's no clearly
appropriate lifetime. To allow for this, NeXT introduced a hack into their
distributed object run time, preventing it from destroying strings created in
an RPC process. Thus, using objects in the normal fashion results in a memory
leak each time a string is passed back or forth.
Because of this, you need to know whether an object will be used locally or
remotely, and be able to add the code to manually free any string pointers
created in consequence. This is a rather ugly situation, destroying the
otherwise beautiful symmetry of NeXT's Distributed Object system.
Rather than just undo the hack, NeXT solved the more-general problem of
temporary objects in the FoundationKit (see the text box "NeXT's
FoundationKit"), used in PDO 3.0. At the time of this writing, PDO 3.0 was
just about to go into beta, so my example uses the shipping version, PDO 2.0. 



The PhoneDir Example 


At this point, you should be able to read and understand the code for the DDJ
interoperable-object challenge. As you may recall, the requested application
was the simplest possible client-server application, an example called the
"One-Minute Phone Directory" (so-called because it is so small, implemented in
less than 200 lines of C). Listings One through Four present my implementation
of the PhoneDir example. I won't repeat the original C code here, but it is
available electronically; see "Availability," page 3.
The goal of the exercise is not to show off application functionality, but to
focus on the machinery needed to turn a piece of legacy C code into an
application that uses distributed objects. To this end, I did not rewrite the
legacy code, even though it would have been trivial. Rather, the legacy code
is called from inside PhoneDir, an object I defined that adopts the
PhoneDirectory protocol. A simple server application vends the remote object,
and the client application is virtually identical to the nonobject case. The
total Objective-C code is the same size as the nondistributed C program. A
prototype version written entirely using the FoundationKit was actually
shorter than the C version.
Objective-C and Distributed Objects
Objective-C is a hybrid, object-oriented programming language (OOPL)
originally developed by Brad Cox of Stepstone. It consists of a few Smalltalk
features layered on top of ANSI C: messaging, class definition, and the id
data type. Smalltalk syntax and semantics make it much simpler and easier to
learn than C++, at the price of a look-and-feel that is slightly foreign (at
least to C programmers). 
Objective-C is a dynamically bound, single-inheritance OOPL using both static
and dynamic typing. Its principal difference from C++ is that Objective-C
handles most decisions at run time, rather than compile time. This requires a
great deal of run-time information, which, while incurring some overhead,
allows the use of more-powerful programming paradigms. 
A good caching strategy results in messaging overhead around three times that
of a function call. Of course, that is still far slower than an inline
function call or pointer dereference; for time-critical sections, you can drop
back into straight C.
Objective-C objects and messages are "self-conscious," that is, you can ask an
object its name, whether it responds to certain messages, and what its method
signatures are (that is, what are the types of its arguments). Objects can
also manipulate messages as first-class objects, using the @selector()
directive. For example, a List object can use the -perform: method with a
message argument to make its constituent objects perform the appropriate
method. It can even check first to send the message only to objects that know
how to respond to it!
The key feature that makes Objective-C useful for distributed objects is its
ability to "forward" messages; see Example 3. Dynamic typing allows you to
send any message to an object, even a message that's not part of the object's
defined interface. While this ability should be used sparingly (it is optional
in Objective-C), it allows for a great deal of flexibility when prototyping,
or dealing with objects from an outside source. If the object cannot respond
to a message, the default behavior is to return a run-time error. However, if
you implement a forward:: method, you can choose to forward that message to
another object, known as a "delegate." 
Objective-C has been extended by NeXT in a variety of ways, including
protocols and better integration with C++. These extensions are tracked by the
Free Software Foundation in the GNU Objective-C compiler, part of the current
GCC distribution. Sun is also integrating Objective-C with its C++ compiler as
part of OpenStep for Solaris, scheduled for release later this year.
--E.N.P.
NeXT's FoundationKit
NeXT first introduced Distributed Objects (DO) as part of NextStep 3.0 in
1992, followed a year later by Portable Distributed Objects (PDO) for HP-UX.
Since then, it has become clear that one of the main motivations for the use
of object-oriented technology is to simplify distributed client/server
computing. NeXT has therefore extended the foundations of Objective-C to
optimize its support for object distribution over a network.
The resulting technology is called "FoundationKit" (FK) and consists of
numerous classes, plus a new paradigm for memory allocation called
"autorelease." Since most action takes place in response to an event (user
interaction or RPC), this provides a natural lifetime for temporary objects.
Instead of being freed, an object can be autoreleased, meaning that it will be
cleaned up the next time through the event loop. If you wish to hold on to an
object, you send it a retain message; when you no longer need it, you send a
release message.
Autorelease is based on reference counting and is not as powerful as true
garbage collection: It requires programmer intervention and does not handle
cyclic references. However, it has lower overhead than any other distributed
garbage-collection scheme and the overriding virtue of being easy to implement
over a network. Most importantly, it provides a uniform solution to the
temporary-object problem, which crops up whenever you pass pointers across a
network or return an object from within a method. The freeing that takes place
between user events also helps minimize its impact on response time.
FoundationKit provides numerous other classes to aid in developing
applications and frameworks. Network-savvy object wrappers for primitive data
types (numbers, points, buffers, and collections) aid in passing them over the
network. Objects manage Objective-C information (method signatures and
exceptions). Most importantly, a Unicode-supporting NSString object allows
NeXT to merge the release of NextStep for European and Asian languages, which
were previously separate. However, there is still no official word on a
revised Text object which would support alternate layouts (right-to-left,
vertical).
FoundationKit ships as part of NextStep 3.3 and will be in PDO 3.0, slated for
the middle of this year. NextStep 3.3 is the first release of NextStep (and
possibly any other operating system) that will run transparently on four
different CPU architectures: Motorola, Intel, PA-RISC, and SPARC. Applications
compiled on any one platform using NeXT's MAB (Multiple Architecture Binary)
technology, which uses GNU's cross-compilation tools, will run identically on
any of the others. FoundationKit also forms the basis of OpenStep, NeXT's
OS-independent API, which will be available on Sun's Solaris and Digital's
OSF/1 for Alpha near the end of 1995. NeXT recently confirmed that OpenStep
will be ported to Windows 95 and Windows NT. FoundationKit is also considered
the starting point for "Mecca," NeXT's next-generation technology, which is
intended to compete against Taligent and Cairo in 1996 or 1997.
--E.N.P.
Example 1: A simple server application.
main()
{
 static const char* serverName = "ServerName";
 // create and initialize the Server object
 MyServer* mine = [[MyServer alloc] init];
 // register the Server with the nameserver facility
 NXConnection* conn = [NXConnection
 registerRoot:mine
 withName:serverName];
 // start servicing requests 
 [conn run]; 
}
Example 2: Client for the simple server.
main()
{
 static const char* serverName = "ServerName";
 // Get the proxy for the server - scan the entire subnet
 id serverProxy = [NXConnection
 connectToName:serverName
 onHost:"*"];
 // Pass two floating-point numbers to server and get a result
 float result = [serverProxy multiply:7.0 with:5.0];
 // Get a proxy for another object vended by the server
 id objectFromServer = [serverProxy getAnotherObject];
 // Pass a string, and get one back
 char* string = [serverProxy sendAndReturnString:"string pointer"];
}
Example 3: Forwarding in Objective-C.
// Called when this object receives a message that it cannot respond to.
// Messages consist of a "selector" (i.e. "forward::") plus arguments.
// If forward:: is not defined, then "doesNotRecognize:" is automatically
// called. The variable "delegate" is an instance variable of this class.
- forward:(SEL)aSelector :(marg_list)argFrame
{
 // if the delegate can handle it, let it.
 if ( [delegate respondsTo:aSelector] )
 return [delegate performv:aSelector :argFrame];
 // otherwise, raise the error

 [self doesNotRecognize:aSelector];
}
Example 4: (a) Specifying a protocol; (b) using the specified protocol.
(a)
@protocol ServerProtocol : InheritedProtocol - multiply:(float) a with:(float)
b; - (char*) sendAndReturnString:(const char*) ptr; - (id<AnotherProtocol>)
getAnotherObject;@end@class MyServer: Object <ServerProtocol> //...@end
(b)
// Get the proxy for the serverid<ServerProtocol> serverProxy = [NXConnection
connectToName:server];// Get a proxy for another object vended by the
serverid<AnotherObject> another = [serverProxy getAnotherObject];

Listing One
//---------------------------------------------------------------
// PhoneClient.m by Ernest Prabhakar, 1995
// Transliteration of test suite into Objective-C
// The only addition is to free strings for the remote case
//---------------------------------------------------------------
#import "PhoneDir.h"
#ifdef REMOTE
#import <remote/NXProxy.h>
#endif
main(int argc, char* argv[])
{
 const char* name;
 const char* number;
#ifdef REMOTE // scan subnet for object
 id<PhoneDirectory> theDir = 
 [NXConnection connectToName:dirName onHost:"*"];
 if (!theDir) {
 printf("Server not present.\n");
 exit(1);
 } else {
 printf("Connected to remote server.\n");
 }
#else
 PhoneDir<PhoneDirectory>* theDir = [[PhoneDir alloc] init];
#endif
 // Call server to get number for name, if any.
 name = "John Doe";
 number = [theDir number:name];
 // Print out result
 if (number) {
 printf("%s's number is %s.\n", name, number);
#ifdef REMOTE 
 // For remote case, free returned string
 free((char*)number);
#endif
 } else {
 printf("%s does not have a number listed.\n",name);
 }
 // do a lookup by number
 number = "408-555-1212";
 name = [theDir name:number];
 if (name) {
 printf("%s's number is %s.\n", name, number);
#ifdef REMOTE 
 // For remote case, free returned string
 free((char*)name);
#endif
 } else {
 printf("The phone number %s has not been assigned.\n",number);
 }

 exit(0);
}

Listing Two
//---------------------------------------------------------------
// ** PhoneDir.h ** by Ernest Prabhakar, 1995
// The protocol and interface declaration for the PhoneDir class.
//---------------------------------------------------------------
#import <objc/Object.h>
static const char* dirName = "PhoneDirectory";
//--------------------Protocol Declaration----------------------
@protocol PhoneDirectory
 // Return number given name
 - (const char*) number:(const char*) name;
 // Return name given number
 - (const char*) name:(const char*) number;
@end
//-------------------Interface declaration----------------------
// Inherits name: and number: from protocol
@interface PhoneDir : Object <PhoneDirectory>
 {
 }
 // initialization and destruction
 - init;
 - free;
@end

Listing Three
//----------------------------------------------------------------------------
// PhoneDir.m by Ernest Prabhakar, 1995
// Wrapper class for the C phonedir functions. It uses the original C code
// unchanged to prove a point. Otherwise, the code for the PDO implementation
// would be about the same size as original non-distributed code!
//----------------------------------------------------------------------------
#import "PhoneDir.h"
#include "phonedir.h"
@implementation PhoneDir : Object
{
}
//---------------------------------------------------------------
- init
{
 [super init];
 phonedir_Initialize();
 return self;
}
//---------------------------------------------------------------
- free
{
 phonedir_Terminate();
 return [super free];
}
//---------------------------------------------------------------------------
// When the object is invoked over the wire with a string argument, a copy of
// the passed string is made. In that case, we have to explicity free the 
// string. On a local invocation, the string would be a reference from the 
// caller, and we should not free it. This dichotomy between local and remote
// calls--and the different treatment of char* and other pointers--is due
// to the limitations of PDO 2.0, and will not be necessary in PDO 3.0.

//---------------------------------------------------------------------------
//
// Return the phone number, given a customer name.
- (const char*) number:(const char*) name
{
 const char* result = phonedir_LookupByName(name);
#ifdef REMOTE
 free(name);
#endif
 return result;
}
//---------------------------------------------------------------
// Return the customer name, given the phone number.
- (const char*) name:(const char*) number
{
 const char* result = phonedir_LookupByNumber(number);
#ifdef REMOTE
 free(number);
#endif
 return result;
}
@end

Listing Four
//---------------------------------------------------------------
// PhoneServer.m by Ernest Prabhakar, 1995
// Creates the PhoneDir object & sets it up to service requests.
//---------------------------------------------------------------
#import "PhoneDir.h"
#import <remote/NXProxy.h>
main(int argc, char* argv[])
{
 id myDir = [[PhoneDir alloc] init];
 id myConn = [NXConnection registerRoot:myDir withName:dirName];
 printf("Starting server...\n");
 [myConn run];
}


























Examining Symantec C++ 7.0


Fast linking, 32-bit support, and distributed builds top the features list




Ira Rodens


Ira is a software developer who specializes in Windows and Motif. He can be
contacted on CompuServe at 70711,2570 or at 301-924-0596.


Without question, Windows developers have benefited from the so-called C++
wars, which have produced compilers that generate highly optimized code,
provide robust tools and utilities, and sport sophisticated development
environments--all at a reasonable cost. The most recent release of Symantec
C++ (Version 7.0) continues this trend, adding a visual programming
environment, class and hierarchy editors, a new feature that allows you to
distribute build tasks over a LAN, upgraded Microsoft Foundation Classes, a
32-bit multithreaded linker, better integration of the Multiscope debugger,
support for Windows 95 resources, and a set of visual tools called "Express
Agents." And with 7.0, Symantec has also enhanced its implementation of the
C++ language, adding support for exceptions and run-time type identification.


Integrated Development Environment


The Symantec C++ 7.0 Integrated Development and Debugging Environment (IDDE)
takes a Visual Basic-like approach to placing its windows on the Windows
desktop (as opposed to the conventional Multiple Document Interface approach).
This makes it easier for you to use Windows Help and other applications in
conjunction with the IDDE. The IDDE also features a set of
workspaces--accessed through customizable tabs in the main control
region--that let you individualize your work environment. Drag-and-drop is
used extensively to allow different views of code and data. Key components of
the IDDE include the following: 
Editor/browsers. Two of the most innovative features of the development
environment are the hierarchy and class editors, which work together to
provide a smooth visual method for program design and coding. These are true
editors, not just browsers. The Symantec approach is unique in that it uses a
parser to extract information about code structure without compiling and
linking the project first. Code is parsed as it is developed, so the
information in the hierarchy and class editors is always up to date. This
makes the development environment a pleasure to work in, compared to the
typical C++ compiler hierarchy and class browsers, which extract their
information during compilation and link times. This information is most useful
during initial program development, before the project reaches error-free
compilation and linking. 
The hierarchy editor can be used to build a complete class hierarchy for the
application; the class editor can then fill in the class members and provide
the required program functionality. You can add classes to the hierarchy, then
connect them in the desired hierarchy by dragging the connecting lines. As the
connections are changed, the source code and header files change automatically
and the class editor fills in the necessary code. This editor is a three-paned
window: The left pane contains a list of classes in the application; the right
pane shows the members, data, and typedefs for the class; and the bottom pane
shows the source. When you select a class in the left pane, the members of the
class appear in the right pane. Clicking on a member name brings up the member
source code, which can then be edited directly in its pane. Adding a member is
easy: Just click in the right pane with the right-mouse button and select
"Add" to bring up the necessary dialog box. 
The class editor's only downside is that it can't handle the mapping between
Windows messages and member functions; this requires the Class Express tool.
It would have been much more convenient to do this directly from the class
editor. In any case, the ability to develop code on a class and function level
instead of dealing with individual source and header files is the most
innovative feature of Symantec C++ 7.0.
ResourceStudio. This program includes a set of resource-editing tools for
creating dialog boxes, icons, bitmaps, and the like. Also included are a
version editor, which provides version information for install programs, and a
string-table editor. Together with the Class Express tool, the ResourceStudio
integrates the creation of dialogs and the subsequent building of the
necessary code. 
The project facility. The project facility utilized by Symantec 7.0 provides a
complete set of features and has links to Intersolv's PVCS version-control
system, allowing check-in/out of files from the IDDE. By adding subprojects to
a project, you can maintain all of your libraries and DLLs within the same
project as your .EXE files and have them automatically rebuilt when changes
are made. The project facility includes automatic dependency checking for
inclusion of header files. You can process files using external makefiles
based on processors such as lex and yacc. You can build either a debug or
release version of the application by checking a box in its settings dialog.
The debug version includes debug information included in the object, and
enables assertions and additional error checking desirable for debugging.
Another useful feature of the project facility is the use of option sets for
saving all of the options associated with a project, allowing option sets to
be easily switched on a project.
Linker. Symantec C++'s Optlink linker (which was always fast) is now a native,
32-bit application. Consequently, it can take advantage of multithreading to
improve link time and provide support for both 16- and 32-bit applications.
(To further improve the linker's performance, Symantec has eliminated CVPACK.)
Compiling and linking is processed in the background, even under Windows 3.1,
and other tasks performed during the build phase have surprisingly good system
response. 
NetBuild. This utility allows distributed builds over networks with a NetBIOS
interface. This lets you off-load parts of the build onto cooperating
machines, thus speeding up builds on large development projects. Symantec
claims that even if one of the remote machines goes down during build
processing, the build will still complete successfully.


Creating an Application


The Symantec compiler can target DOS, Extended DOS, and 16- and 32-bit
Windows. To explore the Symantec tools, I developed an OLE 2.0 container
application under Win32s. (Complete source code for this project is available
electronically; see "Availability" on page 3.) This option produces a 32-bit
application that can run under either Windows 3.1 (with Win32s DLLs), Windows
95, or Windows NT. I first used AppExpress to automatically generate much of
the framework code using MFC 3.0. AppExpress guided me through a few dialogs
and then created five source files, six classes, associated header files, and
a resource file.
Building the project produced a reasonable-looking SDI application with a
dockable toolbar. I was able to insert a Microsoft Graph object and then save
and retrieve the file. It even created a skeleton About box. All of this was
accomplished without writing a single line of code--not bad for 15 minutes of
work. However, the window wasn't scrollable and there was not yet a way to
select objects, size them, or move them about.
To make the window scrollable, I made CScrollView the base class of
CcontainView (the class representing the client-view window) instead of CView;
see Listing One. I did this by dragging the connecting line in the hierarchy
editor from CView to CScrollView. The header files were then automatically
updated. I added a bit of code to create and initialize the scroll bars and
handle resize events, and I was off and running. The scroll bars came up
beautifully, but did not respond to the mouse. I figured out that when I had
replaced the base class for CcontainView, the hierarchy editor had not changed
the message-mapping macros accordingly. MFC normally handles messages in the
derived class. If a message handler is not found there, the message gets
passed up to the base class for processing. The message is passed up the class
hierarchy until either it is processed or it reaches the top-most class,
whereupon it is processed by the default window procedure. To make this scheme
work, the BEGIN_MESSAGE_MAP macro associates a derived class with its base. If
BEGIN_MESSAGE_MAP does not point to the correct base, messages will not be
handled properly.
Once this problem was solved, the scroll bars worked properly. AppExpress
added some drawing code for the view, but it could draw only one item, which
always appeared at a hardcoded position within the document. MFC provides C++
classes to serve as wrappers for most of the Windows structures. A CRect class
item, rctPos, was added as a member of the client-item class,
CcontainCntrItem, to allow the objects to be moved around in the window.
Another problem with scroll bars is that the item's position in the window
does not correspond to its position within the document. Therefore, I added
the member function GetViewRect, which converts document position to window
position given the position of the scroll bars. Dummy functions were created
to do this for all of the methods that dealt with position and to pass the
information to the corresponding function in the base class. The item's
position within the document was saved into the file by modifying the
Serialize member of CcontainCntrItem.
Once the scrolling and drawing code was working correctly, I turned my
attention to moving and sizing the objects. MFC provides a CRectTracker class
that has most of the necessary code. CRectTracker lets you draw several styles
of bounding box, plus sizing handles. It also sets the cursor, showing the
move cursor or sizing cursors, as appropriate. Deriving a CRectSelect class
from CRectTracker provided the required functionality for the application
(CRectTracker even tracks the cursor).


Debugging the Code


When I tried to build and link the application, several error messages
scrolled by in the output window. Clicking on the error message took me to the
offending spot in the source code, and after a few cycles of correct and
build, a clean build was produced. Full of enthusiasm, I clicked on the
Execute Program menu item. The Symantec IDDE minimized itself to an icon, and
the program began running. Then I used the program to insert a Paintbrush
object. This worked fine. 
However, although I clicked all around the object, I still couldn't get the
bounding box to appear. Finally, I clicked on another part of the window and a
bounding box appeared. Clearly it was time to fire up the debugger, which uses
Multiscope's Windows-based debugging technology. Like any Windows-based tool,
Multiscope lets you open a plethora of windows and simultaneously look at lots
of code, data, and debug information. However, since the debugger windows are
just like any other windows under Windows, the debugger introduces paint
events, mouse-move events, and so on, which may interfere with the very events
you are trying to debug. In this respect, previous-generation debuggers (such
as Borland's Turbo Debugger) are easier to use.
Nevertheless, the Symantec debugger was full featured, providing many ways to
view and modify data and the ability to easily step through the code and set
both absolute and conditional breakpoints. MFC uses assertions and C++
exception handling extensively, and this provides valuable clues when the
program blows up during debug. These debug features are conditionally compiled
into the code and turned off when a production build is selected through the
Project Settings dialog. The debugger allows full control of threads and adds
a Threads View to inspect multiple threads in your application. From the
Threads view, you can drag a thread over to the source view to display the
code for that thread. You can also drag C/C++ code from the Source View to an
Assembly View. The code is disassembled on the fly. Finally, the new hardware
watchpoints include Pentium support for Windows NT and Windows 95.


Benchmarking with the Migration Tool


Microsoft and Symantec have jointly developed the "MFC Migration Tool," a
utility that helps C programmers migrate existing Windows code to MFC. As
preparation for the migration process, you must first compile your project
with all compiler warnings turned on and ensure that the code is as clean as
possible. Next, run your code through the Migration Tool, which compares your
code to a set of migration guidelines and reports on potential migration
problems within your code. The tool will step you through the code, and even
let you edit your source. Finally, you must create an MFC skeleton (using
AppExpress) and move the C code into MFC classes. Tasks here include moving
your WinMain() code to MFC's WinWain(), dropping your WM_Paint code into the
OnDraw member function of the appropriate view class, converting your
WndProc's switch statements to a view-class member function, and moving the
WM_COMMAND code into a member function. Message handlers must also be
converted from switch statements to MFC handler functions. The tool comes with
complete help files on steps to further integrate your application with MFC.
Symantec provides a set of timing benchmarks, one of which uses the source
code from the MFC Migration Tool. The timer test measures the time it takes to
build an executable. This test was performed on a 486/66 with 16 MB RAM under
Windows NT 3.5. The source code to the MFC Migration Tool was compiled using
four scenarios: compiled for speed, compiled for size, debug information
included, and debug information included but no .MAP file generated. Table 1
presents the results of the timings along with the size of the generated .exe
files. The entire project, complete with source code and makefiles, is
available electronically. If you're interested in how other compilers stack
up, you should be able to compile this code with any development platform that
supports MFC, including Visual C++, Watcom C/C++, and MetaWare's High C/C++.


Conclusion



MFC provided a quick start in writing the code and allowed the use of
thousands of lines of prewritten code. The result: I produced an OLE
application with a minimal knowledge of the OLE API, since most of the
difficult coding work was embedded within the MFC framework.
Symantec C++ 7.0 is a well-integrated set of tools that maximizes programming
productivity by letting you work in the realm of classes and functions in a
hierarchical arrangement, rather than flipping through source files trying to
make sense of a dizzying array of classes and functions. While some of the
tools that come with it have a few rough edges, the package provides a
powerful environment for developing C++ applications.


For More Information


Symantec C++ 7.0
Symantec
10201 Torre Avenue
Cupertino, CA 95014-2132
408-253-9600
Optimizing C++ Code
Walter Bright
Walter, the original author of the Symantec compiler, can be contacted at
wbright@symantec.com.
It's generally accepted that the more C++ features you use in a class, the
slower your code will be. Fortunately, you can do a few things to tip the
scales in your favor. First, don't use virtual functions, virtual base
classes, destructors, and the like, unless you need them. Some compilers will
return structs that fit into 1, 2, 4, or 8 bytes in the general-purpose
registers, provided that no constructor is declared. Compile Listing Two with
and without the constructor and see the difference in the code your compiler
generates. 
C++ exception handling is another feature to be wary of. The jury is still out
on whether it makes your code faster, but just using it adds overhead that is
roughly proportional to the number of automatic objects of classes with
destructors in the program. So if you need exception handling, try to cut down
on automatic objects. Instead, use referenced objects and design them so that
they don't need a destructor. Objects without destructors needn't be cleaned
up by the exception-handling code, so no overhead will be added to keep track
of them. Note that an empty destructor (such as ~X() { }) is not good
enough--there must be no destructor to avoid the overhead.
Another source of bloat is multiple inheritance, especially when using virtual
base classes. For faster code, stick with single inheritance.
Virtual functions add bloat because they need extra code to be called and
their class data contains an extra pointer, which guarantees you'll have a
constructor. So for a complex class hierarchy with only one or two virtual
functions, consider removing the virtual aspect, and maybe do the equivalent
with a test and branch.
Memory Allocation
Most complex programs spend a lot of time allocating and freeing large numbers
of small objects. The usual storage allocators (malloc, free, new, and delete)
are general purpose and rather slow. Constructing a storage manager for your
specific needs can speed things up a lot.
For instance, if your program allocates a bunch of objects, but never frees
them, malloc's extra bookkeeping is unnecessary and expensive. Instead, you
can use a simple heap allocator; see Listing Three. If you are allocating and
freeing a large number of identically sized objects, a custom allocator that
deals with fixed-size objects can be a lot faster.
Virtual Memory
Most modern programs for any system more advanced than 16-bit DOS run under
virtual memory. But a program cannot execute or read data directly from disk;
it must be loaded into memory first. If the program accesses many different
areas of its address space, the operating system must continually exchange
memory and disk, and it is fairly easy to construct a program that thrashes
itself to the point where the system slows to a crawl. Optimizing minimizes
this thrashing so that whatever is needed is nearly always already in memory.
There are two aspects to this: code and data. 
The worst case is a heavily used loop that calls a function in each page of
the program, which will minimize performance. You need to do the reverse:
Control the ordering and location of functions in the program so that strongly
connected functions (that call each other a lot) are grouped together,
hopefully in the same virtual-memory page. This can have a side benefit of
localizing the code into a fast memory cache. The problems in doing this are:
Code tends to be written so that statically--not dynamically--related
functions are grouped together (like all the member functions of a particular
class).
Modern code projects tend to be extremely large, and it can be next to
impossible to manually group functions in an optimal way. Unfortunately, very
large projects are most likely to benefit from function-order optimization! 
The solution, of course, is to use a profiling tool which finds the dynamic
relationship among functions by gathering statistics at run time, then
computes a reasonably optimal link order.
Note that modern operating systems do not actually read your whole program
into memory before executing it. They simply add the disk image of the program
into the virtual-memory system, and the program is then swapped into memory as
it is executed. So, the speed at which a program loads strongly depends on the
number of pages that must be swapped in to execute the initialization code.
Grouping all the initialization code, therefore, will speed up program loads.
Grouping rarely used routines together means that only in rare cases will they
actually ever be read off of the disk, so you can bloat your programs with
impunity!
Optimizing virtual-memory performance for data is not quite as easy. If your
program has to regularly search a large, in-memory data structure, an
algorithm that accesses many of the pages to perform a common operation will
be terribly slow. The data structures need to be organized so that lookups and
other accesses are done directly with as little searching as possible. Think
of the file-system structure on a floppy disk: If, to read a bit of the
floppy, your program had to read the floppy each time, it would be unusably
slow. The file-system data structure on the disk minimizes the amount of disk
that must be read for any particular access. Organize your data structures
with the same thought. Keep in mind that address space is not the same as
real-memory use. Address space is essentially free, so burn it in favor of
reducing page accesses. Address space that is allocated but never accessed
costs you neither memory nor hard disk. Also, with virtual memory, a
read/write to memory will cost you twice what a read would. This is because
when the virtual-memory system is swapping, if a page was not written to, the
system can discard it; otherwise, the system must write that page back to
disk. So don't write to your data structure unless you have to, and design it
so that modifying the structure involves writes to as few pages as possible.
Table 1: Timings (in seconds) and .exe file sizes for the MFC Migration Tool.
Configuration Build Time Executable Size
With Debug Info 283.72 298,436
Debug, No map 260.05 298,436
Speed 333.42 106,496
Size 318.54 108,544

Listing One
// contaivw.cpp : implementation of the CcontainView class
// Copyright (c) Ira Rodens , 1995. All Rights Reserved.
#include "stdafx.h"
#include "contain.h"
#include "contadoc.h"
#include "cntritem.h"
#include "contaivw.h"
#include "crectsel.h"
#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
//////// CcontainView ////////
IMPLEMENT_DYNCREATE(CcontainView, CScrollView)
BEGIN_MESSAGE_MAP(CcontainView, CScrollView)
 //{{AFX_MSG_MAP(CcontainView)
 ON_WM_SETFOCUS()
 ON_WM_SIZE()
 ON_COMMAND(ID_OLE_INSERT_NEW, OnInsertObject)
 ON_COMMAND(ID_CANCEL_EDIT, OnCancelEdit)

 ON_WM_LBUTTONDOWN()
 ON_WM_SETCURSOR()
 ON_WM_INITMENUPOPUP()
 ON_COMMAND(ID_EDIT_CUT,OnEditCut)
 ON_COMMAND(ID_EDIT_COPY,OnEditCopy)
 ON_COMMAND(ID_EDIT_PASTE,OnEditPaste)
 ON_UPDATE_COMMAND_UI(ID_EDIT_CUT,OnUpdateEditCut)
 ON_UPDATE_COMMAND_UI(ID_EDIT_COPY,OnUpdateEditCopy)
 ON_UPDATE_COMMAND_UI(ID_EDIT_PASTE,OnUpdateEditPaste)
 //}}AFX_MSG_MAP
 // Standard printing commands
 ON_COMMAND(ID_FILE_PRINT, CView::OnFilePrint)
 ON_COMMAND(ID_FILE_PRINT_PREVIEW, CView::OnFilePrintPreview)
END_MESSAGE_MAP()
////// CcontainView construction/destruction ////// 
CcontainView::CcontainView()
 : CScrollView()
{
 m_pSelection = NULL;
 m_pSelectRect = NULL;
}
CcontainView::~CcontainView()
{
 if (m_pSelectRect) delete m_pSelectRect;
}
/////// CcontainView drawing /////// 
void CcontainView::OnDraw(CDC* pDC)
{
 CcontainCntrItem* pItem;
 CcontainDoc* pDoc = GetDocument();
 ASSERT_VALID(pDoc);
 POSITION pos = pDoc->GetStartPosition();
 CPoint ptScroll = -GetScrollPosition();
 while ((pItem = (CcontainCntrItem*)pDoc->GetNextClientItem(pos)) != NULL) {
 pItem -> Draw(pDC,ptScroll);
 }
 if (m_pSelectRect != NULL) {
 m_pSelectRect->Draw(pDC, ptScroll);
 }
}
void CcontainView::OnInitialUpdate()
{
 SIZE size;
// Set size for 8.5 X 11 inch paper 
 const double width = 8.5 ;
 const double height = 11 ;
 CView::OnInitialUpdate();
 m_pSelection = NULL; // initialize selection
 if (m_pSelectRect) {
 delete m_pSelectRect;
 m_pSelectRect = NULL;
 } 
 ShowScrollBar(SB_BOTH,TRUE);
// set for standard 8.5 X 11 inch paper 
 CWindowDC cDC(NULL);
 size.cx = (int)(width * (double)cDC.GetDeviceCaps(LOGPIXELSX));
 size.cy = (int)(height * (double)cDC.GetDeviceCaps(LOGPIXELSY));
 SetScrollSizes(MM_TEXT,size);
 

}
/////// CcontainView printing /////// 
BOOL CcontainView::OnPreparePrinting(CPrintInfo* pInfo)
{
 // default preparation
 return DoPreparePrinting(pInfo);
}
void CcontainView::OnBeginPrinting(CDC* /*pDC*/, CPrintInfo* /*pInfo*/)
{
 // TODO: add extra initialization before printing
}
void CcontainView::OnEndPrinting(CDC* /*pDC*/, CPrintInfo* /*pInfo*/)
{
 // TODO: add cleanup after printing
}
/////// OLE Client support and commands /////// 
BOOL CcontainView::IsSelected(const CObject* pDocItem) const
{
 // The implementation below is adequate if your selection consists of
 // only CcontainCntrItem objects. To handle different selection
 // mechanisms, the implementation here should be replaced.
 return pDocItem == m_pSelection;
}
void CcontainView::OnInsertObject()
{
 // Invoke the standard Insert Object dialog box to obtain information
 // for new CcontainCntrItem object.
 COleInsertDialog dlg;
 if (dlg.DoModal() != IDOK)
 return;
 BeginWaitCursor();
 CcontainCntrItem* pItem = NULL;
 TRY
 {
 // Create new item connected to this document.
 CcontainDoc* pDoc = GetDocument();
 ASSERT_VALID(pDoc);
 pItem = new CcontainCntrItem(pDoc);
 ASSERT_VALID(pItem);
 // Initialize the item from the dialog data.
 if (!dlg.CreateItem(pItem))
 AfxThrowMemoryException(); // any exception will do
 ASSERT_VALID(pItem);
 // If item created from class list (not from file) then launch
 // the server to edit the item.
 if (dlg.GetSelectionType() == COleInsertDialog::createNewItem)
 pItem->DoVerb(OLEIVERB_SHOW, this);
 ASSERT_VALID(pItem);
 pDoc->UpdateAllViews(NULL);
 }
 CATCH(CException, e)
 {
 if (pItem != NULL)
 {
 ASSERT_VALID(pItem);
 pItem->Delete();
 }
 AfxMessageBox(IDP_FAILED_TO_CREATE);
 }

 END_CATCH
 EndWaitCursor();
}
// The following command handler provides the standard keyboard
// user interface to cancel an in-place editing session.
void CcontainView::OnCancelEdit()
{
 // Close any in-place active item on this view.
 COleClientItem* pActiveItem=GetDocument()->GetInPlaceActiveItem(this);
 if (pActiveItem != NULL)
 {
 pActiveItem->Close();
 }
 ASSERT(GetDocument()->GetInPlaceActiveItem(this) == NULL);
}
// Special handling of OnSetFocus and OnSize are required for a container
// when an object is being edited in-place.
void CcontainView::OnSetFocus(CWnd* pOldWnd)
{
 COleClientItem* pActiveItem= GetDocument()->GetInPlaceActiveItem(this);
 if (pActiveItem != NULL &&
 pActiveItem->GetItemState() == COleClientItem::activeUIState)
 {
 // need to set focus to this item if it is in the same view
 CWnd* pWnd = pActiveItem->GetInPlaceWindow();
 if (pWnd != NULL)
 {
 pWnd->SetFocus(); // don't call the base class
 return;
 }
 }
 CView::OnSetFocus(pOldWnd);
}
void CcontainView::OnSize(UINT nType, int cx, int cy)
{
 CView::OnSize(nType, cx, cy);
 UpdateBars();
 COleClientItem* pActiveItem= GetDocument()->GetInPlaceActiveItem(this);
 if (pActiveItem != NULL)
 pActiveItem->SetItemRects();
}
////// CcontainView diagnostics /////// 
#ifdef _DEBUG
void CcontainView::AssertValid() const
{
 ASSERT (!((m_pSelection && !m_pSelectRect) 
 (!m_pSelection && m_pSelectRect)));
 CView::AssertValid();
}
void CcontainView::Dump(CDumpContext& dc) const
{
 CView::Dump(dc);
}
CcontainDoc* CcontainView::GetDocument() // non-debug version is inline
{
 ASSERT(m_pDocument->IsKindOf(RUNTIME_CLASS(CcontainDoc)));
 return (CcontainDoc*)m_pDocument;
}
#endif //_DEBUG

/////// CcontainView message handlers /////// 
void CcontainView::OnLButtonDown(UINT nFlags, CPoint point)
{
 CcontainCntrItem* pNewSelection;
 CcontainCntrItem* pItem;
 COleClientItem * pActive;
 CcontainDoc* pDoc = GetDocument();
 if ((pActive = pDoc->GetInPlaceActiveItem(this)) != NULL) {
 pActive -> Close();
 ASSERT (pDoc->GetInPlaceActiveItem(this) == NULL);
 return;
 }
 CPoint ptScroll = GetScrollPosition();
 CPoint ptMouse ;
 ptMouse.x = point.x + ptScroll.x;
 ptMouse.y = point.y + ptScroll.y;
 
 if (m_pSelectRect) {
 if (m_pSelectRect->HitTest(point) != CRectTracker::hitNothing) {
 CRect rctOld;
// Find original rectangle 
 m_pSelectRect->GetTrueRectangle(rctOld,-ptScroll);
// Resize/move object 
 m_pSelectRect->Track(this,point,ptScroll);
// Redraw original location 
 InvalidateRect(&rctOld);
// Draw object in new location 
 {
 CClientDC dc(this); 
 m_pSelection -> Draw(&dc,ptScroll);
 m_pSelectRect -> Draw(&dc,ptScroll);
 } 
 return;
 }
 }
 pNewSelection = NULL;
 POSITION pos = pDoc->GetStartPosition();
 while ((pItem = (CcontainCntrItem*)pDoc->GetNextClientItem(pos)) != NULL) {
 if ((pItem->GetRect()).PtInRect(ptMouse)) {
 pNewSelection = pItem;
 break;
 }
 }
 ChangeSelection(pNewSelection);
}
BOOL CcontainView::OnSetCursor(CWnd* pWnd, UINT nHitTest, UINT message)
{
 BOOL bAns = FALSE;
 if ((pWnd != this) (nHitTest != HTCLIENT)) {
 ::SetCursor(::LoadCursor(NULL,IDC_ARROW));
 } 
 else if (m_pSelectRect) {
 CPoint ptScroll = GetScrollPosition();
 bAns = m_pSelectRect->SetCursor(pWnd,nHitTest,ptScroll);
 }
 if (!bAns) {
 ::SetCursor(::LoadCursor(NULL,IDC_CROSS));
 }
 return TRUE;

}
void CcontainView::OnEditCut()
{
 OnEditCopy();
 DeleteSelection();
}
void CcontainView::OnEditCopy()
{
 m_pSelection -> CopyToClipboard();
}
void CcontainView::OnEditPaste()
{
 CcontainCntrItem *pObject = new CcontainCntrItem(GetDocument());
 if (!pObject -> CreateFromClipboard()) delete pObject;
}
void CcontainView::DeleteSelection()
{
 if (m_pSelection != NULL) {
 CRect rctObj;
 m_pSelectRect -> GetTrueRectangle(rctObj,-GetScrollPosition());
 delete m_pSelection;
 m_pSelection = NULL;
 delete m_pSelectRect;
 m_pSelectRect = NULL;
 InvalidateRect ((LPRECT)&rctObj,TRUE);
 } 
}
void CcontainView::ChangeSelection(CcontainCntrItem *pNewSelection)
{
 if (pNewSelection != m_pSelection) {
 CPoint ptScroll = GetScrollPosition();
 if (m_pSelectRect) {
 CRect rctOld;
 m_pSelectRect->GetTrueRectangle(rctOld,ptScroll);
 InvalidateRect(&rctOld);
 delete m_pSelectRect;
 m_pSelectRect = NULL;
 }
 m_pSelection = pNewSelection;
 if (m_pSelection) {
 m_pSelectRect = new CRectSelect(m_pSelection);
 {
 CClientDC dc(this);
 m_pSelectRect->Draw(&dc,ptScroll);
 }
 }
 } 
}
void CcontainView::OnUpdateEditCut(CCmdUI *pCmdUI)
{
 pCmdUI -> Enable(m_pSelection != NULL);
}
void CcontainView::OnUpdateEditCopy(CCmdUI *pCmdUI)
{
 pCmdUI -> Enable(m_pSelection != NULL);
}
void CcontainView::OnUpdateEditPaste(CCmdUI *pCmdUI)
{
 BOOL bFlag=CcontainCntrItem::CanPaste() CcontainCntrItem::CanPasteLink();

 pCmdUI -> Enable(bFlag);
}

Listing Two
extern struct X { int member;
 X(); // try commenting out this line
} x;
struct X test() { return x; }

Listing Three
class Heap {
static void *heap;
static size_t heapleft;
public:
static inline void *malloc(size_t size)
{ void *p;
 if (heapleft < size)
 { size_t newsize = (size + 4095) & ~4095;
 p = malloc(newsize);
 if (!p) return p;
 heapleft = newsize;
 heap = p;
 }
 p = heap;
 *(char **)&heap += size;
 heapleft -= size;
 return p;
}
static inline void *calloc(size_t size)
{ void *p;
 p = malloc(size);
 return p ? memset(p,0,size) : p;
}
static inline void free(void *p) { }
};




























Developing C++ NLMs


Walking the NDS tree--in C++




W. Dale Cave


Dale, a software engineer for Novell, is currently working on NetWare
Directory Services. He can be contacted at dcave@novell.com.


Netware Loadable Modules (NLMs) are 32-bit utilities that, when run on a
NetWare server, are dynamically linked into NetWare, thereby extending the
network operating system. Many NetWare features--disk drivers, LAN drivers,
and libraries such as CLIB, DSAPI, and NWSNUT--are NLMs. Even NetWare
Directory Services (NDS) is an NLM, DS.NLM. These NLMs take advantage of
NetWare's multitasking, multithreaded, and resource-management capabilities.
DSBROWSE, the utility I present here, allows you to view (or "walk") the
NetWare Directory Services tree. What's atypical about it is that it's written
in C++.
NLM internals are generally straightforward and have been discussed in many
books and magazines, including Dr. Dobb's Journal (see "Implementing NLM-based
Client/Server Architectures," by Michael Day, October 1992). Until now, most
NLM development has been done in C. C++ introduces more complexity.
Consequently, instead of describing how the DSBROWSE NLM works, I'll focus on
the tools and process of writing C++-based NLMs. DSBROWSE is provided in both
source and binary form; see "Availability," page 3.


Looking at the Workbench


For starters, NLM development is best done with two computers: a client
workstation, for editing, building, and remote debugging; and a NetWare
server, for running and testing the NLM.
The C/C++ compiler of choice for most NLM developers is Watcom C/C++ 10.0.
Watcom introduced C++ support in Version 9.5 and has updated it twice with
patch A and patch B. I used Version 10, patch A for developing DSBROWSE. This
compiler supports many features expected from a C++ compiler, including
templates, exception handling, I/O streams, and source-level debugging. You
can get patch A directly from Watcom by downloading the files c_a.zip and
ch_a.zip files from file area 17, "Watcom C/C++ 10.0 Problems and Fixes" forum
on Watcom's BBS. The files are also available via anonymous ftp from
ftp.watcom.on.ca in directory /pub/bbs/lang_v10.0/c.
It is necessary to install the Watcom compiler first, then the Novell NLM SDK.
Next, apply patch A (unless you are installing from an already patched
version). If the compiler is already installed, TECHINFO.EXE located in the
WATCOM\BINB directory will analyze your development environment and display
critical files and the level of patch applied to them.
Although Watcom claims to include the Novell NLM SDK 4.0 in its package,
Novell's NLM SDK is still a necessity. In particular, the CD-ROM version of
the Novell SDK provides documentation and messaging tools required for using
the NetWare console text-based interface standard (NWSNUT). I developed
DSBROWSE using the NetWare NLM SDK 3.5 (the current version is 4.1).
Although I created DSBROWSE on an Intel-based PC running Novell DOS 7, DOS 3.3
or higher should work. You also need an 80386 (or higher) PC with at least
eight MB of memory. My installation for NLM development (including Watcom and
Novell tools) consumes 45 MB of disk space, but other installations may vary.
The NetWare server used for testing the NLM should be dedicated for testing. A
production server should not be used.
Finally, C++ NLMs are supported for the NetWare 3.x and 4.x environments.
However, DSBROWSE will only run on NetWare 4.x because it uses NDS.


Client Configuration


After the Watcom and Novell tools have been installed, check your environment
variables and DOS path. If Watcom is installed to c:\watcom, your environment
should have two Watcom-related variables: include=c:\watcom\novh;c:\watcom\h
and watcom=c:\watcom. The order of include locations is significant. It is
important to place the Novell header-file area (c:\watcom\novh) before the
Watcom header-file area (c:\watcom\h) to ensure the proper header-file search
order. It is also required to have the c:\watcom\bin, c:\watcom\ binb, and
c:\novsdk\msgtools directories in your path or have them mapped as search
drives. The first two directories contain Watcom executables, and both the bin
and binb directories are necessary. The Novell NLM enabling (or "messaging")
tools are used for internationalization or enabling. Although message enabling
is optional for developers and DSBROWSE is not completely enabled,
DSBROWSE.CPP (Listing One) shows one way of using message strings. (DSBROWSE.H
is available electronically.) DSBROWSE's makefile, MAKEFILE (also available
electronically), demonstrates the management of the message database.


Server Configuration


No special server "set" parameters are required for debugging NLMs. You simply
copy the Watcom debugger to a directory in the server's search path, usually
the SYS:\SYSTEM directory. The Watcom debugger filename varies depending on
the NetWare version and other debugging options. Watcom NLM debugging will be
discussed in more detail later.
To source-level debug an NLM, you need to load the Watcom NLM debugger on the
server. In MAKEFILE, a copy dsbrowse.nlm n:\system is executed after a
successful NLM build to place DSBROWSE .NLM in the test server's SYS:\SYSTEM
directory. Note that drive n: must be previously mapped to the test server's
SYS volume.


Compile


DSBROWSE is compiled by Watcom's WPP386.EXE. Although WCL386.EXE is available
to both compile and link, I have chosen to use the separate compiler and
linker utilities for demonstration purposes.
The command to build DSBROWSE is: wpp386 dsbrowse.cpp /3s /d2 /xs /bt=netware.
The /3s parameter is required, and it tells the compiler to generate 386 code
using a stack-calling convention. 
The /d2 switch enables full symbolic debugging, which allows source-level
debugging. This switch should normally be omitted for a released NLM product. 
The /bt=netware option is required and causes the compiler to build an NLM
object module referencing __WATCOM_Prelude, Watcom's version of Novell's
PRELUDE.OBJ, which is required because it contains necessary startup code for
C++.
The /xs parameter is optional in a sense. If your C++ source code contains
exception-handling code, then the compiler must be told to allow it;
otherwise, the exception parameter is not required. Exception handling is
disabled by default. If the parameter is omitted and the source code has
exception code, the compiler will indicate that your source code has exception
handling that you neglected to mention and will abort the compilation.
Although there are various ways of implementing exception handling (controlled
by other /x switches), I would expect the compiler to enable it by default.


Link


WLINK.EXE is Watcom's linker. The DSBROWSE project has a link file named
DSBROWSE.LNK (available electronically), which simply has a list of linker
commands. The command to link DSBROWSE with its link file is: wlink
@dsbrowse.lnk. 

Referring to the listing of DSBROWSE.LNK, some important options must be
included. The format statement form nov nlm... tells the linker to generate a
Novell NLM. debug all and debug novell tells it to include all debug
information, including source-level debugging information. Both of these debug
statements are required to source-level debug the NLM.
Since DSBROWSE is partially enabled with messaging, option
messages=dsbrowse.msg will include the default English messages found in the
DSBROWSE file MESSAGES.H (available electronically). In addition to the import
and module state- ments, which indicate where external references should be
searched and which corresponding NLMs to autoload, the library path should be
searched as indicated: libpath c:\watcom\lib386\netware;c:\watcom\lib386. (The
Novell NetWare library directory should be searched before the Watcom library
directory.)


Debug


The compiler and linker options previously mentioned will build a debug
version of DSBROWSE. To better understand what you need to do to source-level
debug an NLM, I'll briefly discuss Watcom's debugging architecture.
Watcom only supports remote debugging, so the machine running your application
must be controlled by a separate machine, which can be connected by a
parallel, serial, or SPX connection.
Watcom's server debugger, Watcom Novell Debugger v2.5, is an NLM that runs on
the server and relays requests and information to and from the client
debugging workstation; see Figure 1. Watcom supplies six versions of its debug
NLM to allow for each combination of parallel (par), serial (ser), and SPX
(nov) transport type to each major NetWare version (3.x and 4.x). Because I
debug DSBROWSE on a NetWare 4.x server and through an SPX connection, I load
NOVSERV4.NLM (refer to Watcom's documentation for more details).
The server console command to load Watcom Novell Debugger v2.5 is: load
novserv4 nvi, where nvi is the NetWare Service Advertising Protocol (SAP)
name. This is an arbitrary name posted on the network that the client
debugging workstation will search. If no SAP name is given, the default
novlink is used.
Once the server debugger is loaded (there is no need to load DSBROWSE.NLM),
the client needs to be set up. Watcom's client debugger, WD.EXE v4.0, is
executed at the DOS command line by: wd /vga50 -trap=nov;nvi, where /vga50 is
optional and puts the video in a 50-column mode if the hardware supports it.
The -trap=nov identifies the transport type just like on the server side.
Since I'll use the network SPX connection to link the two machines, nov
specifies that request.
Since I am using a network SPX connection, the client debugger needs to find
the debug server on the network. It does this by searching SAP for the name
nvi, which is specified on the server. Once again, if ;nvi is omitted, the
default SAP name searched is novlink.
When the client debugger is loaded and finds the debug server, the server
debugger will automatically load DSBROWSE.NLM. The client debugger will then
display DSBROWSE.CPP and highlight the first executable line of code.


DSBROWSE 


No login capability is included when you run DSBROWSE.NLM. NetWare Directory
Services do not require you to login to "browse" the tree. Although this is
the default NDS behavior, it is possible for browse rights to be revoked at
specific locations in the tree by users of proper authority. In this case, a
login ability might be useful, and it would be a good enhancement to DSBROWSE.

Be aware that no unload procedure is included in DSBROWSE. Every production
NLM should register an unload function that will free resources when the NLM
is unloaded from the console.
MAKEFILE will build DSBROWSE and will work with Watcom's WMAKE.EXE. The
environment variables and path (or search drives) must be set up prior to
running the make. Assuming all DSBROWSE files are in the default directory,
the command to build DSBROWSE is wmake. WMAKE.EXE will search the default make
filename of MAKEFILE.


Concerns


My client workstation consistently crashes after a debugging session. Although
irritating, source-level debugging DSBROWSE has been effective and worth the
nuisance. 
C++ comes with its peculiarities, one of which is name mangling. Name mangling
supports function overloading (functions with the same name). It's the naming
of functions that include an encoding of the function's number of arguments
and their types. This can be a problem.
With DSBROWSE, for instance, if a variable is to be made public in the NetWare
environment, the variable sym-bol will have to be exported. DSBROWSE.CPP shows
how two variables (exported in DSBROWSE.LNK) are handled. The first symbol
ExportSymbolMangled actually becomes W?ExportSymbolMangled$ni because of name
mangling. If ExportSymbolMangled is exported, the linker cannot resolve it.
But the mangled name, W?ExportSymbolMangled$ni, can be resolved.
Another variable, ExportSymbolUnMangled, is included with the extern "C"
linkage. This forces a C naming convention on the variable, which will prevent
it from being mangled. Then it can be exported as is. Note that the C linkage
is not available for C++ constructs. Although a class can be exported with the
__export declaration, the Novell Internal Debugger will not recognize names
with parentheses in them. This is a problem because all mangled methods have
parentheses.
The last concerns are not obvious. The first is the use of Watcom's prelude
object module __WATCOM_Prelude in place of Novell's _Prelude. The second is
that of CLIB3S.LIB, Watcom's static-library version of Novell's CLIB.NLM. To
build C++ NLMs, you need to use this static library. So even though you think
you might be getting some functions from Novell's CLIB.NLM, you could be
getting them from Watcom's CLIB version. These replacements may cause
difficulty in obtaining technical support and identifying which product has
the problem. 


Summary


The tools exist today to develop C++ NLMs. Because there are some caveats, a
C++ NLM project might require some research. The readme.log for the compiler
is a good place to start. Watcom technical support can be contacted for
product and development questions, while Novell can be reached for SDK and
NetWare licensing assistance. Whatever you decide, developing C++ NLMs can be
fun and Watcom's source-level debugging capabilities make NLM debugging easy
and effective.


For More Information


Novell Inc.
Developer Relations
800-733-9673
ftp.novell.com
http://www.novell.com
Watcom Inc.
415 Phillip Street
Waterloo, ON
Canada N2L 3X2
519-886-3700
BBS: 519-884-2100 
ftp.watcom.on.ca 
Figure 1: Watcom remote-debugging architecture.

Listing One
/* DSBROWSE.CPP -- Author: W. Dale Cave. DSBROWSE is a NetWare Directory 

Services tree browser. It will display and allow traversal of the Directory 
tree. C++ source code for browsing functionality. Unloading this NLM from the 
console with "unload" command will report resources freed by OS because the 
NLM did not free them. An exit procedure would need to be added to release 
these resources and prevent the messages. */
#include <iostream.h>
#include <nwsnut.h>
#include <advanced.h>
#include <nwenvrn.h>
#include <process.h>
#include <conio.h>
#include <nwdsapi.h>
char **messageTable;
#include "messages.h"
#include "dsbrowse.h"
NUTInfo *handle;
NWDSCCODE nwCode;
int CLIBScreenID; 
int (*defaultCompareFunction)(LIST *el1, LIST *el2);
/* "ExportedSymbolMangled" and "ExportedSymbolUnMangled" are included only for
examples of exporting symbols in the C++ name-mangling environment. See
DSBROWSE.LNK for the export statement syntax. This symbol will be mangled 
in the C++ environment. With debug information on, a scan of the symbols 
(NetWare internal debugger command "n") shows this symbol 
"ExportedSymbolMangled" is mangled to "W?ExportedSymbolMangled$ni". */
int ExportedSymbolMangled = 0xF1;
#ifdef __cplusplus
extern "C" {
#endif
/* This symbol name will not be mangled if enclosed within "extern C..."
constructs. Watcom compiler will define __cplusplus by default if compiling
a .CPP file. */
int ExportedSymbolUnMangled = 0xF0;
#ifdef __cplusplus
};
#endif
// ************************************************************************
// ************************************************************ NI_Context
// ************************************************************************
NI_Context::NI_Context() 
 {
 nwContext = NWDSCreateContext();
 nwStatus = nwContext;
 bLoggedIn = FALSE;
 }
NI_Context::~NI_Context()
 {
 // logout before freeing of the context
 if (bLoggedIn == TRUE)
 NWDSLogout();
 if (nwContext != ERR_CONTEXT_CREATION) 
 {
 nwCode = NWDSFreeContext(nwContext);
 nwStatus = nwCode;
 }
 }
NWDSCCODE NI_Context::SetFlags(int nFlags)
 {
 NWDSCCODE nwReturnValue;

 /* When using NWDSSetContext() to set the context, its third parameter is
 either an integer or a character string and is prototyped as a void *.
 Hence, we need to type cast the parameter to a void *. */
 nwReturnValue = NWDSSetContext(nwContext, DCK_FLAGS, (void *) &nFlags);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::SetConfidence(int nConfidence)
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = NWDSSetContext(nwContext, DCK_CONFIDENCE, 
 (void *) &nConfidence);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::SetNameContext(char *charNameContext) 
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = NWDSSetContext(nwContext, DCK_NAME_CONTEXT, 
 (void *) charNameContext);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::SetTransportType(int nTransportType)
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = NWDSSetContext(nwContext, DCK_TRANSPORT_TYPE, 
 (void *) &nTransportType);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::SetReferralScope(int nReferralScope) 
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = NWDSSetContext(nwContext, DCK_REFERRAL_SCOPE, 
 (void *) &nReferralScope);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::NWDSLogin(NWDS_FLAGS optionsFlag, char *objectName,
 char *password, NWDS_VALIDITY validityPeriod)
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = ::NWDSLogin(nwContext, optionsFlag, objectName, password,
 validityPeriod);
 nwStatus = nwReturnValue;
 if (nwStatus == 0)
 bLoggedIn = TRUE;
 return nwReturnValue;
 }
NWDSCCODE NI_Context::NWDSLogout()
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = ::NWDSLogout(nwContext);
 nwStatus = nwReturnValue;
 if (nwStatus == 0)
 bLoggedIn = FALSE;
 return nwReturnValue;
 }

// ************************************************************************
// ************************************************************* NI_Buffer
// ************************************************************************
NI_Buffer::NI_Buffer()
 {
 nwCode = NWDSAllocBuf(DEFAULT_MESSAGE_LEN, &nwpBuffer);
 nwStatus = nwCode;
 }
NI_Buffer::~NI_Buffer()
 {
 // NWDSFreeBuf always returns 0
 nwCode = NWDSFreeBuf(nwpBuffer);
 nwStatus = nwCode;
 }
NWDSCCODE NI_Buffer::InitBuf(NI_Context &niContext, int nOperation)
 {
 NWDSCCODE nwReturnValue;
 nwReturnValue = NWDSInitBuf(niContext, nOperation, nwpBuffer);
 nwStatus = nwReturnValue;
 return nwReturnValue;
 }
// ************************************************************************
// ************************************************* DSNameCompareFunction
// ************************************************************************
/* This compare function is used for NWSNut interface. Its purpose is to cause
"[Root]" to be first in directory list and ".." to be second. All other
objects
will be sorted alphabetically with NWSNut's default compare function. */
int DSNameCompareFunction(LIST *el1, LIST *el2)
 {
 int nReturnValue;
 if (strcmp((char const *)el1->text, GETMSG(MSGID_ROOT)) == 0 
 strcmp((char const *)el2->text, GETMSG(MSGID_ROOT)) == 0)
 {
 // "[Root]" is being compared, make it always first in the list
 // i.e. it will always compare less than the other strings
 if (strcmp((char const *)el1->text, GETMSG(MSGID_ROOT)) == 0)
 nReturnValue = -1;
 else
 nReturnValue = 1;
 }
 else if (strcmp((char const *)el1->text, GETMSG(MSGID_DOTDOT)) == 0)
 nReturnValue = -1;
 else if (strcmp((char const *)el2->text, GETMSG(MSGID_DOTDOT)) == 0)
 nReturnValue = 1;
 else
 nReturnValue = defaultCompareFunction(el1, el2);
 return nReturnValue;
 }
// ************************************************************************
// ************************************************************** MainMenu
// ************************************************************************
int MainMenu()
 {
 int nSelection;
 // create menu
 NWSInitMenu (handle);
 // set function for default compare
 // this will cause the strings placed in the menu to be ordered according
 // to our insertion order (and ignore the default sorting function)

 NWSSetDefaultCompare(handle, NULL);
 NWSAppendToMenu(MSGID_MENUOPT_SERVERINFO, 1, handle);
 NWSAppendToMenu(MSGID_MENUOPT_BROWSETREE, 2, handle);
 NWSAppendToMenu(MSGID_MENUOPT_EXIT, 3, handle);
 nSelection = NWSMenu(MSGID_MENU_MAIN, 10, 40, NULL, NULL, handle, NULL);
 NWSDestroyMenu (handle);
 return nSelection;
 }
// ************************************************************************
// ************************************************************ ServerInfo
// ************************************************************************
int ServerInfo()
 {
 char strCompanyName[80];
 char strRevision[80];
 char strRevisionDate[24];
 char strCopyrightNotice[80];
 int maxChars = 80+80+24+80+3;
 char *bigBuff = new char[maxChars]; // allocate string for all strings
 nwCode = GetFileServerDescriptionStrings(strCompanyName, strRevision,
 strRevisionDate, strCopyrightNotice);
 
 // copy all strings to big string
 strcpy(bigBuff, strCompanyName);
 strcat(bigBuff, "\n");
 strcat(bigBuff, strRevision);
 strcat(bigBuff, "\n");
 strcat(bigBuff, strRevisionDate);
 strcat(bigBuff, "\n");
 strcat(bigBuff, strCopyrightNotice);
 // display big string
 NWSViewText(10, 40, 5, 50, MSGID_SERVERINFO, (unsigned char *)bigBuff,
 maxChars, handle);
 delete bigBuff;
 return 0;
 }
// ************************************************************************
// ****************************************************** BrowseTreeAccess
// ************************************************************************
int BrowseTreeAccess(char *strObject, char *charObjectSelected)
 {
 int err;
 LIST *selectionElement;
 NWSInitList(handle, NULL);
 // save NWSNut's default compare function
 NWSGetDefaultCompare(handle, &defaultCompareFunction);
 // set this object list compare function to our own "DSNameCompareFunction"
 NWSSetDefaultCompare(handle, DSNameCompareFunction);
 NWSAppendToList(NWSGetMessage(MSGID_ROOT, &(handle->messages)), 
 (void *) 0, handle);
 // selectionElement identifies the default selection for NWSList
 selectionElement = 
 NWSAppendToList(NWSGetMessage(MSGID_DOTDOT, &(handle->messages)), 
 (void *) 0, handle);
 // read objects
 NI_Context niContext;
 NI_Buffer niBufResults;
 // set DS context to root; object name will be a Relative Distinguished Name
 niContext.SetNameContext("[Root]");

 int32 iterHandle=-1L;
 int i;
 uint32 totObjects, totAttrs;
 Object_Info_T objectInfo;
 char objName[MAX_DN_CHARS];
 char listObj[MAX_DN_CHARS];
 char strBuf[MAX_DN_CHARS+2];
 // Call to NWDSList and check for errors. iterHandle must be initialized
 // to -1. NWDSList should be called until iterHandle is -1. In the case
 // that the result spans multiple output buffers, iterHandle will be
 // something other than -1. Do not change the value returned. Just send
 // it back with the next call to NWDSList.
 iterHandle = -1L;
 do
 { 
 err = NWDSList(niContext, strObject, &iterHandle, niBufResults);
 if(err<0)
 throw("NWDSList Error");
 // Pull information from output buffer. Check for errors.
 // First get the number of objects in the buffer. Check for errors.
 err = NWDSGetObjectCount(niContext, niBufResults, &totObjects);
 //printf("Total objects = <%d>\n",totObjects);
 if(err<0)
 throw("NWDSGetObjectCount Error");
 for(i=0;i<totObjects;i++)
 {
 // Pull object name, total number of attributes associated with
 // object, and an object info structure from buffer. Check
 // for errors.
 err = NWDSGetObjectName(niContext, niBufResults, objName, &totAttrs, 
 &objectInfo);
 if(err<0)
 throw("NWDSGetObjectName Error");
 //printf("Object found --> %s\n",objName);
 if (objectInfo.objectFlags & DS_CONTAINER_ENTRY)
 {
 strcpy(strBuf, "+");
 }
 else
 {
 strcpy(strBuf, "");
 }
 strcat(strBuf, objName);
 NWSAppendToList((unsigned char *)strBuf, (void *) 0, handle);
 }
 } while(iterHandle != -1);
 nwCode = NWSList(MSGID_MENU_DSBROWSE, 14, 40, 11, 40, 
 M_ESCAPE M_SELECT, &selectionElement,
 handle, NULL, NULL, 0);
 strcpy(charObjectSelected, (const char *)selectionElement->text);
 NWSDestroyList(handle);
 return nwCode;
 }
// ************************************************************************
// ************************************************************ BrowseTree
// ************************************************************************
int BrowseTree()
 {
 int err;

 char strObject[MAX_DN_CHARS];
 char strOldObject[MAX_DN_CHARS];
 LIST *selectionElement;
 char strTemp[MAX_DN_CHARS];
 BOOL bTraversalRequest;
 LONG lContextDisplayPortal;
 strcpy(strObject, GETMSG(MSGID_ROOT)); //"[Root]");
 do
 {
 strcpy(strOldObject, strObject);
 lContextDisplayPortal = NWSDisplayInformation(MSGID_CONTEXT, 0, 2, 40, 
 NORMAL_PALETTE, VNORMAL, (unsigned char *)strObject, handle);
 err = BrowseTreeAccess(strObject, strObject);
 NWSDestroyPortal(lContextDisplayPortal, handle);
 if (err != 1)
 {
 if (strcmp(strObject, GETMSG(MSGID_ROOT)) == 0)
 {
 // clear the old object name so it will not get appended 
 strcpy(strOldObject, "");
 }
 else if (strcmp(strObject, "..") == 0)
 {
 char *strPtr;
 // check if [Root] was selected
 if (strcmp(strOldObject, GETMSG(MSGID_ROOT)) != 0)
 {
 // [Root] was not selected
 // copy old path, the new path (strObject) is ".."
 strcpy(strTemp, strOldObject); 
 strPtr = strchr(strTemp, '.');
 if (strPtr == NULL)
 {
 // we are at the root because no . was found
 strcpy(strObject, GETMSG(MSGID_ROOT));
 }
 else
 {
 strPtr++; // pass the .
 strcpy(strObject, strPtr);
 }
 }
 else
 {
 // if already at the root, set strObject to "[Root]"
 strcpy(strObject, GETMSG(MSGID_ROOT));
 }
 }
 else if (strObject[0] == '+')
 {
 int nLen = strlen(strObject);
 int nIndex = 0;
 // copy object (minus +) into temp
 for (nIndex = 0; nIndex < nLen-1; nIndex++)
 strTemp[nIndex] = strObject[nIndex+1];
 strTemp[nIndex] = '\0';
 // copy into strObject
 strcpy(strObject, strTemp);
 // add old object if not at root already

 if (strcmp(strOldObject, GETMSG(MSGID_ROOT)) != 0)
 {
 // add .
 strcat(strObject, ".");
 strcat(strObject, strOldObject);
 }
 }
 else
 // just in case a noncontainer is selected (emulate .)
 strcpy(strObject, strOldObject); 
 }
 } while (err != 1); // do until escape is pressed in browse list
 return err;
 }
// ************************************************************************
// ****************************************************************** main
// ************************************************************************
main() 
 { 
 LONG messageCount, languageID;
 LONG l1=LoadLanguageMessageTable(&messageTable, &messageCount, &languageID);
 char *charUserName = "Admin";
 int err;
 int nMainMenuSelection;
 LONG NLMHandle;
 LONG allocTag;
 try
 {
 NLMHandle = GetNLMHandle();
 // create a screen for displaying our information 
 CLIBScreenID = CreateScreen(GETMSG(MSGID_DSBROWSE_CPP), 
 AUTO_DESTROY_SCREEN);
 if (!CLIBScreenID)
 return -1;
 DisplayScreen(CLIBScreenID);
 allocTag = AllocateResourceTag(NLMHandle, 
 (unsigned char *) "DSBrowse Alloc Tag", 
 AllocSignature);
 nwCode = NWSInitializeNut(MSGID_DSBROWSE_CPP, 
 MSGID_PROGRAM_VERSION, NORMAL_HEADER, NUT_REVISION_LEVEL, 
 0, 0, CLIBScreenID, allocTag, &handle);
 int nMainMenuSelection;
 do
 {
 nMainMenuSelection = MainMenu();
 switch(nMainMenuSelection)
 {
 case 1:
 ServerInfo();
 break;
 case 2:
 BrowseTree();
 break;
 default:
 break;
 }
 // exit if ESCAPE is pressed or Exit option is selected from main menu
 } while ((nMainMenuSelection != 3) && (nMainMenuSelection != -1));
 } // end try

 catch (int nValue)
 {
 ConsolePrintf("%s ==> ", GETMSG(MSGID_DSBROWSE_CPP));
 ConsolePrintf(GETMSG(MSGID_ERR_EXCEPT_INTEGER));
 ConsolePrintf("%d\n", nValue);
 }
 catch (NI_Context exContext)
 {
 ConsolePrintf("%s ==> ", GETMSG(MSGID_DSBROWSE_CPP));
 ConsolePrintf(GETMSG(MSGID_ERR_EXCEPT_CONTEXT));
 ConsolePrintf("\n");
 }
 catch (char * strMessage)
 {
 ConsolePrintf("%s ==> ", GETMSG(MSGID_DSBROWSE_CPP));
 ConsolePrintf(GETMSG(MSGID_ERR_EXCEPT_STRING));
 ConsolePrintf("%s\n", strMessage);
 }
 catch (...)
 {
 ConsolePrintf("%s ==> ", GETMSG(MSGID_DSBROWSE_CPP));
 ConsolePrintf(GETMSG(MSGID_ERR_EXCEPT_OTHER));
 ConsolePrintf("\n");
 }
 NWSRestoreNut(handle);
 nwCode = DestroyScreen(CLIBScreenID);
 return 0; 
 } // end main
DDJ


































PROGRAMMING PARADIGMS


Apple Talks the Talk and Walks the Dog at WWDC




Michael Swaine


This year's "Quadruple Nickel Award for Fully Grasping the Paradigm" goes to
the agent who brought together Jerry Pournelle and Newt Gingrich to
collaborate on a science-fiction novel, a concept so trippy that if it weren't
true it would be a science-fiction novel. Runners-up: The advertising agency
that decided that Rush Limbaugh's TV viewers are just the sort of people who
would buy spearmint-flavored chewing tobacco; and G. Gordon Liddy, of
Watergate fame, who has been offering advice of late on dealing with burglary,
a paradigm regarding which his expertise is a thing of legend.
Meanwhile, out here on the paradigms beat, I'm mostly beating on Apple. This
month's column is largely Macintosh (I may pull a little Newton out of my
sleeve before I'm through). It touches on scripting, with updates on the
various Mac scripting tools I've mentioned here from time to time--HyperCard,
Frontier, SuperCard, FaceSpan, AppleScript--with a nod to how or whether these
things connect with CGI scripting for Web services. And there's a heavy
Internet emphasis. I'll even point you to a new organization for Web-page
developers, in case you've decided to make your fortune writing
<blink>HTML</blink>.
But first, these bulletins from the rubber-chicken circuit.


Talking Dog


They were talking dog, Cyberdog to be precise, at this year's Worldwide
Developer's Conference (WWDC) in San Jose. Cyberdog is the codename of Apple's
Internet strategy.
One strategy for Internet access, Apple execs pointed out as they paraded the
pup before the press in a Cyberdog sneak at the conference, is the integrated
app, exemplified by TCP Connect. Another strategy is the suite of Internet
tools, exemplified by Internet-in-a-Box. Cyberdog, scheduled to be released to
developers by the time you read this and to the great unwashed early in '96,
is a third approach: a collection of Internet-supporting OpenDoc parts.
It's an approach designed to strike a balance between Apple's own need to
provide a complete Internet-user solution and third-party developers' need to
sell Internet products without having to compete head-on with Apple. Although
Cyberdog will provide a "complete" collection of basic parts for Internet
access, third-party developers will be able to sell better individual parts or
collections. The OpenDoc approach will also make it possible to embed these
communication facilities in applications, and will support a scheme in which
any user can create an interface to the Internet.
Initially, Apple will supply a generic container, plus some obvious parts,
like a World Wide Web part, a Gopher part, a text part, a picture part, and
context parts (a personal notebook and a log). These container parts will
contain live links, and dragging something from the log to the desktop will
create a double-clickable stand-alone link to the site.
Cyberdog is the most encouraging sign I've seen from Apple in a long time,
because of what it could mean for OpenDoc. For Apple, the move to OpenDoc is
as big as the move from the 680x0 to the PowerPC chip, but it's more than
that, because it's not just Apple's move. The shift from monolithic apps to
part-based software enabled by OpenDoc and OLE will be a true revolution in
software development, in software marketing, and even in the way people use
software. Whether it will all play out the way Apple hopes is an open
question. I hope it does, not for Apple's sake, but for the users. Cyberdog, a
compelling deployment of OpenDoc into the hottest market of the day, could be
the wedge Apple needs to make OpenDoc as successful as Apple needs it to be,
as quickly as Apple needs it to happen.


Bobbing for Apple


Apple's response to Microsoft's Bob user interface is the many faces of
Copland.
Apple gave out mucho info about its next OS release at the WWDC, despite the
fact that Copland won't actually be released till mid-1996. Possibly the most
dramatic aspect of Copland is the customizability of its user interface.
Although the Mac UI has always been customizable (which explains the drop in
worker productivity that has accompanied the widespread computerization of the
American workplace), Copland takes it further, largely abandoning the idea of
a single Macintosh look and feel. Demos showed a child's UI with single-click
buttons rather than icons and sticky menus and colorful window treatments, a
more conventional Mac UI, and something that looked like a video game or an
application designed by Kai Krause.
Other new interface features: Views, live windows whose contents are based on
user-specified criteria and which are automatically updated; Drawers, windows
that close not to icons but to tabs across the bottom of the screen; and
spring-loaded windows, an elegant feature better seen than described.
The real story in Copland is that this is the microkernel-based, nearly
all-PowerPC-native version of the OS. The file system, I/O, and net services
are all memory protected and preemptively multitasked. Disk, network, and
Finder actions can all happen at the same time. There's a new native file
system with a concurrent, reentrant design.
Apple was uncharacteristically subdued at the conference, focusing more on
substance and less on hype than in past years. They even admitted openly that
QuickDraw GX needs improvement. PowerTalk and GX (presumably after
improvements) will be in the default installation of Copland (and will use
less RAM), so your app can assume their presence.
As mentioned last month, Apple Guide will transmogrify into Apple Assistant, a
tool for creating time- or event-based agents.


CGI Rules, AppleScript Tools


Web pages don't have to be passive; clicking on a link on some pages fires up
an application on the host machine that can do almost anything. This ability
to drive an app over the World Wide Web opens up the possibility of all kinds
of tricks.
Common Gateway Interface (CGI) scripts are the preferred method for driving
apps from a Web page for anyone using WebSTAR (formerly MacHTTP). Using CGI is
a three-step process:
1. Append the information that needs to be sent to the application to the URL
in your HTML document. A question mark (?) flags the end of the file reference
and the beginning of the data (for example,
http://www.college.edu/thePathToTheApp$text_to_find).
2. Create an Apple event message containing the URL and the data.
3. Send this Apple event to the application indicated by the URL.
In other words, CGI is the key to Web pages that do interesting things, and
AppleScript is the key to CGI.
The best place to learn about CGI scripting is
http://www.uwtc.washington.edu/Computing/WWW/Lessons/START_HERE.html.


Frontier is Free, SuperCard isn't


Userland's Frontier was the first system-level scripting system for the Mac,
before AppleScript. As AppleScript lurched toward release, Userland kept
Frontier consistent with it, in the process throwing away some tools and
approaches that were arguably better that what Apple settled for. (I know the
point is arguable because Userland founder Dave Winer argues just that.) This
spring, Userland bent to the inevitable and gave up trying to make money
selling a product that competes directly with technology that Apple's giving
away. Frontier is now free and can be downloaded from
http://www.hotwired.com/Staff/userland/aretha/project1writingcgiscri_227.html.
Userland is not down for the count, though, and Dave expects to make money
from derived products, especially Internet-related products. For example,
Userland had a lot of experience in developing Frontier solutions to manage
workflow for editorial organizations: picking up copy from editorial, flowing
it through QuarkXPress, laying out the publication, and passing the result off
to the designers. In a product called "AutoWeb," Userland took what they had
learned about automating print publishing and applied it to the development,
deployment, and maintenance of web-page structures.
The aforementioned URL is also the place to learn about AutoWeb and to find
out how to create CGI scripts using Frontier. Since Frontier can edit and run
AppleScript scripts but is a richer language, you can use Frontier to build on
existing AppleScript CGI scripts.
SuperCard, a HyperCard-like program created by Silicon Graphics shortly after
the release of HyperCard and much loved by multimedia developers, was bought
and later set free by Aldus. Last year it was reacquired by Allegiant, a
company formed by Silicon Graphics alumni. This year at WWDC, Allegiant
announced NetTalk, a tool that lets Mac scripters build custom interfaces to
the Internet.
NetTalk is an intermediary that sits between SuperCard (or HyperCard or
Frontier or Prograph or FaceSpan) and the Internet. Among other capabilities,
it includes direct back end CGI support for WebSTAR. FaceSpan? That's the
AppleScript user-interface builder I've mentioned here before. Since it's OSA
compliant, it'll work with scripts written using Frontier, too, or with
SuperCard, HyperCard, and so forth.



HyperCard is Hyped


HyperCard 2.3 was announced at WWDC with a subdued fanfare volume that
befitted its right-of-the-decimal-point newsworthiness. It's a true 0.1
increment revision, the big improvement being that it's now PowerPC native,
something that the more naive among us sort of expected to happen about the
time the PowerPC machines were released.
Apple also threw another bone to the dogs yapping for color, but HyperCard
remains, still in 1995, in its heart of hearts, a black-and-white application.
The strategy seems to be to wait out the complaints until wired-in (as opposed
to taped-on) color ceases to be an issue. Exactly how it will cease to be an
issue I couldn't say, since I don't know where HyperCard is really headed. I
don't mean to suggest that anyone, least of all Apple, does. But Apple hints
at some scenarios.
Does HyperCard become an OpenDoc container, a Swiss Army knife with a Gillette
philosophy? ("Give away the razor and sell them the blades." Or was that
Gillette?) Then the lack of real color becomes a business opportunity.
Does HyperCard become a front end for AppleScript? (More so than it already
is, that is.) Does Apple send FaceSpan off to Frontierland? Then color
painting becomes a needless frivolity, and the color already bundled in may be
adequate.
Does HyperCard figure in Apple's Internet strategy? Does it become an HTML
editor? (Nobody's suggested this, but read on.) Or does it become a highly
scriptable database back end for Web pages managed with Apple's Web servers?
(It's bundled with the servers now, as one of the database options.) Either
way, it can get along without full color support.
Real color isn't an issue in any of these scenarios for an imagined future
HyperCard. No, it's only the real HyperCard, the in-house programmer's secret
weapon, the quick-and-dirty prototyping tool, the unappreciated multimedia
development environment, the educator's salvation, that will suffer from this
neglect. But then, they're used to it by now.


HTML Gets Emotional


Yes, I recently poked fun at the concept of schooling people in how to create
vanity Home pages, but Web-page creation is a legitimate job and a really hot
area. If you're interested in watching or becoming a part of the creation of a
new job classification, check out the HTML Authors Guild mailing list. Just
send the message "subscribe html-authors-guild <your name>:" to
Majordomo@lists.stanford.edu. Another list you should be checking out if
you're using WebSTAR is the MacHTTP-talk list (send "subscribe machttp-talk
<your name>" to Majordomo@academ.com). Warning I wish somebody had given me:
Subscribing to both of these lists can fill your mailbox rapidly.
I keep reading those "How did Joe create this really nifty Web page?" pages,
and they always say the same thing: Joe has looked at various HTML editors,
but they all lacked some feature he wanted, so he just wrote the page with his
word processor.
I'm developing Web pages, too, and while there are many editors, even many
approaches to editors (filters for BBEdit, ClarisWorks, Adobe PageMaker,
QuarkXPress, and Microsoft Word; WebWorks from Quadralay, a tweenware app for
FrameMaker; stand-alone HTML editors like SoftQuad's HoTMeTaL Pro, Rick
Giles's Prograph-based HTML Editor, Robert Best's HTML Web Weaver, and Eric
Lease Morgan's HyperCard-based Simple HTML Editor; and more-or-less WYSIWYG
editors like Open Door Networks's HyperCard-based WebDoor and Navisoft's
NaviPress), I discover, not to my surprise, that none of them work the way I
want. So I wrote my own.
I wanted to be able to more or less drop in a new vocabulary when HTML 3.0 or
VRML or something else got formalized; I wanted a facility for stockpiling
boilerplate text, boilerplate HTML, and all my favorite URLs, and dropping
them into my documents easily; I wanted references to all HTML documents I
created to be immediately available for dropping into other HTML documents in
proper HTML local-link format; I wanted easy customization for features I
didn't think of, like automatic documentation of the files referenced in a
document and automatic date/time stamping of HTML documents; and I wanted hot
links.
I used--you guessed it--HyperCard (Version 2.2, in fact). The color tools are
a pain, but they're adequate for colorizing buttons, which was all I wanted.
The result meets my needs, but I don't know whether anybody else would use it.
Drop me a note if you're a Mac type and are curious; I'll tell you where to
find it.
:-)If you have zero interest in HTML, you may be depressed to learn that some
people are now using it to create new emoticons in e-mail and online chat.:-)


MessagePad Messages Macs


When Apple introduced the Newton MessagePad PDA, it only had a few boneheaded
faults. It cost too much, it didn't have a built-in cellular modem, and the
handwriting recognition was inadequate. Apple hasn't fixed any of those
things, but forces outside the company have helped some. You can now buy an
early model MessagePad for an acceptable $149, although the new ones remain
around $600. There are good PCMCIA cellular modems. Palm Computing has more or
less fixed the handwriting problem.
ScriptLink, from Momentum (Greenwood, Australia), doesn't solve a Newton
problem; it adds a capability you might not have expected in the original
device. Or maybe it solves a problem created by the fact that the Newton
devices use an entirely new operating system. ScriptLink brings AppleScript to
Newton.
ScriptLink lets Newton apps control Mac apps. It consists of two components:
the ScriptLink Macintosh Server, an application that needs to run in the
background on a Mac, and ScriptLink Newton Client, a Newton package that must
be installed on the Newton device. It works like this: You develop a Newton
application that contains embedded AppleScript. When the Newton app accesses
the AppleScript, the app triggers ScriptLink Newton Client to send the
AppleScript to ScriptLink Macintosh Server somewhere out on the net, and
ScriptLink Macintosh Server dispatches the AppleScript to the target Mac
application.
It's a cute little system, and it works. What can you do with it? Use the
Newton to query databases, upload Newton-captured data automatically to a Mac
database, or drive AppleScript-savvy Mac apps. Not difficult to master,
ScriptLink offers the Newton developer five new calls: slConnect,
slDisconnect, slSendScript, slConnectionStatus, and slTransOutstanding, plus a
few callbacks. One of the supplied examples shows how to create an entire
word-processing document on the Mac from the Newton. I like it.
Too bad that both MessagePad sales and Apple's share of the market it opened
are declining.































C PROGRAMMING


Grumpy Old Programmers




Al Stevens


This issue marks the seventh anniversary of my first "C Programming" column. I
went back and read that first column of August 1988 and saw that I promised
then to report an occasional pet programming peeve. I did for a while, but
eventually that part of the column faded away. I thought I'd use this
anniversary issue to get a few things off my chest, so here are some rambling
crotchets for you to peruse.
Have you noticed lately that the non-programming-industry press and the
marketers have appropriated another of our buzz phrases? According to PC
Magazine, the user interfaces of Windows 95 and OS/2 Warp are object oriented.
Wow. That must make them better than all those other user interfaces that
aren't object oriented. I've been sitting here at my Windows 95 Final Beta
site trying to encapsulate something. I haven't figured out how, yet. The
polymorphism command button must be in here somewhere, too, but I can't find
it.
Here's how it goes. A paradigm such as object-oriented programming gains
widespread acceptance. The market follows with tools that support that
paradigm. It is good. Other products, unrelated to the paradigm, innocently
use some of the same terms. Things on the GUI desktop, for example, are called
"objects." Everybody knows that objects are good and that object-oriented
anything is the wave of the future. Nontechnical journalists make the
association, draw an invalid conclusion, saturate the print media with
misinformation, and convert what was a perfectly good technical term into
meaningless media hype. They have even said that Visual Basic is object
oriented. When a programmer writes anything, an editor gets to have at it. An
editor who writes something about programming should likewise be required to
show it to a programmer, who should be given total veto power over the
technobabble.
Remember some 20+ years ago, when the 12 rules of a relational database were
developed? Every programmer knew what a relational database was until the
subject was totally confused by the trade press reporting and marketing all
those database products that claimed to be relational.
How about structured programming? The original definition specified three
programming constructs: sequence, selection, and iteration. Simple and
elegant. Yet every programming practice that someone has since disapproved of
is criticized as not being structured enough, and tomes have been written on
the subject of structured programming, addressing everything except those
three simple statement flow constructs.


A Saturday Date


I am going to retire into obscurity on December 24, 1999. I advise you to do
the same. One week later, all programmers will have the same respect paid to
them that is now reserved for lawyers, politicians, and TV evangelists. On
January 1, 2000, computer programs everywhere will cease to work properly
because of the dreaded 6-digit date format. It isn't going to be a pretty
sight. I don't want to be associated with the profession when the year 00
comes around. Some of it is my fault. I was one of those Cobol programmers in
the 1960s who was sure that none of those programs and databases would last
for 30-odd years. Hah! A lot of programs will need to be fixed, and a lot of
databases will need to be repaired, come the new century. The largest impact
will be on programs that support the government and business communities.
That's lucky. January 1, 2000 is a Saturday. The New Year's Day holiday will
be observed on Monday. They can fix the problem over the long weekend.


Memory Lane


I have a 486/66 with 24 MB of RAM. Every driver is loaded high. DOS is in the
High Memory Area. XMS is enabled. How come programs such as Turbo Debugger
still tell me I don't have enough memory?


Ne IMail


It's hard to come up with a program name that no one has already used. Several
years ago I picked D-Flat because I was sure that if I used C-Sharp,
somebody's lawyer would come calling. D-Flat seems to have endured. My Quincy
interpreter was originally named "QC," a name that lasted for only about two
weeks. There already was a QC compiler, and its vendor hollered.
Now it has happened again. A while back, I launched a new column project, an
Internet mail-reader program that I carelessly named "IMail." A call came in
from Ipswitch Inc. (81 Hartwell Ave., Lexington, MA 02173), which produces a
Windows-based Internet mail-reader program with the trademarked name--you
guessed it--IMail. So now I have to come up with a new name.
I have to think of something so unlikely that nobody else would want it.
That's why D-Flat worked. Butthead-mail has a certain ring. Nobody is likely
to have used that one. Feminists might like Mail-Chauvinist. The logo could be
a razorback in a mailperson's uniform.
Once I come up with a new name, I'll have to make some minor code changes and
release another version. That's convenient, because a reader sent me a mail
message about a bug. My scripting scheme of sending cat $MAIL followed by rm
$MAIL has an insidious problem. If mail comes in while the cat command is
running, the rm command will delete the new mail. That didn't happen during
testing, but his message unearthed another silly bug. There was a dollar sign
in the first column of one of his lines of text. The script program
interpreted it as a UNIX prompt and stopped the reception. I'll be fixing
those problems at the same time I find a new name.


GNU C++


Enough of this grousing. Late last year I wrote a fourth edition to a tutorial
book on C++. It is called Teach Yourself C++, and, as I must remind you each
time I plug this book, mine has a yellow cover and does not have Herb
Schildt's name emblazoned on it. The book has many exercise programs and comes
with a diskette with all the source. It seemed fitting that the book would
round out a trilogy. I have a tutorial book on programming named Welcome to
Programming, which uses QBasic as the teaching language, and one on C named Al
Stevens Teaches C, which includes the Quincy C interpreter. If the C++ book
could also include a compiler on its diskette, the trilogy would be
complete--three books that take the reader from QBasic through C to C++
complete with source code and a language translator. (QBasic is included with
every copy of MS-DOS.)
When you are looking for a contemporary C++ compiler that you may freely
distribute without royalties and that implements most of the language, there
is only one choice: GNU C++.
GNU C++, called "GPP," is the work of the Free Software Foundation (FSF). It
is one of a large suite of free programming tools and utilities downloadable
from many online locations and available on several commercial CD-ROMs. Anyone
can distribute GNU software and charge reasonable copy costs as long as they
include the source code or make it available to the user. (One such source is
the Dr. Dobb's Alternative Programming Languages CD-ROM.)
I needed a version of GPP that runs under MS-DOS. Like other C++ compilers,
GPP is a big program and needs a DOS extender. A version named "DJGPP" is
distributed on CD-ROM by the FSF, and includes a DOS extender and C and C++
compilers that compile 32-bit programs. There is an assembler, linker,
debugger, and other tools. To fit what I needed on a diskette, I had to strip
down to the bare-minimum configuration--the C++ compiler, the linker, the
assembler, the DOS extender, and only the basic run-time libraries and header
files. Using LHARC compression, I was able to fit everything I needed to
compile and execute all the exercise programs onto one high-density diskette.
We can discuss three things about DJGPP: the C++ compiler, the DJGPP port to
MS-DOS, and the CD-ROM products that FSF sells.
GPP is a contemporary C++ compiler that implements the version of C++
described in the Annotated C++ Reference Manual, by Margaret A. Ellis and
Bjarne Stroustrup (Addison-Wesley, 1990). The only feature not yet implemented
is exception handling. GPP does not implement run-time type information and
newstyle casts or the other new language inventions of the ANSI committee. Of
the 150 exercises in the book, only a dozen of them involve language features
not implemented by GPP, so the book is a minor torture test for the compiler.
I found a few bugs related to the translation of invalid code but only one bug
where the compiler failed to work at all. Any program that includes iomanip.h
and uses the setw manipulator does not compile. That's a serious problem. I
reported it to the GPP developers, and it should be fixed in a future version.
Example 1(a) is a program that won't compile under GPP.
To circumvent the problem, I inserted the code in Example 1(b) into the smanip
class-template declaration in iomanip.h. This workaround generates several
warning messages, but the code compiles now, and the program runs okay.
GPP is not the fastest C++ compiler available, nor does it generate the most
optimized executables, but you can't beat the price. The GNU suite of software
puts quality tools well within the reach of educators and, more importantly,
students of limited means. Having the source code to the compiler lets you use
GPP both as a study in compiler design and to implement and experiment with
language features. It would be fun to implement Visual GPP, for example. The
compiler source code is not easy to read. They used a grammar and a yacc
clone. You have to get into that mindset if you want to make changes to the
language. You should anyway, if you are going to experiment with language
translation. 


DJGPP


The DJGPP port of GNU C++ includes extended DOS executables for the compiler
programs as well as the header files and libraries to support development of
extended DOS programs of your own. To run DJGPP or a program that you compile
with DJGPP, you must have a program named GO32.EXE in the path. That program
is the DOS extender. If you are running programs that use floating-point math
on a PC without a math coprocessor, you need a file named EMU387 in the path
in order to emulate the coprocessor. If you distribute programs compiled with
DJGPP, you must also distribute these files.



The FSF Compiler Tools Binaries CD-ROM


The Free Software Foundation distributes DJGPP on their Compiler Tools
Binaries CD-ROM. It costs $240.00 for companies and $60.00 for individuals. It
includes GNU C and C++ compilers and many utility programs to run under MS-DOS
and other platforms that do not come with compilers. The CD-ROM includes the
source code for all programs on the CD-ROM.
The CD-ROM includes installation instructions to run the compiler from the
CD-ROM. You make sure that the path is set up correctly, set some environment
variables, and that's it. Nothing gets copied to the hard disk. You do not
want to do this, however, unless you are really short on disk space and have
at least a quad-speed CD-ROM with caching software installed. When running
from the CD-ROM, the program in Example 2 takes four minutes to compile and
link on a 486/33 with a single-speed CD-ROM drive. By installing the compiler
on the hard disk, I reduced that time to 15 seconds.
Depending on how much of the compiler system you want to install on your hard
disk, DJGPP can require 7-160 MB.
Installation on a hard disk is not as easy as installing to use the compiler
from the CD-ROM. You have to poke around in the CD-ROM and find all the
different README files to get the complete picture. Installation consists
mainly of unzipping several compressed files in a specific sequence and then
doing the same path and environment-variable manipulations that you would with
the CD-ROM.
Documentation comes in text files formatted for a reader program that comes
with the package. Most MS-DOS programmers will not like this program. Its user
interface is less than intuitive for those used to CUA programs. The reader
program also does not work with all keyboards. The program does not use
standard BIOS calls to read paging and scrolling keystroke values. Two PCs in
my shop can't get past the first screen.
There is a decided UNIX text-mode feel to the FSF Compiler Tools CD-ROM. Don't
expect slick, GUI-hosted editor/debuggers and utility programs or greased
installations. You have to be a programmer to use this product, and you have
to be a programmer to find the stuff that you need to use this product. You
don't simply run a command-line compiler and get an executable. The
command-line compiler does not automatically find standard libraries; you have
to specify them on the command line. Output from the compiler/ link procedure
is always to a file named a.out, which is not ready for execution. You must
postprocess this file through two more programs to get an executable file,
which you must then rename to whatever .EXE file you want. The compiler does
not produce a menu of command-line options or even a version number when you
enter only the compiler's name on the command line.
Those few criticisms notwithstanding, GNU C++ and the DJGPP port are powerful
products and well worth their nominal cost. In today's market, sixty bucks is
not a lot for a text-mode-only, command-line-driven C++ compiler, even when
you buy it from an organization dedicated to the promotion of free software.


ANSI C++ Public Review


If you act fast, you can get your two cents worth into the C++ standardization
effort. In late May, I received an announcement stating that the X3J16
committee will conduct a two-month public review of the draft standard for
C++. The review lasts from May 26, 1995 to July 25, 1995, but the committee
must receive your written comments by July 6 to consider them in their current
deliberations. They'd like e-mail in advance (x3sec@itec.nw.dc.us) but require
signed hardcopy before they can register your comments.
I am able to report this event in the August issue because many of you get the
issue at the beginning of July. With luck you have about a week to get a copy
of the draft, read it cover-to-cover, and send in your comments. Send $65.00
to X3 Secretariat, Attn: Lynn Barra, 1250 Eye Street, Suite 200, Washington,
DC 20005, 202-626-5738. I wish I could give you more time, but this is all the
time they allow.
Example 1: (a) A program that won't compile under GPP; (b) inserting this code
into the smanip class-template declaration in iomanip.h circumvents the
problem. 
(a)
#include <iostream.h>
#include <iomanip.h>
main()
{
 cout << setw(6) << "hello";
}
(b)
friend
 ostream& operator<<(ostream& o,
 smanip<TP>& m)
 { (*m._f)(o, m._a); return o;}
Example 2: This program takes four minutes to compile and link on a 486/33
with a single-speed CD-ROM drive. Installing the compiler on the hard disk
reduces the time to 15 seconds.
#include <iostream.h>
int main()
{
 cout << "hello";
}


























ALGORITHM ALLEY


Biochemical Techniques Take On Combinatorial Problems




Peter Pearson


Peter is a cryptologist at Uptronics Inc., a cryptography and data-security
company in San Jose, California. He can be reached at pkp@uptronics.com.


Readers of Dr. Dobb's Journal are accustomed to solving mathematical problems
using "computers"--that is, boxes full of semiconductors, buses, RAM, and
related gizmos. Consequently, it's hard to believe that a large class of
difficult and intensely mathematical problems might be best solved not by
pushing electrons through wires in a computer laboratory, but by mixing
solutions in test tubes in a molecular-biology laboratory. Yet that is exactly
the prospect suggested by Leonard Adleman, who applied the laboratory tools of
modern molecular biology to the bogeyman problems of computer science (see
"Molecular Computation of Solutions to Combinatorial Problems," by Leonard M.
Adleman, Science, November 11, 1994).
There is a class of computationally intractable problems known as "NP."
Problems in this set include the well-known Traveling Salesman problem and
problems such as the Hamiltonian Circuit, Bin Packing, Graph-3-colorability,
Knapsack, and Generalized Instant Insanity (remember the Parker Brothers
puzzle?) problems. Computer scientists and mathematicians have discovered many
such instances of problems whose solution requires taking a (possibly large)
number of objects and finding an arrangement that has a particular property or
satisfies a requirement. Given the solution, you can quickly verify that it
solves the problem, but there is no known "fast" way to find the solution.
A simple example is the Knapsack Problem: From a given finite set of integers,
find a subset whose sum is a given x. Exhaustive search is a practical
solution for small sets, but as the size of the set increases, the time
required to find a solution increases faster than any power of the size.
However, testing a candidate subset is easy: It requires no more additions
than there are integers in the starting set.
To be admitted to the NP club, a problem must be proven equivalent to a
problem already in NP. "Equivalent" means, casually speaking, that any
instance of the new problem can be easily transformed into an instance of some
NP problem and vice versa. This admission criterion guarantees that all
problems in NP are about equally hard: If a quick way were found to solve
Traveling Salesman problems, for example, then someone with a tough instance
of the Knapsack problem could transform it into an instance of the Traveling
Salesman Problem, solve it, and transform the solution back into the answer to
the Knapsack problem. Thus, all members of NP stand or fall together.
Traditional estimates of these problems' difficulty assume that the problem is
attacked on a conventional computer. The analysis of, say, a Knapsack problem
might go something like this: 
There are 100 integers in the whole set, so there are 2100 (=1030) possible
subsets. If I have to examine 10 percent of these subsets, using a million
computers, each of which can examine a million subsets per second, then it
will take 1017 seconds, or three billion years.
In a dramatic departure from conventional thinking, Adleman attacked a problem
in NP using techniques found in molecular biology laboratories. (See the
accompanying text box entitled "DNA Basics" for more background information.)
He synthesized DNA molecules that represent randomly guessed answers, then
searched through a huge number of them to pick out any correct answer. The
number of guesses that can be tested with this approach is limited not by time
and computing power, but by the number of DNA molecules you can handle. Since
a gram of DNA might contain 1018 smallish molecules, the millions of computers
testing millions of subsets start looking puny in comparison.


Examining the Details


Adleman solved the Directed Hamiltonian Path Problem: Given a map showing many
cities and many one-way roads connecting cities, find the shortest itinerary
that starts at City A, ends at City Z, and passes through every other city
exactly once.
In Adleman's approach, cities are represented by random "20-mers." That is,
each city is assigned a sequence of 20 bases selected at random from the set
{A,T,G,C}; see Figure 1. Roads are represented by 20-mers derived from the
sequences of the cities they connect. For example, a road from City J to City
D would be represented by a 20-mer whose first 10 bases are complementary to
the first 10 bases of City J and whose last 10 bases are complementary to the
last 10 bases of City D; see Figure 2. An exception is made for roads starting
at the starting city or ending at the ending city (cities A and Z, in this
example). These roads are extended by an extra 10 bases, so as to contain the
full 20-base sequence complementary to the starting or ending city.
Using DNA-manipulation techniques developed by biologists, Adleman
manufactured a bunch of "city" 20-mers and a bunch of "road" 20-mers, and
mixed them all together in a pot. Because complementary DNA strands tend to
stick together, a typical Road JD 20-mer will have its beginning half stuck to
the beginning half of a City J 20-mer, and its ending half stuck to the ending
half of a City D 20-mer; see Figure 3. The other half of the City D 20-mer
will probably be stuck to some road that begins at City D, and so forth.
Next, Adleman added to this soup an enzyme that repairs "nicks" in DNA. This
enzyme finds the places where the ends of two road 20-mers touch (in the
middle of a city 20-mer) and welds the two ends together. (It also welds
together the touching ends of city 20-mers.) The resulting DNA strands
represent lists of roads that you can legally travel, called "itineraries." 
Most itineraries look nothing like the answer to the problem: Some contain
only a couple of roads, and some traverse one part of the map many times over
without ever visiting some other part. Still, there was a chance that one of
these DNA strands might represent the solution to the Directed Hamiltonian
Graph problem, and Adleman had to find it.
He knew the length of the desired molecule: The number of roads taken must be
one less than the number of cities on the map. Molecular biologists routinely
use electrophoresis to separate DNA molecules by length: When an electric
field pushes DNA molecules through a gel, longer molecules move more slowly.
Adleman cut out the part of the gel containing strands of the desired length,
extracted the DNA from the gel, and threw away everything else.
It was also obvious that all the molecules that don't pass through a given
city could be discarded. Adleman did this using "City J" 20-mers attached to
magnetic beads. These beads were mixed with the DNA from the gel, and time was
allowed for itineraries that pass through City J to stick to the complementary
sequence on the beads. He fished them out with a magnet and discarded
everything else. By warming and changing the salinity of the solution, he
unstuck the itinerary strands from the beads, giving a solution of right-sized
itineraries that pass through City J.
Repeating this process for every city on the map leaves you with (if anything)
an itinerary with exactly the required properties: It passes through every
city; it can't pass through any city twice, because it's not long enough to
hold the extra road; and the special handling of roads starting and ending at
cities A and Z guarantees that City A must be first and City Z, last in the
itinerary. Thus, if there's a molecule there, it's the answer.
A technique called the "Polymerase Chain Reaction" (PCR) can be used to
duplicate many million-fold a single, special DNA sequence hidden in a soup of
other DNA. PCR requires only that you know the sequence of the first several
and last several bases of the sequence to be duplicated. Since you know the
first 20 bases of the desired itinerary (because it starts at City A) and the
last 20 bases (complementary to City Z), you can use PCR to make an abundance
of the desired strand. To find where City J appears in the itinerary, PCR
duplicates just the sequence from City A to City J, and the size of the
resulting strands is measured by electrophoresis.


Implications


Computer science is full of NP problems; usually, they relate to optimization.
Typically, these problems need to be solved in seconds, and approximate
solutions are usually acceptable. It's hard to imagine that very many of these
problems would warrant interfacing a computer to an "NP coprocessor" with
pumps, reagents, glassware, heaters, and gels. The small problem (a map with
seven cities) on which Adleman demonstrated this technique took a week of lab
work, and even though a substantially larger problem probably wouldn't take
appreciably longer and the procedure could be automated to speed it up by a
couple orders of magnitude, this technique seems destined for use on large,
nonurgent problems with very valuable answers.
Where are such problems found? Cryptology is one place. (It's no coincidence
that Adleman is the "A" in the RSA public-key encryption algorithm.) For
decades, cryptologists have been mining NP for problems around which ciphers
might be built. The best pedigree for a cryptographic protocol is proof that
breaking it is equivalent to solving some general problem in NP. But Adleman
has knocked off the rails the traditional calculus of security that applies to
these systems, and arguments that once postulated processors and microseconds
may soon revolve around gallons of vat capacity.


References


Adleman, Leonard M. "Molecular Computation of Solutions to Combinatorial
Problems." Science (November 11, 1994).
Garey, Michael R. and David S. Johnson. Computers and Intractability: A Guide
to the Theory of NP-Completeness. San Francisco, CA: W.H. Freeman, 1979.
Schneier, Bruce. "NP-Completeness," Dr. Dobb's Journal (September 1994).
DNA Basics
Deoxyribonucleic acid (DNA) is a long, thin, chain-like molecule made by
connecting smaller molecules called "bases" (see Figure 4). Four different
bases--typically called A, T, C, and G--occur. The number of bases in a DNA
molecule can range from a mere handful to tens of millions. A DNA molecule can
be specified by giving its sequence of bases in the order in which they
appear, such as "ATCCATTAG...." Any sequence of bases is possible. There is
directionality in a DNA strand, so the molecule AAAAATTTTT is not just
TTTTTAAAAA viewed upside-down.
The A bases have a gentle attraction for the T bases, and the C bases for the
G bases, such that two DNA molecules whose sequences "fit together" (see
Figure 5) will tend to stick together. (This is the famous "double helix"
configuration, though I ignore the helicity in my diagrams.) Two DNA sequences
are called "complementary" if each equals the other in reverse order with As,
Ts, Cs, and Gs replaced by Ts, As, Gs, and Cs, respectively.
Over the past few decades, molecular biologists have discovered enzymes (large
protein molecules that occur in cells) that perform such functions as cutting
DNA molecules where certain sequences occur, assembling DNA molecules
complementary to existing DNA molecules, linking separate DNA molecules into
longer molecules, and more. These enzymes are now routinely harvested from
bacteria and used in molecular-biology laboratories.
The typical cell in your body contains around 7 billion base pairs of DNA,
with a total length of about 2 meters--one meter from each parent. This DNA
occurs in 46 pieces called "chromosomes," and constitutes a vast library of
recipes used in conducting the business of a cell. The DNA you got from each
of your parents is thought to contain recipes for around 100,000 different
proteins. A protein is a string of amino acids selected from a suite of 20,
and the recipe for a protein is simply an ordered list of amino acids. Three
consecutive DNA bases specify one amino acid in the protein, and the
translation from DNA triplets to amino acids is done very tidily using a
lookup table that also contains "end-of-protein" triplets. The replication of
DNA (for dividing cells) and its translation into proteins are performed by
complex proteins built (naturally) from recipes encoded in the DNA. A large
part of the business of your body is carried out by proteins, and much of the
rest is carried out by molecules built by proteins.
Multiplying the estimated 100,000 protein recipes (genes) by the typical
length of a gene (a few thousand base pairs) gives a few hundred million--a
small fraction of the total amount of DNA in the cell. It is presently unknown
what function, if any, is served by all that extra DNA, but much of it
consists of sequences that are stylistically different from protein recipes.
--P.P.

Figure 1: Adleman's DNArepresentation of a particular city.
Figure 2: Roads are represented by DNA sequences complementary to the cities
they connect (sequences in the top row are read from right to left).
Figure 3: A mixture of roads and cities tends to self-assemble into possible
itineraries.
Figure 4: (a) A schematic representation of the four bases from which DNA
molecules are assembled. (b) Short DNA molecules are sometimes classified by
the number of bases; for example, the illustrated 6-mer.
Figure 5: A double-stranded DNA molecule made from complementary 6-mers. Note
that the backbones run in opposite directions.


























































PROGRAMMER'S BOOKSHELF


Perspectives on Computer Security




Lynne Greer Jolitz


Lynne, who is coauthor of 386BSD, can be contacted at ljolitz@cardio.ucsf.edu.


For most people, security is as simple as locking the front door or putting a
Club on a car's steering wheel. For networked computer users, security is a
devilish issue, because a computer system can be compromised by any one of
millions of other computers around the globe. Fortunately, a good number of
books on network-security techniques are available, and while none will
protect a computer from the latest attack (you'll just have to keep up on
journals and conferences for that), many offer valuable insights.
Network Security: Private Communications in a Public World, by Charlie
Kaufman, Radia Perlman, and Michael Speciner, discusses the practical issues
of secure communications, including cryptographic techniques, applied-number
theory, authentication, and integrity. It also covers existing Internet
mechanisms used to increase network security (Kerberos, PEM, PGP, and the
like) as well as extensions to X.400 and NetWare. Finally, the book provides a
good overview of encrypted communications and authentication as currently used
on the Internet. It avoids matters such as the formal government-security
framework and concentrates on the actual "moving pieces" used in security
mechanisms.
I enjoyed this book primarily because it was loaded with insider jokes and
minutiae, such as "UNIX, an unusually user-hostile and otherwise mediocre
operating system" or (my personal favorite) 
...plausible deniability, a situation in which events are structured so that
someone can claim not to have known or done something, and no proof exists to
the contrary. Whenever this term comes up, the person in question is almost
certainly guilty.
The authors are not afraid to voice opinions on popularly perceived solutions
to insecure networks. For example, the current trend of developing, selling,
and purchasing commercial firewall packages is concisely characterized by
Charlie Kaufman: 
Firewalls are the wrong approach. They don't solve the general problem, and
they make it very difficult or impossible to do many things. On the other
hand, if I were in charge of a corporate network, I'd never consider hooking
into the Internet without one. And if I were looking for a likely financially
successful security product to invest in, I'd pick firewalls. 
The meat of Network Security: Private Communications in a Public World is its
practical introduction to communications-oriented security in the form of
encryption and authentication; specific implementation details are described
only casually. Of particular interest in this post-Mitnick era is the brief
discussion of sabotage-resistant routing protocols. Since routing is the next
logical target of attack, it is an area worthy of critical study. In fact,
secure routing and network integrity alone could fill another book.
Network Security: Private Communications in a Public World provides a balanced
treatment of controversial topics (such as cryptography), but it isn't a
"war-stories" book. The level of discussion is technical enough to get the
point across, yet not so detailed as to become dull. Still, the book lacks
descriptions of attacks against TCP and DNS. Even though they've been covered
in other security books, these topics still have a place in a discussion of
attack pathologies. The book also omitted discussions of the "Green Book," the
follow-up work to the "Orange Book" (which maps the Trusted Computing metaphor
into a networking paradigm). While of admittedly limited use, the Green Book
does offer sanguine observations about network security that fall into the
scope of this book. Finally, the text jumps right into specific algorithms
without bothering to develop the subject of cryptography. The result is an
incomplete picture: It's unclear why a certain technique is employed in a
given algorithm or why an algorithm is considered flawed. 


E-Mail Security for the Layman


While insider stories and algorithmic examinations are interesting, they are
less than useful to the individual trying to protect e-mail from prying eyes.
To complicate matters, while regular surface mail is protected by a host of
laws regarding privacy and is processed by a quasi-governmental agency which
must follow certain regulations, most e-mail correspondence is not (yet) as
carefully protected or regulated. The law is still murky regarding privacy
from coworkers, system administrators, managers, and the like. Thus,
protection of sensitive correspondence and the limits of such protection are
topical subjects.
E-Mail Security: How to Keep your Electronic Messages Private, by DDJ
contributing editor Bruce Schneier, is an in-depth treatment of
electronic-mail security intended for immediate application by the reader.
Schneier begins with an overview of electronic-mail security and goes on to
discuss and contrast the two preeminent security encapsulations used in
network electronic mail--Pretty Good Privacy (PGP) and Privacy Enhanced Mail
(PEM). Finally, the book addresses restrictions placed on its use by the
government and intellectual-property rights. Schneier's discussion of finite
mathematics alone is worth the price of the book.
The one downside is Schneier's view that it is absolutely good to secure all
communications in this manner. While this approach probably appeals to his
target audience, it is ironic that the same tools that can prevent
misappropriation of information can also be used to shield a scoundrel who
misappropriates other's work. Yes, I've heard the argument that anyone who
doesn't secure their work deserves to be punished, but that's just the old
blame-the-victim routine, which doesn't deal with reality.
In addition, shielding posters or remailers on the net, making them
effectively anonymous, is not a defensive security approach intended to keep
personal e-mail private, but instead an ideologically motivated offensive
tactic. Net users should be aware that this approach is rarely used for
purposes of, say, revealing a governmental plot to suppress information:
Instead, it's used for character assassination, personal vendettas, theft of
work, disinformation, petty criminal behavior, and worse. In fact, the current
chaos is eerily similar to John Brunner's prediction in his classic book The
Shockwave Rider over 20 years ago, where anonymous denunciation lines allowed
antagonists to destroy a protagonist's credit, job status, and even marriage
without fear of retribution. Ignoring or aiding this practice without regard
for the consequences is ethically questionable at best.
Overall, Schneier's writing has a concise, readable, appealing style. E-Mail
Security: How to Keep your Electronic Messages Private is ideal for the
computer user who feels insecure about sending Internet mail and has an active
interest in the powerful tools available for securing it.


Network Security as a Professional Practice


Network Security, by Steven L. Shaffer and Alan R. Simon, provides a
comprehensive, top-down approach to computer and networking security as a
professional practice. It focuses primarily on the formal nomenclature and
structure used as the framework for government- and commercial-security
environments. This formalism is critical for serious computer-security work.
Network Security is ideal as a top-down introduction to any intensive study of
formal security mechanisms and policies of the last 20 years.
Not included are the "tools of the trade" that a network-security officer uses
in practice, the methodology that programmers use to implement secure
operating systems, or the cryptographic mechanisms that secure communications
across a data network. However, bibliographic references provide pointers for
the serious student.
One nice feature of this book is a description of
representative-government-security programs that show the formal
information-security structure in practice. Among the programs discussed are
the Department of Defense's BLACKER, DNSIX, and CCEP; profiles of
security-product vendors are given as well. (This latter group was incomplete:
Sun's Secure Solaris, Oracle's MLS products, and HP's HP-UX BLS were missing.)
A downside of Network Security is its insularity and relative blindness that
stems from its proximity to traditional security perspectives. For example,
while PEM and Kerberos are discussed briefly, unofficial security mechanisms,
such as PGP and COPS, are not. There is no critical analysis of the inherent
weaknesses of the "official" architectures for information security. Despite
these omissions, however, Network Security's coverage of the appropriate
formalisms make it essential to the serious security professional's library.


Enterprise Network Security


Network Security: How to Plan for It and Achieve It, by Richard Baker, is the
most ambitious of the books discussed here. It develops and implements an
enterprise network's security envelope from the bottom-up, but avoids
discussion of the underlying mechanisms. Baker speaks to MIS managers or
network administrators who must develop and implement an official, organized
security policy, comprising physical security, business-management structures,
backups, training, viruses, and security audits. 
Each chapter begins with an overview of a problem (such as securing the
desktop), then develops a top-down plan to deal with it. While fleshing out
these details, Baker discusses the elements and management of a careful,
secure environment (occasionally citing industry examples). The book does not
cover operating-system and software architectures; it concentrates on
operational aspects pertinent to a business. 
Network Security: How to Plan for It and Achieve It reminds us that
information security often fails because it is not integrated into the
information system from the start.
The breadth of the book is exemplified in its discussion of the legal
requirements of a network-information processing service, including the legal
doctrines of due care and due diligence. Few administrators are aware of the
potential liabilities of insecure or improperly maintained information
systems, which are magnified when the system retains information covered by
privacy or intellectual-property rights. The Infobahn of the future will
likely involve many suits over negligent operation of information services,
resulting in substantial liability awards from unsuspecting companies.
Baker approaches enterprise network security from a situational perspective.
This is bound to appeal to the administrator who can directly apply Baker's
solutions to rectify a situation or avoid an incident; enterprise network
administrators should keep this book handy.
Network Security: Private Communications in a Public World
Charlie Kaufman, Radia Perlman, and Michael Speciner
Prentice-Hall, 1995, 504 pp. $46.00 ISBN 0-13-061466-1
Network Security
Steven L. Shaffer and Alan R. Simon
Academic Press, 1994, 318 pp. $25.95, ISBN 0-12638-01-04

E-Mail Security: How to Keep your Electronic Messages Private
Bruce Schneier
John Wiley & Sons, 1995, 362 pp. $24.95, ISBN 0-471-05318-X
Network Security: How to Plan for It and Achieve It
Richard H. Baker
McGraw-Hill, 1995, 456 pp. $34.95, ISBN 0-07005-14-10

























































SWAINE'S FLAMES


Did the Cuckoo Lay an Egg?


I was elated when I caught Cliff Stoll on the radio talking about his book
Silicon Snake Oil (Doubleday, 1995). The book is an antidote to the hype--the
snake oil--surrounding the Internet, the as-yet-unbuilt information
superhighway, the fabled realm of cyberspace. On this radio program, Stoll was
presenting some of the arguments he makes in the book.
As I listened to his arguments, I found myself mentally demolishing each one.
This guy's all wet, I thought. I should buy the book and write a
take-no-prisoners critique of its arguments. Rip it apart the way Mark Twain
demolished James Fenimore Cooper's writing in "Fenimore Cooper's Literary
Offenses."
I should have known better. Cliff Stoll is the author of The Cuckoo's Egg, a
wonderfully engaging, true-life detective story about his success in
uncovering a ring of net-cracking spies. Obviously he knows how to write, and
how to think straight. When I read Silicon Snake Oil, I saw that it was not at
all the book I had expected. Like many of us, Stoll is better on the page than
off the cuff.
Still, I found myself raising a few quibbles.
Seeking the culture of the net, Stoll finds "libertarian political leaning:
Stay off my back and let me do whatever I please...not much informed
dialog...name-calling" and way more male than female voices. He decries the
stridency of debate, the self-centeredness of political action on the net.
Yes, but is it really useful to try to characterize the culture of the net
today, when its population is doubling in size annually, when all of the new
users are people not steeped in the current net culture, when the 1970s
ARPANET culture that once characterized the net is visibly evolving into one
subculture among many? Isn't it likely that the gender imbalance and political
narcissism and insensitive macho stridency of the net merely reflect its past
demographics, rather than some technological determinism? In other words, is
the culture of the net really a function of the technology, or is it a
function of who's on it?
Stoll: "...unlike a chisel, drill, or shovel, the computer demands rote
memorization of nonobvious rules. You subjugate your own thinking patterns to
those of the computer. Using this tool alters our thinking processes."
Yes, tools condition how we think about problems. When all you have is a
hammer, every problem looks like a nail. But software is a new kind of tool,
distinguished by its malleability. (A computer is just a toolbox; it's surely
software that Cliff means to be talking about, rather than the box.) No
program conditions our thinking as rigidly as a hammer. Although it's
important to consider how our (software) tools affect our thinking, surely
this is less of a problem with software tools than with hardware tools like
chisels, drills, shovels, hammers, or printing presses. The nonobviousness of
the rules that programs impose is not a given; some people write good user
interfaces. And what exactly does he mean by the thinking patterns of the
computer anyway? 
"How sad," Stoll says, speaking of the metaphorical world of cyberspace, "to
dwell in a metaphor without living the experience." He underscores the
nonphysicality of this world with a powerful phrase: "A hug without touching."
Yes, I'd hate to give up the tactile pleasures of life. We are all physical
beings, but we're not just physical beings. We also have a mental existence.
Things that happen online are not mere metaphors; we can meet people, learn,
have our lives changed, our hearts broken, all without physical contact. We
are, to some extent, creatures of thought as well as of flesh; you can argue
about the balance, but you shouldn't deny the reality of mental life.
Okay, these are quibbles. It's a good book. My only serious gripe is that
every time Stoll takes a position, he backs down from it. From someone who
goes around catching spies, I was hoping for a hard-hitting expos. Silicon
Snake Oil is not a hard-hitting expos. It does raise a lot of questions, and
for those who haven't considered them before, that's worthwhile.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com










































OF INTEREST
VisualFlow, from Momentum Software, is an object-oriented, visual
application-development tool for designing and managing information between
heterogeneous, distributed systems. Using point-and-click techniques,
developers can integrate systems by selecting objects from an object catalog,
then configuring and linking them. Additionally, VisualFlow objects can
perform mapping functions (one-to-one, one-to-many, many-to-one) to bridge
data from one application or database to another. Data can then be translated,
filtered, rearranged, and merged. VisualFlow supports objects that interface
to APIs, including SQL, RPC, IPCs, files, screens, and message-oriented
middleware (MOM). VisualFlow, which sells for $4500.00, runs on OSF/Motif and
X Window System platforms.
Momentum Software
401 South Van Brunt
Englewood, NJ 07631
201-871-0077
FMG has released OpenDialog, a dialog-box management tool for Macintosh
developers. In addition to enabling code reuse between dialog boxes, the tool
automates the management of buttons, checkboxes, and filters, and the editing
of fields and fonts. OpenDialog supports development in both 68K and PowerPC
native modes. All routines are C and Pascal callable, and compatible with
Think C, Metrowerks, and MPW. The tool sells for $259.00.
FMG
131 Elden Street, Suite 308
Herdon, VA 22070
703-478-9881
Watcom has announced multiplatform versions of its SQL toolkits on a single
CD-ROM. The Watcom SQL package includes support for Windows 3.1, Windows NT,
OS/2, and NetWare NLMs; Windows 95 support will be available when the user
interface begins shipping. Watcom SQL 4.0 is a stand-alone system for
developing single-user or mobile applications, while Watcom SQL Server 4.0 is
designed for 6, 16, 32, or an unlimited number of users. Watcom SQL 4.0 sells
for $295.00, while the server edition retails for $795.00 to $4995.00,
depending on configuration.
Watcom International
415 Phillip Street
Waterloo, ON 
Canada N2L 3X2
519-886-3700
Borland has announced its RAD Pack for Delphi, a rapid-application development
toolkit that includes: Visual Component Library source code for over 70
components in Delphi; Resource Workshop, which allows developers to extract
and modify standard Windows resources such as icons, cursors, bitmaps, and
dialogs; Resource Expert, to convert standard resource scripts into Delphi
form; Visual Solutions Pack, a collection of VBX custom controls, including
spreadsheet control; WYSIWYG word processors, asynchronous communications,
image editors, and the like; the Delphi Language Reference Guide, a printed
reference of the Object Pascal language; and Turbo Debugger for Windows. The
RAD Pack retails for $249.95. 
Borland International
100 Borland Way
Scotts Valley, CA 95067
800-453-3375 ext. 1309
AT&T has announced a program called "AT&T Resources for New Business" to help
start-up businesses get off the ground. The program package includes
interactive access to a library of business management and marketing
information, a dedicated resource center, and special discounts on
business-office equipment, software, and payroll services, along with savings
of up to 80 percent on select business publications. Coincidentally, enrollees
in the program must also sign up for AT&T long-distance telephone service.
Annual subscriptions to the program cost $99.00. 
AT&T Resource Center
800-782-7837
Green Book International has released a suite of Windows-based hypertext
publishing tools. All toolkits are built upon a real-time publishing engine
that reformats documents on the fly based on window size and zoom level.
Because of small system overhead, the company claims that 1000 pages of text
can be stored on a single, high-density diskette. The more-powerful toolkits
in the suite support data encryption, password access, and allow you to
incorporate images, sound, and video.
GBook Personal ($199.00) lets you import HTML and ASCII files, but does not
support video, sound, or security. It does include viewer licenses for five
books, with 20 users per book.
GBook Professional ($449.00) allows you to import HTML, ASCII, RTF, Word 6,
WordPerfect for Windows 6, Ami Pro, and Corel Ventura files. It exports to
HTML, and supports sound and video, but not security. Includes viewer licenses
for 10 books, with 100 users per book.
GBook Enterprise ($3499.00) allows you to import SGML, HTML, ASCII, RTF, Word
6, WordPerfect for Windows 6, Ami Pro, and Corel Ventura files. It also
exports to HTML and supports sound, video, and security. This toolkit includes
a viewer license for 50 books, 300 users per book.
Green Book International
15 Emery Court
Nepean, ON 
Canada K2H 7W2
613-726-6565
SunSoft has introduced the SunSoft Performance WorkShop for Fortran 90, an
integrated application-development tool suite that includes a compiler for the
latest revision of the Fortran standard, as well as mathematics libraries
optimized for multiprocessor (MP) environments.
SunSoft Performance Workshop for Fortran 90, based on CF90 from Cray Research,
offers several language features over Fortran 77, including freeform source
input, new control structures, derived data types, and direct support for
array manipulation.
Additionally, the suite includes the SPARCompiler Fortran 90 compiler and
SunSoft Performance Library, scientific and mathematics libraries optimized
for SPARC MP environments. The SunSoft Performance WorkShop for Fortran 90
sells for $4495.00, with multiuser discounts available. Single-user, unbundled
versions of the SPARCompiler Fortran 90 and SunSoft Performance Library start
at $1295.00 and $995.00, respectively.
Sun Microsystems
800-821-4643.
http://www.sun.com 
Ex Machina has announced a set of SDKs for Windows-based paging and wireless
messaging applications. The SDKs allow you to embed paging and wireless
messaging into PC programs. The Notification SDK ($195.00 plus commercial
distribution fee) provides short message notification and paging functions for
sending messages to pagers or PCs, sending group pages, and the like. The
Messaging SDK ($1495.00 plus royalties) enables apps to read, log, and manage
messages from PCs; send unlimited-length messages; split messages; and more.
Finally, the Data SDK ($3995.00 plus royalties) has the additional
capabilities of sending files such as spreadsheet and word-processing
documents, message encryption and compression, and support for TCP/IP. 
All three SDKs are Windows C++ DLLs with C wrappers, allowing them to be used
in C programs and with tools such as Visual Basic and Visual FoxPro.
Ex Machina
11 East 26th Street, 16th Floor
New York, NY 10010-1402
800-238-4738
Power Computing Corp. and Metrowerks have announced the Power Computing
CodeStation, a workstation configured to provide programmers with an
integrated hardware/software solution for developing software. CodeStation
will be based on Power Computing's PowerPC/MacOS PCs, and Metrowerks'
CodeWarrior software-development tools. CodeWarrior running on CodeStation
will target applications for the Macintosh OS running on 680x0 and
PowerPC-based computers, Windows 95/NT running on 80x86-based computers, and
Magic Cap running on personal communicators.
Metrowerks
512-346-1935
info@metrowerks.com 
Micro Focus has announced the BridgeWare family of development and run-time
tools to provide seamless access from GUI development environments such as
PowerBuilder and Visual Basic to Micro Focus Cobol and CICS applications. The
BridgeWare family includes Micro Focus DeskTop BridgeWare (PowerBridge and
VisualBridge) and the Micro Focus BridgeWare Server Enabling Kit. BridgeWare
generates external function calls, including all 4GL presentation logic, from
existing legacy and newly developed Cobol source without requiring you to
reengineer or rewrite the program interface. 
DeskTop BridgeWare is a Windows-based, client/server code-construction
middleware toolset that generates API requests in the form of 4GL
script-language external function calls. BridgeWare generates all 4GL script
necessary to provide access to Cobol from GUI 4GL applications in a single,
nonprocedural (point-and-click) process. Desktop BridgeWare apps can be
deployed as single user or as Windows clients on any 4GL-Cobol supported
platform across the enterprise without requiring additional run-time license
fees. You can also create Cobol-Windows DLLs through a GUI Administration
System that allows you to customize Cobol-Windows executables and organize the
Cobol and CICS components. A GUI compile and link facility generates Windows
DLLs in a simple point-and-click process.
The BridgeWare Server Enabling Kit lets you connect PowerBuilder and Visual
Basic client applications to networked Cobol application servers and offload
CPU-intensive processing in a true scalable, distributed-processing
environment. The BridgeWare Server Enabling Kit requires DeskTop BridgeWare
and the Micro Focus Transaction System, a multiuser, multitasking,
multiplatform online transaction- processing system for designing distributed
client/server systems. 
Desktop BridgeWare sells for $399.00 per programmer, with the Micro Focus
Server Enabling Kit for $500.00 per server.
Micro Focus 
2465 East Bayshore Road
Palo Alto, CA 94303

415-856-4161
Cimetrix has announced the Cimetrix Open Development Environment (CODE), an
open architecture, standards-based family of software for developing and
deploying workcell control applications.
Using CODE, manufacturers can significantly reduce their equipment costs and
time to retool, while increasing flexibility and responsiveness. CODE runs on
standard operating systems (UNIX, X Windows, and Windows NT). CODE includes an
integrated suite of software tools that allows manufacturing engineers to
conceptualize, design, simulate, test, and debug a workcell application in an
off-line environment, using a point-and-click programming interface or
programming in standard C/C++. 
CIMBuilder and CIMulation make up the CODE family of products. CIMBuilder is
an object-oriented, standards-based rapid-application development (RAD)
environment that also supports C/C++ programming. CIMulation is an off-line,
graphical, workcell-simulation environment, that, when used in conjunction
with CIMBuilder, provides you with immediate feedback on how each task will be
implemented by the workcell mechanisms. CIMBuilder and CIMulation sell for
$3000.00 each.
Cimetrix
222 South 950 E.
Provo, UT 84606
801-344-7000
Analog Devices has released a low-cost DSP design kit called the "ADSP-2100
EZ-Kit Lite Development System." The kit includes everything you need to
evaluate, develop, debug, and prototype DSP applications. Specifically, the
system includes a development board with 16-bit stereo/audio I/O capability,
assembler, linker, and simulation software, PC-host software, and DSP
algorithm source code and accessories. Sample programs include those for MPEG
audio decode and echo cancellation. The code is compatible with the entire
family of Analog Devices' ADSP-2100 processors. The kit sells for $89.00. 
Analog Devices
Three Technology Way
Norwood, MA 02062
617-461-3881
PKWare has introduced its PKWare Data Compression Library for UNIX. The
toolkit, which can be used with most C/C++ UNIX-based compilers, lets you
compress or extract to and from memory, disk, I/O ports, or any other device
your program can address. It requires 35K of memory for compression, and 12K
for extraction. The library, which is available in individual versions that
support SCO Open Desktop UNIX/XENIX 386, Novell UnixWare, Sun Solaris 1.x
SPARC, Sun Solaris 2.x SPARC, and Sun Solaris 2.x X86, is compatible with with
PKWare libraries for DOS, Windows, Win32, and OS/2. Versions of the library
sell for $450.00 each.
PKWare
9025 N. Deerwood Drive
Brown Deer, WI 53223
414-354-8699
Software Garden has begun shipping Dan Bricklin's OverAll DLL, a library for
dynamically displaying magnified details of spatial data (map, photo,
architectural design, and so on) too large to fit on the computer screen
without losing sight of the overall image. This is akin to passing a
magnifying glass over a map. OverAll DLL technology can be integrated into a
wide variety of applications via languages and environments such as C/C++,
Visual Basic, and PowerBuilder. The royalty-free library and the OverAll
Viewer authoring system sell for $495.00.
Software Garden
P.O. Box 373
Newton Highlands, MA 02161
800-745-6101
The AccuSoft Redlining Toolkit lets you add redlining and annotation to just
about any Windows-based imaging application. Delivered as a DLL, VBX, or OCX,
the toolkit is compatible with Visual Basic, PowerBuilder, C/C++, and other
development environments. The package includes features such as lines, arrows,
sticky notes, text, highlighter, zooming, object linking, and file I/O. The
Redlining Toolkit sells for $995.00.
AccuSoft 
Two Westborough Business Park
Westborough, MA 01581
508-898-2770



































EDITORIAL


The Beat Goes On


In a time when USA Today and "CNN Headlines" news-bite journalism ladles out
daily events in forgettable, teaspoon-sized tidbits, we forget that most
stories don't end when they see print. In fact, some of the more interesting
twists and turns of current events occur after headlines have turned to fish
wrap. With that in mind, it seems like a good time to catch up on past events
covered in this space, as well as introduce a few new ones.
Back in February, for instance, I took a gander at the problem of conflicting
Internet domain site names. Making the news at the time were domain names like
"ronald@macdonalds.com," and "mci.net" that had been registered by individuals
having scant connections with the obvious namesakes. More recently, the
Council of Better Business Bureaus was beaten to the punch by Mark Sloo, who
registered "bbb.com" and "bbb.org." Miffed at having to settle for "cbbb.org,"
the Council claimed trademark violation and launched a lawsuit against Sloo
and Tyrell Corp., Sloo's Internet provider. In a press release, the Council
crowed that "the BBB, which operates on behalf of an ethical marketplace,
issued a warning to unethical entrepreneurs attempting to transact business in
the new 'cyberspace' marketplace." With the weight of BBB lawyers leaning
heavily on his pocketbook, Sloo transferred registration of bbb.com and
bbb.org to the Better Business Bureau. 
Speaking of trademarks, in July, I railed against Microsoft's heavy-handed
(and selective) enforcement of its trademark for the term "bookshelf."
Exhibiting rare good sense, the U.S. Patent and Trademark Office recently said
"no" to three gold diggers who tried to trademark "Air McNair," the nickname
of football player Steve McNair. Although trademarks are traditionally granted
on a first-come/first-served basis, the PTO ruled that commercial use of
McNair's nickname would be prohibited without his written consent. The trio of
disappointed entrepreneurs (none named "McNair") admitted their intent was to
line their pockets with McNair's signing bonus.
Unfortunately, the PTO wasn't seeing as clearly when it granted a patent to
Dr. Samuel Pallin, an Arizona ophthalmologist, for a unique incision he
developed for cataract surgery. After receiving the patent, Pallin began suing
and demanding royalties from physicians who used the procedure. The American
Medical Association, for its part, has issued a statement declaring patenting
of medical procedures to be unethical.
Patents were also in the eye of the storm in April, when CompuServe started
asking for royalties for use of its LZW-based, GIF file-format specification.
After developers cried foul, CompuServe kicked into high gear its plans to
dump GIF in favor of PNG (described in "PNG: The Portable Network Graphic
Format," by Lee Daniel Crocker, DDJ, July 1995). PNG is free and open, and
available for use without fear of patent infringement. The online provider
also announced it will provide a freely distributable PNG-based toolkit that
will include a GIF-to-PNG conversion utility.
It was graphic language, not graphic images, that landed Jake Baker in hot
water, as discussed in May. Baker--a University of Michigan student who posted
an Internet story describing abduction, torture, and mutilation--ill-advisedly
used the name of a classmate in his story, and the university and FBI took him
seriously. He was kicked out of school, charged with interstate transmission
of a threat to injure, and held without bond. Federal Judge Avern Cohn, who
proved to be more interested in meting out justice than grabbing headlines,
threw out the case, saying it should have been handled as a disciplinary
matter by the university. Baker has since enrolled in an Ohio school.
Dan Farmer, on the other hand, was canned from his job at Silicon Graphics not
for what he said, but for what he did. Farmer, as discussed in the June issue,
released onto the Internet a program called "Satan" that analyzes networks for
security holes. Satan and its creator were in the headlines for the requisite
15 minutes, then quietly drifted out of the limelight. 
But before Farmer had to start standing in breadlines, he was snapped up by
Sun Microsystems. Along with Tsutomu Shimomura and Whitfield Diffie, Farmer is
testing the vulnerability of Sun's Java language and SunScreen security
technology. (SunScreen is designed to enable credit-card numbers and other
sensitive data to be safely transmitted over the Internet. For information on
Java, see "Java and Internet Programming," by Arthur van Hoff, DDJ, August
1995, and "Net Gets a Java Buzz," by Ray Valds, Dr. Dobb's Developer Update,
August 1995.) Good. Farmer is off the street, and Shimomura is busy at
something productive instead of signing book and movie deals about his part in
tracking down infamous computer criminal Kevin Mitnick.
And speaking of Mitnick, America's most wanted digital rapscallion signed a
deal of his own--a plea bargain which will net him eight months in the
slammer, instead of a possible 20 years. Since Mitnick will probably be back
on the networks and looking for a job next year, maybe Sun ought to put him on
the payroll, too. 
Jonathan Ericksoneditor-in-chief













































LETTERS


C++ Event-Driven Threads 


Dear DDJ,
In his excellent article "Event-Driven Threads in C++" (DDJ, June 1995), Dan
Ford mentions that since "several of the methods [of the QThread class] are
pure virtual...we must provide an implementation for them in the derived
classes." While this is true within the context of the article, I've always
found it a bit misleading to put it this way without further qualifications.
Many people I've run into think that if a method is pure virtual, not only
must derived classes provide an implementation (which is false), but also that
the base class cannot provide a default implementation (which is also false).
In their book C++ FAQS, Marshall P. Cline and Greg A. Lomow get the latter
wrong in one spot (though [they] later unselfconsciously correct it), where
they assert "there is no way to implement [a pure virtual] member function in
[an abstract] base class."
A derived class must provide an implementation of a pure virtual function only
if one wishes to instantiate the derived class. For example, suppose class A
declares a pure virtual function void foo()=0;, class B inherits from A
providing no implementation for foo(), and class C inherits from B and does
provide an implementation for foo(). All this means is that B is still an
abstract class which cannot be instantiated, though C can. Class A could
provide a default implementation for foo(), which could be used in C, or in
any other derived classes, for that matter.
Though, as Scott Meyers notes in Effective C++, this feature of the language
is "generally of limited utility," it can be useful, for example with pure
virtual destructors providing default cleanup behavior, which is inconvenient
to offload to nonvirtual functions.
Bill Lear
Lear Software
rael@world.std.com
Dan responds: Thanks for your comments, Bill. You make a good point regarding
the implication of my statement about pure virtual methods (even though, as
you point out, it is true in the context of the article). In an article of
that length, it is usually not possible to explore all the possible
consequences of various design and implementation decisions. Perhaps the
paragraph would have been clearer if I had pointed out that since the QThread
class did not provide an implementation for the methods, it would be necessary
for concrete classes derived from QThread to provide implementations.


Park It 


Dear DDJ,
After reading Jonathan Erickson's "Editorial" (DDJ, June 1995), I'm going to
have to add the special infrared flashlight that lets me stay in the parking
place but feed coins. We already use radar and laser detectors and jammers. 
Hmmm. A high-intensity infrared beam.... "Terrorists attacked the parking
meters at city hall this morning, causing general havoc, and permitting the
public to park freely for over two hours. Loss of revenue was devastating...."
Now, let's go a step further. That infrared sensor might just be able to read
a digital signature on the car. And while you're at it, instead of plugging in
more cash, maybe it could just charge my account. Betcha I'll have a fourth
box on my dashboard that transmits the digital signature of the Governor's
car. Of course, people will get suspicious when the Governor is parked at ten
different parking spaces at the same time.
This really isn't too far-fetched. But the privacy implications could be a
problem. You can already get traced via some pagers, as well as cell phones,
credit cards, and follow-me phone numbers. Tracing your car would just add one
more tool to Big Brother's arsenal. Kinda scary as a person, but kinda cool as
a gweepoid unit. We developer types like to have explicit control over the
things we can control.
...click...
Nelson Crowle 
ncrowle@tyrell.net


Will the U.S. Have an Internet Future?


Dear DDJ,
Jonathan Erickson's "Editorial" about the Internet (DDJ, May 1995) raises many
concerns about the future of Internet in the United States. Let's suppose that
Senator Exon's bill about making node operators liable for "indecent contents
of messages" passes. Operators will demand release contracts from everybody
that talks with their machines, users and other nodes alike. Then, they will
only pass on messages from "trusted" people, that is, people whom they have
contracts with. It will be the end of Internet for all practical purposes in
the United States. It will be as complicated and expensive (or more) as
CompuServe and the future Microsoft Network. Well, those are two big
beneficiaries of such a law besides the lawyers....
But of course, Internet is now large enough to survive the loss of its
backbone, the United States. Certainly someone else will pick it up. Maybe
Japan, maybe Europe. Maybe the U.S. Supreme Court will end up deciding that
the law does not apply to Indian Reservations and we will have "IndianNet"
instead of Internet.
Mauro Sant'Anna
Sao Paulo, Brazil
mauro.santanna%mandic@ibase.org.br 


Julian Dates


Dear DDJ,
In his letter on Julian dates (DDJ, June 1995), Homer Tilton states that
"...when the astronomer says that his 'Julian Day'...relates back to January
1, 4713 bc, he doesn't tell us what calendar he is talking about!" This just
isn't true. From Explanatory Supplement to the Astronomical Ephemeris and the
American Ephemeris and Nautical Almanac, 1974 reprint (p. 71):
To facilitate chronological reckoning astronomical days, beginning at
Greenwich noon, are numbered consecutively from an epoch sufficiently far in
the past to precede the historical period. The number to a day in this
continuous count is the Julian Day Number which is defined to be 0 for the day
starting at Greenwich noon on b.c. 4713 January 1, Julian proleptic calendar.
Murray Lesser
Yorktown Heights, NY 
Murray.Lesser@f347.n109.z1.fidonet.org


Image Authentication 


Dear DDJ,
While Steve Walton's techniques for embedding a "seal" of authenticity in an
image are quite clever ("Image Authentication for a Slippery New Age," DDJ,
April 1995), one question remains unanswered: How does the author of an image
enable third parties to verify the seal, without giving away any other
abilities?
Walton's seal is based on a traditional authentication code, where both the
creation and the verification of the code require knowledge of a secret key.
Thus, giving a third party the ability to verify a seal also gives the ability
to create new seals.

One way out of this problem is for the author to seal each image with a
different secret key, but this can be complex to administer. Another solution
is digital signatures: Seals are created with a private key known only to the
author, and verified with a corresponding public key that can be given to
anyone without revealing the private key. The RSA public-key cryptosystem and
NIST's Digital Signature Algorithm both provide this kind of functionality.
Embedding seals is one issue; enabling other parties to verify them is
another. Digital signatures combined with Walton's techniques can provide good
solutions to both requirements.
Burton S. Kaliski, Jr., Chief Scientist
RSA Data Security
Redwood Shores, California
Steve responds: Dr. Kaliski has stated very clearly the entire solution to
image authentication, including good methods for third-party verification. The
algorithms which I proposed are intended as a way of embedding any sort of
data into an image in an undetectable way. Illustrating this with a checksum
scheme was a simple and, judging from the response, clearly understandable
environment in which to tinker with the methods. By embedding digital
signatures encrypted using RSA or whatever will succeed it in the next
generation, the problem of determining the veracity of images is completely
and conveniently solved. A final comment for those of you interested in other
forms of data: By adding a slight amount of controllable noise (if it's not
already present), any stream of data can be protected in precisely the same
manner. For example, music, voice recordings, telemetry, executable software
binaries... fill in your own blank!


Fluid Thoughts


Dear DDJ,
In his "Programming Paradigms" column "Fluid Concepts and Creative Analogies"
(DDJ, June 1995), Michael Swaine mentions a program called "Seek-Whence,"
which is supposed to find the next number of a given sequence, say,
1,2,2,3,3,3,4,4,4,4,.... The "definite" answer is 5, as given implicitly at
the end of the article. However, I claim it is 4: one 1, two 2s, three 3s,
five 4s. Why? Because 1, 2, 3, 5 is a part of Fibonacci sequence. In fact, I
can claim whatever number I like as the answer by using Lagrange interpolation
function (take the integer part if you insist that it produces a sequence of
integers) as the rule. The point is, there is no definite rule whence a
sequence came, and there is no definite answer to "puzzles" like that. So what
exactly does "Seek-Whence" do? I cannot tell until I read Hofstadter's book.
Let me ask this question: Given the sequence 1,2,3,4,5,..., what makes 6 a
more convincing follow-up than, say, 7? I feel that it is rather an illusion
than anything else. I hope this is addressed in detail in Hofstadter's book,
although it is unfortunately left out of Michael's column.
Huayong Yang 
yang@math.umass.edu
Dear DDJ,
Michael Swaine's column on "Fluid Concepts and Creative Analogies" was very
interesting. However, I have a question concerning one of the next terms under
the section titled "Seeking Whence." The string of terms in question was:
3,5,11,17,31,41,47,59,....
The given solution was p(p(n)); the (nth prime)-th. The second prime, the
third prime, the fifth prime, the seventh prime.... However the just mentioned
string does not fit the solution:
The prime numbers:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,57,59,61,67,71,73,79...
2nd prime = 3
3rd prime = 5
5th prime = 11
7th prime = 17
11th prime = 31
13th prime = 41
17th prime = 57, not 47
19th prime = 61, not 59
Therefore, the new series should be 3,5,11,17,31,41,57,61,... with the next
term of 79 the 23rd prime. Is this not correct?
Dwight Keeve,
Columbia, South Carolina
DwightK848@aol.com


TrademarkTM


Dear DDJ,
I particularly enjoyed Jonathan Erickson's "Editorial" in the July 1995 issue
of DDJ. In fact, my wife commented that I had a pronounced smirk on my face as
I read it. I guess I just could not help myself as I became amused while
reading about how Microsoft was going after an independent programmer just to
set a precedent in a trademark dispute. I suppose next they will launch a
campaign to sue everyone in the U.S. named "Bob" for trademark infringement.
This could be so widespread that Microsoft could possibly lessen the burden by
assessing a flat-fee for all "Bob-offenders." Hey, maybe they could even make
it part of their '95 tax returns. After all, it seems that the present
administration is pretty sweet on Mr. Gates, and neither Bill nor Bill will
have to pay--their names aren't "Bob." I question why Microsoft didn't name
their new interface "Bill" instead.
I would like to make a prediction. I have noticed that Microsoft has begun
offering a developer's software package that includes Office 4.2, Visual C++/
Basic, FoxPro, and a few other goodies. I also noticed that Microsoft is
making their move toward subscription plans with the developer's platform
subscription (you know, the one that Microsoft Tech Support answers every
developer question with: "What you need is the developer's platform level 2
subscription for $499 a year, the Microsoft sales number is 1-800....") and
the Visual C++ subscription. If you put these all together, in the next year
Microsoft may begin offering a Home/Office PC Software Suite. This suite will
be a one-time purchase where you get Windows 95, the 32-bit office suite
(5.0?), Quicken, Microsoft Network, Microsoft Mail, and Microsoft Publisher.
Microsoft will market this as a great "one-time investment" just like buying a
package-deal computer from companies such as Dell or Gateway. I suppose the
price will be around $1000-$1200, but since you only have to buy it once....
The company could also offer a subscription for a nominal fee of, say, $400 a
year to get four quarterly releases of the latest software on CD as well as a
number of free hours per month on the Microsoft Network. That way, the average
user will always be using the latest and greatest software, and the biggest
advantage is that all the headaches of continually purchasing upgrades are
gone. It's like one-stop software shopping for home and business users. An
added advantage is that file-format incompatibilities would no longer be an
issue. If everybody uses Word and Excel, then nobody has to worry about
converting files to another word processor or spreadsheet format when they
want to share their files.
This would pose a great threat to other companies like Lotus and Novell, as
Microsoft could potentially take over the remaining market share (not that
they don't already dominate). In fact, Microsoft could become so big that they
would be much like AT&T/Bell before the Federal deregulation. Hopefully it
won't go that far, but I know Microsoft's marketing team (I think they are
about the best around) has big dreams--they just haven't revealed them all to
the rest of the world yet.
Of course, everyone I've mentioned this idea to so far has responded with
extreme skepticism. But I'm just putting two and two together. One of my
coworkers even suggested that Microsoft would get slapped with antitrust
lawsuits. I don't think that would happen, since the up-front sale constitutes
a single purchase, and the subscription would be viewed as a software-support
agreement. Some of the big UNIX software suppliers do it every day (HP,
SunSoft, SAS, Frame software, and others).
Troy M. Noble
Colorado Springs, Colorado
71744.3311@compuserve.com













































































Clickable Images in HTML


The WWW can be more than a pretty face




Andrew Davison


Andrew is a lecturer in the department of computer science at the University
of Melbourne, Australia. He can be reached at ad@cs.mu.oz.au.


One of the nice things about the World Wide Web (WWW) is the ease with which
you can add images to WWW pages. However, much more can be done with pictures
than simply using them for decoration. For instance, you might want to use
"clickable" graphics, where something happens when users click on an image.
The HTML code in Example 1, for instance, specifies a page that uses a picture
(the Marx Brothers in marx.gif) as a link to the document marx-info.html. On
your screen, this looks something like Figure 1. When users click anywhere on
the picture, the hypertext (or "hyper-picture") link is followed to
marx-info.html (see Listing One), which looks like Figure 2 on your screen.
The drawback to this example is that clicking only causes one thing to happen,
no matter where on the picture the cursor is located. A more interesting
capability would be a clickable image with "hot spots" linking different
regions in a picture to different actions. For instance, in the Marx Brothers
picture, you could make the faces of Harpo, Groucho, and Chico hot spots, so
that when someone clicks on them, only information about that individual will
appear. Other common uses of hot spots are interactive maps, where clicking on
a particular building brings up relevant information, or role-playing games,
where clicking on an item (a whiskey bottle, a corpse, and so on) returns a
clue.
The two basic approaches to implementing clickable images with hot spots in
HTML involve either forms or environment variables. 


Clickable Images Using Forms


When implementing clickable images using forms, the key is to use a form INPUT
tag with an IMAGE attribute. Example 2 shows a small WWW page that uses these
features. On screen, this looks like Figure 3. The code between FORM and
</FORM> defines the form, with the METHOD attribute determining the means by
which the form information is sent to the WWW server. The ACTION attribute
names the program that will be invoked. The NAME attribute of the INPUT tag
specifies an arbitrary string used to identify the picture. The TYPE attribute
states that the input of the form is from an image. SRC gives the location of
the marx.gif picture.
The form in Figure 3 looks somewhat odd because the usual Send or Transmit
button is missing. Instead, the form details are sent when the cursor is over
the image and the mouse is clicked. As described in my article "Coding with
HTML Forms," (DDJ, June 1995), the details are output as a string with the
format name=value&name=value&..., where name is the name of the form's
data-input field, and value is its associated data.
In this example, the only data-entry "field" is the picture named "Marx."
However, since the input type is image, the two fields Marx.x=<X-character>
and Marx.y=<Y-character> are transmitted. X and Y are the coordinates of the
cursor over the image when the mouse was clicked. The axes for an image start
at (0,0) in the top-left corner, with X increasing across and Y increasing
downwards. The coordinates of various parts of an image can be obtained using
most graphics packages; xv is the easiest to use under UNIX.
When the form string arrives at the application, it can be manipulated using
the techniques described in my previous article. In this example, however, qgp
echoes the field values; see Figure 4. A more-useful application would return
different WWW pages depending on the (X,Y) coordinates, perhaps using the NAME
attribute to access a file holding relevant hot-spot information for marx.gif.



Clickable Images Using Environment Variables


In the early days of HTML, many browsers and servers did not support forms.
This is rapidly changing, making the entry of information through WWW pages
much easier. However, the full power of forms is not necessary for clickable
images since the data passed to the application is relatively simple. 
In non-forms-based HTML programming, two environment variables are commonly
utilized: PATH_INFO and QUERY_ STRING. Many other environment variables are
also available; see http://hoohoo.ncsa.uiuc.edu/cgi/primer.html and http://
hoohoo.ncsa.uiuc.edu/cgi/env.html for a complete list.
The application which services clickable images with hot spots uses the
variables PATH_INFO, QUERY_STRING, and PATH_ TRANSLATED, and it is actually
possible to manage without the (explicit) use of QUERY_STRING. Listing Two
presents the WWW page that acts as the interface to this application. The file
is also at http://www.cs.mu.oz.au/~ad/code/visuals/mxi.html. In a browser, it
looks like Figure 5.
The target location (the string assigned to href) is composed of two parts:
http://www.cs.mu.oz.au/cgi-bin/mapper, the application location, and
/~ad/code/visuals/marx.map, a string that will be assigned to PATH_INFO by the
WWW server. The server can divide the string because it "knows" that cgi-bin
is the WWW application directory. This knowledge is stored in the
configuration file for the http daemon (called http.conf) as the line: Exec
/cgi-bin/* /local/dept/wwwd/scripts/*. This acts as a kind of rewrite rule
that determines the actual location of the application
/local/dept/wwwd/scripts/mapper. The PATH_INFO value should be a string
representing a partial path to a file, although it could be any kind of string
without spaces. In this example, the partial path is ~ad/code/visuals, the
path from the home directory of the author. The filename is marx.map.
The partial path in PATH_INFO is automatically translated into a full path and
assigned to the PATH_TRANSLATED environment variable. The translation strategy
is again determined by rewrite rules in http.conf. For partial paths starting
at a home directory, the relevant rule is UserDir www-public, which states the
location of WWW directories beneath each individual's home directory. Thus,
PATH_TRANSLATED is assigned /home/staff/ad/ www-public/code/visuals/marx.map.
This is the file holding the hot-spot information for the marx.gif picture.
When the image in Figure 5 is clicked upon, information is sent to the server
in the form of a long URL. The URL consists of the string assigned to href,
with a string of the form ?X,Y appended to the end. 
X and Y will be the characters representing the (X,Y) coordinate where the
image was clicked, using the coordinate system just described. The characters
are appended by the inclusion of the ismap tag in the IMG attribute.
To summarize, the string delivered to the WWW server has the form
application-name/filename-with-partial-path?X,Y. The server processes the X,Y
tail of the string in two ways: It is assigned to the QUERY_STRING environment
variable, and the string is space separated into parts and assigned to the C
command-line arguments argv[1], argv[2],.... argv[0] always contains the
application name. In this example, X,Y contains no spaces, so it is completely
assigned to argv[1]. 
Once the command line is built, the server uses it to invoke the application,
and also makes the various environment variables available to the executing
program. Thus, for the HTML code in Listing Two (displayed in Figure 5), the
mapper application will be called with argv[1] assigned some X,Y string (for
example, "152,149"). Mapper will also have PATH_INFO and PATH_TRANSLATED
available, and will therefore be able to access the hot-spot information file
marx.map. It could also read the X,Y string through QUERY_STRING, but this is
unnecessary. 


The Mapper Application


Mapper.c is a C program which implements clickable images with hot spots. It
has three main duties:
Read the hot-spot information for a picture from a file.
Find the hot spot that contains (X,Y).
Deliver the WWW page associated with the hot spot to the user.
Listing Three presents the code for mapper.c. (This code is also available at
http://www.cs.mu.oz.au/~ad/code/visuals/mapper.c and electronically from DDJ;
see "Availability," page 3.) This program is based closely on a program called
imagemap.c, available at
http://hoohoo.ncsa.uiuc.edu/docs/setup/admin/NewImagemap.html. Also at that
site is a small example with a clickable image of the Taj Mahal. A longer
imagemap.c tutorial can be found at
http://www.ora.com/gnn/bus/ora/features/miis/ index.html.
Mapper.c has similar functionality to imagemap.c except that mapper.c assumes
that the hot-spot information filename can be located solely by examining
PATH_TRANSLATED. Imagemap.c also looks in a predefined configuration file that
holds other hot-spot information filenames. The extra functionality is useful
but inessential for this discussion, so the code for it has been excluded from
mapper.c.
The other changes in mapper.c are cosmetic: Prototype declarations have been
included, extra comments have been added, and the code has been restructured
to use more functions.


The Hot-Spot Information-File Format



Mapper.c uses the hot-spot information-file format designed for imagemap.c.
Using the terminology of that program, hot-spot information files are called
image-map files. An image map consists of lines of the form
<hot-spot-shape-type> <associated-URL> <shape outline co-ordinates>. There are
five hot-spot shape types: circle, poly (polygons), rect (rectangles), point,
and default. For instance, marx.map contains the information in Example 3.
Note that the # lines are comments. The default line specifies what URL should
be returned when a clicked (X,Y) point is in none of the hot spots. The three
circles correspond to the faces of Harpo, Groucho, and Chico. A circle is
specified by a center and another coordinate that indirectly gives the radius.
The two poly hot spots match the suitcases labeled "Blondes" and "Gags" in
marx.gif.
A complication is the interaction between the default and point hot-spot
types. A point hot spot specifies an (X,Y) coordinate, which is compared with
the clicked point. If there are several point hot spots, then the one nearest
to the clicked point is deemed active. You cannot have a point hot spot and
use default in the same image, since they each achieve the same thing. 
For more details on the image-map file format, see http://
hoohoo.ncsa.uiuc.edu/docs/setup/admin/NewImagemap.html.


Mapper.c Details


The mapper.c program first reads the PATH_TRANSLATED environment variable that
contains the location of the image-map file for the clickable picture.
get_map() does this and, if successful, assigns the string to map. Errors are
dealt with by calling servererr(), which prints a WWW error page such as the
one in Figure 6 and then causes the application to terminate.
If the image-map file can be opened, then the X and Y values in the argv[1]
string are extracted by get_clickpt(), which stores them in the clickpt array.
process_map() has three jobs: It reads in and parses the image-map file,
locates the (X,Y) coordinate in one of the hot spots, and transmits the
associated URL to the user. 
It reads the image-map file a line at a time inside a While loop, skipping #
comment lines and blank lines. The first call to get_word() stores the
hot-spot shape type in the type array, the second call stores its URL in url.
The default case has its associated URL stored in deflt unless any point hot
spots have already been processed. 
get_coords() reads in the series of coordinates that specify a hot-spot
outline and stores them in the coords array.
A series of strcmp() tests follow to determine the hot-spot shape type. Then a
function is called to test if the clicked point is inside the shape area
specified by the coordinates in the coords array.
The test for a point hot spot is different because the final choice of point
hot spot must wait until all have been tested against the clicked point. In
the meantime, the URL of the hot spot currently closest to the clicked point
is stored in deflt.
The URL associated with the chosen hot spot is sent to the user by calling
sendmesg(). This function prints the URL of a WWW page instead of its text.
This is intercepted by the server, which sends the actual document to the
user. URLs are printed using the format Location: <URL> followed by a blank
line. 
sendmesg() is complicated by its ability to deal with full URLs (for example,
http://www.cs.mu.oz.au/~ad/index.html) and partial URLs
(/~ad/code/visuals/mgm.html). The former is printed immediately, the latter is
expanded into a full URL by prefixing it with the server's host name (for
instance, www.cs.mu.oz.au) to make
http://www.cs.mu.oz.au/~ad/code/visuals/mgm.html. Details about the printing
of URLs can be found at http:// hoohoo.ncsa.uiuc.edu/cgi/primer.html and
http://hoohoo.ncsa .uiuc.edu/cgi/env.html. 


Clickable Images in Action


To illustrate the implementation and use of clickable images, let's start with
the example in Figure 5 (the corresponding code is shown in Listing Two and at
http://www.cs.mu.oz.au/ ~ad/code/visuals/mxi.html). The hot spots for the
image are specified in marx.map as the faces of Harpo, Groucho, and Chico, and
the "Blondes" and "Gags" suitcases. Marx.map also includes URLs for each hot
spot and for the default case.
When Groucho's face is clicked upon, Figure 7 appears on the screen, which is
defined at http://www.cs.mu.oz.au/~ad/ code/visuals/groucho.html. Similarly,
clicking on the "Gags" suitcase produces Figure 8, which is at
http://www.cs.mu.oz.au/ ~ad/code/visuals/gags.html. The default case is linked
to http://www.cs.mu.oz.au/~ad/code/visuals/mgm.html, displayed in Figure 9.
The layout of the various pages is meant to give the impression that the
picture stays on the screen continuously while the text around it changes.


Conclusion


Clickable images with hot spots are a helpful WWW technique, especially for
guide books (see, for example, the University of California Museum of
Paleontology pages starting at http:// ucmp1.berkeley.edu/), role-playing
games, and maps (for instance, the Internet resources "map" at
http://www.ncsa.uiuc.edu/ SDG/Software/Mosaic/Demo/metamap.html). They can
also enliven contents pages (such as with the HotWired WWW pages at
http://www.hotwired.com).
Figure 1: WWW page with image.
Figure 2: WWW page linked to Figure 1.
Figure 3: Page generated by HTML code in Example 2.
Figure 4: Echoing the field values.
Figure 5: WWW page which acts as the interface to an application.
Figure 6: Typical WWW error page.
Figure 7: Clicking on Groucho's face generates this screen.
Figure 8: Image produced by clicking on the "Gags" suitcase.
Figure 9: Clicking on the default link generates this screen.
Example 1: HTML code specifying a page that uses a picture as a link.label.
<html>
<head>
<TITLE>The Marx Brothers (image as anchor label)</TITLE>
</head>

<body>
<H1>The Marx Brothers (image as anchor label)</H1>

<br>
<img src="gball.gif" alt=""> Click on the picture for more
on the Marx Brothers. <p>

<a href="marx-info.html"><img src="marx.gif" alt=""></a> <p>

<hr>
<address><a href="http://www.cs.mu.oz.au/~ad">Andrew Davison</a></address>
</body>

</html>
Example 2: Typical WWW page.
<html>
<head>
<TITLE>The Marx Brothers (using forms)</TITLE>
</head>

<body>
<H1>The Marx Brothers (using forms)</H1>

<br>
<img src="gball.gif" alt=""> Click on the picture to find
out more: <p>

<FORM METHOD="POST"
 ACTION="http://www.cs.mu.oz.au/cgi-bin/qgp">

<INPUT NAME="Marx" TYPE="image"
 SRC="http://www.cs.mu.oz.au/~ad/code/visuals/marx.gif"> <p>

</FORM>

<hr>
<address><a href="http://www.cs.mu.oz.au/~ad"
 >Andrew Davison</a></address>
</body>
</html>
Example 3: Using hot-spot shape types.
# image map for Marx Brothers image

default /~ad/code/visuals/mgm.html

# left head
circle /~ad/code/visuals/harpo.html 52,33 52,10

# middle head
circle /~ad/code/visuals/groucho.html 142,33 142,10

#right head
circle /~ad/code/visuals/chico.html 230,46 230,26

# blondes suitcase
poly /~ad/code/visuals/blondes.html 19,74 92,62 101,101 31,109

# gags suitcase
poly /~ad/code/visuals/gags.html 219,86 309,44 310,87 251,110

Listing One
<html>
<head>
<TITLE>The Marx Brothers</TITLE>
</head>
<body>
<H1>The Marx Brothers</H1>
A family of Jewish-American comics whose zany
humour convulsed minority audiences in its time and influenced later
comedy writing to an enormous extent. <p>
<h2>Groucho (1890-1977)</h2>
(Julius Marx) had a painted moustache, a cigar, a

loping walk and the lion's share of the wisecracks. <p>
<h2>Harpo (1888-1964)</h2>
(Adolph Marx) was a child-like mute who also played a harp. <p>
<h2>Chico (1886-1961)</h2>
(Leonard Marx) played the piano eccentrically and
spoke with an impossible Italian accent. <p>
<h2>The Other Brothers</h2>
Aside from Chico, Groucho and Harpo (shown above),
there were two other brothers: <i>Gummo (1893-1977)</i>
(Milton Marx) and <i>Zeppo (1901-1979)</i> (Herbert Marx)
who left the team early on. <p>
<hr>
<address><a href="mx0.html">To Start</a></address>
</body>
</html>

Listing Two
<html>
<head>
<TITLE>The Marx Brothers</TITLE>
</head>
<body>
<H1>The Marx Brothers</H1>
<br>
<img src="gball.gif" alt=""> Click on the picture to find out more: <p>
<a href="http://www.cs.mu.oz.au/cgi-bin/mapper/~ad/code/visuals/marx.map">
<img src="marx.gif" alt="" ismap></a> <p>
<hr>
<h2>The Marx Brothers</h2>
A family of Jewish-American comics whose zany
humour convulsed minority audiences in its time and influenced later
comedy writing to an enormous extent. <p>
Aside from Chico, Groucho and Harpo (shown above),
there were two other brothers: <i>Gummo (1893-1977)</i>
(Milton Marx) and <i>Zeppo (1901-1979)</i> (Herbert Marx)
who left the team early on. <p>
<hr>
<address><a href="http://www.cs.mu.oz.au/~ad">Andrew Davison</a></address>
</body>
</html>

Listing Three
/* Simplified version of mapper 1.2
** Based on work by: Kevin Hughes,
** Eric Haines, Rob McCool, Chris Hyams, Rick Troth,
** Craig Milo Rogers, Carlos Varela
** Original version at 
** http://hoohoo.ncsa.uiuc.edu/docs/setup/admin/NewImagemap.html
** Andrew Davison (ad@cs.mu.oz.au) January 1995
** Available at http://www.cs.mu.oz.au/~ad/code/visuals/mapper.c
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#define MAXLINE 500 /* max length of line in image map file */
#define MAXVERTS 100 /* max num of coords in a shape */
#define NUMLEN 10 /* max num of digits in a X or Y number */
#define LF 10

#define X 0
#define Y 1
void get_map(char map[]);
void get_clickpt(char *arg, double clickpt[]);
void process_map(FILE *fp, double clickpt[]);
void get_word(char *input, int *pi, char *word);
void get_num(char *input, int *pi, char *num);
void get_coords(char *input, int *pi, double coords[][2], int size, FILE *fp);
void servererr(char *msg);
void sendmesg(char *url);
double sdist_apart(double clickpt[], double coords[][2]);
int clickpt_in_rect(double clickpt[], double coords[][2]);
int clickpt_in_circle(double clickpt[], double coords[][2]);
int clickpt_in_poly(double clickpt[], double pgon[][2]);
int main(int argc, char *argv[])
{
 char map[MAXLINE]; /* name of image map file with full path */
 double clickpt[2]; /* for the (X,Y) coord clicked upon */
 FILE *fp;
 if (argc != 2)
 servererr("Wrong number of arguments, client may not support ISMAP.");
 get_map(map);
 if((fp = fopen(map,"r")) == NULL)
 servererr(strcat("Couldn't open image map file:", map));
 else {
 get_clickpt(argv[1], clickpt);
 process_map(fp, clickpt);
 fclose(fp);
 }
 return 0;
}
void get_map(char map[])
/* obtain the image map file name with a full UNIX path */
{
 char *name; /* name of image map file with partial path */
 name = getenv("PATH_INFO");
 if((!name) (!name[0]))
 servererr("No image map name given. Please read the 
 <A HREF=\"http://hoohoo.ncsa.uiuc.edu/docs/setup/admin/NewImagemap.html\">
 instructions</A>.<P>");
 name++; /* ignore first '/' */
 /* if the name contains a '/' then it represents a partial UNIX path */
 if (strchr(name,'/'))
 strcpy(map, getenv("PATH_TRANSLATED"));
 else
 servererr("Map name must include a partial UNIX path.");
}
void get_clickpt(char *arg, double clickpt[])
/* extract the (X,Y) coord clicked upon */
{
 char *t;
 if((t = strchr(arg,',')) == NULL)
 servererr("Your client doesn't support image mapping properly.");
 *t++ = '\0';
 clickpt[X] = (double) atoi(arg);
 clickpt[Y] = (double) atoi(t);
}
void process_map(FILE *fp, double clickpt[])
/* parse the image map file, locate clickpt inside a hot shape,

 invoke the associated URL */
{
 char input[MAXLINE]; /* a line from the image map file */
 char type[MAXLINE]; /* type of hot spot shape */
 char url[MAXLINE]; /* URL associated with shape type */
 char deflt[MAXLINE]; /* the URL for the default case */
 double coords[MAXVERTS][2]; /* the coordinates of a shape */
 int num_ptshapes = 0; /* number of point hot spots used */
 double dist, min_dist; /* for nearest point hot spot calc */
 int i;
 while((fgets(input, MAXLINE, fp)) != NULL) {
 i = 0;
 if((input[i] == '#') (!input[i]))
 continue;
 get_word(input, &i, type);
 while(isspace(input[i])) 
 i++;
 get_word(input, &i, url);
 if((strcmp(type,"default") == 0) && (num_ptshapes == 0)) {
 strcpy(deflt,url);
 continue;
 }
 get_coords(input, &i, coords, MAXVERTS, fp);
 if(strcmp(type,"poly") == 0) /* poly type */
 if(clickpt_in_poly(clickpt,coords))
 sendmesg(url);
 if(strcmp(type,"circle") == 0) /* circle type */
 if(clickpt_in_circle(clickpt,coords))
 sendmesg(url);
 if(strcmp(type,"rect") == 0) /* rect type */
 if(clickpt_in_rect(clickpt,coords))
 sendmesg(url);
 if(strcmp(type,"point") == 0) { /* point type */
 dist = sdist_apart(clickpt, coords);
 /* If first point hot spot, or the nearest, set the default. */
 if ((num_ptshapes == 0) (dist < min_dist)) {
 min_dist = dist;
 strcpy(deflt,url);
 }
 num_ptshapes++;
 }
 }
 if(deflt[0])
 sendmesg(deflt);
 else 
 servererr("No default specified.");
}
void get_word(char *input, int *pi, char *word)
/* extract a word from an input line */
{
 int i;
 for(i=0; ((!isspace(input[*pi])) && (input[*pi])); i++) {
 word[i] = input[*pi];
 (*pi)++;
 }
 word[i] = '\0';
}
void get_num(char *input, int *pi, char *num)
/* extract a number (as characters) from an input line */

{
 int i = 0;
 while ((isspace(input[*pi])) (input[*pi] == ',')) /* find char */
 (*pi)++;
 while (isdigit(input[*pi]))
 num[i++] = input[(*pi)++];
 num[i] = '\0';
}
void get_coords(char *input, int *pi, double coords[][2], int size, FILE *fp)
/* extract the (X,Y) coords from an input line,
 and store them in the coords array */
{
 char num[MAXLINE]; /* a X or Y part of a coord */
 int k=0;
 while ((input[*pi]) && (k < size)) {
 get_num(input, pi, num);
 if (num[0] != '\0')
 coords[k][X] = (double) atoi(num); /* X part of a coord */
 else
 break;
 get_num(input, pi, num);
 if (num[0] != '\0') {
 coords[k][Y] = (double) atoi(num); /* Y part of a coord */
 k++;
 }
 else {
 fclose(fp);
 servererr("Missing Y value in a co-ordinate.");
 }
 }
 if (k < size)
 coords[k][X] = -1; /* sentinel */
 else
 coords[k-1][X] = -1; /* overwrite last coord */
}
/* HTML print utilities */
void servererr(char *msg)
{
 printf("Content-type: text/html%c%c",LF,LF);
 printf("<title>Mapping Server Error</title>");
 printf("<h1>Mapping Server Error</h1>");
 printf("This server encountered an error:<p>");
 printf("%s", msg);
 exit(-1);
}
void sendmesg(char *url)
{
 if (strchr(url, ':')) /* It is a full URL */
 printf("Location: ");
 else /* It is a virtual URL */
 printf("Location: http://%s", getenv("SERVER_NAME")); 
 printf("%s%c%c",url,LF,LF);
 exit(0);
}
/* clickpt locating functions */
double sdist_apart(double clickpt[2], double coords[MAXVERTS][2])
/* Find the square of the distance between the click point
 and the coords of a point hot spot. 
 Don't need to take square root */

{
 return ((clickpt[X] - coords[0][X]) * (clickpt[X] - coords[0][X])) + 
 ((clickpt[Y] - coords[0][Y]) * (clickpt[Y] - coords[0][Y]));
}
int clickpt_in_rect(double clickpt[2], double coords[MAXVERTS][2])
/* is the clickpt inside a rectangle? */
{
 return ((clickpt[X] >= coords[0][X] && clickpt[X] <= coords[1][X]) &&
 (clickpt[Y] >= coords[0][Y] && clickpt[Y] <= coords[1][Y]));
}
int clickpt_in_circle(double clickpt[2], double coords[MAXVERTS][2])
/* is the clickpt inside a circle? */
{
 int radius1, radius2;
 radius1 = ((coords[0][Y] - coords[1][Y]) * (coords[0][Y] - coords[1][Y])) +
 ((coords[0][X] - coords[1][X]) * (coords[0][X] - coords[1][X]));
 radius2 = ((coords[0][Y] - clickpt[Y]) * (coords[0][Y] - clickpt[Y])) +
 ((coords[0][X] - clickpt[X]) * (coords[0][X] - clickpt[X]));
 return (radius2 <= radius1);
}
int clickpt_in_poly(double clickpt[2], double pgon[MAXVERTS][2])
/* is the clickpt inside a polygon? */
{
 int i, numverts, inside_flag, xflag0;
 int crossings;
 double *p, *stop;
 double tx, ty, y;
 for (i = 0; pgon[i][X] != -1 && i < MAXVERTS; i++)
 ;
 numverts = i;
 crossings = 0;
 tx = clickpt[X];
 ty = clickpt[Y];
 y = pgon[numverts - 1][Y];
 p = (double *) pgon + 1;
 if ((y >= ty) != (*p >= ty)) {
 if ((xflag0 = (pgon[numverts - 1][X] >= tx)) ==
 (*(double *) pgon >= tx)) {
 if (xflag0)
 crossings++;
 }
 else {
 crossings += (pgon[numverts - 1][X] - (y - ty) *
 (*(double *) pgon - pgon[numverts - 1][X]) /
 (*p - y)) >= tx;
 }
 }
 stop = pgon[numverts];
 for (y = *p, p += 2; p < stop; y = *p, p += 2) {
 if (y >= ty) {
 while ((p < stop) && (*p >= ty))
 p += 2;
 if (p >= stop)
 break;
 if ((xflag0 = (*(p - 3) >= tx)) == (*(p - 1) >= tx)) {
 if (xflag0)
 crossings++;
 }
 else {

 crossings += (*(p - 3) - (*(p - 2) - ty) *
 (*(p - 1) - *(p - 3)) / (*p - *(p - 2))) >= tx;
 }
 }
 else {
 while ((p < stop) && (*p < ty))
 p += 2;
 if (p >= stop)
 break;
 if ((xflag0 = (*(p - 3) >= tx)) == (*(p - 1) >= tx)) {
 if (xflag0)
 crossings++;
 }
 else {
 crossings += (*(p - 3) - (*(p - 2) - ty) *
 (*(p - 1) - *(p - 3)) / (*p - *(p - 2))) >= tx;
 }
 }
 }
 inside_flag = crossings & 0x01;
 return (insid










































Installing Windows 95 Programs


Your own installation program for Windows 95, Windows 3.1, and Windows NT




Al Williams


Al is the author of several books, including OLE 2.0 and DDE Distilled and
Commando Windows Programming (both from Addison-Wesley). Look for his latest
book, Steal This Code! in bookstores soon. You can contact Al on CompuServe at
72010,3574.


Gone are the days when applications could get by with simple batch scripts for
installing programs. Today, you must have a full-blown installation program to
keep Windows users happy. While there are a number of commercially available
installation packages--Stirling's InstallShield, Jetstream's InstallWizard,
Knowledge Dynamics' Winstall, Sax's Setup Wizard, and Microsoft's SetWizard
(included with Visual C++ and Visual Basic) come to mind--sometimes you'll
need to write your own installation programs. When it comes to this, however,
there's good and bad news. 
The good news is that Windows provides reasonable support for installation,
including version checking and decompression. The bad news is these functions
are quirky and poorly documented. In this article, I'll present a Windows 95
toolkit you can use to write high-quality installation programs in C, C++, and
other languages. You won't need to learn a new scripting language--this
installer uses C or C++. In the process, I'll examine tabbed dialogs,
Microsoft's new property sheets. 


Windows Support 


The VER.DLL library (a standard part of Windows) has three ways to directly
support installation programs:
The VERSIONINFO resource statement, which records information about EXE-format
files (including DLLs).
VerFindFile(), which locates appropriate places to install files.
VerInstallFile(), which decompresses and copies files to their destinations.
The VERSIONINFO resource is a special entry in an RC file that specifies
information about an executable module (EXEs, DLLs, and so on). Among the
items you can store are the version number, language the module uses, file
type, original filename, and copyright notice.
Listing One shows a typical VERSIONINFO entry. Notice that the information is
in a simple, text-based format. There is a fixed portion of data and a varying
portion enclosed in a BLOCK keyword. Each block has its own keywords and
syntax; see Table 1.
VerFindFile() recommends a location for a file you want to install. You supply
several parameters, and the function returns the recommended path and a status
word; see Table 2. The status word is 0 on success; otherwise, one or more
bits may be set:
VFF_CURNEDEST--a previous version of the file resides in a nonrecommended
directory.
VFF_FILEINUSE--a previous version of the file is in use; you won't be able to
remove it.
VFF_BUFFTOOSMALL--an input buffer was too small.
VerInstallFile() takes the information that VerFindFile() returns and uses it
to copy the file to its new location. You can compress the source file with
Microsoft's Compress utility (a DOS program) or leave it uncompressed. Table 3
shows more details about VerInstallFile().
The Ver... functions examine the version resource in your files, if it exists.
Any file that has resources can contain a version resource (DLL, EXE, FON, and
so on). Listing One shows a typical version resource containing copyright
information, version numbers, language identifiers, and other information. To
add a version resource to your files, simply place a version-resource block in
your RC file and compile as usual. The version resource contains version
information, other fixed values, and a variable part that begins with the
BLOCK keyword. These blocks contain string information that varies from
language to language. 


Install-Program Characteristics


An install program must perform these basic steps:
1. Determine the directory where you will install and the location of the
installation disk, and collect any options. Typically, you'll present a dialog
(with default values) where the user can enter this information. However, some
programs might install to a fixed location (the Windows directory, for
example).
2. Verify that there is enough disk space available.
3. Create any necessary directories and subdirectories (this may vary
depending on the options the user selects in step 1).
4. Make the first file to install the current file.
5. Call VerFindFile() for the current file. If many applications share the
file (for example, a common DLL), use the VFFF_ISSHAREDFILE flag. Set the
source filename to the complete path of the source file and the
application-directory parameter to your program's install directory. This call
will return the directory that currently holds the file (if applicable) and
the directory that VerFindFile() recommends for installation.
6. Call VerInstallFile() for the current file. Set the source to the complete
path of the file on the install disk and the destination path to the one
recommended in step 5. You also pass the current file's location from step 5
to VerInstallFile() so it can remove the old file. This function will
decompress the file (if required) and copy it to the proper location.
7. Check the return value from VerInstallFile(). If the VIF_TEMPFILE bit is
set, you must delete the temporary file that VerInstallFile() uses. You may
report other errors to the user or force the installation by repeating step 6
with VIFF_FORCEINSTALL. 
8. If there are more files, make the next one current and go to step 5.
9. Make any entries required in INI files or the system registry (optional).
10. Create icons for your program in the user's shell (optional).
While these steps appear straightforward, you can have problems. First, much
of the documentation for the Ver... functions is just wrong. Secondly, there
are problems when installing from multiple diskettes. Finally, compressed hard
drives can be difficult to manage.
The main problem with the VerFindFile() and VerInstallFile() documentation is
the description of the lpszFileName field. The documentation states that you
should not include the path for the file in this parameter--only the filename
and extension--but this is wrong. The culprit is LzOpenFile(), the underlying
function that opens the file (which may be compressed).
COMPRESS.EXE (the Windows SDK compression utility) changes a file's extension
by appending an underscore to it (or replacing the last extension character
with an underscore). For example, READ.ME becomes READ.ME_ and README.TXT
becomes README.TX_. When the Ver... functions use LzOpenFile() to open the
file README.TXT, it looks for the filename in the current directory. Since the
file is now README.TX_, it doesn't find it. It then looks for the file in the
Windows directory, Windows system directory, all the directories on the path,
and any mapped network drives. If it doesn't find the file, it then looks for
README.TX_, which it will find. 
Unfortunately, it is a good bet that the file README.TXT does exist in one of
the myriad places LzOpenFile() searches. Then, the Ver... functions will
cheerfully copy the alien file to your install directory and remove it from
its current location. This is an endless source of confusion for users when
they open your README.TXT file and it talks about a Borland product (or
whatever LzOpenFile() found). The solution to this problem is simple: Ignore
the documentation. Always specify a complete path to the install file.
Another documentation bug is the behavior of VerFindFile() when
VFFF_ISSHAREDFILE is set. This flag allegedly causes the function to select
the Windows (or Windows system) directory as the file's destination. This
appears to not work under any current version of the VER.DLL library. It is
safer to manually select the Windows directory when necessary. You can use the
GetWindowsDirectory() and GetSystemDirectory() calls to find the names. (Win16
programs must use the GetWindowsDir() and GetSystemDir() calls instead.)


Using Multiple Disks



If your entire product fits on one disk, the install program can load from the
disk and run. If you need more than one disk, however, you may have problems
running the install program from the floppy. When Windows runs any program, it
may not keep all of it in memory at one time. It can go back to the disk any
time to load code segments, resources, or whatever. If your install program is
on disk 1 and the user has disk 2 in the drive, havoc will ensue. 
You can manipulate your resources and your DEF file to cause Windows to load
the entire program and lock it in memory. However, this doesn't work with
certain networking software (including software from Novell). The network
detects that you are running a program from the floppy and keeps the file
locked. When you change disks, DOS will report an invalid disk change error
and not allow you to continue. This happens even if Windows no longer needs
the file.
The only realistic alternative is to copy the install program to the hard disk
as the first step of installation. Then the installer can run from the hard
disk. If the install program can copy itself, the performance penalty is
minimal since Windows will use the same in-memory copy of the program.
However, the network will know that the floppy need not remain in the drive.


Problems with Compressed Hard Disks


Calculating the available disk space is always difficult. You can't simply
compute the free bytes on the disk and compare that number to the amount of
space your software requires because DOS stores files in clusters. If a hard
disk's cluster size is 2K, for example, each file stores in multiples of 2K. A
1-byte file takes 2K; so does a 1K file. A 3900-byte file requires 4K.
Therefore, you should compute space requirements for each file based on the
target hard disk's cluster size and compare the number of clusters.
However, this usually isn't worth the trouble. If the user employs a
disk-compression program (Stacker, for example), the numbers are meaningless.
Compression programs report an estimated free space that may be way off,
depending on the nature of the files you store. There is no way to know how
much actual space you have until you use it.
As a compromise, I usually compute an estimated free space and compare it to
the space available on the disk. If the free space is less than the estimate,
I'll warn users, but allow them to continue. Although not an optimal solution,
this technique works in practice.
If the user reinstalls your software, calculating free space is even more
difficult. You should take into account files that you will overwrite when
computing free space. With shared files, this can be difficult. Again, simply
warning the user if disk space appears low is safe and easy.


Encapsulating Installation


It isn't difficult to write an installation library that does most of the hard
work. You don't want a DLL for an installation program since you could have
problems loading a DLL at install time. A static link library works well and
allows you to produce a single install program with no external parts.
The installation library can easily incorporate all the usual Windows
trappings, including a WinMain() function and the main window class. You have
to supply a list of files to install, directories to create, and the
information required in step 1 of the installation program. The installation
library will provide support for adding program-manager groups, but you'll
have to make the calls yourself. You also can make calls to set up any INI
files (or the registry).


Using the Install Library


Figure 1 is an install program written with my install toolkit. The toolkit
provides a window (filled with your logo) that serves as a workspace. You can
provide multiple logos, and the installer will select the one 
that best fits the current display. Select a logo slightly smaller than the
screen size so you can leave room for the window title. For example, a logo
for a 640x480 screen might be 600x400.
Your program supplies a global string (TITLE) that the installer uses as the
main-window title. You can also provide an icon by using the APPICON ID in
your RC file. If you use multiple disks and want the installer to copy itself
to the user's hard disk, you should declare the hdcopy variable to True. If
you omit this declaration, the installer will run from the floppy.
The only other required item is the install() function; see Figure 2. From
here you can bring up a dialog, read configuration information, and so on.
When you are ready, call cw_Install() to initiate the installation; see Figure
3. The cw_Install() function returns either SUCCESS, CANCEL, or RETRY. If the
return value is SUCCESS, you are free to continue with the remainder of the
installation. If it is RETRY, you should get new installation values (for
example, show your dialog again) and call cw_Install() with the new
parameters. When cw_Install() returns CANCEL, you should display any error or
help messages you want and return. 
You can also provide a cw_InstallInit() function. The toolkit executes this
function before creating a window. To cancel the installation, return False
from this routine. Usually you don't need this function, but it is available
for special initialization.
The parameters to cw_Install() are straightforward. First, you pass the
parent-window handle (usually the same one the toolkit sends you). Next, you
supply the application directory (which need not exist) and the option bit
mask. Each file and directory can have an option bit mask. If the mask is 0,
the toolkit always installs the file. If the mask is not 0, the toolkit
installs the file only when the option bit mask you pass to cw_Install() has
the same bit set. For example, if you use a bit mask of 3, the installer will
process files marked with 0, 1, 2, or 3.
Following the option bits, you supply cw_Install() with a pointer to an array
of subdirectories (using the install_dirs structure; see Table 4) and the
length of the array. If you don't need subdirectories, you can use NULL and 0
for these parameters. Next is a pointer to an array of _inst_files structures
(and the length of the array). This structure contains six fields: an option
bit mask, source filename, destination filename, destination directory, and
flags for the VerFindFile() and VerInstallFile() flags.
The last two parameters are the estimated size (in bytes) and a Boolean that
controls what happens when cw_Install() successfully completes. If you set
this variable to True, cw_Install() will display a message box when the
installation is successful. However, if you have more work to do (for example,
setting up INI files) you may want to display your own message box at the end.
To eliminate the message, set this parameter to False.
Some fields in the _inst_files structure can take special values. If the
destination filename is NULL, the filename remains unchanged. The destination
directory is usually "." to signify the current directory. You can also
specify a subdirectory name, WINDIR for the Windows directory, or SYSDIR for
the Windows system directory.
If the first flag field is -1, the installer copies the file without using the
Ver... functions. This is useful for storing compressed files on the user's
hard disk. Also, if the source filename is NULL, the installer uses the flag
field as a disk number. It searches the install disk for a file named DISKn.ID
(where n is the number in the flag field). If it can't find the file, it
prompts the user to insert the disk. You can use this feature to prompt the
user for multiple disks. If the disk is already present, no prompt occurs.
Therefore, you will often start the list with a check for DISK1.ID.
The install toolkit offers several helper functions for use in your main
routine. The cw_VersionCheck() call checks which versions of DOS and Windows
are present. You can also send Program Manager DDE commands using
cw_ProgManCmd(). You use these commands to set up icons in Program Manager for
your application. (For more details about Program Manager's DDE interface, see
the accompanying text box entitled, "Adding Icons.") To modify INI files or
the registry, call the standard Windows API functions.


An Example Install Program


Listing Two is an example install program that copies a program called
"CoolWorx32" and two 32-bit example editors. (CoolWorx is a C/C++ toolkit I
wrote that uses the object-oriented nature of Windows programming to simplify
application development; see my article "Simplifying Windows Development," Dr.
Dobb's Sourcebook, March/April 1995.) The program uses two option bits: Option
1 installs the single document interface (SDI) editor, while option 2 installs
the multiple document interface (MDI) editor. The installer always installs
the CoolWorx support files that the editors use.
The _inst_files structure contains a list of all required files. The installer
always copies files marked with option 0. Otherwise, the installer only copies
files that have at least one option bit set that is also set in the selected
option word.
To select options and set the install directory, this installer uses a tabbed
dialog (Microsoft calls these "property sheets"), but an ordinary dialog box
would serve just as well. A single call to PropertySheet() works like the
ordinary DialogBox() call except that it manages multiple dialog templates.
Since the tabbed dialog has two pages, the installer has two dialog templates,
PG1 and PG2. Once the user enters the options and the install directory
information, the installer calls cw_install(), which does all the work
required to assign file locations, decompress files, and copy them to the hard
disk.
If the cw_install() function returns SUCCESS, and the user selected one or
both of the editor examples, the installer creates a program-manager group and
adds icons for the editors. The cw_ProgManCmd() function makes this easy.
Finally, the installer opens the README.TXT file using WORDPAD.EXE, the
Windows 95 replacement for NOTEPAD.EXE. Once it launches WORDPAD (using
WinExec()), the installer exits. You could easily use calls like
SetPrivateProfileString() or RegSetValue() to install INI files or registry
entries.


Using Tabbed Dialogs


Tabbed dialogs are simple to create under Windows 95. In Figure 1 (a typical
tabbed dialog), each tab represents a different dialog page. Clicking on the
tab makes the specified page active.
Each page in a tabbed dialog is an individual dialog. It has its own template
and can have a separate callback. Of course, you can route the callbacks for
each page to the same routine, if you prefer.
The three property-sheet calls are in PRSHT.H (although Microsoft may move
these to COMMCTRL.H later), but you will usually only use
one--PropertySheet(). This call is analogous to the standard DialogBox() call.
You supply a pointer to a PROPSHEETHEADER structure (see Table 5), which has a
pointer to an array of PROPSHEETPAGE structures (Table 6). These two
structures specify how the tabbed dialog behaves.
The PROPSHEETHEADER structure defines properties for the entire tabbed dialog.
You need to set the dwFlags field to indicate which fields you will use; see
Table 7. For example, if you want each tab to use an icon, you set the
PSH_USEICON or PSH_USEICONID flag and fill in the hIcon or pszIcon field to
select an icon.
If you don't set the PSH_PROPSHEETPAGE bit in the dwFlags field, you must
separately create each page using CreatePropertySheetPage(). This call returns
a handle that you can store in an array to use in the phpage field of the
PROPSHEETHEADER structure. Usually, you simply set the PSH_PROPSHEET flag and
use an array of PROPSHEETPAGE structures (in the ppsp field) instead of
handles.
Each PROPSHEETPAGE structure also has a dwFlags field; see Table 8 for a list
of values. You can provide a resource template name in the pszTemplate field
or a dynamic-dialog template in the pResource field. If you use a
dynamic-dialog template, you must set the PSP_DLGINDIRECT flag in the dwFlags
field.
If you set the PSP_USETITLE flag, you can also set a title for the tab in the
pszTitle field. If you don't set the flag, the dialog's caption becomes the
tab title. The dialog callback is exactly like an ordinary dialog function and
is set in the pfnDlgProc field. The other fields allow you to set a function
to run before Windows destroys the page.



Messages


The property sheet accepts several messages (by way of macros) and can send
you several WM_NOTIFY messages; see Table 9 and Table 10. You usually won't
use most of these. The PSM_SETCURSEL message allows you to make a page active,
and PSM_PRESSBUTTON lets you programmatically push any of the tabbed dialog
buttons (not the buttons in your dialog template).
If you haven't used any of Microsoft's common controls (see "Windows 95 Common
Controls," by Vinod Anantharaman, DDJ, May 1995) you may not be familiar with
WM_NOTIFY. The new controls send WM_NOTIFY to alert you of noninput events. In
the past, notifications came with WM_COMMAND messages (for example, the
EN_CHANGED notification). WM_NOTIFY messages pass a pointer to a structure in
lParam. The first part of this structure corresponds to a NMHDR structure (see
Table 11). By examining the code field in this structure, you can determine
the type of notification. For property sheets, these codes begin with PSN_
(see PRSHT.H). Then you cast the structure pointer to a more-specific
structure. For property sheets, you don't need a special structure; just cast
lParam to an LPNMHDR and examine the code field.
You can also control the state of the buttons with the PSM_CHANGED,
PSM_UNCHANGED, and PSM_CANCELTOCLOSE messages. You should use these to inform
the user about the state of the dialog.
You can catch notifications that tell you when Windows activates or
deactivates a page (PSN_SETACTIVE and PSN_KILLACTIVE). Other notifications
inform you when the user presses certain buttons; see Table 9.
There are a few tricks to using tabbed dialogs:
Use the WS_CHILD style for each dialog template.
Don't place OK and Cancel buttons in the template--the property sheet will add
these.
Make certain each dialog template is the same size.
If you have a common item on multiple pages, be sure it lines up exactly in
each dialog template.


Inside the Install Library


Most of the install library is straightforward. It is detailed in Listing
Three; listings are available electronically, see "Availability," page 3. The
WinMain() function creates a window that contains the logo bitmap and calls
your install() routine. Later, you call cw_Install() to do all the messy work.
When your install() routine returns, the toolkit cleans up and exits.
If you set the hdcopy flag, the installer checks its command-line arguments.
If there are none, it copies itself to the temporary directory and runs the
new copy with two arguments: the source directory and the name of the
temporary executable. When it detects these arguments, the installer continues
with normal processing. When processing completes, the installer removes
itself from the temporary directory. This prevents problems with network
locking.
The only special part of the install toolkit is the cw_Install() function. The
other portions are straightforward Windows programming. The cw_Install() call
walks through the directory array creating directories, then walks through the
file list calling VerFindFile() and VerInstallFile() repeatedly. Of course,
the calls only occur if the correct option bits are set.
Adding Icons
Program Manager (and Program Manager replacements) provides a DDE interface
that allows install programs to manage groups and icons. The Windows 95
default shell supports this interface by adding pseudogroups and icons to the
Start-button menu.
Commands are passed to Program Manager via DDE and are enclosed in square
brackets. The most common commands are as follows:
CreateGroup(name,[path]) creates a new group with the specified name and
optional group filename.
ShowGroup(name,cmd) is the display group. cmd can range from 1 to 8: 1
activates and displays the group window, 2 activates the group as an icon, 3
activates and maximizes the group window, 4 restores the window, 5 activates
the window in place, 6 minimizes the window, 7 displays the group as an icon
without activation, and 8 restores the group without activation.
DeleteGroup(name) deletes a group and its contents.
Reload([group]) reloads a group from its group file. If you specify no group,
PROGMAN reloads all groups.
AddItem(cmd,[name],[icon_file],[icon_index],[x]
[y],[start_dir],[hotkey],[minimize]) adds a new item to the current group. The
parameters are: cmd, the command line; name, the item name; icon_file, the
file that contains the item's icon; icon_index, the icon to use from the
icon_file; x, the new item's x-coordinate (if this parameter is present, the y
parameter is not optional); y, the new item's y-coordinate; start_dir, the
working directory; hotkey, the item's shortcut key; and minimize, the item's
run state.
DeleteItem(item) removes the item from the current group.
ReplaceItem(item) removes the item from the current group and marks its
position for use by the next AddItem command.
ExitProgMan(save) exits the program manager. The save parameter specifies
whether PROGMAN should save its current state. If PROGMAN is your default
shell, this command won't work.
You can send these commands to Program Manager using the cw_ProgManCmd()
function, which saves you from worrying about the details behind the DDE
transmission. You'll see an example near the end of Listing Two.
--A.W.
Table 1: Fixed VERSIONINFO resource.
Field Description
FILEVERSION Version of this file.
PRODUCTVERSION Version of entire product.
FILEFLAGSMASK Contains a 1 for valid bits in the FILEFLAGS field.
FILEFLAGS File attributes (for example, VS_FF_DEBUG).
FILEOS Operating system (for example VOS_WINDOWS32).
FILETYPE File type (for example, VFT_APP).
FILESUBTYPE Type of driver, font, or VxD (if applicable).
Table 2: VerFindFile() parameters.
 Parameter Description
dwFlags 0 for normal file; VFFF_ISSHAREDFILE for shared files.
szFileName Filename.
szWinDir Windows directory.
szAppDir Application's directory (destination).
szCurDir VerFindFile places the file's current location
 in this variable.
lpuCurDirLen Length of szCurDir array.
szDestDir VerFindFile places recommended install directory
 in this variable.
lpuDestDirLen Length of szDestDir array.
Table 3: VerInstallFile() parameters.
Parameter Description
dwFlags Control flags (0, VIFF_FORCEINSTALL, or VIFF_DONTDELETEOLD).

szSrcFileName Source filename.
szDestFileName Destination name.
szSrcDir Source directory.
szDestDir Destination directory.
szCurDir Directory where file currently resides.
szTmpFile Temporary filename possibly returned by VerInstallFile().
lpuTmpFileLen Length of above array.
Table 4: The _inst_files structure.
Field Description
bitmask Option bits that apply to this file.
srcfile Source filename.*
dstfile Destination filename.*
dstdir Destination directory.*
flags Flags set to VerFindFile().*
cflags Flags set to VerInstallFile().
* May take special values.
Table 5: PROPSHEETHEADER structure.
 Field Description
dwSize Size of structure.
dwFlags PSH flags (see Table 6).
hwndParent Parent window.
hInstance hInstance that contains resources.
hIcon Icon handle (if PSH_USEHICON is set).
pszIcon Icon name (if PSH_USEICONID is set).
pszCaption Title to use when PSH_PROPTITLE is set.
nPages Number of tabs.
nStartPage Beginning tab number (if PSH_USEPSTARTPAGE is not set).
pStartPage Name of beginning tab (if PSH_USEPSTARTPAGE is set).
ppsp Pointer to array of property-sheet structures (if
 PSH_PROPSHEETPAGE is set).
phpage Pointer to array of property-sheet handles (if
 PSH_PROPSHEETPAGE is not set).
pfnCallback Global property-sheet callback.
Table 6: PROPSHEETPAGE structure.
Field Description 
dwSize Size of structure.
dwFlags PSP_ flags (see Table 7).
hInstance Instance handle for resources.
pszTemplate Name of dialog template (if PSP_DLGINDIRECT is not set).
pResource Pointer to resource (if PSP_DLGINDIRECT is set).
hIcon Icon handle (if PSP_USEICON is set).
pszIcon Icon name (if PSP_USEICONID is set).
pszTitle Name to override template's title.
pfnDlgProc Dialog callback.
lParam 32 bits of user-defined data.
pfnCallback Called before destruction if PSP_USERELEASEFUNC is set.
pcRefParent Pointer to reference count (used with PSP_USERREFPARENT flag).
Table 7: Flag bits for PROPSHEETHEADER.
Bit Description
PSH_DEFAULT 0-no bits set.
PSH_PROPTITLE Prepend "Properties for" ahead of title.
PSH_USEHICON Use icon handle.
PSH_USEICONID Use icon name or ID.
PSH_PROPSHEETPAGE Use ppsp field instead of phpage field.
PSH_MULTILINETABS Use multiline tabs.
PSH_WIZARD Suppress tabs and treat dialog as a wizard.
PSH_USEPSTARTPAGE Use pStartPage field.
PSH_NOAPPLYNOW Suppress the Apply Now button.
PSH_USECALLBACK Enable global callback.

PSH_HASHELP Support help.
Table 8: Flag bits for PROPSHEETPAGE.
Bit Description
PSP_DEFAULT 0-no bits set.
PSP_DLGINDIRECT Use indirect dialog resources.
PSP_USEHICON Use icon handle.
PSP_USEICONID Use resource ID for icon.
PSP_USETITLE Use override title.
PSP_USEREFPARENT Use reference-count variable.
PSP_USECALLBACK Use release callback.
PSH_HASHELP Support help.
Table 9: Property-sheet notifications.
 Code Description
PSN_SETACTIVE Page receiving focus.
PSN_KILLACTIVE Current page is losing focus.
PSN_APPLY OK or Apply button pressed.
PSN_RESET Cancel button pressed (too late to stop).
PSN_HASHELP Query page to see if it supports help.
PSN_QUERYCANCEL Cancel button pressed (possible to abort).
PSN_WIZBACK Back button pressed (wizard only).
PSN_WIZNEXT Next button pressed (wizard only).
PSN_WIZFINISH Finish button pressed (wizard only).
Table 10: Commonly used property-sheet messages.
Message Pseudocall Description
PSM_SETCURSEL PropSheet_SetCurSel Sets active page by
 handle.
PSM_SETCURSELID PropSheet_SetCurSelByID Sets active page by
 ID.
PSM_CHANGED PropSheet_Changed Enable Apply Now
 button.
PSM_RESTARTWINDOWS PropSheet_RestartWindows Ask Windows to
 restart when property
 sheet closes.
PSM_REBOOTSYSTEM PropSheet_RebootSystem Ask Windows to reboot
 when property sheet
 closes.
PSM_CANCELTOCLOSE PropSheet_CancelToClose Change "Cancel" button
 to "Close."
PSM_QUERYSIBLINGS PropSheet_QuerySiblings Forward message to all
 initialized pages
 until one returns
 nonzero; return the
 value.
PSM_UNCHANGED PropSheet_UnChanged Disable Apply Now
 button.
PSM_APPLY PropSheet_Apply Do the same processing
 as if the Apply Now
 button were depressed.
PSM_SETTITLE PropSheet_SetTitle Sets dialog title.
PSM_SETWIZBUTTONS PropSheet_SetWizButtons Enable specific wizard
 or button (wizards
 PropSheet_SetWizButtonsNow only); PropSheet_____LINEEND____
 SetWizButtonsNow()
 uses SendMessage()
 instead of
 PostMessage().
PSM_PRESSBUTTON PropSheet_PressButton Programatically press
 a button.
PSM_SETFINISHTEXT PropSheet_SetFinishText Set text on "Finish"

 button (wizards only).
PSM_GETTABCONTROL PropSheet_GetTabControl Get handle to tab
 control.
Table 11: Notification header structure.
Field Description
hwndFrom Window handle of originating control.
idFrom Window ID of originating control.
code Specific notification.
Figure 1: The installer in action.
Figure 2: The install() function.
int install(HWND mainwin, HANDLE hInst, LPSTR src);
where: mainwin=window handle for main window hInst=install program's instance
handle src=string containing the install source directory
The install program ignores the return value from install().
Figure 3: The cw_Install() function.
int cw_Install(HWND w, LPSTR appdir, DWORD bitmask,
 struct install_dirs *subdir, int nrdirs,
 struct _inst_files *inst_files, int nrfiles,
 unsigned long space, BOOL mbflag);
where: w=install window appdir=destination directory bitmask=option bits (see
text) subdir=list of subdirectories to create nrdirs=number of elements in
subdir inst_files=list of files nrfiles=number of elements in inst_files
space=projected size of installed components mbflag=FALSE to disable success
message box
return values: SUCCESS=installation complete CANCEL=installation canceled
RETRY=installation failed

Listing One
VS_VERSION_INFO VERSIONINFO
FILEVERSION 1, 0, 0, 0
PRODUCTVERSION 1, 0, 0, 0
FILEOS VOS_DOS_WINDOWS32
FILETYPE VFT_DLL
{
 BLOCK "StringFileInfo"
 {
 BLOCK "040904E4"
 {
 VALUE "CompanyName", 
 "Al Williams Computing\000\000"
 VALUE "FileDescription", 
 "CoolWorx Application Framework\000"
 VALUE "FileVersion", "1.00\000\000"
 VALUE "InternalName", 
 "CoolWorx\000"
 VALUE "LegalCopyright", 
 "Copyright ) 1994 by Al Williams\ 
Computing\000\000"
 VALUE "OriginalFilename", 
 "COOLWORX.DLL\000"
 }
 }
}

Listing Two
/* CoolWorx Install program */
#include <windows.h>
#include <prsht.h> /* could change */
#include "cwinstal.h"
#include "install.h"
#include <stdlib.h>
/* You must declare the title string */
char TITLE[]="CoolWorx Install"; // title
/* The code in this source file uses this string as
 a default */

#define DEFDIR "C:\\COOLWORX"
/* Subdirectories */
struct install_dirs subdirs[]=
 {
 {0,"BIN"},
 {1,"SDI"},
 {2,"MDI"}
 };
/* Number of subdirectories */
#define NRDIRS sizeof(subdirs)/sizeof(subdirs[0])
/* File list. If srcfile is NULL install will prompt
 for disk change using flags as disk number --
 verify that diskN.id file exists on it.
 If dstdir is WINDIR, then use Windows directory
 if dstdir is SYSDIR, then use System directory
 If dstdir is ".", then use default install directory
 If flags are -1 then force straight copy
 (no decompress, etc.) */
struct _inst_files inst_files[]=
 {
 { 0,NULL,NULL,NULL,1,0 },
 { 0,"README.TXT",NULL,".",0,0 },
 { 0,"COOLWORX.DLL",NULL,WINDIR,VFFF_ISSHAREDFILE,0},
 { 0,"CBUTTON.DLL",NULL,WINDIR,VFFF_ISSHAREDFILE,0},
 { 0,"CWMISC.DLL",NULL,WINDIR,VFFF_ISSHAREDFILE,0},
 { 2,"COOLEDIT.EXE",NULL,"MDI",0,0},
 { 1,"SDIEDIT.EXE",NULL,"SDI",0,0}
 };
/* Number of files */
#define NRFILES (sizeof(inst_files)/sizeof(inst_files[0]))
char appdir[_MAX_PATH];
char opts[2];
/* 1st tab dialog proc */
BOOL pg1proc(HWND dlg,UINT cmd,WPARAM wParam,LPARAM lParam)
 {
 switch (cmd)
 {
 case WM_INITDIALOG:
 SetDlgItemText(dlg,DIR,DEFDIR);
 break;
 case WM_NOTIFY:
 {
 LPNMHDR nh=(LPNMHDR)lParam;
 switch (nh->code)
 {
/* Cancel */
 case PSN_RESET:
 EndDialog(dlg,FALSE);
 return TRUE;
/* OK */
 case PSN_APPLY:
/* Return TRUE */
 SetWindowLong(dlg,DWL_MSGRESULT,TRUE);
 GetDlgItemText(dlg,DIR,appdir,sizeof(appdir));
 EndDialog(dlg,TRUE);
 return TRUE;
 }
 break;
 }

 }
 return FALSE;
 }
/* 2nd tab dialog proc */
BOOL pg2proc(HWND dlg,UINT cmd,WPARAM wParam,LPARAM lParam)
 {
 switch (cmd)
 {
 case WM_INITDIALOG:
/* Init state */
 SendDlgItemMessage(dlg,SDI,BM_SETCHECK,
 opts[0]=='X',0);
 SendDlgItemMessage(dlg,MDI,BM_SETCHECK,
 opts[1]=='X',0);
 break;
 case WM_NOTIFY:
 {
 LPNMHDR nh=(LPNMHDR)lParam;
 switch (nh->code)
 {
 case PSN_RESET:
 EndDialog(dlg,FALSE);
 return TRUE;
/* OK */
 case PSN_APPLY:
 SetWindowLong(dlg,DWL_MSGRESULT,TRUE);
 opts[0]=SendDlgItemMessage(dlg,SDI,
 BM_GETCHECK,0,0)?'X':' ';
 opts[1]=SendDlgItemMessage(dlg,MDI,
 BM_GETCHECK,0,0)?'X':' ';
 EndDialog(dlg,TRUE);
 return TRUE;
 }
 break;
 }
 }
 return FALSE;
 }
/* Your main install routine must be named install!
 w=main window handle
 hInst=instance handle of install program
 srcdir=source directory for install (e.g., a:\) */
install(HWND w,HANDLE hInst,LPSTR srcdir)
 {
 int i;
 DWORD bitmask; /* option bitmask */
 char tmpfile[_MAX_PATH];
/* Tabbed dialog body */
 PROPSHEETPAGE pages[2]=
 {
 {sizeof(PROPSHEETPAGE),0,0,"PG1",
 NULL,NULL,(DLGPROC)pg1proc,0,NULL,NULL},
 {sizeof(PROPSHEETPAGE),0,0,"PG2",
 NULL,NULL,(DLGPROC)pg2proc,0,NULL,NULL}
 };
/* Tabbed dialog header */
 PROPSHEETHEADER psh={sizeof(PROPSHEETHEADER),
 PSH_PROPSHEETPAGE,NULL,NULL,NULL,
 "CoolWorx32 Install",

 2,0,pages };
/* Set defaults */
 lstrcpy(appdir,DEFDIR);
 opts[0]='X';
 opts[1]='X';
 psh.hInstance=pages[0].hInstance=
 pages[1].hInstance=hInst;
 psh.hwndParent=w;
/* Come here if install returns RETRY */
retry:
 if (!PropertySheet(&psh))
 {
/* Come here if install is cancelled */
cancelinst:
 MessageBox(w,"Installation Cancelled",
 NULL,MB_OKMB_ICONSTOP);
 return 1;
 }
 UpdateWindow(w); /* make sure window updates */
 bitmask=0;
/* decode options to bitmask */
 for (i=0;i<sizeof(opts);i++)
 if (opts[i]=='X') bitmask=1<<i;
/* Call installer -- pass main window, app directory and options */
 switch (cw_Install(w,appdir,bitmask,subdirs,
 NRDIRS,inst_files,NRFILES,0,FALSE))
 {
 case RETRY:
 goto retry; // show options again
 case CANCEL:
 goto cancelinst; // forget it
 }
/* success */
/* Set up INI file, groups, etc.*/
 if (opts[0]=='X'opts[1]=='X'
 {
 cw_ProgManCmd("[CreateGroup(CoolWorx32 Alpha,)]");
 cw_ProgManCmd("[ShowGroup(CoolWorx32 Alpha,1)]");
 if (opts[0]=='X')
 {
 cw_ProgManCmd("ReplaceItem(SDI Editor)]";
 wsprintf(tmpfile,
 "[AddItem(%s\\SDI\\SDIEDIT,SDI Editor,,,,,)]",
 appdir);
 cw_ProgManCmd(tmpfile);
 }
 if (opts[1]=='X')
 {
 cw_ProgManCmd("ReplaceItem(CoolEdit)]";
 wsprintf(tmpfile,
 "[AddItem(%s\\MDI\\COOLEDIT,CoolEdit,,,,,)]",
 appdir);
 cw_ProgManCmd(tmpfile);
 }
 WinExec("WORDPAD README.TXT",SW_SHOW);
 MessageBox(w,"Installation Complete","Notice",
 MB_OKMB_ICONEXCLAMATION);
 }
 }


Listing Three
int WINAPI cw_Install(HWND w,LPSTR appdir,DWORD bitmask,
 struct install_dirs *subdirs,int NRDIRS,
 struct _inst_files *inst_files,int NRFILES,unsigned long space, BOOL mbflag)
 {
 int i;
 unsigned cdlen;
 unsigned inslen;
 char tmpfile[_MAX_PATH];
 char curdir[_MAX_PATH];
 char instdir[_MAX_PATH];
 char srcf[_MAX_PATH];
 unsigned tmplen;
 if (i=cw_ChdirEx(appdir))
 {
 .
 .
 .
 }
if (space)
 {
 unsigned long freesp;
#ifdef _WIN32
 DWORD secper,bps,freec,tclust;
#else
 struct diskfree_t df;
#endif
/* Compute free space */
#ifdef _WIN32
 GetDiskFreeSpace(NULL,&secper,&bps,&freec,&tclust);
 freesp=(unsigned long)secper*bps*freec;
#else
 _dos_getdiskfree(0,&df);
 freesp=
 (unsigned long)df.avail_clusters*
 df.sectors_per_cluster*df.bytes_per_sector;
#endif
 if (freesp/1024<space)
 {
 int id=
 MessageBox(w,"You may not have enough free disk space.\n"
 .
 .
 .
 }
 }
/* Set up progress bar */
 if (!prog)
 prog=cw_ProgressDlg(w,"Installing...","",
 NRDIRS+NRFILES,TRUE);
 if (prog) // position progress bar
 {
 RECT r,dr;
 int x,y,h,xw;
 GetClientRect(w,&r);
 ClientToScreen(w,(LPPOINT)&r);
 ClientToScreen(w,((LPPOINT)&r)+1);
 GetWindowRect(prog,&dr);

 x=((r.right-r.left)-(xw=dr.right-dr.left))/2;
 y=((r.bottom-r.top)-(h=dr.bottom-dr.top))/2;
 MoveWindow(prog,x+r.left,y+r.top,xw,h,TRUE);
 }
/* Need to make directory tree here */
 for (i=0;i<NRDIRS;i++)
 {
 if (subdirs[i].bitmask)
 if (!(subdirs[i].bitmask&bitmask)) continue;
 if (prog)
 {
 char ptitle[_MAX_PATH+33];
 wsprintf(ptitle,"Creating subdirectory %s",
 (LPSTR)subdirs[i].dir);
 if (cw_ProgressSet(prog,i,ptitle)) return -1;
 UpdateWindow(w);
 }
 if (access(subdirs[i].dir,0)&&mkdir(subdirs[i].dir))
 {
 MessageBox(w,"Can't create subdirectory.",
 subdirs[i].dir,MB_OKMB_ICONSTOP);
 return -2;
 }
 }
/* Install files */
 for (i=0;i<NRFILES;i++)
 {
 UINT vrv;
 DWORD vrvi;
 char *dst;
/* Skip file if install bits don't match */
 if (inst_files[i].bitmask)
 if (!(inst_files[i].bitmask&bitmask)) continue;
 if (!inst_files[i].srcfile)
 {
 /* Special... prompt for new disk */
 static char msg[66],idfile[_MAX_PATH];
 if (srcdir[lstrlen(srcdir)-1]!='\\') lstrcat(srcdir,"\\");
 wsprintf(msg,"Please insert disk #%d",inst_files[i].flags);
 if (srcdir[lstrlen(srcdir)-1]=='\\')
 srcdir[lstrlen(srcdir)-1]='\0';
 wsprintf(idfile,"%s\\DISK%d.ID",(LPSTR)srcdir,
 inst_files[i].flags);
 while (access(idfile,0))
 {
 struct dlgboxparam pblk;
 pblk.msg=msg;
 pblk.srcdir=srcdir;
 pblk.sizsrc=sizeof(srcdir);
 if (DialogBoxParam(hInst,MAKEINTRESOURCE(DISKDLG),
 w,diskdlg,(DWORD)&pblk))
 return -1;
 UpdateWindow(w);
 if (srcdir[lstrlen(srcdir)-1]=='\\')
 srcdir[lstrlen(srcdir)-1]='\0';
 wsprintf(idfile,"%s\\DISK%d.ID",
 (LPSTR)srcdir,inst_files[i].flags);
 }
 continue;

 }
/* Get on with it */
 if (inst_files[i].dstfile)
 dst=inst_files[i].dstfile;
 else
 {
 dst=strrchr(inst_files[i].srcfile,'\\');
 if (dst) dst++; else dst=inst_files[i].srcfile;
 }
 if (prog)
 {
 char ptitle[_MAX_PATH+33];
 wsprintf(ptitle,"Installing %s",(LPSTR)dst);
 if (cw_ProgressSet(prog,i+NRDIRS,ptitle)) return -1;
 UpdateWindow(w);
 }
fretry:
 cdlen=sizeof(curdir);
 inslen=sizeof(instdir);
 if (inst_files[i].flags==0xFFFF)
 {
 /* copy unconditionally w/o decompress or checking */
 if (!copyfile(inst_files[i].dstdir,dst,
 srcdir,inst_files[i].srcfile))
 {
 int id=
 MessageBox(w,"Could not copy this file.\n"
 "You may be able to close other applications\n"
 "and then successfuly install.\n"
 "Retry?",inst_files[i].srcfile,
 MB_RETRYCANCELMB_ICONSTOP);
 if (id==IDRETRY) goto fretry;
 return -1;
 }
 }
 else
 {
 vrv=VerFindFile(inst_files[i].flags,dst,
 NULL,inst_files[i].dstdir?inst_files[i].dstdir:
 appdir,curdir,&cdlen,instdir,&inslen);
 if (vrv&VFF_FILEINUSE)
 {
 int id=
 MessageBox(w,"This file is in use and can't be"
 " installed.\n"
 "You may be able to close other applications\n"
 "and then successfuly install.\n"
 "Retry?",inst_files[i].srcfile,
 MB_RETRYCANCELMB_ICONSTOP);
 if (id==IDRETRY) goto fretry;
 return -1;
 }
 tmplen=sizeof(tmpfile);
 if (!lstrcmpi(curdir,srcdir)) *curdir='\0';
 if ((vrv&VFF_CURNEDEST)&&*curdir
 &&!(inst_files[i].flags&VFFF_ISSHAREDFILE))
 *curdir='\0';
 if (inst_files[i].dstdir)
 {

 if (inst_files[i].dstdir==(char *)1)
 getboot(instdir); /* not supported for WIN32 */
 else if (inst_files[i].dstdir==(char *)2)
 GetWindowsDirectory(instdir,sizeof(instdir));
 else
 lstrcpy(instdir,inst_files[i].dstdir);
 }
 *tmpfile='\0';
 lstrcpy(srcf,srcdir);
 if (srcf[lstrlen(srcf)-1]!='\\') lstrcat(srcf,"\\");
 lstrcat(srcf,inst_files[i].srcfile);
 vrvi=VerInstallFile(inst_files[i].cflags,srcf,
 dst,"",
 instdir,
 curdir,tmpfile,&tmplen);
 if (vrvi&VIF_TEMPFILE)
 {
 char dfile[_MAX_PATH];
 lstrcpy(dfile,instdir);
 if (dfile[lstrlen(dfile)-1]!='\\')
 lstrcat(dfile,"\\");
 lstrcat(dfile,tmpfile);
 unlink(dfile);
 }
 if (vrvi&(VIF_WRITEPROTVIF_FILEINUSE
 VIF_OUTOFSPACEVIF_ACCESSVIOLATION
 VIF_SHARINGVIOLATIONVIF_CANNOTCREATE
 VIF_CANNOTDELETEVIF_CANNOTRENAME
 VIF_OUTOFMEMORYVIF_CANNOTREADSRC
 VIF_CANNOTREADDST))
 {
 int m=0;
 int id;
 if (vrvi&VIF_WRITEPROT) m=1;
 if (vrvi&VIF_FILEINUSE) m=2;
 if (vrvi&VIF_OUTOFSPACE) m=3;
 if (vrvi&VIF_ACCESSVIOLATION) m=4;
 if (vrvi&VIF_SHARINGVIOLATION) m=5;
 if (vrvi&VIF_CANNOTCREATE) m=6;
 if (vrvi&VIF_CANNOTDELETE) m=6;
 if (vrvi&VIF_CANNOTRENAME) m=6;
 if (vrvi&VIF_OUTOFMEMORY) m=7;
 if (vrvi&VIF_CANNOTREADSRC) m=8;
 if (vrvi&VIF_CANNOTREADDST) m=6;
 id=MessageBox(w,vermessage[m]
 ,inst_files[i].srcfile,MB_RETRYCANCELMB_ICONSTOP);
 if (id==IDRETRY) goto fretry;
 return -1;
 }
 } /* end of else */
 }
 cw_ProgressSet(prog,0xFFFF,"Copying Complete");
 if (mbflag)
 MessageBox(w,"Installation complete",
 "Success",MB_OKMB_ICONEXCLAMATION);
 return 0;
 }


































































Simplifying C++ GUI Development


Packaged UI widgets, C++ templates, and standard UNIX utilities




Perry Scherer


Perry is a senior systems analyst for Arco Alaska Inc. and can be contacted at
laspws@aai.arco.com.


To a large extent, application development and GUI development have become one
and the same. The result is the advent of GUI builders that simplify the
process of building bit-mapped interfaces for systems such as Windows and
Motif. But even though GUI builders provide a framework for putting an
application together, you still need to deal with the details, particularly
those of the user interface--and that's where UI widgets come into play.
Widgets are prebuilt display objects and user-interface controls that can be
pulled intact into applications, saving months of tedious coding work. In our
case, for instance, widgets for x/y graphing and 3-D visualization made it
possible to construct a complex application in a very short time period. The
application, Unimovie, is a C++ application for postprocessing
reservoir-simulation results; see Figure 1. The application presents 2-D and
3-D views of oil-field values such as oil saturation, water saturation, and
pressure. Time-lapsed animations give engineers a sense of frontal advances,
viscous gravity fingers, and flow patterns, as if they were watching them in
real time--somewhat like a TV weather map.
In the past, Arco engineers used a custom-built Macintosh application to
perform the same basic tasks as Unimovie, but that program's execution time
for large simulations was unacceptable. Also, parts of the Mac application
were written in assembly language, which made maintenance difficult.
Advancements in RISC technology provided us with the incentive to write a
similar application that would run under UNIX. However, the time frame was
tight--we needed to complete Unimovie in six months or not bother at all. We
quickly realized that the key to completing the project on schedule was to
minimize C++ coding by using high-level tools for subclassing and GUI
development. Since I was familiar with the Tools.h++ environment from Rogue
Wave and the KL Group's off-the-shelf widget components for graphing,
plotting, and 3-D displays, these toolsets became the foundation upon which
all our application classes were written.
Unimovie, which was originally written for an RS/6000 running AIX, consists of
about 14,000 lines of C++ code and compiles into an approximately 2-MB
executable. We've subsequently ported Unimovie to Hewlett-Packard and Sun
workstations. On all three platforms, we used a vendor-provided C++ compiler.


How Unimovie Works


One type of Unimovie display is similar to a contour map, with values such as
oil saturation projected on the top of the contoured picture. In this
instance, however, the colors represent a value, not a depth. In other words,
it is really a 4-D plot: Three dimensions are spatial, and the fourth is the
value projected on the surface. Other types of Unimovie displays are
custom-developed, 2-D projections of these 3-D surfaces.
The illusion of animation is achieved with carefully synchronized time-out
calls within X Windows (XtAppAddTimeOut), which allow Unimovie to display the
same time steps simultaneously as several planes are coordinated and displayed
at once. Time-out calls let the program delay for short intervals, then draw
the next image in the time sequence. An interval ID is passed which tells the
program how long to wait, and the careful synchronization of these intervals
puts the entire display in motion. In this way, the user can see an x-y plane
and cross section at the same time, for example. Unimovie can display an
animated sequence of time steps for any plane--depth or cross section--in a
reservoir-simulation study.
Imagine a rectangular grid comprised of cells--like a spreadsheet, but with a
much greater resolution. Each cell within the grid can independently display a
spectrum of n different colors. These colors represent n uniform intervals in
the data range (from MIN to MAX), with each successive interval assigned the
next color in the spectrum.
As the reservoir-simulation data is read, the values in the range are
displayed with their corresponding colors. The colors change with time to
represent the dynamic properties of the simulation data. These changing color
patterns can be grasped much more intuitively than mere numbers or still
graphs.
Unimovie's Motif encapsulation is similar to that outlined by Douglas Young in
Object-Oriented Programming with C++ and OSF/Motif (Prentice-Hall, 1992). In
general, we created a parent class to handle just the GUI components. The
parent class was an abstract base class with several no-op virtual functions.
Children of this parent class were generated to fill in the details of the
virtual functionality. This approach may allow us to port to PowerPC or Intel
machines while replacing only the parent/GUI classes.


Using XRT Widgets


User-interface development can be complex and time consuming, and many
developers spend much of their time wrestling with low-level graphics
primitives and X Window idiosyncrasies. Well-designed widget sets such as XRT
allow X developers to concentrate on the content of their applications and not
the details of Xlib.
XRT widgets are object-oriented extensions to Motif. KL Group's XRT/3d for
Motif is the basis for Unimovie's 3-D surface plots and contour graphing. With
Unimovie, users display not only a 2-D view of an x-y plane or cross-section,
but a 3-D, depth-projected picture of a surface. The widget handles rotation,
scaling, annotation, and perspective calculations using built-in X
translations.
We used the XRT/graph widget to display the simulation data in various 2-D
graphs and bar charts. The 2-D graphs were used for simulation-history
matching with historical data retrieved from an Oracle database. XRT/graph
handles all of the typical business charting types, including x-y plots, pie
charts, and area graphs. The widget isolates the developer from tedious tasks
such as axis scaling and precision. We have used XRT/graph for
high-performance, real-time displays without any noticeable degradation in
speed.
XRT includes many convenience functions to simplify development. For example,
XrtMap handles all user-to-pixel coordinate conversions. Example 1 shows how
XrtMap is used to mark a selected point with crosshairs. Hardcopy output is
also encapsulated with convenience calls. Output can be produced in EPS, XWD,
or CGM formats. For our custom 2-D C++ classes, we developed our own
PostScript methods. C++ function overloading permits us to use the same method
name for both screen painting and 2-D PostScript output. The function-argument
signature determines whether data is drawn on the screen or to a
PostScript-output stream.
The XRT widgets are easier to learn than any other widget set I've used for X
Window development. XRT widgets are programmed using the same Xt-based
application-programming interface as OSF/Motif. In addition, XRT provides a
tool called "Builder" that allows the developer to interactively witness the
effects of various resource changes upon graph appearance. This is a valuable
tool during the application-prototyping phase or for producing
presentation-quality plots.
Additional procedures and methods handle special tasks, such as printing. The
toolkit includes sample code for creating print dialog boxes, making it
relatively easy to provide high-quality printing output in EPS, XWD, or CGM
formats. Note that PostScript methods have been built into the graphic 2-D
classes. This is another example of virtual functions/late binding. Function
overloading allows us to use the same method name for both screen painting and
PostScript printing. This greatly simplifies the main event loop, while the
gnarly details are hidden far below. The arguments of the method, in this
case, determine whether graphics are drawn on the screen or a PostScript file
is generated.
Compared to other GUI development tools I've used, the XRT tools make for a
quick learning curve because the XRT widgets are programmed using the same
Xt-based application-programming interface as OSF/Motif.


Using Tools.h++


The other primary tool we used was Rogue Wave's Tools.h++, a data-structure
toolkit that shields the developer from a lot of nitty-gritty C++ coding.
Tools.h++, which is available for DOS, Windows, OS/2, and UNIX, encapsulates
common C++ constructs such as strings, dates, linked lists, binary trees,
stacks, queues, and file I/O. All these constructs are presented in template
form.
Unimovie is based entirely upon templates. Tools.h++ requires only a few
class-member functions to be present in order to use their templates. With
Tools.h++, issues such as dynamic memory allocation and vector resizing can be
eliminated from the system-development equation. This is a significant step
toward code simplicity and bug reduction. Tools.h++ also provides classes that
mimic the Smalltalk paradigm.
The late binding of C++ was an important feature to us because Unimovie uses
different C++ classes to display different kinds of grids: rectangular, corner
point, and polar coordinate. The type of simulator data determines which grid
type is created. At run time, the same method names are used for all three
grid types, but the correct function signature is chosen using late binding.
Example 2 illustrates how Tools.h++'s templates and late-binding properties
allowed us to put all types of overlays (text, boxes, lines, and the like)
into one type of container. This example demonstrates the concept of resizing
all graphic overlays when the window container is resized.
Another example of late binding is method drawGrid, a virtual function defined
for all three grid types. At run time, the correct function will be picked.
Tools.h++ templates organize collections of these grids, with the virtual base
class as the parameterized type. Actual members of the collection are, of
course, children of the base class that have a definite grid type.


Database Access and File Management


One of the problems we encountered was obtaining adequate database-access
times with a relational data structure. Some of the simulations take up as
much as half a gigabyte of information. Unimovie parses through these giant
files to determine distinct file pointers to indicate the relevant time steps.
Based on these pointers, it builds a direct-access dictionary to read the
file.
This means there is an initial delay while Unimovie reads the file for the
first time. From then on, whenever you want to pull a particular time
sequence, you don't have to parse it from scratch. The information dictionary
of "hot spots" is stored in memory, which allows for very fast access times
for all subsequent movie runs.

We initially used Oracle's Oracle7 to store data, but we soon realized our
unique set of problems didn't lend themselves to the typical
relational-database approach. Consequently, we created our own object database
using RWBTreeOnDisk, Tools.h++'s binary disk tree. This is a persistent B-Tree
dictionary, in which both a dictionary and associated objects can be stored on
a disk. For example, if users are interested in several oil-production
quantities and want to graph just those quantities within the time series,
they can extract those quantities and label them for reference in this
object-oriented database. We found this custom database to be both powerful
and extensible. Additionally, retrieval is about two times faster than
similarly organized data in an Oracle7 database.


Standard UNIX Utilities


Standard UNIX utilities such as Lex and Yacc also simplified our development
efforts. We used these utilities to create time-valued overlay images, which
are analyzed, stored in ASCII files, and added to a Unimovie display. This
allows users to overlay lines, images, and 2-D pictures on top of the
oil-simulation images. For example, a user might want to depict a building (an
oil-production facility, for instance) on top of a reservoir map.
Lex breaks up the overlay ASCII files into keywords and parameters. Yacc
analyzes the sequence and patterns of keywords and parameters and processes
them into graphic primitives that can be placed appropriately within the C++
grids. (More recently, we've begun using Bison instead of a vendor-specific
Yacc because of Bison's consistency across multiple UNIX platforms and its
support of reentrant grammars. We're about to turn to Flex instead of a
vendor-specific Lex for similar reasons.)
Lex and Yacc are also used by Unimovie to read and process a general x/y plot
in an ASCII format. This allows us to reformat data from various sources
(including relational database and flat files) and compare it with the
simulation results.
In a related way, Lex and Yacc are used as a curve-formula processor. The
formula processor accepts normal syntax for curve addition, scalar
multiplication, vector multiplication, division, and so forth. Formulas are
stored in object-oriented databases (RWBTreeOnDisk) as text strings and
retrieved and parsed by Lex/Yacc as needed. Formula recursion (formulas within
formulas) is currently handled by providing multiple lexers with different
global name spaces, although the reentrant grammar of Bison may make this
approach obsolete.


Conclusion


After initial testing in Anchorage, word of our application spread throughout
Arco. Unimovie has since been installed at Arco offices around the world.
Engineers love the 3-D effects, screen presentations, quality of hardcopy
output, and the ability to display their simulation data more quickly than in
the past.
There are no run-time fees or royalties associated with the packaged widgets
or any other standard components used, which has made it economically feasible
to widely distribute the application.
There are many potential uses for this type of animation application. For
example, environmental-studies applications could be constructed that show
sequences, as in a groundwater-contamination study.
Figure 1: Sample Unimovie display.


For More Information


KL Group
260 King Street East, Third Floor
Toronto, ON
Canada M5A 1K3
800-663-4723
Rogue Wave Software
P.O. Box 2328
Corvallis, OR 97339
800-487-3217
Example 1: Use of the XrtMap call from XRT/graph. XrtMap allows users to
highlight the selected point with crosshairs. 
 ...
// Draws a crosshair about the selected point.
void xrtWindow::position_cross ( XButtonEvent* event )
{
 XrtMapResult m;
 if ( XrtMap ( graph, 1, event->x, event->y, &m ) == XRT_RGN_IN_GRAPH) {
 XtVaSetValues ( graph,
 XtNxrtXMarkerShow, TRUE,
 XtNxrtYMarkerShow, TRUE,
 XtNxrtXMarker, XrtFloatToArgVal(m.x),
 XtNxrtYMarker, XrtFloatToArgVal(m.y),
 NULL, NULL );
 }
 else {
 XtVaSetValues ( graph,
 XtNxrtXMarkerShow, FALSE,
 XtNxrtYMarkerShow, FALSE,
 NULL, NULL );
 }
}
 ...
Example 2: Code simplification resulting from Tools.h++ templates and the
late-binding properties that allow us to put all types of overlays (text,
boxes, lines, and so on) into one type of container. This sample demonstrates
the resizing of all graphic overlays when the window container is resized.
 ...
// Scale all the overlays.
 // Hash table iterator for RWTPtrHashTable template.

 RWTPtrHashTableIterator<anyOverlay> it(overlays);
 // 'anyOverlay' virtual base class for all overlays.
 anyOverlay* coll;
 // Simply loop through all overlays and resize them!
 // All overlays have method 'handleResize' and use
 // late binding to determine which method signature to use.
 while ( coll = it() )
 coll->handleResize (scaleHor, scaleVer, _minX, _minY, wellSize);
 ...






















































Visual Basic by Remote Control


Letting custom controls access foreign control properties




Robert Sardis


Bob is an independent software developer who has implemented many Windows
applications. He can be reached at 30 E. Huron, #1310, Chicago, IL 60611.


Visual Basic continues to establish itself as a popular development tool for
Microsoft Windows, and, predictably, developers continue to push the envelope
of what Visual Basic and its associated Control Development Kit (CDK) can do.
Recently, I needed to create a Visual Basic custom control that would mimic
the standard Data control, but for a non-SQL, non-ODBC database. The control
would have to communicate with the database via an API set, and communicate
with other Visual Basic controls in order to build queries and display query
results. I wanted all communication with the other controls to be handled
automatically at the C level, so that the people adding my control to their
Visual Basic forms would not have to do any Basic programming.
Communication with the database API was straightforward, but communication
with the other controls turned out to be unexpectedly challenging. I finally
got my control to work the way I wanted, but only after learning a lot about
some unusual and incompletely documented Visual Basic features. I ended up
writing a collection of functions to simplify access to control properties; I
used these functions to write a diagnostic VBX that acts like a Spy program
for Visual Basic controls, displaying a list of properties and their data
types, flags, and values.
In this article, I'll describe how to access control properties at run time,
and present my collection of access functions. I'll also describe my
diagnostic control, CTRLINFO.VBX, which is available electronically; see
"Availability," page 3.


Foreign-Control Properties


To make my custom control perform queries by example, I wanted it to poll all
TextBox fields on the form, using their DataField property for column name and
their Text property for column value. After the query result returned, I
wanted my control to replace the Text property with the value for that column
in the query result. This process required calling VBGetControlProperty() to
get the Text and DataField properties to build the query, and then calling
VBSetControlProperty() for the Text property to display the result.
The problem is that VBGetControlProperty() and VBSetControlProperty() require
that you know the index of the property being accessed. Property indexes are
created as #defines in a control's header file, and generally are not
documented or otherwise made available to the outside world. (In particular,
the index for a standard property is not the corresponding "IPROP_STD_"
constant defined in VBAPI.H.) And even if you do happen to know a property
index for a foreign control, the control's designers are free to change the
index from one version of the control to the next.
The only reliable way to determine a property index for a foreign control is
through direct access to the control's property array.


The Property Array


At design time, you define a control's properties by creating an array of
2-byte integers. The array entry for a custom property will be a "true" (near)
address of a PROPINFO structure; the entry for a standard property will be a
"fake" address defined in VBAPI.H. The array is part of the control's MODEL
structure, which is passed to VBRegisterModel() when the control is created.
Example 1 is a code fragment from a control's header file defining an array
containing one standard and one custom property.
A property's index, as passed to VBGetControlProperty(), is actually its index
in this property array. To get to the property array at run time, you first
call VBGetModel() to obtain the address of the control's MODEL structure. The
property array is one of the fields of this structure. Example 2 shows how to
walk through the array.
Example 2 also reveals some subtleties in dealing with near and far pointers.
All pointers obtained from the MODEL structure are near pointers, but to data
residing in the MODEL structure's segment. An application needs to turn these
near pointers into far pointers by using the MAKELONG() macro.


Getting Property Information


The last section showed how to obtain entries in a control's property array.
The entry for a custom property is the address of a PROPINFO structure, which
contains all the information you need about the property, including its name,
data type, and flags. However, the entry for a standard property is just a
constant defined in VBAPI.H, which your application has to interpret somehow.
We need a simple method for obtaining a true PROPINFO structure for any
property, either custom or standard; this would allow an application to deal
uniformly with all properties.
Listing One contains my solution to this problem: I provide a lookup array
containing the "PPROPINFO_STD_" constant and a corresponding PROPINFO
structure for each of the 42 standard properties defined in VBAPI.H; the
PROPINFO structure contains the property's name and data type and has the rest
of its fields initialized to 0.
The data types of the standard properties are not specified in the Visual
Basic documentation. Some are listed in a technical fax available from
Microsoft; I had to discover the rest by trial and error. Some of the data
types are surprising: For example, the x- and y-dimensional properties, which
are always integers, store their values as floats rather than shorts or longs.
Three data types in my array are left as 0, for "unknown." The None property
really doesn't have a data type, because it only acts as a place holder for a
property that was dropped from the array since a prior version of the control.
The Name and DataSource properties, which really ought to have useful values,
have data types of zero because their values as returned by
VBGetControlProperty() don't seem to be meaningful. (A call to Microsoft's
technical-support line confirmed that these property values are not available
to a VBX at run time.)
Listing One contains three functions to access property information.
GetPropertyArray() is called once and returns the address of the control's
property array. GetProperty() is called for each desired property, and always
returns a far pointer to a PROPINFO structure. In the case of a custom
property, this pointer is the address of the "real" structure; in the case of
a standard property, the pointer is the address of a structure in my lookup
array. IsStandardProperty() can be called to determine whether a property is
standard or custom.


Finding a Property by Name


My original interest in control properties was to get or set their values
using VBGetControlProperty() and VBSetControlProperty(). As discussed
previously, these functions require a property's index, which is generally
unknown for a foreign control. What is known, however, is the property's name.
So as a practical application of Listing One, Listing Two contains a function
called GetPropertyIndex(), which returns the index of a named property.
GetPropertyIndex() makes one call to GetPropertyArray() and then uses
GetProperty() to examine the name of every property in the array. It returns
either the index of the property with the desired name, or -1 if no such
property is found.
Note that the near/far pointer problem is still here, but that the segments
have changed. The property name is a near pointer to a string living in the
same segment as the (far) property address returned by GetProperty().
MAKELONG() is used again, to make a far pointer with the correct segment.


The CTRLINFO Diagnostic Control



Getting and setting foreign-control properties really isn't practical without
a diagnostic tool for determining the properties available, their data types,
the values they take, and how and when these values are set. As a more
extensive application of the functions in Listing One, I wrote a diagnostic
custom control called "CTRLINFO.VBX." 
This control is added to a Visual Basic form at design time. At run time, it
creates a dialog box displaying model information and a list of property names
for any control under the cursor; double-clicking on one of the listed
property names brings up a second dialog box showing in-depth information
about the property: whether it is custom or standard, its data type (and
flags, for a custom property), and its value. The display is similar to that
of Microsoft's Spy program. The code for this VBX is available electronically
(see "Availability," page 3).


Conclusion


Accessing foreign-control properties is an underused but extremely effective
technique. The utilities in Listings One and Two and the CTRLINFO.VBX
diagnostic tool make this technique simple and practical for anyone writing a
custom control. The results can be rewarding. With a lot less work than you
think, you can give your existing API-based resources a healthy new life as
Visual Basic controls.
Example 1: Defining a property array containing one standard and one custom
property.
// PROPINFO structure for custom property
PROPINFO Property_Session =
{
 "Session",
 DT_HSZ PF_fGetData PF_fSetData,
 OFFSETIN(MYCONTROL, Session), 0, 0, NULL, 0
};
// property indices
#define IPROP_MYCONTROL_NAME 0
#define IPROP_MYCONTROL_SESSION 1
// property array
PPROPINFO MyControl_Properties[] =
{
 PPROPINFO_STD_NAME, // fake address
 &Property_Session, // real address
 NULL
};
Example 2: Processing a control's property array.
void ProcessPropertyArray (HCTL hctl)
{
 MODEL FAR *lpModel;
 PPROPINFO FAR *lppPropInfo; // far ptr to array of near pointers
 LPPROPINFO lpPropInfo;
 lpModel = VBGetControlModel(hctl);
 lppPropInfo = (PPROPINFO FAR*) MAKELONG(
 lpModel->npproplist, (_segment)lpModel);
 for (; *lppPropInfo; lppPropInfo++)
 { 
 if (*lppPropInfo >= PPROPINFO_STD_LAST)
 {
 // process *lppPropInfo as a fake address
 // for a standard property defined in VBAPI.H
 }
 else
 {
 lpPropInfo = (LPPROPINFO) MAKELONG(
 *lppPropInfo, (_segment)lpModel);
 // process lpPropInfo as a real address of a
 // PROPINFO structure
 }
 }
}

Listing One
/************************************************************************
 * CtrlProp.cpp -- Utilities for getting information about Visual Basic
 * controls' properties by Bob Sardis, 1995
 ************************************************************************/
#include <windows.h>
#include <vbapi.h>

#include "CtrlProp.h"
// StandardPropery -- holds information for a 'PPROPINFO_STD_' property
typedef struct
{
 PPROPINFO FakeAddress;
 PROPINFO PropInfo;
} StandardProperty;
StandardProperty StandardProperties[] =
{
 PPROPINFO_STD_NAME, {"Name", 0},
 PPROPINFO_STD_INDEX, {"Index", DT_SHORT},
 PPROPINFO_STD_HWND, {"Hwnd", DT_SHORT},
 PPROPINFO_STD_BACKCOLOR, {"BackColor", DT_COLOR},
 PPROPINFO_STD_FORECOLOR, {"ForeColor", DT_COLOR},
 PPROPINFO_STD_LEFT, {"Left", DT_XPOS},
 PPROPINFO_STD_TOP, {"Top", DT_YPOS},
 PPROPINFO_STD_WIDTH, {"Width", DT_XSIZE},
 PPROPINFO_STD_HEIGHT, {"Height", DT_YSIZE},
 PPROPINFO_STD_ENABLED, {"Enabled", DT_BOOL},
 PPROPINFO_STD_VISIBLE, {"Visible", DT_BOOL},
 PPROPINFO_STD_MOUSEPOINTER, {"MousePointer", DT_ENUM},
 PPROPINFO_STD_CAPTION, {"Caption", DT_HSZ},
 PPROPINFO_STD_FONTNAME, {"FontName", DT_HSZ},
 PPROPINFO_STD_FONTBOLD, {"FontBold", DT_BOOL},
 PPROPINFO_STD_FONTITALIC, {"FontItalic", DT_BOOL},
 PPROPINFO_STD_FONTSTRIKE, {"FontStrikeThru",DT_BOOL},
 PPROPINFO_STD_FONTUNDER, {"FontUnderline", DT_BOOL},
 PPROPINFO_STD_FONTSIZE, {"FontSize", DT_REAL},
 PPROPINFO_STD_TABINDEX, {"TabIndex", DT_SHORT},
 PPROPINFO_STD_PARENT, {"Parent", DT_LONG},
 PPROPINFO_STD_DRAGMODE, {"DragMode", DT_ENUM},
 PPROPINFO_STD_DRAGICON, {"DragIcon", DT_SHORT},
 PPROPINFO_STD_BORDERSTYLEOFF, {"BorderStyleOff",DT_ENUM},
 PPROPINFO_STD_TABSTOP, {"TabStop", DT_BOOL},
 PPROPINFO_STD_TAG, {"Tag", DT_HSZ},
 PPROPINFO_STD_TEXT, {"Text", DT_HSZ},
 PPROPINFO_STD_BORDERSTYLEON, {"BorderStyleOn", DT_ENUM},
 PPROPINFO_STD_CLIPCONTROLS, {"ClipControls", DT_BOOL},
 PPROPINFO_STD_NONE, {"None", 0},
 PPROPINFO_STD_HELPCONTEXTID, {"HelpContextID", DT_SHORT},
 PPROPINFO_STD_LINKMODE, {"LinkMode", DT_ENUM},
 PPROPINFO_STD_LINKITEM, {"LinkItem", DT_HSZ},
 PPROPINFO_STD_LINKTOPIC, {"LinkTopic", DT_HSZ},
 PPROPINFO_STD_LINKTIMEOUT, {"LinkTimeout", DT_SHORT},
 PPROPINFO_STD_LEFTNORUN, {"LeftNoRun", DT_XPOS},
 PPROPINFO_STD_TOPNORUN, {"TopNoRun", DT_YPOS},
 PPROPINFO_STD_ALIGN, {"Align", DT_ENUM},
 PPROPINFO_STD_IMEMODE, {"ImeMode", DT_BOOL},
 PPROPINFO_STD_DATASOURCE, {"DataSource", 0},
 PPROPINFO_STD_DATAFIELD, {"DataField", DT_HSZ},
 PPROPINFO_STD_DATACHANGED, {"DataChanged", DT_BOOL},
 NULL, {"", 0},
};
PROPINFO UnknownStdProp = {"UNKNOWN_STD", 0};
/*****************************************************************
 * Function: GetPropertyArray()
 * Description: Gets property array for a Visual Basic control
 * Parameters: hctl -- handle to control
 * Returns: far pointer to property array

 *****************************************************************/
 PPROPINFO FAR * GetPropertyArray (HCTL hctl)
{
 PPROPINFO FAR *lppPropInfo = NULL;
 MODEL FAR *lpModel = VBGetControlModel(hctl);
 if (lpModel)
 {
 lppPropInfo = (PPROPINFO FAR*)
 MAKELONG(lpModel->npproplist, (_segment)lpModel);
 }
 return lppPropInfo;
}
/*****************************************************************
 * Function: GetProperty()
 * Description: gets specified property of a Visual Basic control
 * Parameters: PropertyArray -- array returned by GetPropertyArray()
 * index -- index of control, from 0
 * Returns: far pointer to a PROPINFO structure; for a standard
 * property, this will be a pointer into StandardProperties[]
 *****************************************************************/
LPPROPINFO GetProperty (PPROPINFO FAR * PropertyArray, short index)
{
 PPROPINFO FAR *lppPropInfo = PropertyArray + index; // offset addr
 if (*lppPropInfo == NULL)
 return NULL;
 if (IsStandardProperty(*lppPropInfo))
 {
 // 'standard' property pointers are not real addresses;
 // need to search StandardProperties[] list
 StandardProperty *pStdProp;
 for ( pStdProp = StandardProperties; 
 pStdProp->FakeAddress; 
 pStdProp++)
 {
 if (*lppPropInfo == pStdProp->FakeAddress)
 return &(pStdProp->PropInfo);
 }
 return &UnknownStdProp; // standard property not found
 }
 else
 {
 return (LPPROPINFO) MAKELONG(
 *lppPropInfo, (_segment)PropertyArray);
 }
}
/*****************************************************************
 * Function: IsStandardProperty()
 * Description: determines whether a property is a standard property,
 * as defined in VBAPI.H
 * Parameters: lpPropInfo -- pointer to PROPINFO structure
 * Returns: TRUE if property is standard, FALSE otherwise
 *****************************************************************/
BOOL IsStandardProperty (LPPROPINFO lpPropInfo)
{
 StandardProperty *pStdProp;
 NPPROPINFO pPropInfo = (NPPROPINFO)LOWORD(lpPropInfo); // near ptr
 
 if (pPropInfo >= PPROPINFO_STD_LAST)
 return TRUE; // pPropInfo is a 'fake' address in VBAPI.H

 
 for (pStdProp = StandardProperties; pStdProp->FakeAddress; pStdProp++)
 {
 if (pPropInfo == &(pStdProp->PropInfo))
 return TRUE; //pPropInfo is contained in StandardProperties[]
 } 
 if (pPropInfo == &UnknownStdProp)
 return TRUE; // pPropInfo is an 'unknown' standard property
 return FALSE; // property is not standard
} 

Listing Two
/*****************************************************************
 * Function: GetPropertyIndex()
 * Description: Gets index of named Visual Basic control property
 * Parameters: hctl -- handle to control
 * lpszPropName -- property name
 * Returns: index of property; returns -1 if property not found
 *****************************************************************/
short GetPropertyIndex (HCTL hctl, LPSTR lpszPropName)
{
 PPROPINFO FAR * PropertyArray;
 LPPROPINFO lpPropInfo;
 short i;
 LPSTR lpsz;
 PropertyArray = GetPropertyArray(hctl);
 if (!PropertyArray)
 return -1;
 for (i = 0; ; i++)
 {
 lpPropInfo = GetProperty(PropertyArray, i);
 if (!lpPropInfo)
 break; // have reached end of PropertyArray[]
 lpsz = (LPSTR) MAKELONG(
 lpPropInfo->npszName, (_segment)lpPropInfo);
 if (!lstrcmp(lpsz, lpszPropName))
 return i;
 }
 return -1;
























MIME and Internet Mail


Making e-mail work




Tim Kientzle


Tim is the author of The Working Programmer's Guide To Serial Protocols
(Coriolis Group, 1995) and can be contacted at kientzle@netcom.com.


While Internet mail is a wonderful tool, it currently has a major shortcoming.
Because it was developed to handle 7-bit text messages, Internet mail is
unsuitable for transferring binary data such as word-processor files, audio,
graphics, and other useful data. In particular, the 8-bit character sets now
in use in most of the world aren't supported by the standard mail transport
protocol (SMTP).
Clearly, the handling of Internet mail needs to change, but there are many
obstacles. Any major change to the Internet mail transport would take years.
During that interval, "islands" of enhanced mail capability would be unable to
exchange mail over the sea of existing 7-bit mail transport.
In lieu of an enhanced mail capability, savvy users have for years been
encoding their binary data into a form compatible with existing 7-bit mail
systems. The Multipurpose Internet Mail Extensions standard (MIME) formalizes
and automates this process, augmenting existing mail programs to automatically
encode and decode data with a minimum of user intervention. Since only the
programs used to read and compose messages require modification, MIME provides
enhanced mail facilities without rewiring the Internet.
The basic definition of Internet mail is contained in RFC822, which states
that a mail message consists of header lines followed by a message body. While
RFC822 describes the syntax of header lines in considerable detail, it is less
precise about the body: "The body is simply a sequence of lines containing
ASCII characters." MIME augments this by adding five new headers which, among
other things, specify the precise format of the message body; see Table 1.


MIME Content Types


MIME specifies the format of the message body in three layers. The first is a
broad type, which identifies the general kind of data. By itself, the type
doesn't provide enough information for the reader to do anything useful, but
it does help the reader select a default handling for certain classes of
messages (for example, text formats might be simply listed to the screen,
while unrecognized image formats would not).
The second layer is the subtype. The type and subtype together specify the
exact kind of data in the message; for example, image/gif. The third layer
specifies how the data is encoded into 7-bit ASCII for transfer.
The Content-Type header contains a type and subtype separated by a "/"
character, followed by a list of keyword=value pairs. For example, the
Content-Type text/plain; charset=iso-8859-8 might be used for a plaintext file
containing characters in the ISO Roman/Hebrew character set. If the display
supported Hebrew characters, the mail reader could (after decoding) display
the text as it was intended by the sender.
As Table 2 illustrates, there are currently seven defined types. The first
four types in Table 2 indicate a single data file in a single format; their
subtypes are listed in Table 3. These basic content types are an improvement
over text-only mail, allowing messages to contain graphics, sound, or other
data types. They are also easy to support; mail readers only need to parse the
Content-Type and Content-Transfer-Encoding headers, decode one of two simple
data formats, and pass the result to a separate viewer program.


Complex Messages


The message and multipart types provide features that can reduce mail-delivery
costs and allow single messages to combine different kinds of data.
The message content type provides three important capabilities.
Message/rfc822, for instance, allows an RFC822-compliant message (which may
also be a MIME message) to be embedded within a MIME message. This provides
improved support for returning or forwarding messages. The
message/external-body type saves on transfer costs by specifying that the
actual message body is contained elsewhere. Keywords define exactly how the
message body can be retrieved (for example, via anonymous ftp or as a local
file). Figure 1 gives some examples. The message/partial type allows a single
large message to be split and sent as several smaller messages. This is
important when dealing with mail systems that limit the size of messages. The
message/partial has three keywords: 
id, which specifies a unique identifier used to match different pieces of the
same message.
number, which specifies which part this is (parts are numbered starting with
1).
total, which gives the total number of parts. 
The id and number keywords are required on all parts; total is required only
on the last part.
The multipart content type allows a single message to contain several pieces,
each in a different format. The most common multipart message is
multipart/mixed, which indicates that the message consists of multiple pieces,
each with its own separate Content-Type header. There are also
multipart/alternative, in which the parts are alternative formats of the same
information (such as a plaintext and a word-processor file with the same
content); multipart/parallel, in which the parts are intended to be displayed
simultaneously (such as an audio recording and a photograph of the speaker);
and multipart/digest, which is the same as multipart/mixed except that the
default content type for each part is message/rfc822 rather than text/plain.
All message and multipart types allow (indeed, often require) the embedded
data to have its own headers. Technically, the embedded data is not an RFC822
message (for example, it may lack a From header), even though it has the same
general format. For example, if a message has type message/external-body, the
body contains a series of lines that look like RFC822 headers, including
Content-Type, Content-Transfer-Encoding, and Content-ID (required for
message/external-body). Like RFC822, a blank line indicates the end of the
headers.
Multipart messages must have some way to separate the different parts. The
boundary keyword specifies a string that does not occur anywhere else in the
message. The actual separators consist of the specified string preceded by
"-". The end of the multipart message is marked by the boundary string
preceded and followed by "-". Figure 2 shows this mechanism in action. This
displays a text message while retrieving and playing audio data from a local
file. A minimal MIME-compliant mail reader would show the text part and inform
the user of the type and location of the external-file data.


Encoding


Transparent handling of binary data is one of the primary goals of MIME. It
does this by specifying the encoding in the Content-Transfer-Encoding header
field. Table 4 lists the five currently defined encodings. The first three
indicate that the data is unencoded. The 8bit and binary types are used
primarily for parts contained in message/external-body and occasionally with
mail systems that do support 8-bit messages.
The Quoted-Printable encoding is intended for data that is primarily 7 bit,
with occasional 8-bit values. For example, text messages in ISO character sets
are often predominantly 7 bit. Quoted-Printable allows most 7-bit text
characters to represent themselves. The remaining characters are encoded as
three-character sequences consisting of "=" followed by a hexadecimal number.
In particular, "=" is encoded as "=3D".
The advantage of the Quoted-Printable encoding is that it allows any part of
the data that is in 7-bit US-ASCII to be read without decoding. However, for
raw binary data, it can introduce excessive overhead. The preferred encoding
for raw binary data is Base64, which encodes each three bytes of binary data
as four characters. The 24-bit value is considered four 6-bit numbers, which
are then encoded from the characters A--Z, a--z, 0--9, +, and /. Thus, "the"
becomes "dGhl". The result is padded with "=" to a multiple of four characters
and broken into 72-character lines. Listing One presents a simple
encoder/decoder program for this encoding method. This encoding is similar to
the one used by the popular uuencode utility, but avoids using punctuation
characters that are lost or altered by certain mail gateways.
In some cases, no encoding is necessary. In particular, the multipart type
always uses 7bit, as does message/partial and message/external-body. Under
certain circumstances, other message types can use binary or 8bit. The
remaining content types can use any available encoding. The point of these
restrictions on message and multipart is to avoid nested encodings, which can
unnecessarily bloat the message. Remember that a Content-Transfer-Encoding of
7bit for a multipart message means that the individual parts have all been
encoded for 7-bit transport.


Security


Many projects have used mail to transfer scripts that are automatically
executed on the receiving machine. MIME's application/postscript is one
example, and other such content types are being proposed. Any system that
allows a received program to be executed automatically is a potential security
risk. PostScript includes the ability to modify files, and even without that,
it is possible to crash many systems by consuming excessive memory or disk
space. Security-conscious systems may need to restrict the handling of these
content types. For example, it is usually more secure to send PostScript files
to a printer than to interpret and display the data on the host machine.



More Information


The current MIME specification, RFC1521, is available from the mail server at
RFC-INFO@isi.edu. When requesting the spec, be sure to include the lines 
retrieve: RFC
doc-id: RFC1521 
in the body of the message. Other RFC documents can be retrieved in a similar
fashion.
MIME does not extend RFC822 to allow the use of non-ASCII characters in mail
headers, but a related proposal, documented in RFC1522, does. An extended text
subtype text/enriched is described in RFC1563. This replaces the text/richtext
type proposed in an earlier MIME draft (the name change was to reduce
confusion with Microsoft's RTF).
You can obtain the specification--in both text and PostScript form--and the
free MetaMail implementation of MIME at ftp://thumper.bellcore.com/pub/nsb.
Table 1: MIME headers.
Header Description
Content-Type Specifies the type of data contained in the
 message. For example, a Content-Type of
 audio/basic indicates a
 particular audio format that the mail reader
 should decode and play.
Content-Transfer- Specifies how the (binary) data is encoded
 Encoding into 7-bit text.
MIME-Version Indicates MIME compliance. Was omitted from
 early drafts of MIME, so isn't yet used by
 all encoders.
Content-ID Uniquely identifies the body of the message.
Content- Provides an additional human-readable
 Description description.
Table 2: MIME content types.
Type Description
text Human-readable text, possibly with
 textual markup. Any file with type text
 should be intelligible if simply listed to the
 screen. In particular, binary word-processor
 formats are not text.
audio Sound data.
image Still image.
video Movie or animated image.
application Application-specific data file.
 Includes script files in certain text languages.
message Wrapper for an embedded message.
multipart Multipart message. Each part may be
 in a different format. Subtypes indicate
 relationships between different parts.
Table 3: Simple MIME data types.
Type/Subtype Description
text/plain Plaintext with no special formatting.
 The key charset is used to
 specify US-ASCII or one of the ISO-8859
 character sets.
text/enriched An alternate format specified in
 RFC1563.
application/ Binary data of an unspecified format.
 octet-stream The type key can be
 used to give additional, human-readable information.
 The padding key can be used to specify
 0--7 bits of padding added to round a
 bit-oriented file to a whole number of 8-bit
 bytes.
application/ A PostScript file.
postscript
image/gif A still image in GIF format.

image/jpeg A still image in JPEG format.
audio/basic A single-channel 8000-Hz audio file in
 8-bit ISDN m-law (PCM) format.
video/mpeg A video image in MPEG format. Video
 images may or may not contain an associated
 soundtrack.
Table 4: MIME encoding types.
Encoding Description
7bit Unencoded 7-bit text.
8bit Unencoded 8-bit text.
binary Unencoded binary data.
Quoted- Most 7-bit characters are unencoded;
 Printable other characters are represented as
 "=" followed by a hexadecimal number.
Base64 Encoded in Base64 using digits A-Z,
 a-z, 0-9, +, and /.
Figure 1: Examples of message/external-body Content-Type headers. As with all
RFC822-compliant headers, these are single lines.
Content-Type: message/external-body; access-type=local-file;
name="/pub/LargeFile"
Content-Type: message/external-body; access-type=anon-ftp; size=12345678;
 site=somehost.com; name=LargeFile; directory=pub/other; mode=image
Figure 2: Sample multipart message.
From: tim@humperdinck (Tim Kientzle)
To: tim@humperdinck
Subject: A Sample Multipart message
MIME-Version: 1.0
Content-Type: multipart/parallel; boundary=SoMeBoUnDaRyStRiNg
Any text preceding the first boundary string is ignored
by MIME-compliant mail readers. This area usually holds
a short message informing a person using a non-compliant
reader that this is a MIME message that they may not be
able to read.
-SoMeBoUnDaRyStRiNg
The preceding blank line ends the headers for this part.
Since there were none, this is assumed to be plain text
in US-ASCII. The boundary cannot occur in the actual
text, so that mailers can quickly scan the text to
locate the boundaries.
-SoMeBoUnDaRyStRiNg
Content-Type: message/external-body; access-type=local-file;
name=/pub/file.audio
Content-Transfer-Encoding: 7bit
Content-Type: audio/basic
Content-Transfer-Encoding: binary
This text is ignored, the actual audio comes from the
file /pub/file.audio. Both blank lines above are
important. Also note the different encodings.
The 7bit encoding means that this embedded message is
in 7bit (which is mandatory for message/external-body),
while the actual audio data is stored in binary in the
local file.
-SoMeBoUnDaRyStRiNg-
This text follows the closing boundary marker above,
and is therefore ignored by compliant mail readers.

Listing One
/****************************************************************************
 MIMECODE - encode/decode binary data using MIME's base64 method
 Definition: ``radix encoding'' is the process of encoding data
 by treating the input data as a number or sequence of numbers

 in a particular base. The most common example is base-16 (hexadecimal)
 encoding, although other bases are possible.
 This program encodes and decodes data using the base 64 encoding
 used by MIME. Output is broken into lines every 72 characters.
 Decoding ignores control characters. Base 64 encoding adds 33% to
 the size of the input file.
 Usage: mimecode <options>
 Description: reads from stdin and writes encoded/decoded data to stdout.
 Options: -e Encode
 -d Decode
****************************************************************************/
#include <stdio.h>
/* This digit string is used by MIME's base-64 encoding */
/* MIME also deliberately ignores `=' characters */
#define BASE64DIGITS \
 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
static unsigned long bitStorage = 0;
static int numBits = 0;
/* Masks for 0-8 bits */
static int mask[] = { 0, 1, 3, 7, 15, 31, 63, 127, 255 };
/****************************************************************************
 ReadBits: reads a fixed number of bits from stdin
 If insufficient bits are available, the remaining bits are
 returned left-justified in the desired width.
*/
unsigned int ReadBits(int n, int *pBitsRead)
{
 static int eof = 0;
 unsigned long scratch;
 while ((numBits < n) && (!eof)) {
 int c = getchar();
 if (c == EOF) eof = 1;
 else {
 bitStorage <<= 8;
 bitStorage = (c & 0xff);
 numBits += 8;
 }
 }
 if (numBits < n) {
 scratch = bitStorage << (n - numBits);
 *pBitsRead = numBits;
 numBits = 0;
 } else {
 scratch = bitStorage >> (numBits - n);
 *pBitsRead = n;
 numBits -= n;
 }
 return scratch & mask[n];
}
/****************************************************************************
 WriteChar: output character to stdout, breaking lines at 72 characters.
*/
static count = 0;
void WriteChar(char c)
{
 putchar(c);
 count++;
 if (count >= 72) { /* Chop after 72 chars */
 putchar('\n');

 count = 0;
 }
}
/****************************************************************************
 PadMimeOutput: pad output for MIME base64 encoding
*/
void PadMimeOutput(void)
{
 while ((count % 4) != 0) {
 putchar('=');
 count++;
 }
}
/****************************************************************************
 ReadChar: Get next non-control character from stdin, return EOF at
 end-of-file
*/
int ReadChar(void)
{
 int c;
 do {
 c = getchar();
 if (c==EOF) return c;
 } while ( (((c+1) & 0x7f) < 33) ); /* Skip any control character */
 return c;
}
/****************************************************************************
 WriteBits: Write bits to stdout
 Note: assumes `bits' is already properly masked.
*/
void WriteBits(unsigned bits, int n)
{
 bitStorage = (bitStorage << n) bits;
 numBits += n;
 while (numBits > 7) {
 unsigned scratch = bitStorage >> (numBits - 8);
 putchar(scratch & 0xff);
 numBits -= 8;
 }
}
/****************************************************************************
 Base64Encode: encode stdin to stdout in base64
 The encoding vector used here is the one used by MIME.
*/
void Base64Encode(void)
{
 int numBits = 6; /* Encode 6 bits at a time */
 int digit;
 const char *digits = BASE64DIGITS;
 digit = ReadBits(numBits,&numBits);
 while (numBits > 0) { /* Encode extra bits at the end */
 WriteChar(digits[digit]);
 digit = ReadBits(numBits,&numBits);
 } 
 PadMimeOutput(); /* Pad to multiple of four characters */
 putchar('\n');
}
/****************************************************************************
 Base64Decode: decode stdin to stdout in base64

 The `decode' array specifies the value of each digit character.
 -2 indicates an illegal value, -1 for a value that should be
 ignored. ReadChar() already ignores control characters.
 Ignores parity.
*/
void Base64Decode(void)
{
 int c, digit;
 int decode[256];
 { /* Build decode table */
 int i;
 const char *digits = BASE64DIGITS;
 for (i=0;i<256;i++) decode[i] = -2; /* Illegal digit */
 for (i=0;i<64;i++) {
 decode[digits[i]] = i;
 decode[digits[i]0x80] = i; /* Ignore parity when decoding */
 }
 decode['='] = -1; decode['='0x80] = -1; /* Ignore '=' for MIME */
 } 
 c = ReadChar();
 while (c != EOF) {
 digit = decode[c & 0x7f];
 if (digit < -1) {
 fprintf(stderr,"Illegal base 64 digit: %c\n",c);
 exit(1);
 } else if (digit >= 0) 
 WriteBits(digit & 0x3f,6);
 c = ReadChar();
 }
}
/****************************************************************************
 Usage: print usage message to stderr
*/
void Usage(char * progname)
{
 fprintf(stderr,"Usage: %s <options>\n",progname);
 fprintf(stderr,"Options: -e Encode\n");
 fprintf(stderr," -d Decode\n");
}
/****************************************************************************
 main: parse arguments, call appropriate encode/decode function
*/
int main(int argc, char **argv)
{
 int encode = 1;
 if (argc < 2) { Usage(argv[0]); exit(1);}
 while (argc > 1) {
 char *p=argv[--argc];
 switch(*p) {
 case '-':
 {
 switch(*++p) {
 case 'e': case 'E': encode = 1; break;
 case 'd': case 'D': encode = 0; break;
 default:
 fprintf(stderr,"Unrecognized option: %s\n",p);
 Usage(argv[0]);
 exit(1);
 }

 }
 break;
 default:
 fprintf(stderr,"Unrecognized option: %s\n",p);
 Usage(argv[0]);
 exit(1);
 }
 }
 if (encode) Base64Encode();
 else Base64Decode();
 exit(0);
}



















































Plug and Play Run-Time Services


Software-controlled system configuration for Windows 95




Thomas A. Roden and Glenn E. Jystad


Tom and Glenn are principal engineers at Phoenix Technologies, where Tom
specializes in Plug and Play and Glenn specializes in advanced technologies.
They can be contacted at tom_roden@ptltd.com and glenn_jystad@ptltd.com,
respectively.


One of the first things you read in the Microsoft Hardware Design Guide for
Windows 95 is that the guide "emphasizes the Plug and Play concept." This
raises a number of questions, not the least of which is, What exactly is Plug
and Play? The easy answer is that Plug and Play (PnP) is an architecture
implemented in Windows 95 that's designed to eliminate some of the
hardware-configuration hassles of installing peripherals such as soundcards,
CD-ROM drives, graphics subsystems, and the like. Of course, to make this
automatic hardware detection and configuration possible, PnP has a few
requirements: For instance, your system motherboard, system BIOS, and
peripheral devices all must be PnP compliant at the outset. In short, PnP is a
collection of methods for determining and controlling system-resource usage.
The problem with short answers, however, is that they tend to generate more
questions.
For instance, what are system resources? To PnP, system resources are memory
address space, I/O address space, IRQs, and DMAs. Memory address space can be
further divided into what's below and above 1M. I/O can be divided into that
below 400h (and all its aliases we haven't managed to escape) and that above.
DMA consists of the 8- and 16-bit varieties, and IRQs can sometimes be shared.
Resources are allocated to or claimed by motherboard devices and plug-in bus
devices of one of several families that may or may not include ISA, MCA, EISA,
PCMCIA, VESA-VL, or PCI. Often, a motherboard device is actually a bus device
soldered directly to the motherboard. And don't forget that configuration
mechanisms (jumpers, switches, ROM setup, standard configuration utilities,
and custom configuration utilities) are varied and sometimes inconsistent with
each another. 
In an effort to make the PC universe more pleasant, a number of hardware and
software companies have developed the PnP standard for the determination and
control of all system resources. Where practical, PnP also shows which device
is consuming what resource, and where the device is located. 
PnP has been shipping on many motherboard BIOSs in one form or another for
over a year. There are two major types:
Conflict detection and resolution (CDR).
Run-time services (RTS). 
With CDR, PnP ISA cards in your system are configured before your operating
system loads. RTS, on the other hand, is detectable by a PnP header in the
F000 segment.


The PnP Header and RTS Spec


The PnP header, which checksums to 0, contains a signature, length, data
pointers, and code-entry points; see Table 1. There should be no more than one
such header in the entire F000 segment. These entry points are far called from
real or 16-bit protected mode. In this article, we'll focus on these
code-entry points. 
The code-entry points take stack-based parameters. While function number is
the first parameter, additional parameters are function specific. This can be
complex to declare properly in C; the source code accompanying this article
serves as an example of declaring it so that it's easy to call.
The PnP RTS specification allows all functions to be optional. Both the RTS
and CDR specs are available in source-code form; see "Availability," page 3.
To save BIOS space, these functions may be implemented with little or no error
checking. It is important that the calling application not pass unexpected
input. 
The most widely implemented functions (as well as those required for Windows
95 PnP compatibility) are the device-node services--GetNumberOfDeviceNodes,
GetDeviceNode, and SetDeviceNode. These services allow a program to determine
the resources used by the motherboard. Configuration utilities can use this
information to preclude configuration conflicts. System-information utilities
can use this information to augment their automatic hardware-detection
algorithms.
Device nodes contain the Descriptor Block of Used Resources and the Descriptor
Block of Possible Resources. The first states the resources the device is
currently configured to use, while the second states the possible
configurations the device can accept. Both are in a tagged, byte-oriented
format that encodes length in a standard way. Thus, new descriptors can be
added such that existing code can skip over them without getting lost. The
supplied code contains routines to parse this format.
Reading a device node tells you if it can be set programmatically. If so, the
SetDeviceNode call can be used. Nodes must be checked before a set call is
issued since the function itself is not required to perform qualification of
the input block. Neither is it required by the spec to return an error on any
form of invalid input. The preferred method of reading a device node is as
follows:
1. Get the device node using the Descriptor Block of Used Resources as a
template. 
2. Select the desired configuration from the Descriptor Block of Possible
Resources. This configuration should not conflict with any other device-node
resource usage.
3. Set the device node (checking for error return).
4. Get the device node again, verifying that the requested configuration was
set.
Step 4 may be overkill, since any fully qualified configuration should set
successfully. Still, this defense is justifiable since applications tend to
have more space.


Storing the Configuration


PnP operates on the principle that everything in the PC universe is
generically detectable. PnP ISA and PCI have configuration registers that can
be read, and RTS supplies the device nodes to determine what's on the
motherboard. Any other devices are classified as ISA Legacy Devices or
Statically Allocated Resources. ("Legacy" devices are expansion cards that do
not support the PnP spec. Windows 95 includes an "Add New Hardware" wizard for
configuring legacy ISA cards in machines with a PnP BIOS. The system assigns
resources to legacy cards before anything else.)
PnP also needs to determine a resource's usage. There are two accepted methods
for this, and only one is implemented in any given system, so applications
need to handle both. The first is to use the Get and Set Statically Allocated
Resource functions, which store only those resources that are not generically
detectable. Consequently, there is no information about how the user wants
things configured--only about the areas unavailable to configurable devices.
The second way of determining resource usage is to use the Extended System
Configuration Data (ESCD) functions: GetESCDInformation, ReadESCD, and
WriteESCD. ESCD is an EISA-compatible format that includes structure
definitions to store information about device nodes, PnP ISA, and PCI
hardware. ESCD stores statically allocated resource information and the Last
Working Configuration (LWC) of configurable devices. This allows the system to
abbreviate the CDR process and ensure consistent device placement from boot to
boot. This is important when device drivers for configurable systems get their
placement information from a static source such as a command-line parameter in
CONFIG.SYS. Device drivers for PnP or PCI devices should never do this but
they sometimes do.
ESCD allows run-time utilities (such as the Device Manager in Windows 95) to
request specific device placements. This may allow indirect control of the
priority of boot devices, or difficult-to-place configurations can be
determined by operating-system or application-level tools.


PNPDEMO.EXE 


PNPDEMO.EXE demonstrates PnP RTS by showing how to find the $PnP header. The
program uses the Real Mode Entry Point and calls several PnP services. 
The existence of a PnP BIOS is determined by the presence of a valid $PnP
header contained within the F000 segment of the BIOS. This header contains
several pieces of information, but PNPDEMO focuses only on the Real Mode Entry
Point. By calling this real-mode location with the appropriate parameters
placed on the stack, the caller can derive all of the devices on the
motherboard, the state of the docking station, the number of PnP ISA cards,
and resource information about the entire system. Moreover, through this
interface, the caller can configure or disable motherboard devices directly in
an industry-standardized, platform-independent fashion. Note that PNPDEMO is
not an alternative to the PnP BIOS specification; it merely demonstrates many
of a PnP BIOS's capabilities.
After locating the PnP header and determining the entry point, a program makes
calls by placing the appropriate parameters on the stack and calling the entry
point's designated location. The first parameter to all functions is the
function number. Table 2 lists the functions and their numerical designation.
PNPDEMO lets the user call all but the Set or Write type functions. These
functions (which you can find in the appropriate source files) should be
tightly controlled by more-sophisticated software. Note that although the
files have the .CPP extension to indicate that they contain C++, most of the
code is actually straight C and only utilizes a few subtle C++ features.
PNPDEMO.CPP. Within each of the following functions, each PnP Run-time
Service's parameters is defined and used in a sample call; see Example 1(a).
PnpSummary displays the current allocation of all of the device nodes by
enumerating the list and parsing the allocated resources, as in Example 1(b).

PNPBIOS.CPP. FindPnpBIOS scans the F000 segment to locate a valid PnP BIOS
Header. PnpGetHeader displays the contents of this header; see Example 2(a).
The routines in Example 2(b) illustrate how to parse the contents of a device
node, its header, allocated resources, and possible resources.
PNPISA.CPP. The routines in the PNP-ISA.CPP module provide the detailed
resource parsing in complete and summarized forms; see Example 3.
UTILS.CPP. EisaId converts the DWORD ID fields to their uncompressed string
value; see Example 4. Each ID contains both a vendor designation and a model
number. In the case of many motherboard device nodes, this ID indicates a
compatible standard device. For example, PNP0401 indicates a 16550-compatible
serial port.


Conclusion


It's been said that anything making the user's interaction with PCs easier
ends up making the programmer's job harder. There's no better example of this
than the Windows 95 PnP expansion architecture. With promises of PnP features
such as support for dynamic reconfiguration events, our jobs won't get any
easier.
Table 1: PnP header contents.
Field Offset Length Value
Signature 00h 4 BYTES $PnP (ASCII)
Version 04h BYTE 10h
Length 05h BYTE 21h
Control field 06h WORD Varies
Checksum 08h BYTE Varies
Event-notification 09h DWORD Varies
 flag address
Real-mode 16-bit 0Dh WORD Varies
 offset to entry point
Real-mode 16-bit 0Fh WORD Varies
 code-segment address
16-bit protected-mode 11h WORD Varies
 offset to entry point
16-bit protected-mode 13h DWORD Varies
 code-segment base address
OEM device identifier 17h DWORD Varies
Real-mode 16-bit 1Bh WORD Varies
 data-segment address
16-bit protected-mode 1Dh DWORD Varies
 data-segment base address
Table 2: PnP RTS functions.
 Function Description
 0 Get number of device nodes.
 1 Get device node.
 2 Set device node.
 3 Get event.
 4 Send message.
 5 Get docking-station information.
 9 Set statically allocated resources.
 A Get statically allocated resources.
 B Get APM 1.1 table.
 40 Get PnP ISA information.
 41 Get ESCD information.
 42 Read ESCD.
 43 Write ESCD.
Example 1: (a) Defining and calling typical PnP run-time service functions;
(b) using the PnpSummary function.
(a)
int PnPFnGetNodeInfo(int argc, char * argv[])int PnPFnGetNode(int argc, char *
argv[])int PnPFnSetNode(int argc, char * argv[])int PnPFnGetEvent(int argc,
char * argv[])int PnPFnSendMessage(int argc, char * argv[])int
PnPFnGetDockInfo(int argc, char * argv[])int PnPFnSetStaticResources(int argc,
char * argv[])int PnPFnGetStaticResources(int argc, char * argv[])int
PnPFnGetApmTable(int argc, char * argv[])int PnPFnGetIsaConfig(int argc, char
* argv[])int PnPFnGetEscdInfo(int argc, char * argv[])int PnPFnGetEscd(int
argc, char * argv[])int PnPFnSetEscd(int argc, char * argv[])

(b)
int PnpSummary (int argc, char* argv[])
Example 2: Using FindPnpBIOS to locate a valid PnP BIOS header; (b) displaying
the contents of the PnP BIOS header.
(a)
WORD FindPnpBios ()int PnpGetHeader ()

(b)
void DumpDevNodeHeader (BYTE ** ppbDevNode)void DumpDevNodeSumHeader (BYTE **
ppbDevNode)void DumpDevNodeSumAllocated (BYTE ** ppbDevNode)void
DumpDevNodeAllocated (BYTE ** ppbDevNode)void DumpDevNodePossibles (BYTE **
ppbDevNode)void DumpDevNodeCompatIds (BYTE ** ppbDevNode)void DumpDevNode
(BYTE * pbDevNode)void DumpDevNodeSummaryLine (BYTE * pbDevNode)
Example 3: Detailing resource parsing in two different forms: complete and
summarized.

void DumpIsaResources (BYTE **ppbIsa)
void DumpIsaSumResources (BYTE **ppbIsa, int iLongSum)
Example 4: EisaId converts the DWORD ID fields to their uncompressed string
value. 
char * EisaId (DWORD dwCompressedID, char * pszBuffer)



























































Zero-Copy Interfacing to TCP/IP


Improving real-time performance




Dana Burd


Dana is a member of the networking group at Wind River Systems. He can be
contacted at dana@wrs.com.


AA network connection to a remote computer can cause a performance bottleneck
for many real-time, embedded applications, especially those that move large
amounts of data. To avoid such bottlenecks, every ounce of network throughput
must be squeezed out of the system. Most factors that affect network
throughput are beyond the application developer's control, but the overhead
incurred by making multiple copies of data can be avoided using a "zero-copy
interface." A zero-copy interface allows a real-time application to send to
and receive from the network stack without copying data. The data buffer sent
or received at the application layer is also used by the device driver to send
or receive data.
In this article, I will examine the concepts, requirements, advantages, and
disadvantages associated with the use of a zero-copy interface. Currently, two
popular real-time operating systems support zero-copy TCP--Wind River's
VxWorks and Integrated Systems' pSOS. I'll use the VxWorks zbuf facility as an
example implementation. The zbuf facility provides calls to perform zero-copy
operations on VxWorks' 4.3 BSD-derived TCP/IP protocol stack. I will also
present sample benchmarking code and explain the potential performance
benefits of using a zero-copy interface.


Zero-Copy-Interface Concept


To introduce the zero-copy-interface concept, I'll trace the flow of received
data through VxWorks' TCP/IP protocol stack. Figure 1 shows a typical
data-reception scenario. First, the network chip automatically fills a single
device-driver buffer with received data. The driver code then copies the data
from the device-driver buffer into a chain of network buffers and hands the
chain off to the network stack. The stack processes the buffer chain and
passes it up to the socket layer, which holds onto the data. When the
application issues a receive call, a user-specified amount of data is copied
from the socket's network-buffer chain into an application-provided buffer.
Although no data copies are performed within the TCP/IP stack itself, every
byte of data is still copied twice: once at the device-driver-layer interface
and once at the socket-layer interface. This approach stems from UNIX's
process model, where applications running in "process space" cannot access
network buffers, which reside in "kernel space." VxWorks' flat memory scheme
does not impose such a division of memory, but rather allows network and
application buffers to coexist in the same memory space, and the zero-copy
interface capitalizes on this.
Figure 2 shows the same data flow, with the addition of a zero-copy interface.
As before, the network chip fills a single device-driver buffer with received
data. Instead of copying the data, however, the driver code makes a system
call to transform the device-driver buffer into one large network buffer. The
driver then supplies a pre-allocated, empty device-driver buffer to the chip
and passes the filled network buffer to the protocol stack. In effect, the
driver is "loaning" its filled receive buffer to the stack for processing, and
expects to be notified through a callback routine when the buffer is free to
be filled again. Once the data is passed to the protocol stack, the network
code performs the exact same functions as before. When the data arrives at the
socket layer, the application may issue a zero-copy-interface receive call to
access the network buffer. In much the same way that the driver loans its
device-driver buffer to the stack, the zero-copy receive call allows the
network to loan its network buffer to the application, thereby avoiding the
socket-layer copy. When finished with the data, the application must make a
zero-copy-interface call to return the network buffer to the stack, at which
time the stack returns the buffer to the device driver via the driver callback
routine.
Note that no specific mention is made of transport-layer functionality. This
layer is not affected by zero-copy issues; therefore, UDP can be used just as
well as TCP. The aforementioned scenarios follow data reception, but data
transmission is just as easily traced. In this case, the application loans a
data buffer to the socket layer via a zero-copy interface send call. The
socket layer processes the application's buffer and passes it down the stack,
performing the usual stack manipulations along the way. At the device driver,
the data is sent to the physical media directly from the network (application)
buffer. Once the data has been sent, the driver returns the data buffer to the
network stack, which in turn gives it back to the application through an
application-specified callback routine.
You have now seen how data may move through VxWorks' TCP/IP protocol stack
without being copied. At the socket layer, zero-copy-interface calls allow
buffers to be shared between the application and the stack. The driver may
also share buffers with the stack, but because of network-chip-architecture
constraints, many device drivers still copy data. Some devices use on-chip
buffer space, which necessitates data copies, and not all chips can send
directly from outgoing buffer chains. Although VxWorks provides system calls
to help avoid data copies, the device-driver developer must determine whether
data copies are necessary and write the driver accordingly. Since
device-driver internals are usually hidden from real-time application
developers, I'll concentrate now on the socket-layer portion of the zero-copy
interface.


zbuf Facility


VxWorks offers a socket-layer zero-copy interface with the zbuf facility, a
buffer-abstraction library that allows you to manipulate and share buffers by
copying pointers to the data instead of the data itself. The facility's basic
unit, the zbuf, has three essential properties, as follows:
It holds a sequence of bytes.
Its data is organized into one or more segments of contiguous data; successive
zbuf segments are not usually contiguous.
Its segments refer to data buffers through pointers. The underlying data
buffers may be shared by more than one zbuf segment.
The zbuf routines shown in Table 1 allow you to create, delete, build, and
manipulate zbufs; Figure 3 presents both simple and complex zbufs. To create
the simple zbuf, you must first create a zbuf ID by calling zbufCreate(),
which returns zbufID1. This empty ID may then be built by calling
zbufInsertBuf(), providing as parameters a data buffer, the length of the
buffer, and an application-specific callback routine (to notify the
application when the data buffer is released). Once the data buffer is
enrolled into zbufID1, you can get the address of the data with zbufSegData()
and the length of the buffer with zbufSegLength(). The more complex zbufs in
Figure 3(b) are built similarly, except that more of the zbuf routines are
used. Most notably, zbufDup() is used to duplicate the last two segments of
zbufID2. Copying zbuf segment pointers to zbufID3 instead of the data buffer
itself allows the two zbufs to share the data with each other.
By itself, the zbuf facility provides a mechanism for sharing data between
separate modules. zbufs may be used as the buffering scheme for new protocols,
middleware, messaging or communication applications, as well as for sharing
data between different programs or software layers. 
The socket layer of VxWorks' TCP/IP stack was modified to provide an interface
to zbufs. The zbuf routines in Table 2 let the application send zbufs to and
receive zbufs from the network stack without copying data. The routines in
Table 1 may be used to create and build zbufs before they are sent, and to
access and delete zbufs after they are received; the routines in Table 2
perform the actual sending and receiving.


Using the zbuf Socket Interface


While zero-copy interfaces bypass data copying, they are not always faster
than the original method of copying. The time to create, build, maintain, and
delete a zbuf must be weighed against the time it takes to copy the data;
unfortunately, these are different on every system. They depend heavily upon
both hardware (CPU architecture, clocking speed, caching issues, memory-access
times) and software (buffer size, task interactions, throughput of network
peer, and so on). No simple formula can unequivocally dictate the use of zbuf
socket calls instead of the regular BSD socket calls, but the following
guidelines will likely indicate the appropriate interface.
The zbuf socket interface is most useful when your real-time application is
sending or receiving large amounts of data over the network. By this, I mean
not only that a large total quantity of data is sent, but also that each call
is sending or receiving a large data buffer (at least a few hundred bytes). If
only a few bytes are transferred in each call, the zbuf overhead will outweigh
any savings, so applications that have small or infrequently transferred
buffers should probably continue to use the standard BSD socket calls.
zbuf socket calls are useful when network throughput, or the time spent in the
network code, is limiting your application. If the CPU is idle much of the
time, zbuf savings will not be significant.
To use zbuf socket calls, your application must adhere to several
requirements: 
When sending a zbuf, the application must not modify the zbuf data until the
network stack releases the data buffer via the application-specified callback
routine. This protects the integrity of the zbuf data while the network code
is sending or resending the buffer. 
The application-specified callback routine must not block or delay, since the
callback is run in the context of the network task. 
When receiving a zbuf, the returned zbuf will typically contain data in
multiple zbuf segments; in other words, the application cannot assume that the
received zbuf data will be contiguous. 
The application must always delete the received zbuf by calling zbufDelete()
when finished processing it.
Zero-copy interfaces have not yet been standardized by the real-time industry.
Although VxWorks' zbuf socket calls are modeled after the BSD socket API--and
therefore require minimal changes to your application--they are not portable
to other operating systems.


Example Code


As previously stated, the zbuf socket interface is most useful for
transferring a lot of data over the network. Figure 4(a) presents a few lines
that appear in many applications that perform such transmissions. Figure 4(b)
is the same, except that it was recoded to use zbufs. The appBufGet() and
appBufRetn() references are fictitious, application-specific routines. In most
applications, they will just manipulate a list of free, fixed-length
application buffers.

Figure 4 illustrates two important points: the strong correlation between the
BSD and zbuf socket calls, which makes for an easy migration to zbufs; and the
application's need to maintain a pool of buffers with zbuf, as opposed to one
allocated buffer with BSD fragment. The pool of buffers allows the application
to send multiple data buffers to the network stack--a feat that BSD
accomplishes by copying data.
Listings One and Two can be compiled and used to benchmark the zbuf socket
interface. Listing One (tcpBlaster.c) repeatedly sends TCP packets across a
network connection to a remote computer. Listing Two (tcpBlastee.c) is used to
receive the TCP packets sent by tcpBlaster. Neither program looks at the data
being sent; they simply track the network throughput and periodically print
out performance statistics for the user. The code can be run between a VxWorks
target and a UNIX host, or between two VxWorks targets. To utilize VxWorks'
zbuf socket interface, simply define the macro INCLUDE_ZBUF_SOCK when
compiling the code.


Benchmarking


The benchmark code in Listings One and Two was run in a controlled environment
to measure VxWorks' TCP/IP protocol stack throughput, with and without the
zbuf socket interface. Four test cases were selected:
Sending to BSD sockets.
Sending to zbuf sockets. 
Receiving from BSD sockets.
Receiving from zbuf sockets.
Each test case was run with 1K, 5K, and 15K transfer buffers, to correlate
buffer size with performance.
The test environment consisted of an isolated two-node LAN comprised of a
20-MHz, 68030-based Motorola MVME147 and a 33-MHz, 68040-based Motorola
MVME167 networked together via an Ethernet link. Both single-board computers
were running VxWorks Version 5.2 and had the benchmark code loaded. The
MVME167 was faster and did not limit network throughput, so saving data copies
on this board was of little interest. The speed of the MVME147, on the other
hand, created a performance bottleneck. Using the zbuf interface to improve
the protocol-stack performance on this slower board directly affected overall
system throughput. 
Table 3 shows that using the zbuf socket interface increased network
throughput in all four test cases. Performance improvements ranged from a 3.9
percent speedup when receiving 1K buffers, to a 20.0 percent gain when sending
15K buffers. As the transfer buffer grew, percentages increased, and the fixed
zbuf overhead costs paled in comparison.
Figure 1: Data flow through a TCP/IP stack.
Figure 2: Data flow through a TCP/IP stack equipped with a zero-copy
interface.
Figure 3: (a) A simple zbuf; (b) complex zbufs.
Figure 4: (a) BSD code fragment; (b) zbuf code fragment.
(a)
pBuffer = malloc (BUFLEN);while ((readLen = read (fdDevice, pBuffer, BUFLEN))
> 0) write (fdSock, pBuffer, readLen);
(b)
pBuffer = malloc (BUFLEN * BUFNUM); /* allocate memory */for (ix = 0; ix <
(BUFNUM - 1); ix++, pBuffer += BUFLEN) appBufRetn (pBuffer); /* fill list of
free bufs */while ((readLen = read (fdDevice, pBuffer, BUFLEN)) > 0) { zId =
zbufCreate (); /* insert into new zbuf */ zbufInsertBuf (zId, NULL, 0,
pBuffer, readLen, appBufRetn, 0); zbufSockSend (fdSock, zId, readLen, 0); /*
send zbuf */ pBuffer = appBufGet (WAIT_FOREVER); /* get a fresh buffer */ }
Table 1: The zbuf API. (a) Creation and deletion routines; (b) data-copying
routines; (c) operations; (d) segment routines.
Routine Description
(a)
zbufCreate() Create an empty zbuf.zbufDelete() Delete a zbuf and free any
associated segments.

(b)
zbufInsertBuf() Create a zbuf segment from a buffer and insert into a
zbuf.zbufInsertCopy() Copy buffer data into a zbuf.zbufExtractCopy() Copy data
from a zbuf to a buffer.

(c)
zbufLength() Determine the length in bytes of a zbuf.zbufDup() Duplicate a
zbuf.zbufInsert() Insert a zbuf into another zbuf.zbufSplit() Split a zbuf
into two separate zbufs.zbufCut() Delete bytes from a zbuf.

(d)
zbufSegFind() Find the zbuf segment containing a specified byte
location.zbufSegNext() Get the next segment in a zbuf.zbufSegPrev() Get the
previous segment in a zbuf.zbufSegData() Determine the location of data in a
zbuf segment.zbufSegLength() Determine the length of a zbuf segment.
Table 2: zbuf socket API.
Routine Description
zbufSockSend() Send zbuf data to a TCP socket.
zbufSockSendto() Send a zbuf message to a UDP socket.
zbufSockBufSend() Create a zbuf and send it as a TCP socket data.
zbufSockBufSendto() Create a zbuf and send it as a UDP socket message.
zbufSockRecv() Receive data in a zbuf from a TCP socket.
zbufSockRecvfrom() Receive a message in a zbuf from a UDP socket.
Table 3: zbuf socket interface benchmark results. Tests run from tcpBlaster to
tcpBlastee. zbuf socket interface used on MVME147, not MVME167. (a) Receive:
MVME167-->MVME147; (b) Transmit: MVME147-->MVME167. 
bufSize Throughput Throughput zbuf Speedup
sockBuf BSD zbuf Over BSD
 Interface Interface
(a) 1K,5K 494 KB/sec 513 KB/sec 3.9%5K,15K 644 KB/sec 717 KB/sec 11.0%15K,45K
682 KB/sec 770 KB/sec 13.0%
(b) 1K,5K 491 KB/sec 518 KB/sec 5.5%5K,15K 684 KB/sec 798 KB/sec 17.0%15K,45K
704 KB/sec 844 KB/sec 20.0%

Listing One
/* tcpBlaster.c - test code to send TCP data to a remote target */
/* Copyright 1995 Wind River Systems, Inc. */
/* modification history: 01a,26may95,dzb cleaned-up from 
 original blaster src. Added zbuf support.
*/
/* includes */
#ifdef UNIX

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <errno.h>
#else /* UNIX */
#include "vxWorks.h"
#include "sockLib.h"
#include "errnoLib.h"
#include "zbufSockLib.h"
#include "ioLib.h"
#include "inetLib.h"
#include "in.h"
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#endif /* UNIX */
/*******************************************************************************
* tcpBlaster - continuously send TCP data to a socket
* This routine opens a TCP socket connection to a remote target, which
* should be running tcpBlastee. This routine then enters a continuous loop
* where TCP data is sent to the connected socket. The remote target,
* remote port number, send buffer size, and send and receive socket
* buffer sizes must be set through parameters to this routine.
* Compile with UNIX if running on a UNIX system, else VxWorks is assumed.
* Compile with INCLUDE_ZBUF_SOCK for VxWorks zbuf socket interface support.
* RETURNS: N/A
*/
#ifdef UNIX
main (argc, argv)
 int argc;
 char * argv [];
 {
 struct hostent * targetHost; /* remote target hostname */
 int targetPort; /* remote target port number */
 int sendSize; /* size of buffer to send */
 int sockBuf; /* size of socket buffers */
#else /* UNIX */
void tcpBlaster /* VxWorks entry point */
 (
 char * targetAddr, /* remote target IP address */
 int targetPort, /* remote target port number */
 int sendSize, /* size of buffer to send */
 int sockBuf /* size of socket buffers */
 )
 {
#ifdef INCLUDE_ZBUF_SOCK
 ZBUF_ID zbufSend; /* zbuf for sending */
 ZBUF_ID zbufSave; /* zbuf for duplication */
#endif /* INCLUDE_ZBUF_SOCK */
#endif /* UNIX */
 struct sockaddr_in sin; /* socket address struct */
 int sFd; /* socket fd */
 char * pBuffer; /* buffer to send */
 bzero ((char *) &sin, sizeof (sin));
#ifdef UNIX
 if (argc < 5) /* check args */
 {

 printf ("usage: %s remoteName remotePort sendSize sockBuf\n",argv [0]);
 exit (1);
 }
 targetHost = gethostbyname (argv[1]); /* process args */
 targetPort = atoi (argv [2]);
 sendSize = atoi (argv [3]);
 sockBuf = atoi (argv [4]);
 /* set remote target address values */
 if (targetHost == 0 && (sin.sin_addr.s_addr = inet_addr (argv [1])) == -1)
 {
 fprintf (stderr, "%s: unkown host\n", argv [1]);
 exit (2);
 }
 if (targetHost != 0)
 bcopy (targetHost->h_addr, &sin.sin_addr, targetHost->h_length);
#else /* UNIX */
 sin.sin_addr.s_addr = inet_addr (targetAddr);
#endif /* UNIX */
 sin.sin_port = htons (targetPort);
 sin.sin_family = AF_INET;
 /* allocate buffer to be sent to remote target */
 if ((pBuffer = (char *) malloc (sendSize)) == NULL)
 {
 printf ("cannot allocate buffer of size %d\n", sendSize);
 exit (1);
 }
 if ((sFd = socket (AF_INET, SOCK_STREAM, 0)) < 0) /* open socket */
 {
 printf ("cannot open socket\n");
 free (pBuffer);
 exit (1);
 }
 /* set socket buffer sizes to user-specified value */
 if (setsockopt (sFd, SOL_SOCKET, SO_RCVBUF, (char *) &sockBuf,
 sizeof (sockBuf)) < 0)
 {
 printf ("setsockopt SO_RCVBUF failed\n");
 free (pBuffer);
 exit (1);
 }
 if (setsockopt (sFd, SOL_SOCKET, SO_SNDBUF, (char *) &sockBuf,
 sizeof (sockBuf)) < 0)
 {
 printf ("setsockopt SO_SNDBUF failed\n");
 free (pBuffer);
 exit (1);
 }
#ifdef INCLUDE_ZBUF_SOCK
 /* create master zbuf and enroll send buffer into created zbuf */
 if ((zbufSave = zbufCreate ()) == NULL)
 {
 printf ("zbufCreate failed\n");
 free (pBuffer);
 exit (1);
 }
 if (zbufInsertBuf (zbufSave, NULL, 0, pBuffer, sendSize, NULL, NULL) ==
 NULL)
 {
 printf ("zbufInsertBuf failed\n");

 zbufDelete (zbufSave);
 free (pBuffer);
 exit (1);
 }
#endif /* INCLUDE_ZBUF_SOCK */
 /* connect to remote target */
 if (connect (sFd, (struct sockaddr *) &sin, sizeof (sin)) < 0)
 {
 printf ("connect failed: host %s port %d\n", inet_ntoa (sin.sin_addr),
 ntohs (sin.sin_port));
 free (pBuffer);
 exit (1);
 }
 for (;;)
 {
#ifdef I
CLUDE_ZBUF_SOCK
 /* duplicate master zbuf - duplicate will be sent */
 if ((zbufSend = zbufDup (zbufSave, NULL, 0, sendSize)) == NULL)
 {
 printf ("zbufDup failed\n");
 break;
 }
 /* send data to remote target */
 if (zbufSockSend (sFd, zbufSend, sendSize, 0) < 0)
#else /* INCLUDE_ZBUF_SOCK */
 if (write (sFd, pBuffer, sendSize) < 0)
#endif /* INCLUDE_ZBUF_SOCK */
 {
 printf ("tcpBlaster write error: %d\n", errno);
 break;
 }
 }
 close (sFd); /* cleanup */
 free (pBuffer);
#ifdef INCLUDE_ZBUF_SOCK
 zbufDelete (zbufSave);
#endif /* INCLUDE_ZBUF_SOCK */
 printf ("tcpBlaster exit\n");
 }

Listing Two
/* tcpBlastee.c - test code to receive TCP data from a remote target */
/* Copyright 1995 Wind River Systems, Inc. */
/* modification history: 01a,26may95,dzb cleaned-up from 
 original blastee src. Added zbuf support.
*/
/* includes */
#ifdef UNIX
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <sys/time.h>
#include <signal.h>
#include <errno.h>
#else /* UNIX */
#include "vxWorks.h"

#include "logLib.h"
#include "sockLib.h"
#include "zbufSockLib.h"
#include "sysLib.h"
#include "ioLib.h"
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "wdLib.h"
#include "socket.h"
#include "in.h"
#endif /* UNIX */
/* globals */
#ifndef UNIX
WDOG_ID tcpBlasteeWd = NULL;
#endif /* UNIX */
int tcpBlasteeRecv = 0;
int tcpWdIntvl = 10;
/* function declarations */
void tcpBlasteeRate ();
/*******************************************************************************
* tcpBlastee - continuously receive TCP data from a socket
* This routine opens a TCP socket and waits for a remote target to conenct.
* The remote target should be running tcpBlaster. This routine then enters
* a continuous loop where TCP data is read from the connected socket. The
* local port number, receive buffer size, and the send and receive socket
* buffer sizes must be set through parameters to this routine.
* Compile with UNIX if running on a UNIX system, else VxWorks is assumed.
* Compile with INCLUDE_ZBUF_SOCK for VxWorks zbuf socket interface support.
* RETURNS: N/A
*/
#ifdef UNIX
main (argc, argv)
 int argc;
 char * argv [];
 {
 int localPort; /* local port number */
 int recvSize; /* size of buffer to receive */
 int sockBuf; /* size of socket buffers */
#else /* UNIX */
void tcpBlastee
 (
 int localPort, /* local port number */
 int recvSize, /* size of buffer to receive */
 int sockBuf /* size of socket buffers */
 )
 {
#ifdef INCLUDE_ZBUF_SOCK
 ZBUF_ID zbufRecv; /* received zbuf */
#endif /* INCLUDE_ZBUF_SOCK */
#endif /* UNIX */
 struct sockaddr_in sin; /* local socket address */
 struct sockaddr_in from; /* remote socket address */
 char * pBuffer; /* buffer to receive into */
 int fFd; /* master socket fd */
 int sFd; /* slave socket fd */
 int numRead; /* number of bytes read */
 int len = sizeof (from); /* length of remote sock addr */
 bzero ((char *) &sin, sizeof (sin));

 bzero ((char *) &from, sizeof (from));
#ifdef UNIX
 if (argc < 4) /* check args */
 {
 printf ("usage: %s localPort recvSize sockBuf\n", argv [0]);
 exit (1);
 }
 localPort = atoi (argv [1]); /* process args */
 recvSize = atoi (argv [2]);
 sockBuf = atoi (argv [3]);
 signal (SIGALRM, tcpBlasteeRate); /* set up stats timer */
 alarm (tcpWdIntvl);
#else /* UNIX */
 if (tcpBlasteeWd == NULL && (tcpBlasteeWd = wdCreate ()) == NULL)
 {
 printf ("cannot create tcpBlastee watchdog\n");
 exit (1);
 }
 wdStart (tcpBlasteeWd, sysClkRateGet () * tcpWdIntvl,
 (FUNCPTR) tcpBlasteeRate, 0);
#endif /* UNIX */
 sin.sin_port = htons (localPort);
 sin.sin_family = AF_INET;
 /* allocate buffer into which data will be received (non-zbuf) */
 if ((pBuffer = (char *) malloc (recvSize)) == NULL)
 {
 printf ("cannot allocate buffer of size %d\n", recvSize);
 exit (1);
 }
 if ((fFd = socket (AF_INET, SOCK_STREAM, 0)) < 0) /* open socket */
 {
 printf ("cannot open socket\n");
 free (pBuffer);
 exit (1);
 }
 if (bind (fFd, (struct sockaddr *) &sin, sizeof (sin)) < 0) /* set local */
 {
 printf ("bind error\n");
 free (pBuffer);
 exit (1);
 }
 if (listen (fFd, 5) < 0) /* set listen queue */
 {
 printf ("listen failed\n");
 free (pBuffer);
 exit (1);
 }
 /* wait for incoming socket connections */
 while ((sFd = accept (fFd, (struct sockaddr *) &from, &len)) == -1)
 ;
 /* set socket buffer sizes to user-specified value */
 if (setsockopt (sFd, SOL_SOCKET, SO_RCVBUF, (char *) &sockBuf,
 sizeof (sockBuf)) < 0)
 {
 printf ("setsockopt SO_RCVBUF failed\n");
 free (pBuffer);
 exit (1);
 }
 if (setsockopt (sFd, SOL_SOCKET, SO_SNDBUF, (char *) &sockBuf,

 sizeof (sockBuf)) < 0)
 {
 printf ("setsockopt SO_SNDBUF failed\n");
 free (pBuffer);
 exit (1);
 }
 tcpBlasteeRecv = 0; /* reset bytes received */
 for (;;)
 {
#ifdef INCLUDE_ZBUF_SOCK
 numRead = recvSize; /* set desired receive size */
 /* read data from socket */
 if (((zbufRecv = zbufSockRecv (sFd, 0, &numRead)) == NULL) 
 (numRead == 0))
#else /* INCLUDE_ZBUF_SOCK */
 if ((numRead = read (sFd, pBuffer, recvSize)) <= 0)
#endif /* INCLUDE_ZBUF_SOCK */
 {
 printf ("tcpBlastee read error: %d\n", errno);
 break;
 }
#ifdef INCLUDE_ZBUF_SOCK
 zbufDelete (zbufRecv); /* delete zbuf - return buf */
#endif /* INCLUDE_ZBUF_SOCK */
 tcpBlasteeRecv += numRead; /* track of total bytes */
 }
 close (fFd); /* cleanup */
 close (sFd);
 free (pBuffer);
#ifndef UNIX
 wdCancel (tcpBlasteeWd);
#endif /* UNIX */
 printf ("tcpBlastee exit.\n");
 }
/*******************************************************************************
*
* tcpBlasteeRate -
* RETURNS: N/A
*/
void tcpBlasteeRate ()
 {
 /* print stats for user's benefit */
 if (tcpBlasteeRecv > 0) /* incoming data ? */
 {
#ifdef UNIX
 printf ("%d bytes/sec tot %d\n", tcpBlasteeRecv / tcpWdIntvl,
 tcpBlasteeRecv);
#else /* UNIX */
 logMsg ("%d bytes/sec\n", tcpBlasteeRecv / tcpWdIntvl, 0, 0, 0, 0, 0);
#endif /* UNIX */
 tcpBlasteeRecv = 0;
 }
 else /* no data in last interval */
 {
#ifdef UNIX
 printf ("No bytes read in the last 10 seconds.\n");
#else /* UNIX */
 logMsg ("No bytes read in the last 10 seconds.\n", 0, 0, 0, 0, 0, 0);
#endif /* UNIX */

 }
 /* re-schedule stats timer for another interval */
#ifdef UNIX
 signal (SIGALRM, tcpBlasteeRate);
 alarm (tcpWdIntvl);
#else /* UNIX */
 wdStart (tcpBlasteeWd, sysClkRateGet () * tcpWdIntvl,
 (FUNCPTR) tcpBlasteeRate, 0);























































A VBX for UDP


A custom control for network development




Frank E. Redmond III


Frank is the director of software development for a
distributed-information-systems company in Michigan. He can be contacted via
CompuServe at 76352,343.


A variety of communication protocols are available for building network
applications: IPX/SPX, NetBIOS, and Named Pipes, to name a few. On a
network-related project I was recently involved in, however, we decided to
implement TCP/IP, primarily because it is independent of both network topology
and platform. 
Our job was to facilitate communication over a LAN between several
applications; therefore, nearly 70 percent of the code I wrote was
communications related. Initially, our applications were to be deployed on
Windows-based machines, but since we eventually intend to support platforms
such as Macintosh and UNIX, our network communications protocol had to be as
platform independent as possible; hence, our decision to support TCP/IP.
To create the user interface on the front end, I turned to Visual Basic (VB),
which sped up the development process and allowed me to encapsulate the
network-related code into a VB Custom Control (VBX). In this article, I'll
present that VBX, along with an overview of TCP/IP protocols. Additionally,
I'll briefly describe how to port VBXs to OLE Custom Controls (OCXs) to ensure
compatibility with future versions of Visual Basic and other development
environments. 


TCP/IP Overview


Transmission Control Protocol (TCP) and Internet Protocol (IP) are the two
most important protocols of the Internet Protocol Suite. TCP/IP is topology
independent: It works with bus, ring, and star network topologies, spanning
LANs and WANs. Since TCP/IP is freely available for independent
implementation, it is also network-operating-system and platform independent.
In addition, TCP/IP is independent of the physical network hardware: It can be
used with Ethernet, token ring, and others.
Of the two, IP is more important because it's used by all other TCP/IP
protocols. IP is responsible for moving data from computer to computer using
an "IP address," a unique 32-bit number assigned to each computer. Other
higher-level protocols are used to move data between programs on different
computers by way of a "port," a unique 16-bit number assigned to each program.
Combined, an IP address and a program port number constitute a TCP/IP socket,
which uniquely identifies every program on every computer. The two most
popular protocols that rely on TCP/IP sockets are TCP and User Datagram
Protocol (UDP).
TCP is considered "connection oriented," because the two communicating
machines exchange a handshaking dialogue before data transmission begins. TCP
uses a checksum to validate received data. If the checksums don't match, the
sender automatically resends the data, without programmer intervention; thus
TCP guarantees that data is delivered, and delivered in order. TCP is best
used for stream-oriented data.
UDP is considered "connectionless." It requires less overhead since there's no
handshaking. However, UDP provides unreliable data delivery: Data may be
duplicated, arrive out of order, or not arrive at all. Though unreliable, UDP
is very efficient and allows you to utilize your own ACK/NAK. UDP is best
suited for record-oriented data where all of the information can be sent in
one packet.
TCP/IP supports Windows-based machines via the Windows Sockets (WinSock) API,
which lets you write applications to the WinSock specification. Applications
will then run on any Winsock-compatible TCP/IP protocol stack. The WinSock API
is in the form of a DLL--WINSOCK.DLL for 16-bit applications and WSOCK32.DLL
for 32-bit apps. You simply write to the WinSock API and link your
applications with the appropriate library. 
WinSock API functions fall into one of three categories: Windows Sockets
functions, Windows Sockets database functions, and Windows-specific functions.
Windows Sockets functions are a subset of the Berkeley sockets routines; see
Table 1. Windows Sockets database functions convert the human-readable host
and client names into a computer-usable format; see Table 2. Windows-specific
functions are extensions to support the event-driven architecture of Windows;
see Table 3. (For more information, see Network Interrupts, by Ralf Brown and
Jim Kyle, Addison-Wesley, 1994.) 


Control Details


As previously mentioned, our LAN-based control project called for extensive
network communication. To help with code reuse, I implemented the network
routines as a VBX. To use the TCP protocol, one socket must be established for
every client that the host will communicate with. If a host is connected to
100 clients using TCP, then the host will need to establish 100 sockets, as
opposed to establishing one UDP socket on the host and communicating with all
100 clients. In light of this (and because a limited number of TCP sockets can
be opened simultaneously for communication), I wrote a UDP-based VBX that is
invisible at run time. This VBX establishes and terminates the socket, and
asynchronously sends and receives data.
In relation to the WinSock API, you first initialize the underlying WinSock
DLL and confirm that the version of the DLL is compatible with the VBX's
requirements; see Example 1. This is done in the VBINITCC routine, which is
called each time an application loads the VBX. 
Next, the control's Connected property is set to True, and a socket is
established in response. The Connected property can only be set at run time,
and before it is set, no read/write activity can take place. 
You then call WSAAsyncSelect to set up an asynchronous event notifier that
generates a user-defined message (WM_USER_ASYNC_SELECT) whenever data arrives
or a request to send data is made; see Example 2.
For the VBX, a request to send data is made by setting the control's Send
property to True; this is possible only at run time. The control responds by
placing a WM_USER_ASYNC_SELECT message in its own queue via PostMessage.
Finally, WSACleanup is called to release all resources allocated by
WSAStartup. There must be one call to WSACleanup for every call to WSAStartup.
In the case of the VBX, WSACleanup is called during the VBTERMCC routine,
which in turn is called each time an application unloads the VBX; see Example
3. The rest of the code is pretty straightforward C; see the file datagram.c
in Listing One. The complete VBX (source and binary) is available
electronically; see "Availability," page 3. For more information on WinSock
programming, see Programming WinSock, by Arthur Dumas (Sams, 1995).


VBX-to-OCX Porting 


To be compatible with future versions of Visual Basic and other development
environments, I ported the UDP VBX to an OLE Custom Control (OCX). The OLE
Control Development Kit (CDK) made porting the UDP control a two-step process:
1. Use the OLE CDK to create a working skeleton. 
2. Appropriately place the code that is specific to the UDP VBX in the newly
formed OCX skeleton. 
One problem I encountered stemmed from the invisible-at-run-time option. If
this is checked in the ControlWizard, the resulting control does not have a
window at run time. This is problematic because the second parameter required
to set up the asynchronous event notifier is the hwnd of the control window
that will receive the user-defined event. To achieve the invisible-at-run-time
effect, the control is hidden at run time with a call to ShowWindow; see
Example 4.


Conclusion


To illustrate the use of the UDP control, I've written a simple chat program
that's available electronically. As Figure 1 illustrates, the program allows
two users to communicate over a TCP/IP connection. To send a message, the user
types a message in the Messages Out box and presses the Send button. Messages
received will automatically appear in the Messages In box. The program is
simple, thanks to the UDP control. Table 4 is a list of the properties
supported by the UDP control, while Example 5 is a typical UDP event. These,
as well as the sample code for the chat program, can be used as a reference
for programming with the UDP control. 
Figure 1: Sample chat program.
Table 1: Windows Sockets functions.
 Function Description
accept Accept an incoming connection.

bind Bind a name to a socket.
closesocket Close a socket.
connect Initiate a connection.
getsockname Get the name bound to a socket.
getsockopt Get the settings for a socket.
htonl Convert u_long to network byte-order.
htons Convert u_short to network byte-order.
inet_addr Convert dotted-decimal IP address into 32-bit number.
inet_ntoa Convert 32-bit number into dotted decimal IP address.
ioctlsocket I/O control of a socket.
listen Listen for connections to this socket.
ntohl Convert u_long to host byte-order.
ntohs Convert u_short to host byte-order.
recv Receive data on a connected socket.
recvfrom Receive data on a socket, along with IP address
 and port number.
select Synchronous I/O multiplexing.
send Send data over a connected socket.
sendto Send data to a specific socket.
setsockopt Set socket options.
shutdown Shut down.
socket Create an endpoint for communication.
Table 2: Windows Sockets database functions.
Function Description
gethostbyaddr Return host name, given IP address.
gethostbyname Return IP address, given host name.
gethostname Return host name.
getprotobyname Return protocol name and number, given a protocol name.
getprotobynumber Return protocol name and number, given a protocol number.
getservbyname Return service name and port, given name and protocol.
getservbyport Return service name and port, given port and protocol.
Table 3: Windows Sockets Windows-specific functions.
Function Description
WSAAsyncGetHostByAddr Asynchronous version of gethostbyaddr.
WSAAsyncGetHostByName Asynchronous version of gethostbyname.
WSAAsyncGetProtoByName Asynchronous version of getprotobyname.
WSAAsyncGetProtoByNumber Asynchronous version of getprotobynumber.
WSAAsyncGetServByName Asynchronous version of getservbyname.
WSAAsyncGetServByPort Asynchronous version of getservbyport.
WSAAsyncSelect Asynchronous version of select.
WSACancelAsyncRequest Cancel outstanding WSAAsyncGet call.
WSACancelBlockingCall Cancel outstanding blocking call.
WSACleanup Release resources allocated by WSAStartup.
WSAGetLastError Return details of last API error.
WSAIsBlocking Determine if there is an outstanding blocking call.
WSASetBlockingHook "Hook" underlying blocking mechanism.
WSASetLastError Set error code to be returned by WSAGetLastError.
WSAStartup Initialize underlying DLL.
WSAUnhookBlockingHook Restore original blocking mechanism.
Table 4: UDP control properties and usage chart.
Property Description
Connected Establishes control's send/receive services;
 undefinable at design time.
Disconnected Terminates control's send/receive services;
 undefinable at design time.
ErrorCode Displays code representing error status of last UDP operation.
MaxBufferSize Size in bytes of the largest packet allowed; read-only.
MyAddress TCP/IP address (in dotted decimal notation) of the
 hosting computer; read-only.

MyPort Port number that the control will respond to.
Send Toggled to True to send data; undefinable at design time.
ToAddress Destination TCP/IP address (in dotted decimal notation)
 identifying the computer that data will be sent to.
ToData Data to be sent.
ToPort Destination port identifying the application that data
 will be sent to.
Example 1: Confirming DLL version.
//initialize underlying WinSock DLL
if (WSAStartup(VersionRequested,&wsaData)!=0) return FALSE;
 //make sure that the VBX supports this version of WinSock
if ((LOBYTE(wsaData.wVersion)!=LOBYTE(VersionRequested))
 (HIBYTE(wsaData.wVersion)!=HIBYTE(VersionRequested)))
 {
 WSACleanup();
 return FALSE;
 }
Example 2: Setting up an asynchronous event notifier.
if (WSAAsyncSelect(DATAGRAMDEREF(hctl)->mappsocket, hwnd,
 WM_USER_ASYNC_SELECT, FD_READFD_WRITE)==SOCKET_ERROR)
DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
Example 3: Releasing resources.
void FAR PASCAL _export VBTERMCC(void)
{ 
 //terminate the usage of the WinSock DLL
 WSACleanup();
}//VBTERMCC
Example 4: Hiding control at run time.
void CUdpCtrl::OnDraw(CDC* pdc, const CRect& rcBounds, const
Rect& rcInvalid)
{
 CBitmap bitmap;
 BITMAP bmp;
 CPictureHolder picHolder;
 CRect rcSrcBounds;
 if (AmbientUserMode()==MODE_DESIGN) 
 {
 //load bitmap
bitmap.LoadBitmap(IDB_UDP); bitmap.GetObject(sizeof(BITMAP),&bmp);
 rcSrcBounds.right=bmp.bmWidth;
 rcSrcBounds.bottom=bmp.bmHeight;
//create picture and render picHolder.CreateFromBitmap((HBITMAP)
// bitmap.m_hObject,NULL,FALSE); picHolder.Render(pdc,rcBounds,rcSrcBounds);
 }
 else ShowWindow(SW_HIDE);
}
Example 5: Definition of UDP control DataIn event.
Sub UDP1_DataIn(FromAddress As String,
 FromPort As Integer, FromData As String)

Listing One
#include "datagram.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#define ERR_None 0L
#define ERR_MethodNotSupported 421
#define ERR_ReadOnlyProperty 20000 //should be 383 but I created 

 //my own text for the message
#define ERR_NOTCONNECTED 20001
#define HOST_NAME_LEN 50
#define WM_USER_ASYNC_SELECT (WM_USER+201)
#define MAKEWORD(a, b) ((WORD)(((BYTE)(a)) ((WORD)((BYTE)(b))) << 8))
HANDLE hmodDLL;
WSADATA wsaData;
BOOL initialized=FALSE;
WORD VersionRequested=MAKEWORD(1,1); 
 //compatible with WinSock 1.1
unsigned int MaxUDPBufferSize;
char hostAddress[16];
int FAR PASCAL WEP(int);
int FAR PASCAL LibMain(hModule, wDataSeg, cbHeapSize, lpszCmdLine)
HANDLE hModule;
WORD wDataSeg;
WORD cbHeapSize;
LPSTR lpszCmdLine;
{
 hmodDLL = hModule;
 return 1;
}//LibMain
BOOL FAR PASCAL _export VBINITCC(USHORT usVersion,BOOL fRuntime)
{
 // this function is automatically called whenever an instance of the 
 // control loaded into memory
 char hostName[HOST_NAME_LEN];
 PHOSTENT phostent;
 IN_ADDR in;
 //Run-Time-Only
 //uncomment the following lines to make a run-time only control
/*
 if (!fRuntime)
 return FALSE;
*/
 if (WSAStartup(VersionRequested,&wsaData)!=0) return FALSE; 
 //make sure that an appropriate version of WinSock is supported
 if ((LOBYTE(wsaData.wVersion)!=LOBYTE(VersionRequested))
 (HIBYTE(wsaData.wVersion)!=HIBYTE(VersionRequested)))
 {
 WSACleanup();
 return FALSE;
 }
 if (!initialized)
 {
 //only need to get the host name one time
 gethostname(hostName,HOST_NAME_LEN);
 phostent=gethostbyname(hostName);
 memcpy(&in,phostent->h_addr,4);
 memcpy(hostAddress,inet_ntoa(in),16);
 //only need to get the max buffer size one time
 MaxUDPBufferSize=(unsigned int)wsaData.iMaxUdpDg;
 //the follwing is done because the calls to recv and sendto 
 //take an integer as their second parameter, not an unsigned int
 if (MaxUDPBufferSize>INT_MAX)
 MaxUDPBufferSize=INT_MAX;
 initialized=TRUE;
 }
 // Register control(s)

 return VBRegisterModel(hmodDLL, &modelDATAGRAM);
}//VBINITCC
int FAR PASCAL WEP(int nShutdownFlag)
{
 return 1;
}//End WEP
void FAR PASCAL _export VBTERMCC(void)
{ 
 // this function is automatically called whenever an instance of the 
 // control is unloaded from memory
 WSACleanup();
 return;
}//VBTERMCC
LONG FAR PASCAL _export DataGramCtlProc(HCTL hctl,HWND hwnd,USHORT msg,
 USHORT wp,LONG lp)
{
 HSZ tempVBString; 
 LPSTR lpstr=NULL;
 LPSTR destAddr=NULL;
 char fromAddr[16];
 char *tempIn;
 SOCKADDR_IN addr;
 int addrLen=sizeof(addr);
 int fromPort;
 unsigned int nBytesSent,nBytesRecv;
 unsigned int stringLen;
 EVENT_PARAMS params;
 IN_ADDR inFrom;
 switch (msg)
 {
 case WM_NCCREATE: 
 //default the properties
 DATAGRAMDEREF(hctl)->mToPort=0;
 tempVBString=VBCreateHsz((_segment)hctl,(LPSTR)hostAddress);
 DATAGRAMDEREF(hctl)->mMyAddress=tempVBString;
 DATAGRAMDEREF(hctl)->mMyPort=0;
 DATAGRAMDEREF(hctl)->mMaxBufferSize=(long int)MaxUDPBufferSize;
 DATAGRAMDEREF(hctl)->mErrorCode=0; 
 DATAGRAMDEREF(hctl)->mSend=FALSE;
 DATAGRAMDEREF(hctl)->mConnected=FALSE;
 DATAGRAMDEREF(hctl)->mDisconnected=TRUE;
 DATAGRAMDEREF(hctl)->mappsocket=INVALID_SOCKET;
 break;
 case WM_NCDESTROY:
 //free string memory allocated with VBCreateHsz
 if (DATAGRAMDEREF(hctl)->mMyAddress)
 {
 VBDestroyHsz(DATAGRAMDEREF(hctl)->mMyAddress);
 DATAGRAMDEREF(hctl)->mMyAddress=NULL;
 } 
 //close any opened socket
 if (DATAGRAMDEREF(hctl)->mappsocket!=INVALID_SOCKET)
 {
 if (closesocket(DATAGRAMDEREF(hctl)->mappsocket)==SOCKET_ERROR)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 DATAGRAMDEREF(hctl)->mappsocket=INVALID_SOCKET;
 } 
 break; 
 case VBM_METHOD:

 //there are no methods supported
 return VBSetErrorMessage(ERR_MethodNotSupported,
 "Method not applicable for this object.");
 case WM_USER_ASYNC_SELECT:
 if (WSAGETSELECTERROR(lp)!=0)
 return ERR_None;
 switch(WSAGETSELECTEVENT(lp))
 {
 case FD_READ:
 if (!DATAGRAMDEREF(hctl)->mSend)
 {
 tempIn=(char *)calloc(MaxUDPBufferSize,sizeof(char));
 //read the data
 nBytesRecv=recvfrom(DATAGRAMDEREF(hctl)->mappsocket,
 tempIn,MaxUDPBufferSize,0,(LPSOCKADDR)&addr,&addrLen);
 if (nBytesRecv==SOCKET_ERROR)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 else
 {
 //get the from address
 memcpy(&inFrom,&addr.sin_addr.s_addr,4);
 memcpy(fromAddr,inet_ntoa(inFrom),16);
 //get the from port
 fromPort=ntohs(addr.sin_port);
 //set the event parameters
 params.FromPort=&fromPort;
 params.FromAddr = VBCreateHlstr(fromAddr, 
 lstrlen(fromAddr));
 params.FromData = VBCreateHlstr(tempIn, 
 lstrlen(tempIn));
 //fire the event
 VBFireEvent(hctl,IEVENT_DATAGRAM_DATAIN,&params);
 //free string memory allocated with VBCreateHlstr
 VBDestroyHlstr(params.FromData);
 VBDestroyHlstr(params.FromAddr);
 } 
 if (tempIn)
 free(tempIn);
 }
 break;
 case FD_WRITE:
 //has there been a notification to send data
 if (DATAGRAMDEREF(hctl)->mSend)
 {
 //get the destination port
 addr.sin_family=AF_INET;
 addr.sin_port=htons(DATAGRAMDEREF(hctl)->mToPort);
 //get the destination address
 destAddr=VBLockHsz(DATAGRAMDEREF(hctl)->mToAddress);
 addr.sin_addr.s_addr=inet_addr(destAddr);
 VBUnlockHsz(DATAGRAMDEREF(hctl)->mToAddress);
 if (DATAGRAMDEREF(hctl)->mToData)
 { 
 lpstr=VBLockHsz(DATAGRAMDEREF(hctl)->mToData);
 stringLen=lstrlen(lpstr);
 //send the data
 nBytesSent=sendto(DATAGRAMDEREF(hctl)->mappsocket,
 lpstr,stringLen,0,(LPSOCKADDR)&addr,sizeof(addr));
 if (nBytesSent==SOCKET_ERROR)

 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 //unlock it 
 VBUnlockHsz(DATAGRAMDEREF(hctl)->mToData);
 } 
 //reset the send flag
 DATAGRAMDEREF(hctl)->mSend=FALSE;
 }
 break;
 default:
 break;
 }
 return ERR_None;
 case VBM_SETPROPERTY:
 //called whenever a property is set
 switch(wp)
 {
 case IPROP_DATAGRAM_TOADDRESS:
 return ERR_None;
 case IPROP_DATAGRAM_TOPORT:
 return ERR_None;
 case IPROP_DATAGRAM_TODATA:
 return ERR_None;
 case IPROP_DATAGRAM_MYADDRESS:
 //read-only
 //clear it out
 if (DATAGRAMDEREF(hctl)->mMyAddress)
 VBDestroyHsz(DATAGRAMDEREF(hctl)->mMyAddress);
 //then set it to what the default is
 tempVBString=VBCreateHsz((_segment)hctl,(LPSTR)hostAddress);
 DATAGRAMDEREF(hctl)->mMyAddress=tempVBString;
 return VBSetErrorMessage(ERR_ReadOnlyProperty,
 "Property is read-only.");
 case IPROP_DATAGRAM_MYPORT:
 return ERR_None;
 case IPROP_DATAGRAM_MAXBUFFERSIZE:
 //read-only
 DATAGRAMDEREF(hctl)->mMaxBufferSize=(long int)MaxUDPBufferSize;
 return VBSetErrorMessage(ERR_ReadOnlyProperty,
 "Property is read-only.");
 case IPROP_DATAGRAM_ERRORCODE:
 return ERR_None;
 case IPROP_DATAGRAM_SEND:
 if (VBGetMode()==MODE_DESIGN) 
 //cannot set this property at design-time
 DATAGRAMDEREF(hctl)->mSend=FALSE;
 else
 //place a FD_WRITE message in the application message que
 PostMessage(hwnd,WM_USER_ASYNC_SELECT,
 DATAGRAMDEREF(hctl)->mappsocket, 
 WSAMAKESELECTREPLY(FD_WRITE,0));
 return ERR_None;
 case IPROP_DATAGRAM_CONNECTED:
 if (VBGetMode()==MODE_DESIGN) 
 //cannot set this property as design-time
 DATAGRAMDEREF(hctl)->mConnected=FALSE;
 else
 {
 if (DATAGRAMDEREF(hctl)->mConnected)
 {

 addr.sin_family=AF_INET;
 addr.sin_port=htons(DATAGRAMDEREF(hctl)->mMyPort);
 addr.sin_addr.s_addr=htonl(INADDR_ANY); 
 DATAGRAMDEREF(hctl)->mappsocket=socket(AF_INET,SOCK_DGRAM,0);
 if (DATAGRAMDEREF(hctl)->mappsocket==INVALID_SOCKET)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 else
 {
 if (bind(DATAGRAMDEREF(hctl)->mappsocket,
 (LPSOCKADDR)&addr,sizeof(addr))==SOCKET_ERROR)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 else
 {
 //find out the port number that may have been
 //automatically assigned if myPort was 0
 getsockname(DATAGRAMDEREF(hctl)->mappsocket,
 (LPSOCKADDR)&addr,&addrLen);
 DATAGRAMDEREF(hctl)->mMyPort=ntohs(addr.sin_port);
 //setup the asynchronous read/write handler
 if (WSAAsyncSelect(DATAGRAMDEREF(hctl)->mappsocket,
 hwnd,WM_USER_ASYNC_SELECT,
 FD_READFD_WRITE)==SOCKET_ERROR)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 }
 //set disconnected to false
 DATAGRAMDEREF(hctl)->mDisconnected=FALSE;
 }
 }
 else
 //cannot toggle to false
 DATAGRAMDEREF(hctl)->mConnected=TRUE;
 }
 return ERR_None;
 case IPROP_DATAGRAM_DISCONNECTED:
 if (VBGetMode()==MODE_DESIGN) 
 //cannot set this property as design-time
 DATAGRAMDEREF(hctl)->mDisconnected=TRUE;
 else
 {
 if (DATAGRAMDEREF(hctl)->mDisconnected)
 {
 //close any opened socket
 if (DATAGRAMDEREF(hctl)->mappsocket!=INVALID_SOCKET)
 {
 if (closesocket(DATAGRAMDEREF(hctl)->mappsocket)==
 SOCKET_ERROR)
 DATAGRAMDEREF(hctl)->mErrorCode=WSAGetLastError();
 DATAGRAMDEREF(hctl)->mappsocket=INVALID_SOCKET;
 } 
 //set connected to false
 DATAGRAMDEREF(hctl)->mConnected=FALSE;
 } 
 else
 //cannot toggle to false
 DATAGRAMDEREF(hctl)->mDisconnected=TRUE;
 }
 return ERR_None;
 default:
 break; 

 }
 break;
 }
 return VBDefControlProc(hctl, hwnd, msg, wp, lp);
}//DataGramCtlProc


























































Implementing Flicker-Free Motion


Moving objects smoothly




Paul J. Martino


Paul, a computer-science student at Lehigh University, has consulted for
Gimpel Software on the PC-Lint for C/C++ project. He is currently developing a
multimedia graphics adventure game in a new programming language called
"AGPL." Paul can be contacted on CompuServe at 73340,3456.


Moving an object across a screen convincingly requires smooth motion without
flicker. To move a large object in a high-resolution mode, a great deal of
computation and a large amount of memory are necessary. In this article, I'll
present a program that implements flicker-free mouse motion that operates
efficiently in resolutions of up to 800x600x24-bit color. Given any bitmap in
a standard file format, the program will turn that bitmap into a mouse cursor
that moves fluidly across the screen. 
To implement my algorithm, you need a high-powered development environment.
I've chosen the combination of Genus Microprogramming's graphics libraries and
the Phar Lap 286DOS-Extender under Borland C++. This combination of tools
allows large bitmaps to be stored in upper memory and displayed with blazing
speed on almost any video card. 


The Algorithm


It is tempting to implement motion in the following manner:
1. Save the portion of screen background where the mouse cursor will be
placed.
2. Paint the mouse cursor. 
3. On motion, redraw the screen background saved in Step 1, then go back to
Step 1.
This method works reasonably well when the mouse is moved a great distance,
because then the current and previous cursor locations will not intersect.
Since the two regions are disjointed, there is only one screen-write operation
to any one pixel of the screen. This results in a motion without flicker.
Unfortunately, large motions are not the general case with the mouse pointing
device. Typically, motions are in very small increments, and the new cursor
intersects the old. In the intersection region, there are two screen-write
operations: The first redraws the background, and the second draws the portion
of the cursor in that window. This causes a great deal of flicker when the
mouse is moved slowly across the screen. The flicker-free algorithm I present
here addresses this problem. The algorithm performs the following:
1. Saves a portion of screen background (where the mouse cursor will be
placed) into a small buffer.
2. Paints the mouse cursor if it is not already at the proper location. 
3. On motion, calculates the minimum window that contains both previous and
current cursor positions and allocates a buffer of that size.
4. Repaints the saved screen background at the proper location within the
large buffer.
5. Saves the portion of the large buffer where the cursor will be placed into
the small buffer allocated in Step 1. 
6. Transparently places the cursor into the large buffer. 
7. Paints the large buffer to the screen.
8. Deallocates the large buffer and returns to Step 3. 


Implementation


The graphics routines that implement this algorithm are provided by a suite of
tools from Genus Microprogramming--GX Kernel, GX Graphics, GX Effects, and GX
Images 286 (protected-mode version).
GX Kernel provides low-level graphics routines that include hardware support
for more than 50 video cards. The GX Kernel transparently supports video modes
up to 1024x768x16.7 million colors. You simply select the VESA equivalent of a
resolution, and the GX Kernel tests whether a compatible hardware mode is
available. If not, the VESA will be used only as a last resort. 
The chief feature of the GX Kernel is its support for virtual off-screen
buffers. These buffers are key in implementing flicker-free motion because the
basic premise is to paint the screen only once per motion. This requires that
off-screen buffers of sufficient size be allocated. Table 1 lists the
functions from GX Kernel that I use in my program. The include file (Listing
One) and the source file (Listing Two) provide implementation details of these
functions. (The complete source code, makefile script, linker response file,
and configuration options are available electronically; see "Availability,"
page 3.) To handle events, I use the GX Graphics toolkit, which provides its
own interrupt-service routine that captures events such as mouse movements,
button presses, button releases, and keyboard presses. GX Graphics also
provides you with several crude internal mouse pointers--a simple arrow, a
plus sign, and an hourglass. Important functions from the GX Graphics package
are listed in Table2.
Bitmaps are loaded from standard file format to virtual buffer using the GX
Images package, which supports GIF, PCX, BMP, JPEG, TIFF, and additional file
formats; see Table 3 for a list of functions used.
The final Genus package needed is GX Effects, which provides excellent
routines for special display effects such as slide and weave, animation via
sprites, and support for FLIC files. This package allows for transparently
placing a bitmap into a virtual buffer; the necessary routines are listed in
Table 4. The GX Effects package also provides some very high-level functions
for animation. However, I don't use these functions because they would not
properly show how the flicker-free algorithm is implemented. 
All of the Genus packages operate in 286 protected mode, which requires a 286
DOS extender. My program supports Borland's Pascal DOS extender, Blinkinc's
Blinker, Tenberry Software's (formerly Rational Systems) 16M, Borland Power
Pack, and the Phar Lap 286DOS-Extender, which I selected because of its
transparency to the programmer and its robust API. 
The Phar Lap 286DOS-Extender is remarkably simple to use in an application.
Just linking in a new library module allows for programs to allocate up to 16
MB of dynamic memory by using familiar malloc() and new() calls. The DOS
extender also includes a full-featured API, called "PHAPI," which lets you
write protected-mode interrupts and capture protected-mode exceptions. Table 5
lists some of the DOS-extender functions I use.


Conclusion


The Genus packages are excellent tools for incorporating graphics into almost
any application. They afford seamless integration of animation, event
handling, drawing, and image incorporation. Genus also has a collection of
graphics packages for Windows. Unfortunately, these Windows toolkits do not
provide the same functionality as the DOS toolkits. This makes porting a Genus
DOS application to Windows somewhat difficult.


For More Information



GX Kernel, GX Images, GX Effects, GX Graphics
Genus Microprogramming 
115 Dairy Ashford, Suite 200
Houston, TX 77079-3012
800-227-0918

286DOS-Extender SDK
Phar Lap Software
60 Aberdeen Avenue
Cambridge, MA 02138
617-661-1510
Table 1: GX Kernel functions.
Function Description
gxVirtualFree Checks if enough memory is available for
 allocation.
gxCreateVirtual Allocates a virtual buffer.
gxDestroyVirtual Deallocates a virtual buffer.
gxVirtualDisplay Displays a virtual buffer on the screen.
gxVirtualVirtual Copies one section of a virtual buffer to another.
Table 2: GX Graphics functions. 
Function Description
grSetEventMask Turns on event handler and sets which events to detect.
grGetEvent Gets the top event on the queue.
grSetMouseStyle Sets an internal mouse-pointer style.
gxDisplayMouse Displays or hides the mouse.
Table 3: GX Images functions.
Function Description
imgFileGetHeader Returns information about the file type and size.
imgFileConvert Loads file and converts to the format of the virtual buffer.
Table 4: GX Effects routines.
Function Description 
fxCreateImage Creates an image that can be made transparent.
fxSetKeyColor Selects key color to make transparent (default is black).
fxImageVirtual Transparently places an image into a virtual buffer.
Table 5: Functions in the Phar Lap 286DOS-Extender. 
Function Description
DosSetExceptionHandler Sets up a protected-mode exception handler.
DosSetRealProtVec Installs both a real- and protected-mode interrupt.
DosSetPassToProtVec Installs a protected-mode interrupt.

Listing One
// GRAPH.H (MOUSE.EXE)
// (C) 1994 Ahpah Software Inc.
// Description: Interface to GENUS toolkits.
#include <c:\genus\include\gxlib.h>
#include <c:\genus\include\fxlib.h>
#include <c:\genus\beta\include\imglib.h>
#include <c:\genus\include\grlib.h>
/* maximums macros */
#define GR_PALSIZE 768 // 256 color load palette
#define GR_XMAX 640 // screen x resolution
#define GR_YMAX 480 // screen y resolution
#define GR_CURSXMAX 150 // maximum x length for a cursor
#define GR_CURSYMAX 150 // maximum y length for a cursor
#define GR_VECT 4 // number of elements in a vector
#define GR_INTERNAL 0 // cursor is internal
#define GR_BITMAP 1 // cursor is a bitmap
#define GR_REMOVE 0 // remove cursor

#define GR_PLACE 1 // place cursor
/* useful macros */
#define X1 0
#define Y1 1
#define X2 2
#define Y2 3
#define RECT(v) v[0], v[1], v[2], v[3]
#define UPLEFT(v) v[0], v[1]
#define LORIGHT(v) v[2], v[3]
#define XWID(v) (v[2] - v[0] + 1)
#define YWID(v) (v[3] - v[1] + 1)
/* image save structure */
STRUCT GR_IMAGE {
 GXHEADER gxheader;
 WORD xmax, ymax;
};
/* effects image structure */
STRUCT GR_FXIMAGE {
 FXIMAGE fxheader;
 WORD xmax, ymax;
};
/* cursor structure */
STRUCT GR_CURSOR {
 WORD type;
 SIGNINT style;
 LONG color;
 GR_FXIMAGE *fximage;
 WORD location[GR_VECT];
 GXHEADER savearea;
};
/* prototypes */
VOID gr_cursormove( WORD, WORD );
VOID gr_cursorset( SIGNINT, LONG );
VOID gr_cursorset( GR_FXIMAGE * );
VOID gr_cursorstatus( WORD );
VOID gr_ender( VOID );
VOID gr_error( STRING );
VOID gr_error( STRING, SIGNINT );
VOID gr_fxcreate( GR_IMAGE *, GR_FXIMAGE * );
VOID gr_fxfree( GR_FXIMAGE *, BOOLEAN = NO );
VOID gr_fxshow( GR_FXIMAGE *, WORD, WORD );
VOID gr_getevent( GREVENT * );
VOID gr_imagefree( GR_IMAGE *, BOOLEAN = NO );
VOID gr_imageload( GR_IMAGE *, STRING );
VOID gr_imageshow( GR_IMAGE *, WORD * );
VOID gr_mousebounds( WORD, WORD, WORD, WORD );
BOOLEAN gr_mousein( GREVENT *, WORD, WORD, WORD, WORD );
VOID gr_starter( VOID );
VOID gr_vectorcopy( WORD *, WORD *, INDEX );
/* function like macros */
#define GR_CURSOROFF() CALL grDisplayMouse( grHIDE ); \
 gr_cursorstatus( GR_REMOVE )
#define GR_CURSORON() gr_cursorstatus( GR_PLACE );\
 CALL grDisplayMouse( grSHOW )

Listing Two
// GRAPH.CPP (MOUSE.EXE)
// (C) 1994 Ahpah Software Inc.
// Description: Interface to GENUS toolkits.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "defs.h"
#include "graph.h"
#include "system.h"
/* variables */
SIGNINT gr_atype; // memory allocation type
WORD gr_mode; // initialized graphics mode
GR_CURSOR gr_cursor; // cursor information block
BOOLEAN gr_alloc; // allocation have occured
TEXT gr_pal[GR_PALSIZE]; // palette for loading 256 color images
/* local functions */
STATIC BOOLEAN gr_trymode( WORD, BOOLEAN );
/* gr_dogenus() reports an error on a toolkit call. (inline) */
INLINE VOID gr_dogenus( SIGNINT retval, STRING str )
 {
 if( retval != gxSUCCESS )
 gr_error( str, retval );
 }
/* gr_max() returns the maximum of two WORDS */
INLINE WORD gr_max( WORD a, WORD b )
 {
 if( a < b )
 return b;
 return a;
 }
/* gr_min() returns the maximum of two WORDS */
INLINE WORD gr_min( WORD a, WORD b )
 {
 if( a > b )
 return b;
 return a;
 }
/* gr_cursormove() will move the cursor to its new location */
VOID gr_cursormove( WORD x, WORD y )
 {
 WORD temploc[GR_VECT], newloc[GR_VECT] = { 0, 0, 0, 0 };
 WORD *curloc = gr_cursor.location;
 WORD xlen, ylen;
 GXHEADER temp;
 if( gr_cursor.type != GR_BITMAP )
 return;
 xlen = gr_cursor.fximage->xmax;
 ylen = gr_cursor.fximage->ymax;
 if( xlen / 2 < x )
 newloc[X1] = x - xlen / 2;
 if( ylen / 2 < y )
 newloc[Y1] = y - ylen / 2;
 newloc[X2] = newloc[X1] + xlen;
 newloc[Y2] = newloc[Y1] + ylen;
 temploc[X1] = gr_min( curloc[X1], newloc[X1] );
 temploc[Y1] = gr_min( curloc[Y1], newloc[Y1] );
 temploc[X2] = gr_max( curloc[X2], newloc[X2] );
 temploc[Y2] = gr_max( curloc[Y2], newloc[Y2] );
 if( gxCreateVirtual( gr_atype, &temp, gr_mode, XWID( temploc ), 
 YWID( temploc ) ) != gxSUCCESS )
 gr_error( (STRING) "Could not create buffer." );
 CALL gxDisplayVirtual( RECT( temploc ), 0, &temp, 0, 0);

 CALL gxVirtualVirtual( &gr_cursor.savearea, 0, 0, xlen, ylen, &temp,
 curloc[X1] - temploc[X1], curloc[Y1] - temploc[Y1], gxSET );
 CALL gxVirtualVirtual( &temp, newloc[X1] - temploc[X1],
 newloc[Y1] - temploc[Y1], newloc[X1] - temploc[X1] + xlen,
 newloc[Y1] - temploc[Y1] + ylen, &gr_cursor.savearea, 0, 0, gxSET );
 CALL fxImageVirtual( &gr_cursor.fximage->fxheader, &temp,
 newloc[X1] - temploc[X1], newloc[Y1] - temploc[Y1] );
 CALL gxVirtualDisplay( &temp, 0, 0, RECT( temploc ), 0 );
 CALL gxDestroyVirtual( &temp );
 gr_vectorcopy( gr_cursor.location, newloc, GR_VECT );
 }
/* gr_cursorset() will set an internal cursor */
VOID gr_cursorset( SIGNINT style, LONG color )
 {
 gr_cursorstatus( GR_REMOVE );
 if( gr_cursor.type == GR_BITMAP )
 {
 gr_cursorstatus( GR_REMOVE );
 CALL grDisplayMouse( grSHOW );
 }
 gr_cursor.type = GR_INTERNAL;
 gr_cursor.style = style;
 gr_cursor.color = color;
 CALL grSetMouseStyle( style, color );
 gr_cursorstatus( GR_PLACE );
 }
/* gr_cursorset() will see a user bitmap cursor */
VOID gr_cursorset( GR_FXIMAGE *fxi )
 {
 if( gr_cursor.type == GR_BITMAP && fxi == gr_cursor.fximage )
 return;
 if( fxi->xmax > GR_CURSXMAX fxi->ymax > GR_CURSYMAX )
 gr_error( (STRING) "Mouse cursor too large." );
 gr_cursorstatus( GR_REMOVE );
 if( gr_cursor.type == GR_INTERNAL )
 {
 CALL grDisplayMouse( grHIDE );
 gr_cursorstatus( GR_PLACE );
 }
 gr_cursor.type = GR_BITMAP;
 gr_cursor.fximage = fxi;
 gr_cursorstatus( GR_PLACE );
 }
/* gr_cursorstatus() will set the cursor on or off and update screen */
VOID gr_cursorstatus( WORD stat )
 {
 STATIC SIGNINT counter = 0;
 WORD *curloc = gr_cursor.location, xlen, ylen;
 SHORT x, y;
 switch( stat )
 {
 case GR_PLACE:
 counter++;
 if( gr_cursor.type != GR_BITMAP counter != 1 )
 return;
 CALL grGetMousePos( &x, &y );
 xlen = gr_cursor.fximage->xmax;
 ylen = gr_cursor.fximage->ymax;
 curloc[X1] = 0;

 curloc[Y1] = 0;
 if( xlen / 2 < (WORD) x )
 curloc[X1] = (WORD) x - xlen / 2;
 if( ylen / 2 < (WORD) y )
 curloc[Y1] = (WORD) y - ylen / 2;
 curloc[X2] = curloc[X1] + xlen;
 curloc[Y2] = curloc[Y1] + ylen;
 CALL gxDisplayVirtual( RECT( curloc ), 0, 
 &gr_cursor.savearea, 0, 0);
 gr_fxshow( gr_cursor.fximage, UPLEFT( curloc ) );
 break;
 case GR_REMOVE:
 counter--;
 if( gr_cursor.type != GR_BITMAP counter != 0 )
 return;
 CALL gxVirtualDisplay( &gr_cursor.savearea, 0, 0, 
 RECT( gr_cursor.location ), 0 );
 break;
 default:
 break;
 }
 }
/* gr_ender() will close out video related functions */
VOID gr_ender()
 {
 if( gr_alloc )
 CALL gxDestroyVirtual( &gr_cursor.savearea );
 CALL gxClearDisplay( 0L, 0 );
 CALL gxSetMode( gxTEXT );
 CALL grStopMouse();
 CALL grSetEventMask( grENOEVENTS );
 CALL gxDone();
 }
/* gr_error() will display a fatal error */
VOID gr_error( STRING s )
 {
 gr_ender();
 sys_ender();
 printf( "Error: %s\n\n", s );
 exit( 1 );
 }
/* gr_error() will display a fatal error with GENUS exit code */
VOID gr_error( STRING s, SIGNINT n )
 {
 gr_ender();
 sys_ender();
 printf( "Error: %s (code %d)\n\n", s, n );
 exit( 1 );
 }
/* gr_fxcreate() will create an effects image from a GR_IMAGE */
VOID gr_fxcreate( GR_IMAGE *grimage, GR_FXIMAGE *fxi )
 {
 CALL fxSetKeyColor( 0L );
 if( fxCreateImage( &fxi->fxheader, &grimage->gxheader, 
 gr_atype ) != gxSUCCESS )
 gr_error( (STRING) "Could not create image." );
 fxi->xmax = grimage->xmax;
 fxi->ymax = grimage->ymax;
 }

/* gr_fxfree() will free an effects image */
VOID gr_fxfree( GR_FXIMAGE *fxi, BOOLEAN freeit )
 {
 CALL fxDestroyImage( &fxi->fxheader );
 if( freeit )
 delete fxi;
 }
/* gr_fxshow() will show an effects image */
VOID gr_fxshow( GR_FXIMAGE *fxi, WORD x, WORD y )
 {
 CALL fxImageDisplay( &fxi->fxheader, x, y, 0 );
 }
/* gr_getevent() will wait for an event */
VOID gr_getevent( GREVENT *event )
 {
 CALL grSetEventMask( (SIGNINT) (grELPRESSgrERPRESSgrEMOUSEMOVE
 grEKEYBOARD) );
 while( grGetEvent( event ) != gxSUCCESS )
 ;
 CALL grSetEventMask( grENOEVENTS );
 gr_cursormove( (WORD) event->curx, (WORD) event->cury );
 }
/* gr_imagefree() will free a loaded image */
VOID gr_imagefree( GR_IMAGE *image, BOOLEAN freeit )
 {
 CALL gxDestroyVirtual( &image->gxheader );
 if( freeit )
 delete image;
 }
/* gr_imageload() will load any supported image type */
VOID gr_imageload( GR_IMAGE *image, STRING fn )
 {
 IMGINHDR imghead;
 if( imgFileGetHeader( fn, 0, &imghead, gr_pal ) != gxSUCCESS )
 gr_error( (STRING) "Could not load image header." );
 image->xmax = imghead.Width;
 image->ymax = imghead.Height;
 if( gxCreateVirtual( gr_atype, &image->gxheader, gr_mode, image->xmax, 
 image->ymax ) != gxSUCCESS )
 gr_error( (STRING) "Could not create buffer." );
 if( imgFileConvert( fn, 0, &image->gxheader ) != gxSUCCESS )
 gr_error( (STRING) "Could not load image data." );
 }
#if 0
/* gr_imagemake() will make a virtual buffer in GR_IMAGE form */
GR_IMAGE *gr_imagemake( WORD xlen, WORD ylen )
 {
 GR_IMAGE *image = new GR_IMAGE;
 if( gxCreateVirtual( gr_atype, &image->gxheader, gr_mode, xlen, 
 ylen ) != gxSUCCESS )
 gr_error( (STRING) "Could not allocate buffer." );
 image->xmax = xlen;
 image->ymax = ylen;
 return image;
 }
#endif
/* gr_imageshow() will display an image */
VOID gr_imageshow( GR_IMAGE *image, WORD *vect )
 {

 GXHEADER newgx, *mhead;
 WORD xlen, ylen, x, y;
 xlen = vect[X2] - vect[X1] + 1;
 ylen = vect[Y2] - vect[Y1] + 1;
 x = vect[X1];
 y = vect[Y1];
 mhead = &image->gxheader;
 if( abs( xlen - image->xmax ) > 2 abs( ylen - image->ymax ) > 2 )
 {
 if( gxCreateVirtual( gr_atype, &newgx, gr_mode, xlen, ylen )
 != gxSUCCESS )
 gr_error( (STRING) "Could not create buffer." );
 CALL gxVirtualScale( &image->gxheader, &newgx );
 mhead = &newgx;
 }
 else
 {
 xlen = image->xmax;
 ylen = image->ymax;
 }
 GR_CURSOROFF();
 CALL gxVirtualDisplay( mhead, 0, 0, x, y, x + xlen - 1, y + ylen - 1, 0 );
 GR_CURSORON();
 if( mhead == &newgx )
 CALL gxDestroyVirtual( &newgx );
 }
/* gr_mousebounds() will set the mouse boundaries */
VOID gr_mousebounds( WORD x1, WORD y1, WORD x2, WORD y2 )
 {
 CALL grSetMouseBounds( x1, y1, x2, y2 );
 }
/* gr_mousein() will check to see if the mouse is in a certain region */
BOOLEAN gr_mousein( GREVENT *event, WORD x1, WORD y1, WORD x2, WORD y2 )
 {
 if( x1 <= (WORD) event->curx && (WORD) event->curx <= x2 &&
 y1 <= (WORD) event->cury && (WORD) event->cury <= y2 )
 return YES;
 return NO;
 }
/* gr_starter() will initialize graphics routines. */
VOID gr_starter()
 {
 gr_dogenus( gxInit(), "Could not intialize GX Kernel." );
 gr_atype = gxCMM;
 if( !gr_trymode( gxVESA_112, NO ) )
 if( !gr_trymode( gxVESA_111, NO ) )
 if( !gr_trymode( gxVESA_110, NO ) )
 CALL gr_trymode( gxVESA_101, YES );
 if( gr_mode <= gxVESA_103 )
 {
 CALL gxGetConvertPalette( gxPAL5, gr_pal );
 CALL gxSetDisplayPalette( gr_pal );
 CALL gxSetDitherMatrix( gxNODITHER );
 }
 if( grInitMouse() != gxSUCCESS )
 {
 CALL gxSetMode( gxTEXT );
 gr_error( (STRING) "No mouse found." );
 }

 CALL grSetMouseStyle( grCARROW, gxRGBPacked( 255, 255, 255 ) );
 CALL grDisplayMouse( grSHOW );
 CALL grTrackMouse( grTRACK );
 gr_mousebounds( 0, 0, GR_XMAX - 1, GR_YMAX - 1 );
 if( gxCreateVirtual( gr_atype, &gr_cursor.savearea, gr_mode, GR_CURSXMAX, 
 GR_CURSYMAX ) != gxSUCCESS )
 gr_error( (STRING) "Could not create buffer." );
 gr_alloc = YES;
 }
/* gr_trymode() will attempt to initialize a video mode */
BOOLEAN gr_trymode( WORD mode, BOOLEAN doabort )
 {
 if( doabort )
 {
 gr_dogenus( gxSetDisplay( mode ), "Graphics mode not supported." );
 gr_dogenus( gxSetMode( gxGRAPHICS ), "Could not switch 
 to graphics mode." );
 gr_mode = mode;
 return YES;
 }
 else
 {
 if( gxSetDisplay( mode ) == gxSUCCESS )
 if( gxSetMode( gxGRAPHICS ) == gxSUCCESS )
 {
 gr_mode = mode;
 return YES;
 }
 }
 return NO;
 }
/* gr_vectorcopy() will copy one vector onto another */
VOID gr_vectorcopy( WORD *dest, WORD *src, WORD len )
 {
 memcpy( dest, src, len * sizeof( WORD ) );
 }



























Examining C/C++ Compilers


Five compilers, head-to-head




Tim Parker


Tim, a technical editor for SCO World magazine, is the author of Linux
Unleashed (Sams, 1995). He can be contacted at tparker@tpci.com.


Choosing a C compiler is both simple and difficult: simple, in that most
compilers on the market are robust and mature enough to offer solid C
handling; difficult, in that certain minor differences between them may be
relevant to your needs. To ferret out these fine points, we recently examined
five C compilers for the Sparcstation 5.
Each compiler was supplied by the vendor for a Sparcstation 5 machine running
Solaris 2.4. We installed the compilers on newly formatted hard drives on
which only Solaris was loaded. We compiled five test programs with each
compiler, measuring time to compile, problems encountered, and any changes
necessary for a clean compilation. The test programs were not of the simple
"hello world" ilk, but real-world applications that we use in-house, for
tracking project time and inventory, and at client sites, for accounting and
time management. 
We also wrote SPECint92 programs and ran them against each compiler. This is
not, however, a very fair test of a compiler, as it measures only integer
performance and programmer skill. We tried to optimize compilation by using
compiler flags and options where applicable, but we don't claim to have
tweaked each compiler to its optimum performance. Still, the numbers we
measured are representative.
Table 1 and Table 2 present our findings, which also include extra tools,
features, and notable add-ons. In this article, we'll describe these features
and our experiences with each compiler.


MetaWare High C/C++


The MetaWare High C/C++ compiler is a combination package, offering both C and
C++ capabilities. Much of the material included with the High C/C++ system
applies to C++ only; to keep the comparison fair, we concentrated only on the
C-language portion.
Our version of MetaWare C/C++ was supplied on eight 3.5-inch diskettes: six
for the compiler and two for patches. Installation proceeded smoothly, and
after the patch diskettes were installed, the system was ready. Installation
lasted about ten minutes. A compiler demo file is included to ensure all the
components were properly installed. Several environment variables must be
massaged to identify paths and libraries.
High C/C++ allows you to turn off extensions in the package with a compiler
option, forcing the core ANSI Standard C language only. High C/C++ also
complies with proposed C++ standards still under development. MetaWare
supplies both an internal ANSI-C preprocessor and the option to use a UNIX
external preprocessor (which may be K&R instead of ANSI C). High C/C++ is not
K&R compatible, but it can be forced to compile K&R code by relying heavily on
external libraries and the preprocessor.
By default, the system supports the extensions MetaWare has added to both C
and C++. In many ways, these extensions provide the best features of other
languages to C and C++ programmers, albeit at the cost of portability to other
compiler products that lack the extensions. For example, High C/C++ lets you
nest functions with up-level references and pass nested functions as
parameters to other functions (a Pascal strong point). Many in-line library
function calls are offered, as well as iterators that help handle and move
through data structures.
You can use either the default UNIX system libraries or those provided with
High C/C++. The latter is necessary if you use any High C/C++ extensions.
Slight problems can occur if you use the MetaWare libraries but need a
function in the UNIX library. Dependency problems may arise as one function
calls another, or there may be duplicate symbols. Since data structures of the
internal and external libraries are different but the names remain the same,
these errors are often not caught during compilation and cause core dumps when
executed.
High C/C++ uses configuration and profile files to specify values for the
compiler. This can save a lot of command-line typing, especially if you
maintain several configuration setups for different purposes and copy those
you need into the default filenames when necessary. Of course, makefiles can
be used to specify most parameters, too.
High C/C++ includes the GNU Debugger (gdb), which is automatically installed
with the compilers. gdb is quite good, and is widely used. It did not seem to
have been customized for the High C/C++ compiler.
As a C compiler, High C/C++ behaves very well, and is extremely difficult to
fluster. In fact, it's about as robust a C compiler as we've seen. It has a
few neat features for typical C programmers: Warnings come in a variety of
levels and can be set to verbose, brief, or somewhere in between. The warning
and error messages are very good, almost invariably pointing you to the
problem location. It would be nice if the offending source code were
displayed, but that's a minor quibble. (The C++ aspect of the High C/C++
package includes the Rogue Wave Tools.h++ and I/O streams library. As a C++
compiler, MetaWare scores strongly.)
Support for the product is extraordinary in today's market. MetaWare offers
lifetime tech support with the package (although the support is restricted to
standard business hours and e-mail). 
We found High C/C++ very easy to work with, and it gave us no problems at all
during testing. However, High C/C++ is relatively slow compared to the rest of
the compilers reviewed here, even when heavily optimized. Indeed, it scored
lowest on nearly every test. How much of a difference is there? Our large
compilation tests revealed several minutes' difference, although with smaller
programs, the difference was much less significant. 
If you can handle slower speeds and the lack of K&R C, MetaWare High C/C++ is
a solid, attractive compiler package. The extensions are useful although not
standard, and the combination of a C and C++ compiler in one package may
appeal to some. We liked High C/C++, but wonder if the performance would annoy
us in the long run.


Cygnus Developer's Kit (GNU C/C++)


Cygnus may not be as familiar a name as say, Microsoft, but the company
nonetheless has a loyal user base. The Cygnus Developer's Kit is a collection
of GNU tools that is bundled together and documented as a complete set, then
ported to many different platforms. Indeed, the ported versions have made
Cygnus quite a name among cross-platform developers. Cygnus's testing and
integration services add value to the GNU tools, and Cygnus is the
organization authorized to make general releases of available GNU C tools
(although GNU products can be obtained from many different sources). 
Our copy of the Cygnus Developer's Kit was supplied on QIC tapes for both
Solaris and HP/UX (many other distribution formats are available).
Installation is simple, using tar to extract an installation script that does
the rest. It's straightforward and applies to all target machines. Considering
Cygnus's wide platform support, customizing installation scripts for each
platform would probably be a headache.
The Cygnus Developer's Kit includes the GNU C compiler (gcc), the GNU C++
compiler (g++), the GNU debugger (gdn), the GNU assembler (as), GNU assembler
preprocessor (gasp), and GNU linker (ld). There is the usual host of GNU
utilities (byacc, flex, make, diff, and so on), as well as some text utilities
(TeX). Libraries in the package include ANSI C run time (for cross-development
platforms), C math subroutines (again for cross development), C++ class
library, C++ iostreams, and, for Solaris platforms, a performance analyzer
(gprof).
The GNU debugger is good, but not worth getting too excited about.
Performance-wise, the GNU C compiler is unremarkable. It was about average in
speed tests during our first trials. It has no optimizing capability, so we
spent a couple of days playing with flags and options. Eventually, we managed
to find several useful tweaks that improved performance a little, but the
compiler still didn't win any prizes. 
There is more to the Cygnus package than just the GNU material. The C
subroutine library supplied by the Free Software Foundation has some nasty
licensing restrictions, so Cygnus developed its own freely distributable,
royalty-free library. Also, Cygnus's tech support has a reputation for
excellence. Many of the Cygnus developers helped write the GNU material in the
first place.
For a solid distribution source for the GNU compiler, good documentation, and
excellent technical support, Cygnus is probably the place to go. You pay for
the add-ons, of course (especially the tech support), but then you don't get
much for free these days.


LanguagePak 1 (GNU C/C++)


Still closer to the "free" level than Cygnus is Ready to Run Software's
LanguagePak 1. LanguagePak 1 comprises gcc, G++, gdb, and a collection of
libraries to support the compiler. LanguagePaks are available for over a dozen
popular UNIX platforms. Ready to Run Software offers collections of utilities
on many subjects besides the C compiler, such as Fax software, X, Internet
tools, and utilities.
There's no real documentation with the LanguagePak. It has a few pages of
installation notes and a catalog of Ready to Run Software's offerings in a
smallish, three-ring binder. A zip-top page insert holds the distribution
media. Manuals and documents are provided in soft-copy only. 
Installation, which uses tar and an installation script, ran smoothly for us,
installing the entire tape's contents.
Ready to Run Software offers no technical support, per se, as it is mainly a
bundling company. The GNU software is the same as that offered by Cygnus, but
the Cygnus package contains a lot more material (as well as printed
documentation and technical support). 
If you like warning messages, this compiler will be your favorite. It
generated more error and warning messages than any other compiler we've seen.
There were also a few compilation problems (mainly portability errors) that
required quite a bit of tweaking.
As a bare-bones, easy source of GNU compilers, Ready to Run Software looks
promising. You don't have to look for the latest version on the Internet or
worry about compatible tools. The GNU compiler is good but not the best. Its
price of $275.00, however, is definitely attractive.



SunSoft SPARCompiler C 3.0.1


The SunSoft SPARCompiler comes bundled with the SPARCworks GUI-driven
development environment. To maintain an even playing field, we limited our
examination to the C portion of the SPARCompiler package (which is physically
separate from the SPARCworks package).
The documentation for SPARCompiler C is very well written--the best we saw.
Some newcomers to C may be overwhelmed by the wealth of material, but good
organization and layout make it manageable.
Installation is from a CD-ROM and could be handled with the pkgadd or swmtool
utilities. You select the components to be installed from a list of all
SunSoft tools, then start the routine. We would have preferred a single
installation to worrying whether we got all the parts installed properly, but
the entire installation went smoothly and required only a few minutes. The
license manager was a pain, but we mastered that, too.
The SPARCompiler is an extension of the older C compiler included as part of
the SunOS distribution and should be familiar to Sun users. We compared
SPARCompiler to our older Sparcstation 1 installation running a pre-Solaris
version of SunOS and found little difference. The error and warning messages
are a little better and the performance is improved, but it's still the same
beastie in fancier wrapping. Messages are just as cryptic and annoying as they
always have been, although you now get a line number.
As far as we can tell, the SPARCompiler uses the same debugger as SPARCworks;
it's capable and worth sticking with. Technical support costs extra after the
first 30 days, but it's available 24 hours a day. The SPARCompiler's forte is
performance, whether optimized or not. Code optimization with SPARCompiler is
a bit hit-and-miss: Specific optimization methods tend to yield variable
results, and we usually defaulted to the system's basic optimization routines.
We would have considered the SPARCompiler a winner if it hadn't been for the
error messages, which were simply annoying. Still, the compiler is very fast
and has good documentation, a strong feature set, and a lot of loyal users who
still think the compiler should be included free with the operating system.


Edinburgh Portable Compilers EPC ANSI C 3.1.2


According to its literature, EPC was started by a group of University of
Edinburgh faculty members as a means of selling their compilers and
development tools. Although its U.S. presence is still small, the most recent
version of its compiler may make people start to pay more attention. This
compiler is the only one that struck us as noticeably different from the rest.
The documentation includes a copy of Kenneth Barclay's C Problem Solving and
Programming (Prentice-Hall, 1990), a very interesting and worthy addition to
the package. 
Installation of the EPC C compiler proceeded smoothly from a QIC tape and used
tar (some earlier versions used pkgadd). One annoying aspect of tar that
applies to many products is that a tarred distribution media doesn't allow you
to change destination directories. All you can do is let it extract to the
default destination, then manually copy files to your preferred directory, and
try to correct all the pointers to the files. We couldn't get this to work
with the EPC compiler and finally used the default /opt directory. A license
manager must be invoked after the installation to enable the compiler.
The Motif-based debugger is quite talented, and we could easily have been
happy with it as our default debugger. It has several advantages over the GNU
Debugger, and is somewhat integrated with the compiler. 
Technical support is provided for 30 days, with access to both the Scotland
and California offices (providing essentially a 16-hour support period). 
The compiler itself is very good. We looked at an earlier release in the
preliminary stages of this evaluation, and we worked with a much earlier
release a couple of years ago. The current version is noticeably faster and
more robust. One of the EPC C compiler's most important advantages is its ease
of use. There are no 100-character-long command-line options, no digging
through cabalistic lists for some magic optimize. A single flag provides all
the optimization that EPC considers stable and useful. 
Unoptimized, the compiler produced good, tight code quite quickly. When the
optimizer was turned on, compilation took a little longer. The code was a
little faster, but not by much, which points to a solid compiler design right
from the start. The EPC C compiler produces both ANSI and K&R C code and
provides some extensions. Other useful features are very good multiprocessing
and thread support. The optimizers are quite clever, handling instruction
migration, register renaming, redundant-access elimination, and very clean
loop analysis.
We found the EPC C compiler the most friendly and easiest to work with, and it
is our top choice for a straightforward, command-line driven C compiler. Our
only complaint is the inflexible installation routine. 


Summary


All of the compilers discussed here are solid performers with relatively minor
differences. Choosing a C compiler on performance alone is usually folly, as
the fastest compiler may not have the library or compilation options that make
your task easier. The performance spread averages only a few percent. What
makes the difference is the ease of use, compiler options, and availability of
different libraries and support tools. 
The Cygnus GNU distribution tape contains every type of GNU utility and
library, available free of charge (as well as some Cygnus-developed libraries)
and is the most cost-effective solution for a full-featured system. 
The LanguagePak is a winner from the cost viewpoint: If your only requirement
is a compiler with basic support tools (debugger and libraries), LanguagePak
is easier than surfing the net to find the GNU system.
MetaWare High C/C++ is definitely the package for those who want top-notch
support, libraries, and both C and C++ compilers in one box. Its only poor
rating was in performance. It's lack of K&R may be a sore point with some
users, but most of the world is now using ANSI C. 
The latest version of the SunSoft SPARCompiler C is the fastest compiler we
tested. However, the poor error and warning messages may annoy some. On the
whole, this is an excellent offering, which makes it difficult to rank second.
Our preference was EPC's compiler. The Motif-based debugger was a winner, and
the compiler's pure ease of use and fine handling made it one of the fastest
turn-around compilers. We could run through more code-compile-link-debug
cycles with EPC C than with any of the others in the same amount of time, and
with much less annoyance. The documentation is very good, the performance was
top-notch, and the optimizations excellent. As a stand-alone C compiler, we'll
stay with EPC C.


For More Information 


LanguagePak 1
Ready To Run Software
4 Pleasant Street
P.O. Box 2038
Forge Village, MA 01886
508-692-9922

EPC C Compiler for SPARC v. 3.1.2
Edinburgh Portable Compilers Ltd.
14531 Big Basin Way
Saratoga, CA 95070
408-867-1039

SPARCompiler C 3.0.1
SunSoft
2550 Garcia Avenue
Mountain View, CA 94043
415-336-6848

MetaWare High C/C++ 3.2

MetaWare
2161 Delaware Avenue
Santa Cruz, CA 95060
408-429-6382

Cygnus Developer's Kit
Cygnus
1937 Landings Drive
Mountain View, CA 94043
415-903-1400
Table 1: Overall compiler ratings. (a) General; (b) C compiler. All ratings
are the average of scores from 1 (poor) to 10 (excellent) given by each
reviewer. No attempt has been made to weight the individual categories or to
total the results based on the categories shown. The overall score is
determined by each reviewer based on each category score and personal biases. 
 MetaWare EPC Ready to Run Cygnus SunSoft
(a)
Documentation 8 9 1 6 9Installation 9 7 3 4 9Features 7 8 3 6 8Support 10 7 1
9 7User Interface 7 7 3 5 6Debugger 6 9 6 6 8Libraries 9 8 6 7 7

(b)
Perf. Unoptimized* 6 8 7 7 10Perf. Optimized* 8 9 8 8 10Configurability 9 6 6
6 8Ease of Use 8 9 6 6 6
Overall Rating: 8 9 6 7 9
*Performance ratings for optimized and unoptimized code determined by
averaging five different-sized test program compilation and execution times
and assigning a value from 1 to 10. 
Table 2: SPECint92 measurements.
 MetaWare EPC Ready to Run Cygnus SunSoft
Compilation Times
 (SPECint92 tests)
Unoptimized 52.2 60.6 58.4* 58.4* 61.9
Optimized 56.8 62.7 60.8 60.8 67.4
*These are both GNU compilers, so they're rated the same.





































PROGRAMMING PARADIGMS


Blunders of the Innumerate




Michael Swaine


The computer-related stuff is in the middle this month, starting at the fourth
subhead. The math is all up here at the beginning, and the Latin is at the
end. The controversial stuff, like affirmative action and presidential
politics, is on the first two pages, the Christmas-shopping advice appears
just past the halfway point, and the importance of anagrams is revealed in the
last paragraph.
Hope that helps you find what you want.


A Mathematician Reads the Newspaper


When a book has a title like A Mathematician Reads the Newspaper (Basic Books,
1995), you make certain assumptions.
John Allen Paulos has written two earlier books on the misuses of
mathematics--Innumeracy (Vintage, 1990) and Beyond Numeracy (Vintage,
1990)--so I expected his latest to be all about the mathematical blunders and
blinders of the popular press. Since there's so much material to draw on, I
expected a book full of outrageous howlers and wondered why it was only 212
pages long.
As it turns out, the book might be better titled A Mathematician Muses about
Newspapers and Other Topics Suggested to Him when so Musing.
Paulos is a newspaper junkie. He grew up loving newspapers, and today he
subscribes to The New York Times and the Philadelphia Inquirer; regularly
skims the Philadelphia Daily News and The Wall Street Journal; and often reads
The Washington Post, the suburban Ambler Gazette, the Bar Harbor Times, "the
local paper of any city I happen to be visiting," various scurrilous tabloids,
and even USA Today. Oh, and he is also a mathematician.
He has, he says, organized the book like a newspaper. I don't challenge that,
but I do note that when John Paulos organizes a book like a newspaper, it
looks a lot like when Marvin Minsky organizes a book like a mind (The Society
of Mind, Simon & Schuster, 1988). 
The chapter titles suggest topical relevance: "Clinton, Dole in Sparring
Roles," "Cult Members Accuse Government of Plot," "DNA Finger Murderer,"
"Cellular Phones Tied to Brain Cancer," and "A Cyberpunk Woody Allen."
Sometimes the chapters live up to the suggestion.
For example, in the chapter "761 Calories, 428 Mgs. Sodium, 22.6 Grams of Fat
per Serving," Paulos points out the meaningless precision in news stories and
recipes. A recipe, unlike other kinds of algorithms, can often get away with
specifying "a pinch of this, a dash of that." But it should not then claim--as
Paulos says they often do--to yield, say, 761 calories, 428 mgs. sodium, and
22.6 grams of fat per serving. In fact, the variations in the nutritive
content of growing things and in the size of "one medium potato," for example,
is so great that probably nothing you say about a recipe has more than one
digit of precision. Meaningless precision occurs elsewhere, of course: Paulos
tells of his neighbor, who brags of getting 32.15 miles per gallon, and of his
daughter's teacher, who gives her a grade of 93.5 on an essay.
In the chapters "Lani 'Quota Queen' Guinier" and "Tsongkerclintkinbro Wins,"
Paulos discusses the mathematics of voting. It's fascinating stuff.
In the latter chapter, he presents the inconclusive results of the voting for
five candidates in a mythical state's Democratic caucus, then presents five
plans for a runoff election. All five arguments are plausible, and each leads
to a different winner, demonstrating that "fairness" is a slippery concept.
In "Lani 'Quota Queen' Guinier," Paulos introduces other subtleties of voting.
Consider a corporation in which the three stockholders have 47, 44, and 9
percent of the stock, respectively. If you think about it, you'll see that all
three stockholders have equal power. Now consider four stockholders with 27,
26, 25, and 22 percent of the stock. If you consider all possible coalitions
that produce over 50 percent, it becomes clear that the fourth stockholder's
22 percent wields zero power. The same kinds of disenfranchisement can occur
in politics if voters vote as a block, due to racial antagonism, for example.
Various innovative voting schemes have been proposed to get around this
unfairness. Ms. Guinier was one of the proposers, and she fell afoul of a
press and public that didn't understand the mathematical justification for
considering such proposals.
One last voting example: Although the nine Justices of the Supreme Court
decide all issues democratically, by majority vote, Paulos presents a scenario
in which three Justices could control all decisions of the Court. Assume that
five Justices, say the five most conservative ones, agree to vote together to
ensure that their majority always rules. To accomplish this, they meet
secretly to decide how they will all vote. They may not be in complete
agreement, so they reach this consensus by the obvious method: They vote,
agreeing to be bound by this prevote vote when the real vote comes in the full
Court. Now assume that three of these five Justices hold an earlier meeting in
which they decide how they will all vote in the prevote vote. These three,
being a majority of the five, will carry the prevote vote, and their will will
prevail in the final decision. And if two of the three, say Scalia and Thomas,
hold an earlier meeting....


A Columnist Watches Television


Relevance? As I was reading the book, the television was murmuring in the
background, keeping my subconscious supplied with a steady stream of O.J.
Simpson references. At one point, a DNA expert testified that a particular DNA
pattern will occur in only one out of 57 billion African Americans; let's see,
that's 3000 times as many African Americans as presently exist.
I also saw Senator Phil Gramm on CSPAN-2 defending the supermajority voting
rule for getting cloture in the Senate. "That [sic] what we call democracy,"
said the Senator, showing that, while he may have the concept, he's a little
loose on the terminology.
Paulos has a job for life rooting out innumeracy. As I was writing this
column, population geneticist and prosecution witness in the O.J. Simpson
trial Bruce Weir testified that when you look at the number, you see that the
concept of race really has no meaning. Meanwhile, my inbox holds a
questionnaire from the county of Santa Cruz pertaining to jury duty, and one
of the questions asks for my race.
Or this one: Dr. Weir made an unwarranted assumption in one of his
calculations, so he recalculated his result without the assumption. The
defense correctly pointed out that the new expected value, having gone down,
was more favorable to the defendant. The prosecution then tried to use the
fact that the top end of the confidence interval had moved up to claim that
the new results could be seen as less favorable to the defendant. Weir, being
a good and honest mathematician, didn't let the prosecutor get away with that
one: Both ends of the confidence interval had moved out--the top end up and
the bottom end down--which meant that the expected value was less reliable
than he had formerly reported.
If Monica Berg hadn't called to ask where this column was, I could have come
up with a dozen more examples of innumeracy.


A Mathematician Counts to Ten


Some of Paulos's topics are, well, interesting, but not what you'd expect; for
example, the little essays on self-reference and information theory that have
only vague connections with newspapers. And the list of the top-ten reasons we
love top-ten lists. (Number 10: "People...like to see if it's going to run out
of good points before it gets to 10.") 
The quirky asides are entertaining, and there's plenty of the germane stuff.
He presents a rational look at touchy topics like racial balance in hiring,
the meaning of SAT scores, and risk assessment with respect to firearms,
abortion, and smoking.
The President of the United States, a former state governor, is big on finding
out what works in state government and proposing that the nation do likewise.
Hawaii's health plan is a great success, so why not use it as a model for the
nation? Paulos explains why not: Sometimes things don't scale up linearly.
Paulos also points out how some of the numbers reported in the press are there
only because we crave numbers. He's delightfully skeptical about economic
forecasts, for example, which he characterizes as generally less sophisticated
than football play-by-play broadcasting. And he takes on the "appropriately
named" Laffer curve, the linchpin of Reaganomics.
I guess what I most enjoyed (besides the discussion of "the Jeffersonian model
of many parallel processors" versus "the Stalinist model of one central
processor") was his nailing of one of my pet peeves: The news story that
begins, "This is not a scientific poll, but..." and then goes on to treat the
results of their 900-number telephone survey as if they meant something.


An Internet Bookshelf


Building an Internet library? Here are a few of the books on my Internet
shelf. I'll skip all the intro books, the "Free Stuff" series from Coriolis,
the personal accounts of lives wasted cruising the online equivalents of
singles bars, and almost anything more than two years old.
HTML for Fun and Profit, by Mary E.S. Morris (Sunsoft Press, 1995), covers CGI
(common gateway interface) scripting as well as the basics of HTML (hypertext
markup language, the format in which WWW documents are written). The World
Wide Web Unleashed, by John December and Neil Randall (Sams, 1994), and The
Mosaic Handbook (O'Reilly & Associates, 1994) are two more good books if you
write HTML. The O'Reilly book is an excellent introduction to Mosaic, the WWW,
and HTML; the fat Sams book covers that ground and gets into Web-page planning
and design concerns.
Connecting to the Internet, by Susan Estrada (O'Reilly & Associates, 1993), is
about running wires and running the numbers. It provides checklists for
getting on the net: projecting costs, performance needs, and so on. Internet
Mailing Lists, edited by Edward T.L. Hardie and Vivian Neou (Prentice Hall,
1994), is useful if you want to set up a mailing list. Many people considering
setting up Web pages might be better served by a mailing list. The Whole
Internet User's Guide and Catalog, by Ed Krol (O'Reilly & Associates, 1992) is
dated but authoritative, and a genuine classic. It's the book from which the
other eight zillion Internet books steal, so why not go to the source?

Consider Netiquette, by Virginia Shea (Albion Books, 1994). You already know
this stuff, but it's nice to refer to an authority when educating the clueless
newbie. And Shea understands the most important law of the net: There is no
law; these rules of netiquette are only convenient conventions.
I know I said I'd skip the intro books, but Christmas is coming up. You
probably can't do better for your net-starved friends and relatives than one
of the Internet for Dummies books (IDG Books, 1994), or for the Mac, Adam
Engst's Internet Starter Kit (Hayden, 1993).


Scripting the Web Server


I've been mucking about with AppleScript, Apple's system-level scripting
technology, since it was released. A new group of people is getting into
AppleScripting these days: folks who are finding that Apple, hard as it is to
believe, offers them the best platform for a Web server. Best security, best
price/performance. Apple isn't used to bragging about its security or prices
and hasn't done much of a job of getting the word out yet.
AppleScript is also very useful for managing a Mac-based server. That could be
good news for two companies that have tied their fortunes to that of
AppleScript--Userland and Software Designs Unlimited.
The introduction of AppleScript was fraught with promise, but Apple soon
figured out that all the promise hung on users actually writing scripts.
Trouble was, AppleScript suffered from invisibility: The script editor was,
shall we say, modest; much of its vocabulary lay hidden in third-party
applications, and the fine scripts that could serve as examples often ran
invisibly in the background. In order to put a face on this technology, Apple
wisely decided to bundle FaceSpan (from Software Designs Unlimited, Chapel
Hill, NC) with AppleScript. FaceSpan is: 1. a front end for AppleScript that
makes script writing easier, and 2. a tool for putting front ends on
AppleScript scripts. Any script can become a stand-alone application with all
the expected user-interface elements and all for about the same learning
investment required to learn HyperTalk. Now FaceSpan has been nativized for
PowerPC, given some powerful new features, and unbundled.
Is FaceSpan good enough to prosper from any AppleScript interest it helped
generate? I don't know, but FaceSpan, plus AppleScript, probably represents
the most rapid application-development system on the Mac. Faster, certainly,
than any Windows-based system.
Meanwhile, as reported here last month, Userland has chosen to go the other
direction: After struggling to keep a commercial AppleScript-related product
alive (more to the point, a commercial AppleScript-competitor product),
Userland has turned Frontier into freeware. Meanwhile, the company is moving
rapidly into script-based tools for Web publishing; check out
http://www.hotwired.com/Signal/DaveNet/.


Obscure Language of the Month


In case you were wondering, the best source on botanical Latin is Botanical
Latin, by William T. Stearn (Timber Press, Portland, OR).
In the Middle Ages, Latin was the universal language of intellectual
discourse. Traveling scholars could be understood by local scholars because
they all spoke classical Latin, a living language in which one could discuss
anything from dinner plans to botany. 
In 1690, philosopher John Locke laid down the requirements for a language that
could support scientific discourse. These included the radical notion that a
word ought to mean one specific thing. Linnaeus adhered to these rules when
constructing his nomenclature of nature the following year. Before Linnaeus, a
botanist would describe plants in language suitable for a chatty letter to his
maiden aunt. After Linnaeus, plants were described in a highly formalized
language in which verbs are largely eliminated and even the typography is
formalized.
Botanical Latin, designed specifically for taxonomic purposes, was the result
of Linnaeus's effort. From this beginning, botanical Latin evolved into a
predictably extensible system of nomenclature that could serve as a language
of this natural science. Arguably, it has also become as artificial and as
formal a language as C or Pascal.
It even has a kind of ISO standard. In 1737, Linnaeus published what have come
to be known as the Linnaean Canons. Example: "Generic names ending -oides are
to be banished from the domain of botany." Or: "If we would not be considered
utter barbarians, let us not invent names which cannot be derived from some
root or other." The more recent International Code of Botanical Nomenclature
gives detailed rules for producing new words based on the names of persons,
such as, "When the name ends in a consonant, the letters ii are added, except
when the name ends in -er, when i is added."
"When no fitting and meaningful name for a new genus comes to mind," Stearn
advises, it's acceptable to rearrange the letters of a closely related genus.
Maingola from magnola, for example.
With rare exceptions such as XINU, programming-language and operating-system
nomenclature hasn't seized on this innovative technique of neologistics--yet.
We can but hope.







































C PROGRAMMING


Windows 95 = Career Opportunities




Al Stevens


An old joke tells about a piano tuner named Oper Knockity. He did such a good
job that your piano never again went out of tune. His slogan was, "Oper
Knockity only tunes once." If you are reading this column, chances are you
don't have serious opportunity problems. Most C and C++ programmers find work.
But several new career paths are about to open, and Windows 95 is the knock of
opportunity.
This is the September column, and, if everything went according to schedule,
Windows 95 has been released. I'm taking a chance in saying that; other trade
columnists, more in-the-know than I, have reported rumors of a ship slip to
November. Those columnists claim to have reliable inside sources.
Just now--I'm writing this in mid-June--Microsoft is still insisting that
August is the date. Good. I can use that position to justify talking about the
product, even though in June I am still under a nondisclosure agreement. By
their schedule, it will have expired by the time you read this. 
If you wonder why Microsoft wouldn't 'fess up to the delay, here's what I
think: They've been leaking Windows 95 information for a year, making users'
mouths water in anticipation. At the same time, IBM has heavily promoted OS/2
Warp. Users think they'll get Windows 95 as soon as August, so they wait and
maybe don't bother with Warp. By the time August rolls around and Redmond
announces a slip to November, the users are primed, ready, and slavering.
They've been waiting for over a year. Three more months doesn't seem that bad.
But if they had learned in May about the slip, well, they might have thrown up
their hands in disgust and gone with something that's already available. Can't
let that happen.
I hope those other columnists are wrong, and Windows 95 is (was) released in
August. Some smart, diligent folks can capitalize. Here's my advice. Set up a
Windows 95 network and learn the operating system. Learn it wall-to-wall. Then
take out an ad in the Yellow Pages. Get a 900 number. Sell yourself as a
Windows 95 trainer and installation expert. A lot of unsuspecting users are
going to need help.
Microsoft and the trade press predict that most PCs will be running Windows 95
in a year or two. Could be. It's a really neat OS, and most applications
developers are targeting it. But guess what? Unless you are running Windows 95
on one PC with no network, no mail, no fax, and nothing shared, Windows 95 is
a knurly knot to install and, sometimes, to operate. Microsoft crows loudly
about Plug and Play and the ease of setting things up, but believe me, t'aint
so.
How do I know? For the past two months, I've been writing a Windows 95
self-help book and researching a Windows 95 games-programming book. Three of
the PCs in my four-PC Windows for Workgroups network are now running Windows
95. Everything works, and it is a good operating environment. With this setup,
I can scamper from PC to PC and test, learn, and write about all the features.
But, woefully, every day I find something that either does not work or needs
better documentation, usually the latter. Of course, my observations come from
a beta, but it's supposed to be the last one. I doubt that the user interface
will change much between now and August, or November, or, uh, well, who knows?
Windows 95 uses the familiar "wizard" paradigm that Visual C++ uses so
effectively. Everything that you set up is done through a wizard. When you
send a fax, a wizard leads you by the hand. These wizards are great, but
sometimes they ask you to know a lot. Other times they fail to tell you
something vital. Usually, when they omit an important detail, the detail is
not obvious, and, therefore, the procedure is not intuitive.
Windows 95 is a really, really big shoe. Wizards have buttons with
"Properties" or "Details" labels. The buttons open dialog boxes with tabbed
pages. The tabbed pages have more buttons that open more dialog boxes. One
wizard can launch another wizard. You dive deep into nested layers of wizards
and dialog boxes. The screen seems totally grayed out by the partially hidden
fragments of a dozen overlapping windows, some modal, some modeless. At the
outer level, when the wizard finally gets around to asking for something, it
wants you to decide on an option or enter the name of something. Sometimes
there are Browse buttons; sometimes the wizard has an explanatory paragraph;
sometimes--nada. Usually, a computer-literate person can deduce what the
wizard wants, but often the wizard says something like, "Enter this
information. If you don't know what to enter, ask your system administrator."
The other day I asked my administrator what to do next. She said, "Wash up,
dinner is almost ready."
Last year, when I made similar observations about installing OS/2 2.1, some
OS/2 users chastised me. I should not malign their beloved OS, they said. A
recent spate of letters to the editor of PC Magazine indicates that OS/2 Warp
is not much better than OS/2 2.1 when it comes to installation woes. The PC's
open architecture is the culprit. Operating-system builders don't know how to
cope with it. The Macintosh is closed, and it doesn't have these problems. Of
course, it doesn't have a majority market share, either.
Let's get together and form an elite cartel of Windows 95 supporters. We can
network our expertise, support large and small businesses and governments,
charge exorbitant fees, and support expensive hobbies like writing Windows 95
programs.


Writing Windows 95 Programs


Opportunities for writing Windows 95 programs abound. Virtually every major
developer is targeting the new operating environment. Little guys, too. The
platform of choice has defaulted to Microsoft Foundation Classes. Most Windows
C++ compilers have licensed MFC. The exception is Borland, which wants you to
use OWL. In time, it'll come around. MFC is not only the de facto standard
Windows framework, it's the best one available.
The paradigm of choice is visual programming. I've been experimenting with
Visual C++ 2.0 and have some things to report. Mostly, it is a lovely
development environment. But pay heed. Visual C++ does not completely isolate
you from the Windows API because it does not encapsulate everything. For
example, a search of the MFC docs fails to find classes that encapsulate modem
and serial-port communications, network-packet exchanges, mail, or multimedia
extensions. Maybe they're in the works; if not, they should be. Telephony and
mail are major parts of Windows 95. They are built into the operating system.
Multimedia is much bigger now than it was when MFC was first conceived.
A Windows 95 Game SDK that addresses sound and accelerated video is in beta
now. It has been discussed in other publications. I'll look at it and report
about its features and facilities when it becomes available.
Many moons ago, I briefly discussed MIDI in this column and bemoaned the lack
of MIDI software tools for DOS programmers. I wanted to write programs that
could read my keyboard and create accompaniments in real time, based on what I
was playing. To do so, I needed tools to read the keyboard and write to the
MIDI instrument channels. No such tools came into view, and I shelved the
project. I later learned that the Windows platform has exactly what I need.
Functions, messages, and data structures defined in the SDK support MIDI
through the Windows device-independent interface. Those functions are not
encapsulated into a class library, however. Which leads (or, in musical
parlance, segues) into the next topic.


The CyberRhythm Section


Many of you know that when I am not programming or writing, I am a jazz
pianist. Some time ago I fell into a bar where the lounge pianist had a MIDI
keyboard and a sequencer, a device that adds drums and accompaniment to what
he plays. That sequencer could do everything but fry an egg. He let me play
with it. One feature amazed me: In its bass-accompaniment mode it watched the
notes I played, deduced the chord, and played, in real time, a respectable
bass line under my improvisations.
I have seen other MIDI devices that could play accompaniments, but they always
required me to play simple three- and four-note chords in the root position
below a designated note on the keyboard. My left hand was, consequently,
unavailable for anything else. Those bass notes below the designated divider
note were likewise unusable for anything but defining chords for the
accompaniment. That's not how I play the piano.
This sequencer had a tiny LCD screen that displayed the name of the chord it
had deduced. I watched that screen while I played. The chord it chose was
usually correct as long as I stuck to the root position in my left hand. Even
when it chose the wrong chord, the bass line was acceptable with very few
dissonant notes.
I tried to find such a sequencer, but the company had upgraded the product to
a new version, which requires the player to play--guess what?--simple three-
and four-note chords in the root position below a designated note on the
keyboard. Fah!
All of which moved me to want to build the perfect personal electronic rhythm
section, which not only would let me use both hands and the entire keyboard,
but would apply a knowledge of harmony and jazz voicings when it deduced the
next chord to base its accompaniment on.
I decided to build that program in Visual C++ 2.0 under Windows 95. That
version runs only under Windows 95 and Windows NT. I used the visual
components of the development environment to build a dialog-based application.
First, I had to get past the problems of reading the MIDI keyboard and
interpreting the notes. Therefore, the first iteration of this program watches
for MIDI messages from the keyboard, ignores messages unrelated to key presses
and releases, and reports the note messages in a list box. Later, I'll start
building the musical heuristics that turn my computer into the reincarnation
of Mingus.
Listings One and Two are mididlg.h and mididlg.cpp, respectively. I did not
write most of these programs; Visual C++ did. I added code to support the
application. Visual C++ wrote some other source-code files that I have not had
to modify yet. If you download this code, you'll get everything.
This first version does not encapsulate the MIDI functions into a class. I'll
wait until I know exactly how I plan to use the entire MIDI API before
launching a class design.
Mididlg.h defines the CMidiDlg class derived from MFC's CDialog class. I added
exactly four lines of code to this file; #include <mmsystem.h> is the first.
That header contains the prototypes and other declarations for the MIDI API. I
added an instance of the HMIDIIN type, a handle of an open MIDI input device.
The FatalError member function reports an error and destroys the dialog
window, terminating the application. The OnMIDIMessage member function is
called when the MIDI input device sends a message. For example, if I press and
release a key or pedal on the keyboard device, the device sends a MIDI message
to the system.
Mididlg.cpp implements the CMidiDlg class member functions. I had to manually
add the ON_MESSAGE macro to the message-map definition. The Visual C++ Class
wizard does not include any MIDI messages among those that it automatically
adds to the class. ON_MESSAGE is supposed to be used for user-defined
messages, but it works for the messages that they left out of ClassWizard, as
well.
The FatalError member function is mine. In several places, the program decides
that it cannot continue. It calls FatalError from those places to display an
error message and destroy the window.
The OnCreate member function is automatically built by ClassWizard. The system
calls OnCreate when the window is being created. That's a convenient place to
initialize the MIDI system, so I added that code.
The call to midiInGetNumDevs returns the number of MIDI input devices
installed into Windows. If that number is 0, there is no keyboard, and the
program calls FatalError.
This program assumes that there is only one MIDI input device and that the
device is a keyboard. The MIDI API includes functions to get the capabilities
of devices and make intelligent decisions about them. I did not use any of
these. The program works with the one device. If I added a lot of input
devices, I'd have to list them in a menu and use the chosen one.
The call to midiInOpen opens the input device. Inasmuch as I am assuming only
one, I pass a device code of 0 to this function and tell it to send messages
to the CMidiDlg windows. For multiple devices, I'd send 1, 2, 3, and so on,
based on the chosen device. The first parameter is the address of the handle
variable to be used to start and stop the input device.
If the device is unavailable, midiInOpen returns nonzero. One reason could be
that another program has the device open.
After the device has been successfully opened, the call to midiInStart enables
the device to send messages.
The OnDestroy function stops and closes the MIDI input device. This function
is provided by Visual C++, but I had to provide the code. OnDestroy is called
when the CMidiDlg window is destroyed.
The OnMIDIMessage member function is one that I added. It is called when the
MIDI device sends a data message, which is translated into the MM_MIM_DATA
Windows message. The ON_MESSAGE macro that I added to the message map connects
the MM_MIM_DATA message to the OnMIDIMessage function. The function interprets
the message. A low-order byte of 0x90 identifies a MIDI note-on message. MIDI
also has a note-off message, but my keyboard never sends it. The note-on
message includes a number that identifies the note and a value that identifies
its velocity. My keyboard has weighted keys. The velocity indicator specifies
how hard I pressed the key. When I release the key, the keyboard sends a
note-on message with a velocity of zero.
The OnClearButton function is called when the user clicks the Clear command
button on the dialog window. The program clears the contents of the list box
and continues.
The note values from my keyboard range from 21 through 108, representing the
88 keys. I use those values to compute the notes from A through B-flat and the
octave from 1 through 8. With that information and the use of velocity to
determine whether the note is being pressed or released, I can maintain an
array of notes being played at any given time. I can then start to build in
the intelligence that deduces a chord.

Just now, the program displays only the note, octave, and on/off status in a
list box. The other stuff comes later. Under-standing some of what I am doing
requires a basic understanding of musical theory. I'll explain the numerical
construction of chords and the relationship they have to one another in only
the simplest of terms. The more interesting parts will be how the program uses
Visual C++ to add user-interface elements, and how it uses--and eventually
encapsulates--the MIDI portion of the Windows multimedia-extensions API.
By the way, this project needs a name. I got stung a couple of times when I
inadvertently used someone else's program name. I'm thinking about naming it
after a deceased bass player (who is not in a position to complain). Any
suggestions?


Source Code


The source-code files for this unnamed project are free. You can download them
from the DDJ Forum on CompuServe and on the Internet by anonymous ftp; see
"Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Make sure that you
indicate that you want mididlg.h and mididlg.cpp and the attendant files. The
code is free, but if you'd like to support my Careware charity, include a
dollar for the Brevard County Food Bank. 


Why Not Roberta?


I saw Bob the other day in a computer store. I looked over the shoulder of a
computer novice who was timidly exploring the rooms in Bob's house and
learning how to use a computer. Bob is a user interface for the epsilon minus.
How ironic. IDG can't do a book called Bob For Dummies. It would have to be,
Bob Is For Dummies.
Good ol' Microsoft. They always make me think of new "C Programming" column
projects. Their unpublished character-oriented-windows (COW) library inspired
D-Flat. Now I'm considering an improved user interface, one that targets folks
who find Bob too urbane and intimidating. The absolute lowest common
denominator in user interfaces. The UI for the rest of us.
Let's see. How would it go? I turn on my computer. The screen flashes and the
disk whirs and rattles. An image appears and there I am, standing at the door
of a house trailer. A pickup truck on blocks sits nearby. Probably an
information-superhighway metaphor. The gun rack in the pickup's rear window
sports a fishing pole, maybe the tool that I use to locate things. A porcelain
pink flamingo stands in the lawn ready to help me with what--my travel plans?
Nearby, a rubber tire hangs on a rope from the limb of an oak tree. Games, no
doubt. A lawn statue of a porter stands mute holding a lantern. Wonder what he
does. A scrawny cartoon hound dog scratches his ear and says through my sound
card, "Hey, Goober." He'll be my guide through the user interface. He leads me
inside the trailer, where a velvet painting of Elvis hangs over a fake
fireplace with a light bulb and cellophane fire simulation. A sign on the wall
in the bathroom says, "We aim to please, you aim, too, please." Each of these
artifacts is a metaphor that lets me use the computer to do something
meaningful--like manage my deep-woods-survival club meeting schedule. There's
lots more, but I'll save the surprises for the first beta.
I'm calling it..."Jim Bob(TM)." To be truthful, I had intended to call this
interface "Bubba." But just between the time I wrote this column and the time
it went to press, someone else posted a similar interface called Bubba (shades
of my recent experience with the "IMail" moniker). Oh well, great minds and
all that jazz.

Listing One
// mididlg.h : header file
/////////////////////////////////////////////////////////////////////////////
// CMidiDlg dialog
#include <mmsystem.h>
class CMidiDlg : public CDialog
{
// private data
 HMIDIIN hMidiIn;
// private functions
 void FatalError(LPCTSTR msg);
 LONG OnMIDIMessage(WPARAM, LPARAM msg);
// Construction
public:
 CMidiDlg(CWnd* pParent = NULL); // standard constructor
// Dialog Data
 //{{AFX_DATA(CMidiDlg)
 enum { IDD = IDD_MIDI_DIALOG };
 // NOTE: the ClassWizard will add data members here
 //}}AFX_DATA
 // ClassWizard generated virtual function overrides
 //{{AFX_VIRTUAL(CMidiDlg)
 protected:
 virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support
 //}}AFX_VIRTUAL
// Implementation
protected:
 HICON m_hIcon;
 // Generated message map functions
 //{{AFX_MSG(CMidiDlg)
 virtual BOOL OnInitDialog();
 afx_msg void OnPaint();
 afx_msg HCURSOR OnQueryDragIcon();
 afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct);
 afx_msg void OnDestroy();
 afx_msg void OnClearbutton();
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};


Listing Two
// mididlg.cpp : implementation file
#include "stdafx.h"
#include "midi.h"
#include "mididlg.h"
#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
/////////////////////////////////////////////////////////////////////////////
// CMidiDlg dialog
CMidiDlg::CMidiDlg(CWnd* pParent /*=NULL*/)
 : CDialog(CMidiDlg::IDD, pParent)
{
 //{{AFX_DATA_INIT(CMidiDlg)
 // NOTE: the ClassWizard will add member initialization here
 //}}AFX_DATA_INIT
 // Note that LoadIcon does not require a subsequent DestroyIcon in Win32
 m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
 hMidiIn = 0;
}
void CMidiDlg::DoDataExchange(CDataExchange* pDX)
{
 CDialog::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CMidiDlg)
 // NOTE: the ClassWizard will add DDX and DDV calls here
 //}}AFX_DATA_MAP
}
BEGIN_MESSAGE_MAP(CMidiDlg, CDialog)
 //{{AFX_MSG_MAP(CMidiDlg)
 ON_WM_PAINT()
 ON_WM_QUERYDRAGICON()
 ON_WM_CREATE()
 ON_WM_DESTROY()
 ON_MESSAGE(MM_MIM_DATA, OnMIDIMessage)
 ON_BN_CLICKED(IDC_CLEARBUTTON, OnClearbutton)
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()
/////////////////////////////////////////////////////////////////////////////
// CMidiDlg message handlers
BOOL CMidiDlg::OnInitDialog()
{
 CDialog::OnInitDialog();
 CenterWindow();
 
 // TODO: Add extra initialization here
 
 return TRUE; // return TRUE unless you set the focus to a control
}
// If you add a minimize button to your dialog, you will need the code below
// to draw the icon. For MFC applications using the document/view model,
// this is automatically done for you by the framework.
void CMidiDlg::OnPaint() 
{
 if (IsIconic())
 {
 CPaintDC dc(this); // device context for painting
 SendMessage(WM_ICONERASEBKGND, (WPARAM) dc.GetSafeHdc(), 0);
 // Center icon in client rectangle

 int cxIcon = GetSystemMetrics(SM_CXICON);
 int cyIcon = GetSystemMetrics(SM_CYICON);
 CRect rect;
 GetClientRect(&rect);
 int x = (rect.Width() - cxIcon + 1) / 2;
 int y = (rect.Height() - cyIcon + 1) / 2;
 // Draw the icon
 dc.DrawIcon(x, y, m_hIcon);
 }
 else
 {
 CDialog::OnPaint();
 }
}
// The system calls this to obtain the cursor to display while the user drags
// the minimized window.
HCURSOR CMidiDlg::OnQueryDragIcon()
{
 return (HCURSOR) m_hIcon;
}
void CMidiDlg::FatalError(LPCTSTR msg)
{
 MessageBeep(MB_ICONSTOP);
 MessageBox(msg, "MIDI KB", MB_OK MB_ICONEXCLAMATION);
 DestroyWindow() ;
} 
int CMidiDlg::OnCreate(LPCREATESTRUCT lpCreateStruct) 
{
 if (CDialog::OnCreate(lpCreateStruct) == -1)
 return -1;
 
 // ---- test for MIDI input devices
 if (midiInGetNumDevs() == 0)
 FatalError("No MIDI input devices");
 // ---- assume one MIDI input device, open the keyboard
 else if (midiInOpen(&hMidiIn, 0, (unsigned long) m_hWnd, 0, 
 CALLBACK_WINDOW) != 0)
 FatalError("Cannot open MIDI input device");
 else
 midiInStart(hMidiIn);
 return 0;
}
void CMidiDlg::OnDestroy() 
{
 CDialog::OnDestroy();
 
 if (hMidiIn != 0) {
 midiInStop(hMidiIn);
 midiInClose(hMidiIn);
 }
}
LONG CMidiDlg::OnMIDIMessage(WPARAM, LPARAM msg)
{
 // --- extract MIDI message components
 if ((msg & 0xff) == 0x90) { // MIDI status = note on
 BOOL velocity = ((msg >> 16) & 0xff) != 0;
 short unsigned int nt = ((msg >> 8) & 0xff) - 21;
 short unsigned int note = nt % 12;
 short unsigned int octave = nt / 12 + 1;

 static char *notes[] = {
 "A ","Bb","B ","C ","Db","D ",
 "Eb","E ","F ","Gb","G ","Ab"
 };
 CString mstr(notes[note]);
 char oc[] = " ( ) ";
 oc[2] = octave+'0';
 mstr += oc;
 mstr += (velocity ? "On" : "Off");
 CListBox *pListBox = (CListBox*)GetDlgItem(IDC_NOTELIST);
 pListBox->AddString(mstr);
 }
 return 0;
}
void CMidiDlg::OnClearbutton() 
{
 CListBox *pListBox = (CListBox*)GetDlgItem(IDC_NOTELIST); 
 pListBox->ResetContent();
}












































ALGORITHM ALLEY


The Blowfish Encryption Algorithm: One Year Later




Bruce Schneier


DES is the workhorse of cryptography algorithms, but it's long been time to
replace the 19-year-old standard. The recent design of a $1M machine that
could recover a DES key in 3.5 hours only confirmed what everybody knew--DES's
key size is far too small.
From the outset, DES was trusted only because it survived the scrutiny of the
NSA. Experts trusted DES because it was a published standard, and because it
survived 20 years of intensive scrutiny by cryptographers around the world.
Cryptography is like that: Confidence in an algorithm grows as group after
group fails to break it.
Recently, candidates that can be considered serious DES replacements are
emerging, although none have yet taken widespread hold. Triple-DES is the
conservative approach; IDEA (used in PGP) is the most promising new algorithm.
And there are a bevy of unpatented also-rans: RC4 (once a trade secret of RSA
Data Security, but now publicly available on the Internet), SAFER, and my own
Blowfish.
I first presented Blowfish at the Cambridge Algorithms Workshop; see
"Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish),"
Fast Software Encryption, R. Anderson, ed., Lecture Notes in Computer Science
#809 (Springer-Verlag, 1994) and in DDJ ("The Blowfish Encryption Algorithm,"
April 1994). From the start, Blowfish was intended to be a completely
free--unpatented, unlicensed, and uncopyrighted--alternative to DES. Since
then it has been analyzed, and we've begun to see its application in both
public and private systems. In this article, I'll present new Blowfish code,
as well as updates on the algorithm's security.


Description of Blowfish


Blowfish is a block cipher that encrypts data in 8-byte blocks. The algorithm
consists of two parts: a key-expansion part and a data-encryption part. Key
expansion converts a variable-length key of at most 56 bytes (448 bits) into
several subkey arrays totaling 4168 bytes. (Note that the description
presented here differs slightly from the one in the April 1994 DDJ,
specifically in steps 5 and 6 of the subkey-generation algorithm.)
Blowfish has 16 rounds. Each round consists of a key-dependent permutation and
a key- and data-dependent substitution. All operations are XORs and additions
on 32-bit words. The only additional operations are four indexed-array data
lookups per round.
Subkeys. Blowfish uses many subkeys, which must be precomputed before any data
encryption or decryption. The P-array consists of 18 32-bit subkeys: P1,
P2,..., P18. There are also four 32-bit S-boxes with 256 entries each: S1,0,
S1,1,..., S1,255; S2,0, S2,1,..., S2,255; S3,0, S3,1,..., S3,255; S4,0,
S4,1,..., S4,255.
Encryption and decryption. Blowfish has 16 rounds. The input is a 64-bit data
element, x. Divide x into two 32-bit halves: xL, xR. Then, for i=1 to 16:
xL=xL XOR Pi
xR=F(xL) XOR xR
Swap xL and xR
After the 16th round, swap xL and xR again to undo the last swap. Then, xR=xR
XOR P17 and xL=xL XOR P18. Finally, recombine xL and xR to get the ciphertext.
Function F. For Function F, divide xL into four 8-bit quarters: a, b, c, and
d. Then, F(xL)=((S1,a+S2,b mod 232) XOR S3,c)+S4,d mod 232.
Decryption is exactly the same as encryption, except that P1, P2,..., P18 are
used in the reverse order.
Generating the subkeys. The subkeys are calculated using the Blowfish
algorithm:
1. Initialize first the P-array and then the four S-boxes, in order, with a
fixed string consisting of the hexadecimal digits of p (less the initial 3):
P1=0x243f6a88, P2=0x85a308d3, P3=0x13198a2e, P4=0x03707344, and so on.
2. XOR P1 with the first 32 bits of the key, XOR P2 with the second 32 bits of
the key, and so on for all bits of the key (possibly up to P14). Repeatedly
cycle through the key bits until the entire P-array has been XORed with key
bits. (For every short key, there is at least one equivalent longer key; for
example, if A is a 64-bit key, then AA, AAA, and so on, are equivalent keys.)
3. Encrypt the all-zero string with the Blowfish algorithm, using the subkeys
described in steps 1 and 2.
4. Replace P1 and P2 with the output of step 3.
5. Encrypt the output of step 3 using the Blowfish algorithm with the modified
subkeys.
6. Replace P3 and P4 with the output of step 5.
7. Continue the process, replacing all entries of the P-array, and then all
four S-boxes in order, with the output of the continuously changing Blowfish
algorithm. 
In total, 521 iterations are required to generate all required subkeys.
Applications can store the subkeys rather than execute this derivation process
multiple times.
C code. Listings One and Two present C code for Blowfish that has been
improved and corrected over that published in April 1994. Listing Three
provides sample test vectors for the code.


Cryptanalysis of Blowfish


When I first presented Blowfish, DDJ sponsored a cryptanalysis contest. I am
pleased to present the most interesting submissions here.
John Kelsey developed an attack that could break 3-round Blowfish, but was
unable to extend it. This attack exploits the F function and the fact that
addition mod 232 and XOR do not commute. Vikramjit Singh Chhabra looked at
ways of efficiently implementing a brute-force keysearch machine.
Serge Vaudenay examined a simplified variant of Blowfish, with the S-boxes
known and not key dependent. For this variant, a differential attack can
recover the P-array with 28r+1 chosen plaintexts (r is the number of rounds).
This attack is impossible for 8-round Blowfish and higher, since more
plaintext is required than can possibly be generated with a 64-bit block
cipher. 
For certain weak keys that generate weak S-boxes (the odds of getting them
randomly are 1 in 214), the same attack requires only 24r+1 chosen plaintexts
to recover the P-array (again, assuming the S-boxes are known). With unknown
S-boxes, this attack can detect whether a weak key is being used, but cannot
determine what it is (neither the S-boxes, the P-array, nor the key itself).
This attack only works against reduced-round variants; it is completely
ineffective against 16-round Blowfish.
Even so, the discovery of weak keys in Blowfish is significant. A weak key is
one for which two entries for a given S-box are identical. There is no way to
check for weak keys before doing the key expansion. If you are worried about
it, you have to do the key expansion and check for identical S-box entries
after you generate a Blowfish key. I don't think it's necessary, though.


Conclusion


As of yet, no one has developed an attack that breaks Blowfish. Even so, more
cryptanalysis is required before pronouncing the algorithm secure. I invite
you to continue analyzing the algorithm.


Listing One
 
/*********************blowfish.h********************/ 
/* $Id: blowfish.h,v 1.3 1995/01/23 12:38:02 pr Exp pr $*/ 
 
#define MAXKEYBYTES 56 /* 448 bits */ 
#define bf_N 16 
#define noErr 0 
#define DATAERROR -1 
#define KEYBYTES 8 
#define subkeyfilename "Blowfish.dat" 
 
#define UWORD_32bits unsigned long 
#define UWORD_16bits unsigned short 
#define UBYTE_08bits unsigned char 
 
/* choose a byte order for your hardware. ABCD - big endian - motorola */ 
#ifdef ORDER_ABCD 
union aword { 
 UWORD_32bits word; 
 UBYTE_08bits byte [4]; 
 struct { 
 unsigned int byte0:8; 
 unsigned int byte1:8; 
 unsigned int byte2:8; 
 unsigned int byte3:8; 
 } w; 
}; 
#endif /* ORDER_ABCD */ 
 
/* DCBA - little endian - intel */ 
#ifdef ORDER_DCBA 
union aword { 
 UWORD_32bits word; 
 UBYTE_08bits byte [4]; 
 struct { 
 unsigned int byte3:8; 
 unsigned int byte2:8; 
 unsigned int byte1:8; 
 unsigned int byte0:8; 
 } w; 
}; 
#endif /* ORDER_DCBA */ 
 
/* BADC - vax */ 
#ifdef ORDER_BADC 
union aword { 
 UWORD_32bits word; 
 UBYTE_08bits byte [4]; 
 struct { 
 unsigned int byte1:8; 
 unsigned int byte0:8; 
 unsigned int byte3:8; 
 unsigned int byte2:8; 
 } w; 
}; 
#endif /* ORDER_BADC */ 
 
short opensubkeyfile(void); 

unsigned long F(unsigned long x); 
void Blowfish_encipher(unsigned long *xl, unsigned long *xr); 
void Blowfish_decipher(unsigned long *xl, unsigned long *xr); 
short InitializeBlowfish(unsigned char key[], short keybytes); 
 
 

Listing Two 
 
/*********************blowfish.c*********************/ 
/* TODO: test with zero length key */ 
/* TODO: test with a through z as key and plain text */ 
/* TODO: make this byte order independent */ 
 
#include <stdio.h> /* used for debugging */ 
#ifdef MACINTOSH 
 #include <Types.h> /* FIXME: do we need this? */ 
#endif 
 
#include "blowfish.h" 
#include "bf_tab.h" /* P-box P-array, S-box */ 
 
#define S(x,i) (bf_S[i][x.w.byte##i]) 
#define bf_F(x) (((S(x,0) + S(x,1)) ^ S(x,2)) + S(x,3)) 
#define ROUND(a,b,n) (a.word ^= bf_F(b) ^ bf_P[n]) 
 
inline 
void Blowfish_encipher(UWORD_32bits *xl, UWORD_32bits *xr) 
{ 
 union aword Xl; 
 union aword Xr; 
 
 Xl.word = *xl; 
 Xr.word = *xr; 
 
 Xl.word ^= bf_P[0]; 
 ROUND (Xr, Xl, 1); ROUND (Xl, Xr, 2); 
 ROUND (Xr, Xl, 3); ROUND (Xl, Xr, 4); 
 ROUND (Xr, Xl, 5); ROUND (Xl, Xr, 6); 
 ROUND (Xr, Xl, 7); ROUND (Xl, Xr, 8); 
 ROUND (Xr, Xl, 9); ROUND (Xl, Xr, 10); 
 ROUND (Xr, Xl, 11); ROUND (Xl, Xr, 12); 
 ROUND (Xr, Xl, 13); ROUND (Xl, Xr, 14); 
 ROUND (Xr, Xl, 15); ROUND (Xl, Xr, 16); 
 Xr.word ^= bf_P[17]; 
 
 *xr = Xl.word; 
 *xl = Xr.word; 
} 
void Blowfish_decipher(UWORD_32bits *xl, UWORD_32bits *xr) 
{ 
 union aword Xl; 
 union aword Xr; 
 
 Xl = *xl; 
 Xr = *xr; 
 
 Xl.word ^= bf_P[17]; 
 ROUND (Xr, Xl, 16); ROUND (Xl, Xr, 15); 

 ROUND (Xr, Xl, 14); ROUND (Xl, Xr, 13); 
 ROUND (Xr, Xl, 12); ROUND (Xl, Xr, 11); 
 ROUND (Xr, Xl, 10); ROUND (Xl, Xr, 9); 
 ROUND (Xr, Xl, 8); ROUND (Xl, Xr, 7); 
 ROUND (Xr, Xl, 6); ROUND (Xl, Xr, 5); 
 ROUND (Xr, Xl, 4); ROUND (Xl, Xr, 3); 
 ROUND (Xr, Xl, 2); ROUND (Xl, Xr, 1); 
 Xr.word ^= bf_P[0]; 
 
 *xl = Xr.word; 
 *xr = Xl.word; 
} 
/* FIXME: Blowfish_Initialize() ??? */ 
short InitializeBlowfish(UBYTE_08bits key[], short keybytes) 
{ 
 short i; /* FIXME: unsigned int, char? */ 
 short j; /* FIXME: unsigned int, char? */ 
 UWORD_32bits data; 
 UWORD_32bits datal; 
 UWORD_32bits datar; 
 union aword temp; 
/* fprintf (stderr, "0x%x 0x%x ", bf_P[0], bf_P[1]); /* DEBUG */ 
/* fprintf (stderr, "%d %d\n", bf_P[0], bf_P[1]); /* DEBUG */ 
 j = 0; 
 for (i = 0; i < bf_N + 2; ++i) { 
 temp.word = 0; 
 temp.w.byte0 = key[j]; 
 temp.w.byte1 = key[(j+1)%keybytes]; 
 temp.w.byte2 = key[(j+2)%keybytes]; 
 temp.w.byte3 = key[(j+3)%keybytes]; 
 data = temp.word; 
 bf_P[i] = bf_P[i] ^ data; 
 j = (j + 4) % keybytes; 
 } 
 datal = 0x00000000; 
 datar = 0x00000000; 
 for (i = 0; i < bf_N + 2; i += 2) { 
 Blowfish_encipher(&datal, &datar); 
 
 bf_P[i] = datal; 
 bf_P[i + 1] = datar; 
 } 
 for (i = 0; i < 4; ++i) { 
 for (j = 0; j < 256; j += 2) { 
 
 Blowfish_encipher(&datal, &datar); 
 
 bf_S[i][j] = datal; 
 bf_S[i][j + 1] = datar; 
 } 
 } 
 return 0; 
} 
=============== bf_tab.h ============== 
/* bf_tab.h: Blowfish P-box and S-box tables */ 
 
static UWORD_32bits bf_P[bf_N + 2] = { 
 0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344, 
 0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89, 

 0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c, 
 0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917, 
 0x9216d5d9, 0x8979fb1b, 
}; 
static UWORD_32bits bf_S[4][256] = { 
 0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7, 
 0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99, 
 0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16, 
 0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e, 
 0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee, 
 0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013, 
 0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef, 
 0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e, 
 0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60, 
 0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440, 
 0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce, 
 0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a, 
 0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e, 
 0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677, 
 0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193, 
 0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032, 
 0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88, 
 0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239, 
 0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e, 
 0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0, 
 0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3, 
 0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98, 
 0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88, 
 0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe, 
 0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6, 
 0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d, 
 0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b, 
 0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7, 
 0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba, 
 0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463, 
 0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f, 
 0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09, 
 0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3, 
 0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb, 
 0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279, 
 0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8, 
 0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab, 
 0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82, 
 0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db, 
 0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573, 
 0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0, 
 0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b, 
 0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790, 
 0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8, 
 0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4, 
 0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0, 
 0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7, 
 0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c, 
 0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad, 
 0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1, 
 0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299, 
 0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9, 
 0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477, 
 0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf, 

 0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49, 
 0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af, 
 0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa, 
 0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5, 
 0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41, 
 0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915, 
 0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400, 
 0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915, 
 0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664, 
 0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a, 
 0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623, 
 0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266, 
 0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1, 
 0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e, 
 0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6, 
 0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1, 
 0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e, 
 0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1, 
 0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737, 
 0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8, 
 0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff, 
 0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd, 
 0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701, 
 0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7, 
 0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41, 
 0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331, 
 0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf, 
 0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af, 
 0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e, 
 0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87, 
 0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c, 
 0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2, 
 0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16, 
 0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd, 
 0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b, 
 0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509, 
 0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e, 
 0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3, 
 0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f, 
 0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a, 
 0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4, 
 0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960, 
 0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66, 
 0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28, 
 0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802, 
 0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84, 
 0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510, 
 0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf, 
 0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14, 
 0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e, 
 0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50, 
 0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7, 
 0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8, 
 0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281, 
 0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99, 
 0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696, 
 0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128, 
 0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73, 
 0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0, 

 0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0, 
 0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105, 
 0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250, 
 0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3, 
 0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285, 
 0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00, 
 0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061, 
 0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb, 
 0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e, 
 0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735, 
 0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc, 
 0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9, 
 0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340, 
 0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20, 
 0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7, 
 0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934, 
 0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068, 
 0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af, 
 0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840, 
 0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45, 
 0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504, 
 0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a, 
 0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb, 
 0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee, 
 0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6, 
 0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42, 
 0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b, 
 0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2, 
 0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb, 
 0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527, 
 0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b, 
 0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33, 
 0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c, 
 0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3, 
 0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc, 
 0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17, 
 0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564, 
 0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b, 
 0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115, 
 0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922, 
 0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728, 
 0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0, 
 0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e, 
 0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37, 
 0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d, 
 0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804, 
 0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b, 
 0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3, 
 0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb, 
 0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d, 
 0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c, 
 0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350, 
 0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9, 
 0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a, 
 0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe, 
 0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d, 
 0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc, 
 0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f, 
 0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61, 

 0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2, 
 0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9, 
 0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2, 
 0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c, 
 0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e, 
 0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633, 
 0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10, 
 0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169, 
 0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52, 
 0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027, 
 0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5, 
 0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62, 
 0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634, 
 0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76, 
 0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24, 
 0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc, 
 0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4, 
 0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c, 
 0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837, 
 0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0, 
 0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b, 
 0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe, 
 0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b, 
 0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4, 
 0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8, 
 0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6, 
 0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304, 
 0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22, 
 0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4, 
 0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6, 
 0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9, 
 0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59, 
 0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593, 
 0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51, 
 0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28, 
 0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c, 
 0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b, 
 0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28, 
 0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c, 
 0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd, 
 0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a, 
 0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319, 
 0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb, 
 0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f, 
 0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991, 
 0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32, 
 0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680, 
 0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166, 
 0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae, 
 0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb, 
 0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5, 
 0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47, 
 0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370, 
 0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d, 
 0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84, 
 0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048, 
 0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8, 
 0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd, 
 0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9, 

 0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7, 
 0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38, 
 0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f, 
 0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c, 
 0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525, 
 0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1, 
 0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442, 
 0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964, 
 0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e, 
 0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8, 
 0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d, 
 0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f, 
 0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299, 
 0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02, 
 0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc, 
 0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614, 
 0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a, 
 0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6, 
 0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b, 
 0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0, 
 0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060, 
 0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e, 
 0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9, 
 0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f, 
 0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6, 
}; 
 

Listing Three 
 
************** TEST VECTORS *********************************** 
This is a test vector. 
Plaintext is "BLOWFISH". 
The key is "abcdefghijklmnopqrstuvwxyz". 
 
#define PL 0x424c4f57l 
#define PR 0x46495348l 
#define CL 0x324ed0fel 
#define CR 0xf413a203l 
 static char keey[]="abcdefghijklmnopqrstuvwxyz"; 
 
This is another test vector. 
The key is "Who is John Galt?" 
 
#define PL 0xfedcba98l 
#define PR 0x76543210l 
#define CL 0xcc91732bl 
#define CR 0x8022f684l 
 
 









































































PROGRAMMER'S BOOKSHELF


Nontraditional Education Alternatives




Jonathan Erickson


As with just about every other part of society, digital communications is
changing the face of higher education. In fact, no less an authority than
Stanford University President Gerhard Casper has said, that as the electronic
revolution shortens the course of studies and lowers the cost, educators,
students, and taxpayers have to wonder if the "physical" university will be
worth attending in the future. 
Digital classroom instruction has become commonplace at most universities
around the country. As an example, Casper points to medical schools where
CD-ROMs with "virtual corpses" have replaced real cadavers for anatomical
instruction. There's little doubt, Casper said in a recent annual State of the
University address, that some CD-ROMs are superior to classroom "talking
heads." Still, the real impact of digital communication is being felt in the
area of "distance learning," where teachers, students, and resources are
dispersed--not just in location, but also in time. In typical
distance-learning scenarios, students "attend" courses at universities
hundreds or thousands of miles away. Instead of attending lectures with other
bleary-eyed classmates, distance-learning students attend their classes where
and when they want to. And increasingly, they are using the Internet or BBSs
to communicate with instructors, submit homework, and take exams. The college
credits they earn are just as valid as those granted to their traditional,
on-campus counterparts.
There are those who would argue that for students leaving home for the first
time, the university environment--from dorm life to football in the fall--is
often as important as an introduction to, say, Hegel. But Internet-based
distance learning can expand the scope of university programs (especially at
smaller colleges) by providing access to individuals and resources around the
world.
With this in mind, perhaps the greatest potential for distance learning exists
for individuals who want (or need) to return to school. Perhaps you want to
hone existing skills--or acquire new ones--and don't have time to attend a
traditional university program. Maybe you have the time, but an acceptable
university program isn't conveniently accessible. Maybe you just enjoy
structured learning programs. In such cases, nontraditional educational
programs may be your best option. 
In the past, nontraditional education programs were associated with
trade-school correspondence courses advertised on the inside of matchbook
covers. Now, however, independent study programs are offered by most colleges
and universities in the country. Need a few credits to supplement your work
with user interfaces? The Rochester Institute of Technology offers bachelors
and masters degrees by nontraditional methods--videocassettes, e-mail, cable
TV, courses on disk, and the like--in everything from graphic arts to computer
science. Want to pick up some business credits so that you can slide into
management? Check out the MBA programs offered by Syracuse University or the
University of Pittsburgh. Want a master's degree in engineering from Stanford
University, an MS in computer science from Rensselaer Polytechnic Institute,
or a master's in mathematics from the University of Massachusetts at Amherst?
You can do it, even if you live on the other side of the world. 
So where do you go to find out about nontraditional programs offered by
universities? The best place to start is Bear's Guide to Earning College
Degrees Nontraditionally, by John Bear and Mariah Bear. First published in
1974, this book provides information ranging from equivalency exams to
financial aid. The heart of the book, however, is its descriptions of courses
and programs offered by universities and colleges around the world; see
Example 1.
Pay particular attention to the term "accreditation." Reputable school
programs are accredited (validated) by agencies recognized by the U.S.
Department of Education. Among the regional accrediting agencies are the North
Central Association of Colleges and Schools and the Southern Association of
Colleges and Schools. Professional accrediting agencies include the Computer
Science Accreditation Board or the Accrediting Board for Engineering and
Technology. Accrediting agencies typically evaluate schools or programs on the
basis of their curricula, faculty, facilities, program length, tuition, fees,
academic objectives, credit received, and so on. Not all programs covered by
Bear and Bear are accredited; others have been accredited by some pretty
obscure agencies.
Not being accredited doesn't mean a program isn't any good (it may be too new
for accreditation). However, accreditation does provide you with an official
stamp of approval. It is significant that Bear and Bear devote a 15-page
chapter to the subject. If you're serious about completing a program of study
for an advanced degree, you'll be making a significant investment of time,
effort, and money. Investing your resources in a fly-by-night diploma mill may
not pay off in the long run. 
High-Technology Degree Alternatives, by Joel Butler, also devotes a chapter to
the topic of accreditation. Butler, however, only describes programs
accredited by recognized agencies. In general, however, High-Technology Degree
Alternatives is more focused than Bear's Guide. Butler zeros in on strategies
for earning a degree while working. As the title suggests, he also limits
coverage to high-tech careers--engineering, programming, and the like. While
Butler's book covers much of the same ground as the Bears', his discussion of
turning company-sponsored training sessions into college credits, or earning
credit from professional certificates and licenses is particularly valuable if
you're already in the work force. And, as you'd expect from a more narrowly
focused book, Butler presents dozens of program descriptions in a standardized
format; see Example 2. 
Both books suffer when it comes to timeliness. Distance-learning programs are
rapidly evolving, and books such as these have a difficult time keeping up.
Since new programs come online and existing ones disappear all the time, you
should use these books as a pointer rather than the final word. In particular,
the recent surge in Internet-based programs isn't reflected in either book.
For instance, an art-appreciation course offered by Penn State via the World
Wide Web is not mentioned. Also, I called numerous listings in the Bear book
and found a number of minor discrepancies in phone or fax numbers, programs
offered, and the like. 
Similarly, there are programs that, for whatever reason, both books fail to
mention. For instance, the University of Missouri at Columbia's Independent
Study Program isn't discussed at all. Offering dozens of courses across all
disciplines, the MU program provides a dial-up BBS for communicating with
faculty, submitting homework, and taking exams. Individual professors at the
university are also interacting via the Internet with students as far away as
Germany. 
If there's any question in your mind whether or not nontraditional distance
learning works, be assured that it does. One frequent DDJ contributor is about
to finish a PhD through Nova Southeastern University, and he's loved every
minute of it. For my part, I've completed over 30 credits (the equivalent of a
full year of course work) through the aforementioned University of Missouri
Center for Independent Study. I found that when I took the courses I wanted at
my own speed and on my own time, I enjoyed the learning process much more than
in my days as a full-time student.
Stanford's Gerhard Casper was only partially right when he said that the
beauty of the Internet is that it "makes it unnecessary for students to travel
long distances." The beauty of digital communication in general, and the
Internet in particular, is that it brings education to students, letting us
explore, learn, and grow in ways that are more meaningful to ourselves.
High-Technology Degree Alternatives
Joel Butler
Professional Publications, 1994 182 pp., $21.95 
ISBN 0-912045-61-2
Bear's Guide to Earning College Degrees Nontraditionally
John B. Bear and Mariah P. Bear
C&B Publishing (Ten Speed Press) 1995, 336 pp., $27.95 
ISBN 0-9629312-3-3
Example 1: Typical program description from Bear's Guide.
Nova Southeastern University
3301 College Ave.
Fort Lauderdale, FL 33314

Stephen Feldman, President

Education, administration, business, computer systems, social and systemic
studies, liberal studies, psychology, speech and language law

1964

Nonprofit, independent $$$$

(305) 475-7300, (800) 541-6682
(305) 475-7621 fax

Nova University has one of the more nontraditional doctoral programs ever to
achieve regional accreditation. The typical student attends one group meeting
a month (generally two or three days), plus two one-week residential sessions,
and from three to six practica which emphasize direct application of research
to the workplace. Total time: about three-and-a-half years. The university
also offers a Doctor of Arts in information science in which students use
interactive computers. A major part of instruction in this program is through
teleconferencing, TELNET, and TYMENET. Residential work has been offered in 23
states. Nova will consider offering the program in the continental United
States wherever a cluster of 20-25 students can be formed. Formerly Nova
University; they recently merged with Southeastern Medical School, hence the
name change.
Example 2: Typical program description from High-Tech Degree Alternatives.
Colorado State University
SURGE: Division of Continuing Education
Spruce Hall
Fort Collins, CO 80523

(303) 491-5288, (800) 525-4950
Fax: (303) 491-7886


Degrees Offered: MS in chemical engineering, civil engineering, computer
science, electrical engineering, interdisciplinary engineering, mechanical
engineering

Credit-Earning Methods Offered/Accepted: video-based classroom study, transfer
credit

Residency Required or External? 100% external.

Cost Basis/Major Costs:

- $250/350 (in/out state) per credit hour

Comments: Colorado SURGE has a very strong set of graduate degree programs
with real academic standards. Students view video classes at existing SURGE
sites (corporate or open) or by establishing their own SURGE site (which is
easy according to the bulletin). SURGE is fully electronic, offering e-mail
and fax communications with faculty and staff as well as video courses. SURGE
is also affiliated with many consortiums and leading-edge corporate
organizations.

Accredited by: North Central Association of Colleges and Schools

















































SWAINE'S FLAMES


Vanity Net Addresses


You laugh at me behind my back because I sometimes use an eWorld e-mail
address. Oh, I know you do. What you don't know is that I do have a real
Internet address, only I'm not telling you what it is. That's right: I use an
eWorld e-mail address to hide my real net address. I understand that other
people sometimes do just the reverse: They use a prestige e-mail address to
mask their real one.
Makes you wonder: Are there vanity e-mail addresses, like vanity license
plates? If you think of any, send them along and I'll print the best. In this
case phony is better than real, because (I hasten to add) I don't want to
publish real people's real Internet addresses here. Another challenge: What's
the shortest Internet address you've seen? I have a friend in Germany whose
address is eight characters long, including the @ sign and the period. Surely
that's not the shortest. Send me your candidate for briefest actual e-mail
address. I won't publish them, but I'll verify them and award the customary
no-prize to any response that I arbitrarily decide to favor.
By the way, in case you don't realize how important winning these no-prizes
is, a past winner, Mike Morton, recently was a runner-up on NPR's Sunday
morning puzzle contest. That contest is designed by Will Shortz, and Shortz is
the puzzle expert who supplied all the riddles for the Riddler in Batman
Forever. And that movie is expected to gross eight trillion dollars this year,
so I think you can see that these no-prizes are pretty significant.
And now for something completely different: From Danyll, a technology writer
for the South China Morning Post and an old pal of Penn & Teller's Teller,
come thanks for reminding him about that fine British weekly, New Scientist.
Also, the following bulletin:
The programming language C++ is called C purasu purasu in Japanese and C ga ga
in Cantonese. Ga ga, in Cantonese, is simply the Chinese character for "+"
twice. The Cantonese know more about programming perhaps than one would have
thought.
Brandon J. Rickman recently posted a guide to MRML on the World Wide Web.
MRML, for those who have been living in a cave the past three months, is the
breakthrough Mind Reading Markup Language, a proprietary extension of the
HyperText Markup Language of the World Wide Web. You might want to stop
reading at this point if that proprietary business bothers you, because
Brandon claims rights to "any ideas you come up with while reading this
information."
MRML includes such tags as <BRAINSCAN> and <THOUGHTSUCK>, which are of
self-evident usefulness.
Here are Brandon's descriptions of the handy tags BELIEVE and FORGET:

<BELIEVE>text</BELIEVE>
Explicit thoughts to be planted in the client's mind. Beware of contradictory
programming! Try to remove previous conceptions before reprogramming. To
reprogram someone that thinks that Pepsi is better than Coke:
1. <BELIEVE>You have no opinions about the relationship between Coke and
Pepsi.</BELIEVE>
2. <BELIEVE>Coke is better than Pepsi.</BELIEVE>
3. <BELIEVE>You are thirsty.</BELIEVE>
 <FORGET>text</FORGET>
Things you want the client to forget. It may be desirable to have the client
forget the URL of your MRML documents.

Michael Swaineeditor-at-large
MikeSwaine@eworld.com




































OF INTEREST
Microsoft has licensed MPEG software- playback technology from Mediamatics for
inclusion in future versions of Windows. This will enable users to experience
TV-like video and CD-quality sound without special add-on hardware. MPEG is a
compression/decompression (codec) system for compressing full-screen,
VHS-quality digital video and CD-quality audio into small files. Currently,
Windows 95-based MPEG video playback performance from CD-ROMs on 90-MHz
Pentium-class computers is approximately 24 frames per second with 11-kHz
audio. 
Microsoft 
1 Microsoft Way
Redmond, WA 98052
206-882-8080
SQA has announced object-level support for the testing of all Visual Basic
objects and Visual Basic custom controls (VBXs) in its SQA TeamTest tool. With
SQA TeamTest's support for VBXs, you simply click on the VBX while running SQA
TeamTest. All properties and data of that VBX are displayed, allowing you to
create a baseline for testing subsequent builds of the application. SQA's VBX
testing technology requires no modifications to the application being tested.
In addition, SQA TeamTest allows users to edit the baseline data during test
recording, changing any data or properties.
SQA Inc.
10 State Street
Woburn, MA 01801
617-932-0110
Zinc Software has released Version 4.1 of Zinc Application Framework, a C++
class library and visual development tool. Zinc 4.1 lets you create globally
enabled, object-oriented, cross-platform applications with one set of source
code. Among the additions and enhancements, Version 4.1 includes a new Image
object, allowing the display of large bitmaps from native file formats; a new
File object, allowing Zinc applications to read/write portable binary files on
any operating system; new CTL3D support for Windows, improving the look of
Windows applications; an improved Help display; support for dot-matrix
printers in MS-DOS applications; easier function names; and improved
performance. Zinc Application Framework is licensed on a per developer, per
platform basis. 
Zinc Software 
405 South 100 East
Pleasant Grove, UT 84062 
801-785-8900
S3 Inc. has announced a new multimedia chipset based on its "cooperative
accelerator architecture" which, the company claims, offers TV-quality video
and CD-quality audio for desktop applications. S3's three-chip offering is
hardware that integrates graphics, audio, and MPEG onto the PC motherboard.
S3's Cooperative Accelerator Architecture divides the work between dedicated
S3 hardware and software executing on the PC's CPU to provide enhanced
graphics and MPEG-based video and audio performance. The result is TV-quality,
full-screen 30 frames-per-second (FPS) live video or MPEG video, and
CD-quality audio. The S3 Streams Processor uses parallel processing to provide
the broadcast-quality special effects and greatly reduced response times
required for graphics-packed interactive games and multimedia titles.
S3 Inc.
2770 San Tomas Expwy.
Santa Clara, CA 95051
408-980-5400
Microsoft has announced that Windows NT Workstation and Windows NT Server
Version 3.51 includes support for PowerPC-based systems. PowerPC becomes the
fourth platform supported by Windows NT (others include Intel, DEC Alpha AXP,
and MIPS). Hardware vendors planning to sell Windows NT on the PowerPC include
FirePower Systems, IBM, IPC Technologies, Motorola Computer Group, and Reply
Corp. During the next few months, Microsoft plans to release development tools
for the PowerPC, including Visual C++, Microsoft Test, and SNA Server. 
Windows NT Server 3.51 also contains the upgraded Network Client Administrator
utility, making it easier for customers to deploy Windows 95. The expanded
utility allows administrators to create a boot disk to install Windows 95 over
the network.
Microsoft 
1 Microsoft Way
Redmond, WA 98052
206-882-8080
The Multimedia PC Working Group has released the Multimedia PC Level 3
Specification, an updated standard for Multimedia PCs. The new standard
includes improved sound and video performance requirements. 
Minimum requirements for MPC3 include support for MPEG1 plus
software-implemented video codecs, a 75-MHz Pentium or similar processor,
quadruple-speed CD-ROM drive, and wavetable sound. The new requirements will
provide hardware that delivers full-screen, full-motion video, and enhanced
CD-quality sound while playing demanding multimedia programs. MPC3 does not
replace MPC2, which was released in 1993. Software that runs on MPC- and
MPC2-compliant hardware will also run on MPC3-compliant hardware. 
The Multimedia PC Working Group will provide hardware test suites to measure
MPC3 compliance. The test suites, produced in cooperation with the National
Software Testing Laboratories, check for individual component compliance as
well as total system performance. The tests establish whether the computer is
delivering MPC compliance in the key areas of processing speed, video
playback, graphics performance, and audio. 
Multimedia PC 
Software Publishers Association
1730 M Street NW, Suite 700
Washington, DC 20036-4510
202-452-1600
California Software is shipping InterAp, a set of integrated and customizable
windows applications for accessing services and information on the Internet.
InterAp includes several "intelligent agents" that automate many Internet
processes for searching and retrieving information on the Internet, and then
publishing the results in many useful ways. These agents may also be
integrated with existing windows applications, or with other InterAp
applications, including the World Wide Web Navigator, Telnet, FTP, and secure
mail messaging.
InterAp's intelligent agents are based on the published NetScripts API and are
designed to do many of the repetitive and time-consuming Internet tasks. They
work in conjunction with the scheduler to perform these tasks in attended or
unattended modes. Once the information is found, it can be received in the
form of an e-mail message or formatted and published as a Word document or
Excel spreadsheet. Four intelligent agents ship with the toolkit: NetNews
Wizard, which enables the user to conduct key word searches through NetNews
articles; Web Publisher, which provides users with the ability to do research
on the Internet by automatically accessing information published on the World
Wide Web; FTP Publisher, which automates the FTP process and allows users to
intelligently move binary and text files around the Internet; and Mail
Publisher, which scans the to, from, subject, and text fields of incoming
messages and then redistributes messages to a defined list of recipients.
The intelligent agents supplied within InterAp are Visual Basic programs that
can be further integrated with existing applications or used to create new
agents using Visual Basic, Visual C++, and other programming languages.
Programmers can develop customized agents that match their own business model
using the NetScripts API reference guide. InterAp sells for $99.95. 
California Software Inc.
2121 E. Pacific Coast Hwy., Suite 120a
Corona del Mar, CA 92625
714-729-2270
http://www.calsoft.com
Wollongong has announced a new family of Internet-access applications called
"Emissary" that combines a range of Internet facilities such as Web browsing,
mail, news reading, file retrieval, and interactive access into a single,
integrated Windows application. The package is based on an open API that
allows components to be "snapped in" easily. New components can take advantage
of dozens of existing shared services provided by Emissary. In addition, new
services can be added or existing ones updated. As a result, you don't have to
reinvent the wheel each time you add a new capability. The package costs
$99.00.
Wollongong
1129 San Antonio Road
Palo Alto, CA 94303
415-962-7156
Tower Technology has released a Windows NT implementation of its TowerEiffel
system. Previously, the system supported SunOS, Solaris, HP/UX, NextStep, and
OS/2. The system now includes an Eiffel 3 compiler, development environment,
and programming tools. TowerEiffel interoperates between Eiffel and C++,
allowing you to utilize existing C++ libraries or add memory management to C++
programs. A prerelease version of the TowerEiffel for Windows sells for
$249.00.
Tower Technology
1501 West Koenig Lane
Austin, TX 78765
512-452-9455
http://www.ca.cf.cm.uk/Tower/
Ryan McFarland has announced VanGui for RM/Cobol, an interface builder that
uses Visual Basic custom control (VBXs) for building Windows-based front ends
for Cobol applications. VanGui consists of two major components: a design tool
and a run-time system. The design tool is a Windows app that provides Cobol
developers with the capability to define windows, populate those windows with
standard Windows and VBX controls, adjust the properties of the controls, and
attach Cobol event-handling logic to the events. The run-time system is a
Windows DLL that manages Windows messages, provides run-time support for the
controls, and provides a Cobol interface to the Windows API. 
Ryan McFarland

8911 N. Capital of Texas Hwy.
Austin, TX 78759
512-343-1010
info@liant.com
DDC-I has announced DACS-95, an Ada 95-compliant version of its Ada
development system. The system also includes all the surrounding mechanics for
program- library management, syntax analysis, automatic recompilation,
internal program representation, and run-time system interface. 
DDC-I
410 N. 44th Street
Phoenix, AZ 85008
602-275-7172
Orion Instruments is providing a free booklet entitled Contemporary Debugging
Strategies: Choosing the Right Tool for Embedded System Development and
Testing. The booklet, written by Dick Jensen and Jan Liband, covers a variety
of topics, including: logical, physical, and real-time debugging domains;
targetless debugging; emulation; modularized reusable code; and more.
Orion Instruments
1376 Borregas Ave.
Sunnyvale, CA 94089-1004
408-747-0686
http://www.oritools.com

Rogue Wave has announced a library called Rogue Wave Tools for MFC, which
includes data structures that enhance the Microsoft Foundation Class library.
The toolkit includes a sorted binary-tree collection, as well as string
classes that support regular expressions. Additionally, the kit provides
advanced data and time functions and a timer class. Rogue Wave Tools for MFC
sells for $150.00 (including source code).
Rogue Wave Software
260 SW Madison
Corvallis, OR 97339
503-754-3010
Visual Systems has announced GUI-Kit 1.1, a 32-bit, cross-platform GUI C/C++
toolkit for Win32s and NT development. Release 1.1 provides additional
compiler support (Visual C++, Borland C++, Watcom C++, and Symantec C++),
hypertext-based online programmer's reference, and improved performance.
GUI-Kit supports 32-bit, cross-platform development under Windows 3.x (Win32s)
and Windows NT. The toolkit costs $495.00, and there are no run-time fees. 
Visual Systems 
2512 Crosstown Blvd., NE
Ham Lake, MN 55304
612-434-6382
C-Fog is an application for obfuscating C programs for developers who need to
distribute source code to multiple platforms. C-Fog uses the C preprocessor to
incorporate the contents of include files into the source file and execute all
preprocessor directives. It also replaces all identifier names (including
function names) with a generic style of name. Furthermore, C-Fog removes the
physical structure of the program, removes comments, and "octalizes" strings.
C-Fog sells for $24.00. A shareware version of C-Fog is available at the
vendor's WWW home page.
Jayar Systems
253 College Street, Suite 263
Toronto, ON
Canada M5T 1R5
416-751-5678
http://www.io.org/~jrsys
SftTree 1.0 from Softel vdm is a custom tree control for Windows. SftTree
supports in-place editing of tree-item data using any Windows control. SftTree
offers hierarchical data display (multiple data columns, multiple text lines
per item, built-in column heads, and so on), 3-D item display, drag-and-drop,
and more. The package sells for $249.00 (single user) and is royalty free. 
Softel vdm
11 Michigan Ave.
Wharton, NJ 07885
201-366-9618
Syware has released its Dr. DeeBee ODBC Driver Kit, which enables legacy and
proprietary databases to be used with database tools such as Microsoft Access,
Visual Basic, PowerBuilder, and Crystal Reports. The ODBC kit contains source
code for a fully functional ODBC driver that provides an SQL parser,
semanticizer, query processor, and all ODBC housekeeping functions. The
royalty-free ODBC Driver Kit features ODBC 2.0 conformance and character and
numeric data types. The Dr. DeeBee ODBC Driver Kit sells for $1800.00; a dBase
ODBC driver created from this kit for Windows 3.1/NT/95 is available at no
charge. 
Syware
P.O. Box 91 Kendall
Cambridge, MA 02142
617-497-1376




















EDITORIAL


Future Tense


Sometimes you just can't help but worry about the future. One day you spot a
gray hair on your head, the next day an AARP membership application arrives in
the mail. One thing it doesn't do any good to worry about, however, is Social
Security. If you believe what financial wizards like Newt Gingrich say, Social
Security will have dried up and blown away by the time most of us are ready
for it. Between one Senator's pork barrel and another's legal-defense fund,
Congress has surreptitiously siphoned Social Security for all it's worth.
Still, by cutting costs here and there, the government hopes to squeeze a few
more years out of the Social Security kitty. If the recently proposed HR 1698
Mandatory Electronic Funds Transfer Expansion Act of 1995, sponsored by
Congressman Jim Lightfoot (R-Iowa), becomes law, anyone filing for Social
Security (or any government benefits, for that matter) would be required to
use electronic fund transfers to move money from the U.S. Treasury to their
local bank. The upside of this proposal is big--really big--savings in money,
maybe as much as $175 million a year. The Treasury currently issues about 850
million payments a year, 580 million of them to Social Security recipients.
Printing and mailing checks for surface-mail delivery costs from 39 to 43
cents each; doing the same thing electronically only costs about 1.1 cents per
check. The bill would require recipients who currently have a bank account to
start receiving electronic funds early next year. Everyone would have to
accept electronic funds by 2001.
There are indirect benefits from the changeover, as well. One of the biggest
headaches for both the Feds and recipients involves lost or stolen checks. For
every 380 checks the Social Security Administration currently mails out, one
person claims never to receive it. With direct deposits, that number falls to
one out of every 7700 transactions.
Of course, taxpayers won't be the only beneficiaries of government-mandated
electronic fund transfers. Banks see proposals such as Lightfoot's as a
windfall--and there's the rub. Banks aren't exactly user friendly, as
evidenced by the First National Bank of Chicago's recent attempt to extort
$3.00 from its customers for the privilege of talking to a human teller.
Likewise, charges for basic bank services are skyrocketing at an alarming
rate--$1.50 for an ATM account-balance inquiry, for instance. Government
officials are admittedly concerned about banks gouging customers who are
required to establish bank accounts. (With all these digital dollar signs
floating around, it's no wonder Microsoft wanted Intuit in the fold.)
As an aside: In the kind of wisdom we've come to expect from Capitol Hill,
Congress has also come up with another way of saving senior-citizen-related
money--cutting off federal funds to groups that criticize Congressional
efforts, such as the National Council of Senior Citizens. A proposal sponsored
by Congressman David McIntosh (R-Indiana) would prohibit lobbying by any group
that receives federal grants, but not by deep-pocket suppliers who feed
congressional reelection coffers.
Social Security is moving forward electronically in other areas: All (or at
least most) forms required by the Social Security Administration are available
in both .PDF (Adobe Acrobat) or PostScript format at the Administration's Web
site (http://www.ssa.gov). 


Dr. Dobb's Journal in 1996


And speaking of the future, we're happy to present the Dr. Dobb's Journal 1996
editorial calendar. As you can see, the calendar includes both familiar topics
and emerging subjects that deserve examination. Among the topics we'll be
paying particular attention to in the coming year are programming for Windows
95, the PowerPC, and the World Wide Web. We'll also be looking at visual
programming, games development, and intelligent software. In short, from
design to implementation, Dr. Dobb's Journal will be looking at anything--and
everything--that's important to the art and science of computer programming.
If you have an article to share with your fellow programmers, give us a call
or drop a line; phone 415-655-4178 or send e-mail to editors@ddj.com or
76704.50@compuserve.com. DDJ author guidelines are available at both our Web
and ftp sites (http://www.ddj.com and ftp.mv.com in the /pub/ddj directory,
respectively). We'll also be happy to fax, e-mail, or surface mail a copy
directly to you. Remember that DDJ comes out about a month prior to its cover
date (some of you may actually be reading the October issue at the end of
August), so plan accordingly if you are targeting a specific issue.


A Tense Future


After investing the time, trouble, and money to get an advanced university
degree, grads usually expect, at minimum, a good job and a solid future.
Computer science and engineering grads, in particular, have always been in
high demand. Unfortunately, says Stanford University professor William Massey,
this trend is changing.
In a recent $250,000 study (which involved more than 200 universities and 1000
employers), Massey discovered that the U.S. is producing more technical PhDs
than the job market can absorb. In computer science, for example, only 50
percent of new doctorates are finding jobs that require a PhD. This isn't to
say that advanced-degree holders are standing in unemployment lines, but that
the jobs they find (and there seem to be a lot of them) don't require a
doctorate. 


Dr. Dobb's Journal 1996 Editorial Calendar


January Encoding: Encryption, Compression, and Error Correction
February Data Communications and Internet Development
March Little Languages
April Algorithms
May Operating Systems
June Patterns and Software Design
July Graphics Programming
August C/C++ Programming
September User Interfaces
October Object-Oriented Programming
November Client/Server Development
December Portability and Cross-Platform Development
Jonathan Ericksoneditor-in-chief

















LETTERS


GOST Encryption


Dear DDJ,
I really enjoyed Bruce Schneier's article "The GOST Encryption Algorithm"
(DDJ, "Algorithm Alley" January 1995). Indeed, after taking a cryptography
course, I decided to further investigate GOST. In a copy of (the translation
of) the Russian standard, I noticed a slight error in Schneier's description
of the algorithm. Schneier notes that:
...the outputs of the eight S-boxes are recombined into a 32-bit word, and
then the entire word undergoes an 11-bit circular shift left, towards the
higher-order bits. Finally, the result is added modulo 232 to the left half to
become the new right half, and the right half becomes the new left half.
And later on, that:
...DES uses XOR to add the key to the right half and to add the right half to
the left half; GOST uses addition modulo 232 for both these operations.
However, according to the standard, although addition modulo 232 is indeed
used to add the key to the right half, the result of the S-box substitution
rotated left is XORed with the left half, and not added modulo 232.
Please note that the code included with the article is correct. A copy of the
standard is available at gopher://idea.sec.dsi.unimi.it/11/crypt/docs
Riccardo Pucella
McGill University
pucella@cs.mcgill.ca


Creative Concepts


Dear DDJ,
I enjoyed Michael Swaine's "Programming Paradigms" column (DDJ, June 1995)
that discussed Douglas Hofstadter's book Fluid Concepts and Creative
Analogies. I am looking forward to finding enough time to read the book. I
have wondered for decades about the significance of problems which have an
infinite number of logically correct solutions but only one "agreeable"
solution. My favorite is to ask someone to give the next number in the
sequence: 1,2,4,8,16,.... If your subjects give the agreeable solution 32, you
then tell them that their answer is pretty good, but you will help them by
giving them the generator for this sequence so that they may work it out for
themselves.
The generator for this sequence is the "cutting the cake problem." You begin
with a perfectly round cake. You mark off n equispaced points on the
perimeter, then make perfectly straight cuts between every pair of points,
then count the number of pieces of cake.
If your subjects bother to do this, they will get the solution:
Number of points:123456
Number of pieces:12481630
It isn't difficult to prove that 32 is the wrong solution; simple symmetry
requires that the solution be divisible by 6.
W.R. Ayers
Los Altos Hills, California


Ada Corrections


Dear DDJ,
I entered, compiled, and executed the PList program by Paul Pukite published
in the "Letters" section (DDJ, July 1995) and found line 4 in error. It should
read: type Element is access all Item'Class; in the original, all was missing.
I look forward to seeing Ada 95 articles and programs in future issues.
John J. Cupak, Jr.
Nashua, New Hampshire 
jcupak@isd99.sanders.LOCKHEED.COM
DDJ responds: Thanks for the correction, John. For information on Ada 95, be
sure to check out David Moore's article in this issue.


Lightweight Tasks


Dear DDJ,
I would like to make a few comments about Jonathan Finger's article
"Lightweight Tasks in C" (DDJ, May 1995). Although the approach Jonathan used
will probably be sufficient in most cases, I find it to have some major
disadvantages:
The stack is changed by copying the stack of each thread to and from the main
stack. On many platforms this excessive copying is inefficient and will result
in poor performance, especially since lightweight threads are often tightly
coupled and task switching is therefore frequent.
As Jonathan also notes, the implementation is not portable, since it depends
on the stack growing in a particular direction.
Objects cannot be shared between threads if they are residing in a stack. For
instance, if a thread passes a pointer to an object in its own stack to
another thread, the object will not be referenced correctly because the stack
has been moved when the other thread is running.
Example 1 shows an alternative method that solves these problems. The idea is
to let all stacks reside in the main stack area. When a new thread is to be
started, a recursive function is used to wind down the stack and reserve room
for the stack of the currently running thread. The size of the original main
stack must be set large enough to accommodate all of the stacks in the
program. Example 1(a) shows a program where the main thread starts another
thread using this method. The program then performs a few taskswitches and
terminates. Example 1(b) is the output of the program.
This scheme may be elaborated into a complete module for starting and stopping
threads with stacks of user-definable sizes. In fact, I have used it to
implement a portable version of a C++-based multitasking kernel.
Stig Kofoed
Herlev, Denmark
stk@craycom.dk

DDJ responds: Watch for Stig's article "Portable Multitasking in C++" in an
upcoming issue of DDJ.


Combinatorial Problems 


Dear DDJ,
In the August 1995 "Algorithm Alley," Peter Pearson reports on Leonard M.
Adleman's article "Molecular Computation of Solutions to Combinatorial
Problems" (Science, November 11, 1994). From my perspective, there appear to
be several holes in Adleman's research, at least as described by Pearson.
For instance, Adleman's approach to the Directed Hamiltonian Path Problem is
described as: "Given a map showing many cities and many one-way roads
connecting cities, find the shortest itinerary that starts at City A, ends at
City Z, and passes through every other city exactly once." The usual
assumption is that the one-way roads are of different lengths. While this is
not always true (consider subway or bus fares as the "length"), without it the
task becomes almost trivial: Any complete path is the same length as any
other. There is no mention of encoding length in the DNA (might "introns" be
used? [Is the giant sea slug the solution to a problem?] If so, Adleman would
have to do the steps in reverse order: First isolate the strings that pass
through all the cities, then use electrophoresis to find the shortest. Another
step would be needed to check that no city was duplicated).
Another difficulty is that, at heart, the procedure is still a search through
randomly generated cases. If all of the DNA fails the testing, it may be that
incomplete stirring is to blame and not the lack of a solution (pun intended).
The large number of molecules simply provides a feasible way of generating an
immense number of cases quickly.
Finally, doesn't the difficulty still favor the code maker over the code
breaker? If the code maker uses one gram of DNA, wouldn't the code breaker
need 10**18 grams?
Maybe the secret of the Universe is 43.
James Pendzick
nsrjwp4@mgic.com
Dear DDJ,
If I correctly understand Peter Pearson's article "Biochemical Techniques Take
on Combinatorial Problems" ("Algorithm Alley," DDJ, August 1995), the
technique might be summarized as "make really a lot of candidate molecules,
and then select the ones that might be solutions." To quote from the article,
"the number of guesses that can be tested...is limited....by the number of DNA
molecules you can handle. Since a gram of DNA might contain 1018 smallish
molecules..." it sounds like a lot, but the Knapsack Problem mentioned has
1030 or so possible solutions, meaning 109 kilograms of DNA for one solution,
given an even distribution. I don't know the density of a DNA solution, but
109 kilograms of water would be a cube 100 meters on a side if I recall the
physical constants approximately. It might not be enough for your pet whale,
but probably too big for the back yard.
Of course, the Knapsack Problem for 100 points is small as such problems go.
How about a Traveling Salesman problem for 1000 points (easily found in
circuit-board assembly)? This is roughly 4e2567; that is, "4" followed by 2500
zeros. This problem can routinely be solved within a few percent of optimality
by the best algorithms on a fairly fast computer in a few seconds.
David X. Callaway
dxc@mitron.com


E-mail Correction


Dear DDJ,
I am a hardware design engineer from Topmax and an avid reader of DDJ. Our
company develops ICE, plotters, printers, and network boards.
I particularly enjoyed the article "68HC05-based System Design," by Willard J.
Dickerson (DDJ, August 1995). However, when I tried to get in touch with
Willard using the e-mail address in his bio, mail returned the message
"Unknown user." Could you please give me Willard's correct e-mail address.
Joel Carvajal
Topmax Philippines Inc.
joel@polgas.tds.pfi.net
DDJ Responds: Thanks for pointing out that we dropped an "l" from the e-mail
address, Joel. Our apologies to you, Willard, and to other readers. Willard's
correct e-mail address is willd@amcu-tx.sps.mot.com.


Writer's Block (WB)




Dear DDJ (DDDJ),


Lately, I see an increase (INC) in the use of nonsensical abbreviations (NAs)
in computer "literature" (CL). I read an issue of Hewlett-Packard (HP)
Professional (a BS sales 'zine): I (I) counted about 43 uses of NAs on two (2)
pages. The writer actually defined an NA without using it subsequently!
For examples, MDD (multidimensional database), non-MDD (nonmultidimensional
database), SI (systems integrator), IT (undefined), VAR (undefined), ISV
(undefined), GUI (graphical/generic? user interface), BPR (business process
reengineering), and SFA (sales force automation).
My question is this: Are these articles not meant to (2 too (2 2 eh ...)) be
read, or are these people just showing off? And if the latter, what is it they
are showing off?
I'm not sure why people do this, but I will suggest a few answers:
1. The era of the White-coated ENIAC Priests (WEPs, or is it WENIACPs? sorry,
English is my fourth (4th) language), is over: Computers (Cs) have become very
common, everybody thinks they know about Cs. WEPs have become ordinary people
(OP). So these OP are trying 2 find a way 2 become WEPs again: by inventing a
New Language Nobody Speaks, Nor Understands (NLNSNU, what do I do with the
comma?).
2. The price of a PKZIP license reached seven (7) figures so the typesetter
(TS) doesn't unzip the text files anymore he got e-mailed from the writers.
3. The OP-writer wants 2 hide the fact that she doesn't really understand the
subject either behind Self-Made Mumbo-Jumbo (SMMJ (= NLNSNU)).
4. Paper = gold.
5. OPs are lazy and as word processors (WP) are 2 complex 2 fully use, OPs
still can't figure out how 2 build a macro 2 insert a Frequently Used Phrase
(FUP).
6. Most OP-writers stole their WP and can't call the helpdesk (HD) 2 figure
out how 2 build a FUP-macro.
7. We are not supposed 2 read the articles, just 2 stand in awe and swallow
the SOB's conclusion.
8. These same writers have written a Dictionary of Computer Terms and
Abbreviations and are generating their own market.
DDDJ, may I make some suggestions for (4) OP writers? 
1. Just keep using NAs 4 FUPs, but, when ready, use the FaR function of your
WP 2 expand all FUPs.
2. Do not use PD PKZIP any more.
3. If you steal a WP, steal a good book about the WP 2, 2 learn macros. Steal
the serial number 2. 4 AT&Ting the HD.
4. if (WEP && write(CL)) { get_a_real_job(time(NULL)); };
Erik Terwiel (ET)

Utrecht, Holland (NL) 
E.H.Terwiel@inter.NL.net


Cover Credits


Dear DDJ,
We really enjoyed seeing a photo of our game Blood on the cover of Dr. Dobb's
Sourcebook of Games Programming (May/June 1995). As you can imagine, Kevin
Kilstrom, our senior artist, was particularly thrilled.
Readers might be interested in details about Blood (which should be published
by Apogee's 3D Realms Division and FormGen later this year). Blood is a 3-D
action/horror game using rendering technology similar to DOOM's, but with
overlapping map areas, bridges, sliding and rotating walls and floors,
translucencies, mirrors, and numerous other effects. The commercial version of
the game will likely contain over 15 MBs of 256-color artwork. Blood will also
have a top-end resolution of 1600x1200 for those happy Pentium owners.
Nick Newhard 
Q Studios 
Redmond, WA 98052
nnewhard@qstudios.com
Example 1: Lightweight tasks.
(a)
#include <setjmp.h>
#include <stdio.h>
jmp_buf jmp_main, jmp_cor;
void cor_to_main( void )
 {
 if( setjmp( jmp_cor ) == 0 )
 {
 longjmp( jmp_main, 1 );
 }
 }
void main_to_cor( void )
 {
 if( setjmp( jmp_main ) == 0 )
 {
 longjmp( jmp_cor, 1 );
 }
 }
void coroutine( int n )
 {
 int i;
 if( n > 0 )
 {
 coroutine( n - 1 );
 }
 else
 {
 i = 0;
 printf( "coroutine start\n"
 );
 for( ;; )
 {
 cor_to_main();
 printf( "%d. coroutine\n", i++ );
 }
 }
 }
int main( void )
 {
 int i;
 printf( "main start\n" );
 if( setjmp( jmp_main ) == 0 )
 {
 coroutine( 200 );

 }
 else
 {
 for( i = 0; i < 5; i++ )
 {
 printf( "%d. main\n", i );
 main_to_cor();
 }
 printf( "main stop\n" );
 }
 return( 0 );
 }
(b)
main start
coroutine start
0. main
0. coroutine
1. main
1. coroutine
2. main
2. coroutine
3. main
3. coroutine
4. main
4. coroutine
main stop





































Automating Association Implementation in C++


Pointer-based association implementation




David M. Papurt


David, who is chief technologist at Terasoft Technology (Milford, MA), is the
author of Inside the Object Model: The Sensible Use of C++ (SIGS Books, 1995).
David can be contacted on CompuServe at 75310,1621.


Most elements of the object model have parallel C++ language counterparts: The
class mechanism maps to abstract data type, const and reference types enforce
immutability, inheritance implements generalization, virtual functions
correspond to polymorphism, and templates realize metatype. But association, a
fundamental component of the object model, has no direct C++ language
counterpart. Consequently, you have to implement this capability yourself.
In this article, I'll describe and compare several unidirectional and
bidirectional pointer-based alternatives for implementing one-to-one
associations: a direct, handwritten implementation; a modular approach that
exploits inheritance; and a template-based implementation that eliminates
replication of programming steps and corresponds to association implementation
by declaration. 
My examples implement an analysis model of a one-to-one association between
type Inventor (playing the role of creator) and type Contraption (playing the
role of invention); see Figure 1. (All figures in this article follow the
Object Modeling Technique, or OMT, notation described by Rumbaugh et al. in
Object-Oriented Modeling and Design, Prentice-Hall, 1991.) Figure 1 indicates
that exactly one Contraption object participates (or links) with every
Inventor object, and vice versa.


Unidirectional Implementation


The first implementation of the Inventor/Contraption one-to-one association is
unidirectional. Only one class (Inventor) contains a pointer to the other
(Contraption); no reverse pointer is present. Figure 2 shows the design model
for the uni-directional implementation using pointer-design notation. Class
definitions and implementations appear in Listing One.
In the unidirectional implementation, association traversal is convenient in
the forward direction only, from Inventor to Contraption. The benefits of this
implementation include simplicity, minimal storage, and minimal link-update
cost (link creation and termination overhead). The class without a pointer is
decoupled from the class with a pointer, and maintenance of the reverse
pointer is unnecessary.
Backward traversal is complex and inefficient: It requires maintaining and
searching a list all Inventor objects. If backward traversal never occurs, of
course, the Inventor list is unnecessary.


Bidirectional Implementation


In the bidirectional implementation of the Inventor/Contraption one-to-one
association, each class contains a pointer to the other participant, making
traversal convenient and efficient in either direction; see Figure 3. No
artifactual object list must be maintained or searched. Class definitions and
implementations appear in Listing Two.
Despite the necessary data members, the association's functional interface is
poorly designed; simply shielding association-implementation member pointers
will not preserve referential integrity: When an Inventor's pointer addresses
a Contraption, there's no mechanism to ensure the Contraption's pointer
addresses that Inventor and no other. Any benefits of bidirectional
implementation pale compared with the complexity of link management and the
near certainty of incorrect link termination, dangling pointers, and corrupt
programs.
These problems can, however, be minimized by adhering to the following
criteria:
Users should not have direct access to pointer set() functions.
In place of accessible pointer set() functions, the association should have a
link-management interface with a function for creating a link between two
otherwise unconnected objects and functions for terminating existing links.
Users should only be able to modify pointers--in pairs--via link-management
functions.
Functions should be available to test for link existence and to traverse the
link.
A complete implementation should manage the link correctly upon linked-object
construction, destruction, copy construction, and assignment.


Bidirectional Implementation with Link Management


The third implementation of the Inventor/Contraption one-to-one association is
bidirectional with link-management functions. Each class contains a pointer to
the other, but functions guaranteed to maintain referential integrity
manipulate the pointers.
The design model of this managed implementation is identical to that of the
unmanaged implementation in Figure 3; language features not easily expressed
by the graphical notation realize the aforementioned design criteria. Class
definitions and member-function implementations appear in Listing Three with
link-management interface-function prototypes. Pointer set() functions are
private, so they are inaccessible to users, but the friend link-management
functions link() and unlink() manage the pointers and create and terminate
links. Pointer get() functions are public, enabling tests for link existence
and traversal. Constructors initialize objects as unlinked, and destructors
call unlink(), so preserve referential integrity upon object termination.
Example 1 illustrates creating, traversing, and terminating a link.
Link-management function implementation. If, at entry, both argument objects
are unlinked, link() sets pointers; if the argument is linked, the two
unlink() functions reset pointers; see Listing Four. All three link-management
functions are written in terms of Inventor and Contraption member functions
and do not manipulate pointers directly. This convention enhances order and
minimizes the costs of an implementation change.
Copy functions. Generated copy functions violate the inverse-mapping
constraint. Copy functions can be overridden three different ways. The
simplest is to make copy functions private and not write implementations; see
Listing Five. If an object participates in associations, it may be appropriate
to eliminate assignment, pass by value, and return by value.
Example 2 shows that a multiplicity of one prevents copy target and source
from linking to the same object. Consequently, the second approach implements
public copy functions that do not disturb links; see Listing Six. The
functions copy other source-object information--besides the link--to the
target.
In the third method, copy functions copy an entire constellation of linked
objects; for example, copying an Inventor also copies its linked Contraption.
(This strategy incorporates "deep copy," a topic beyond the scope of the
present discussion. See Data Abstraction and Object-Oriented Programming in
C++, by Gorlen, Orlow, and Plexico; John Wiley & Sons, 1990.)
These methods of overriding copy functions for one-to-one associations hold
true for any multiplicity association: No matter what the association
multiplicity, participating classes can prohibit object copy, implement copy
so that links are undisturbed, or copy an entire constellation of linked
objects.
Benefits and costs. A managed, bidirectional implementation maintains
referential integrity by eliminating the possiblity of incorrect pointer
manipulation. Traversal is efficient in both directions. Of course,
bidirectional implementation has higher storage and link-update costs than
unidirectional implementation, but it is more powerful.


Modularization via Inheritance


In general, isolating functionally distinct parts of a program engenders a
modular, well-organized program. In this case, it also prepares for type
parameterization, a more powerful association-implementation technique
described later. 
Because an association is independent of the functionality in its associated
classes, its implementation can usually be separate, as well. I do this via
public inheritance. Normally, the separation is introduced during design:
Derived classes and allied friends implement only the association, and base
classes implement all other functionality.

In Figure 4, the attributes name and title demonstrate placement of
non-association-related functionality in the analysis model. In Figure 5, the
analysis-model characteristics are distributed over base and derived classes,
and the association is implemented bidirectionally with link-management
functions. All non-association-related characteristics (here, name and title)
go in base classes. Pointers and association functionality are immediate
members and friends of derived classes.
Base functionality. Base classes IOther and COther ("Inventor Other" and
"Contraption Other"), which implement non-association-related functionality,
appear in Listing Seven, along with a simple class String containing get() and
set() access functions. These classes implement name() and title() attributes.
Any functionality, including participation in other associations, is possible
in classes IOther and COther.
The objective is to engage IOther and COther in an association without
disturbing them, while preserving access to IOther and COther functionality.
Derived classes. The derived classes Inventor and Contraption appear in
Listing Eight with link-management, interface-function prototypes. Pointer
set() functions are private, so they are inaccessible to users. But friends
link() and unlink() manage the pointers and create and terminate links.
Pointer follow() functions are public, enabling tests for link existence and
traversal. Constructors initialize objects as unlinked, and destructors
unlink() objects at termination, before embedded-base-class-object
deinitialization.
Derived classes store (and member functions set() and follow() take and
return) derived-class pointers, not base-class pointers. Public inheritance
enables the selection of base-class members through derived-class pointers, so
base-class and derived-class functionality is accessible upon link traversal.
For example, Inventor::follow() yields a Contraption *, through which both
COther and Contraption member functions can be called.
Base-class constructor arguments propagate and become derived-class
constructor arguments. Derived-class constructors simply pass the propagated
arguments to base-class constructors in the initialization list.
Link-management-function implementation. Link-management-function
implementations appear in Listing Nine. If, at entry, both argument objects
are unlinked, link() sets pointers; if the argument is linked, the two
unlink() functions reset pointers. All three link-management functions are
written in terms of Inventor and Contraption member functions; they do not
manipulate pointers directly.
Use. Example 3 demonstrates the Inventor/Contraption one-to-one
association-independent implementation. Function print() in Example 3(a) calls
base class IOther and derived class Inventor member functions on derived class
Inventor reference argument I. I.name() executes IOther::name(). I.follow()
executes Inventor::follow(), and yields a Contraption* derived-class pointer.
Base class COther member function title() is called through this pointer.
Function main() in Example 3(b) builds a constellation of objects and calls
print(). The program generates the output in Example 3(c).
The scheme described here can append a one-to-one association onto any pair of
classes, without disturbing existing classes, while maintaining access to
existing functionality. Similar schemes can implement other multiplicity and
typed associations.


Association by Declaration with Templates


An even more modular and reusable method of association implementation
incorporates templates. As in the previous approach, association
implementation is separated from other functionality during design. But,
whereas previously, each association implementation was written by hand, here,
association functionality is written once in cooperating templates and
instantiated later.
This powerful method enables association implementation by declaration. Once
preserved in modules of cooperating templates, an association can be
implemented as simply as other object-model components that have direct
language support, like generalization or polymorphism, for example.
Figure 6 shows analysis-model characteristics distributed over base and
derived classes. All Inventor/Contraption characteristics except the
association go in base classes. Pointers and association functionality are
immediate members and friends of derived classes. Here, Inventor and
Contraption are instances of the Left and Right association templates, as well
as derived classes.
Base functionality. Revised base classes IOther and COther, which implement
non-association-related functionality, appear in Listing Ten. Except for the
required default constructors (discussed later), these classes are identical
to Listing Seven. Again, any functionality is possible in classes IOther and
COther.
The objective is to engage IOther and COther in an association by the two-line
declaration in Example 4(a), without disturbing IOther and COther, while
preserving access to IOther and COther functionality. In the code, Inventor
and Contraption are typedefs of cooperating template instances.
Parameterized derived-class templates. Cooperating class templates L1to1 and
R1to1 (short for left and right one-to-one) and link-management
interface-function template prototypes appear in Listing Eleven. L1to1, R1to1,
link(), and unlink() are parameterized versions of the nontemplates in Listing
Eight. Besides parameterization, the templates work like the nontemplates.
Declarations that bind actual types to formal parameters LB and RB (short for
left and right base class) instantiate the class templates. An actual class
L1to1<LB,RB> inherits from the actual class bound to LB. Similarly,
R1to1<LB,RB> inherits from RB. L1to1<LB,RB> contains a pointer member to
R1to1<LB,RB>, and vice versa. Derived-class pointers enable access to derived-
and base-class functionality.
The class templates constrain the actual types that can bind to formals LB and
RB: LB and RB must be class types and have default constructors and
destructors. Since the class templates are written separately from any base
class eventually used for instantiation, base-class constructor arguments
cannot be propagated into derived-class constructor signatures.
Parameterized link-management-function implementation. Implementations of
link-management interface-function templates appear in Listing Twelve. If, at
entry, both argument objects are unlinked, function template link() sets
pointers; if the argument is linked, the two unlink() function templates reset
pointers.
Use. Any pair of classes bound to formals LB and RB that have default
constructors and destructors can instantiate L1to1 and R1to1. Thus, typedefs
like those in Example 4(a) can create an association between an arbitrary pair
of classes. Since Inventor is an alternate name for actual class
L1to1<IOther,COther>, it inherits from IOther and implements one side of the
Inventor/Contraption association. Similarly, Contraption inherits from COther,
and implements the other side of the association.
The print() function in Example 4(b) executes base class IOther functionality
on derived class Inventor reference argument I. The function tests I.follow()
and traverses the link. The member function title() of the base class COther
is called through the derived class Contraption * pointer returned from
I.follow(). The main() function in Example 4(c) builds a constellation of
objects and calls print(). The program generates the output in Example 4(d).


Conclusion


Template-based associations yield well-designed, declaration-generated
bidirectional implementations. Once a template module is written, the
templates can implement an arbitrary number of associations between arbitrary
classes. The scheme can even append multiple associations onto a single class.
On the downside, the actual types that bind to template formal parameters are
constrained: Any base class to which an association is appended must have a
default constructor, and type-specific base-class constructor arguments cannot
be propagated into derived-class constructor signatures. This, in turn, forces
separate "initialization" after associated-object creation, reducing the
utility of constructors and effectively returning the responsibility of object
initialization to the programmer.
The scheme I've described here can append a one-to-one typed association onto
arbitrary classes. Other multiplicity and style associations can be
implemented similarly. The approach is useful for quickly solving many
programming problems in a variety of contexts.
Figure 1: Analysis model of sample one-to-one association for demonstrating
association implementation.
Figure 2: Design model of Inventor/Contraption one-to-one association
unidirectional implementation.
Figure 3: Design model of Inventor/Contraption one-to-one association
bidirectional implementation.
Figure 4: Analysis model of Inventor/Contraption one-to-one association for
demonstrating independent association implementation.
Figure 5: Design model of Inventor/Contraption one-to-one
association-independent implementation.
Figure 6: Design model of Inventor/Contraption one-to-one association
implementation by declaration.
Example 1: Sample code demonstrating link creation, traversal, and
termination.
int main()
{
 Inventor edison; // unlinked
 Contraption lightbulb; // unlinked
 link( edison, lightbulb ); // create
 Contraption * p = edison.invention();
 // traverse
 . . .
 unlink( edison ); // terminate
 return 0;
}
Example 2: Sample code illustrating that copy target and source cannot link to
the same object.
Inventor edison;
Contraption lightbulb;
link( edison, lightbulb );
Inventor bell = edison;
 // bell and edison cannot both link to
 // lightbulb after copy construction
Inventor salk;

Contraption polioVaccine;
link( salk, polioVaccine );
salk = edison;
 // salk and edison cannot both link to
 // lightbulb after assignment
Example 3: Sample code demonstrating use of inheritance-based association.
(a)
void print( Inventor & I ){ cout <<"Inventor: " << I.name() // Base class
member << '\n' << "Invention: "; if ( I.follow() ) { cout <<
I.follow()->title(); // Base class member function // title() called through
pointer } else { cout <<"None!"; } cout << "\n\n";}

(b)
int main(){ Inventor i( "Aaron" ); Inventor j( "Dave" ); Contraption c(
"Measurement Device" ); link( i, c ); print( i ); print( j ); return 0;}

(c)
Inventor: AaronInvention: Measurement DeviceInventor: DaveInvention: None!
Example 4: Sample code demonstrating use of template-based association.
(a)
typedef L1to1<IOther,COther> Inventor;typedef R1to1<IOther,COther>
Contraption;

(b)
void print( Inventor & I ){ cout <<"Inventor: " << I.name() // Base class
member <<'\n' <<"Invention: "; if ( I.follow() ) { cout <<
I.follow()->title(); // Base class member function // title() called through
pointer } else { cout << "None!"; } cout << "\n\n";}

(c)
int main(){ Inventor i, j; i.name( "Dave" ); j.name( "Aaron" ); Contraption c;
c.title( "Protective Device" ); link( i, c ); print( i ); print( j ); return
0;}

(d)
Inventor: DaveInvention: Protective DeviceInventor: AaronInvention: None!

Listing One
// *****************************************
// Unidirectional association implementation
// *****************************************
class Contraption;
class Inventor
{
 Contraption * c;
public:
 Inventor( Contraption * cr ) : c(cr) {}
 // Forward traversal
 Contraption * invention() { return c; }
 void invention(Contraption * cr) { c=cr; }
};
class Contraption
{
 . . .
public:
 Contraption();
 // Reverse traversal difficult
 // Inventor * creator();
 // void creator( Inventor * );
};

Listing Two
// ****************************************
// Bidirectional association implementation
// ****************************************
class Contraption;
class Inventor
{
 Contraption * c;

public:
 Inventor() : c(0) {}
 // Forward traversal
 Contraption * invention() { return c; }
 void invention(Contraption * cr) { c=cr; }
};
class Contraption
{
 Inventor * i;
public:
 Contraption() : i(0) {}
 // Reverse traversal
 Inventor * creator() { return i; }
 void creator( Inventor * ir ) { i = ir; }
};

Listing Three
// *************************************************************
// Bidirectional association implementation with link management
// *************************************************************
class Contraption;
class Inventor
{
 Contraption * c;
 void invention(Contraption * cr) { c=cr; }
public:
 Inventor() : c(0) {}
 ~Inventor() { unlink(*this); }
 Contraption * invention() { return c; }
 // link management interface
 friend void link( Inventor &, Contraption & );
 friend void unlink( Inventor & );
 friend void unlink( Contraption & );
};
class Contraption
{
 Inventor * i;
 void creator( Inventor * ir ) { i = ir; }
public:
 Contraption() : i(0) {}
 ~Contraption() { unlink(*this); }
 Inventor * creator() { return i; }
 // link management interface
 friend void link( Inventor &, Contraption & );
 friend void unlink( Inventor & );
 friend void unlink( Contraption & );
};

Listing Four
// ****************************************
// Link management function implementations
// ****************************************
void link( Inventor & I, Contraption & C )
{
 if( !I.invention() && !C.creator() )
 {
 I.invention( &C );
 C.creator( &I );
 }

}
void unlink( Inventor & I )
{
 if ( I.invention() )
 {
 I.invention()->creator(0);
 I.invention(0);
 }
}
void unlink( Contraption & C )
{
 if ( C.creator() )
 {
 C.creator()->invention(0);
 C.creator(0);
 }
}

Listing Five
// *****************************
// Prohibition of copy functions
// *****************************
class Inventor
{
private:
 Inventor( const Inventor & );
 Inventor & operator=( const Inventor & );
 // COPY FUNCTIONS NOT IMPLEMENTED!
 . . .
};

Listing Six
// *******************************************
// Copy functions that leave links undisturbed
// *******************************************
Inventor::Inventor( const Inventor & r )
: c(0)
 // Initialize other members
{}
Inventor & Inventor::operator=( const Inventor & r )
{
 if ( this != &r )
 {
 // Do not call link(), or unlink(), or otherwise disturb c or r.c.
 //
 // Assign other members
 }
 return *this;
}

Listing Seven
// *******************************************************************
// Base functionality for inheritance based association implementation
// *******************************************************************
class String
{
 . . .
public:
 String( char * );

 ~String();
 char * get();
 void set( char * );
};
class IOther // Inventor Other functionality
{
 String nm;
 . . .
public:
 IOther(char * p) : nm(p) {}
 ~IOther() {}
 char * name() { return nm.get(); }
 void name(char * p) { nm.set(p); }
 . . .
};
class COther // Contraption Other functionality
{
 String tl;
 . . .
public:
 COther(char * p) : tl(p) {}
 ~COther() {}
 char * title() { return tl.get(); }
 void title(char * p) { tl.set(p); }
 . . .
};

Listing Eight
// ******************************************************
// Derived classes implementing association functionality
// ******************************************************
class Contraption;
class Inventor : public IOther
{
 Contraption * c;
 void set(Contraption * cr) { c = cr; }
public:
 // Base class constructor arguments
 // propagate to derived class constructor
 Inventor(char * p)
 : IOther(p),
 c(0)
 {}
 ~Inventor() { unlink(*this); }
 Contraption * follow() { return c; }
 // link management interface
 friend void link( Inventor &, Contraption & );
 friend void unlink( Inventor & );
 friend void unlink( Contraption & );
};
class Contraption : public COther
{
 Inventor * i;
 void set( Inventor * ir ) { i = ir; }
public:
 Contraption(char * p)
 : COther(p),
 i(0)
 {}

 ~Contraption() { unlink(*this); }
 Inventor * follow() { return i; }
 // link management interface
 friend void link( Inventor &, Contraption & );
 friend void unlink( Inventor & );
 friend void unlink( Contraption & );
};

Listing Nine
// ********************************************
// Link management function implementations for
// inheritance based association implementation
// ********************************************
void link( Inventor & I, Contraption & C )
{
 // link only if both arguments are
 // unlinked at entry
 if( !I.follow() && !C.follow() )
 {
 I.set( &C );
 C.set( &I );
 }
}
void unlink( Inventor & I )
{
 // unlink only if linked
 if ( I.follow() )
 {
 I.follow()->set(0);
 I.set(0);
 }
}
void unlink( Contraption & C )
{
 // unlink only if linked
 if ( C.follow() )
 {
 C.follow()->set(0);
 C.set(0);
 }
}

Listing Ten
// ****************************************************************
// Base functionality for template-based association implementation
// ****************************************************************
class IOther // Inventor Other functionality
{
 String nm;
 . . .
public:
 IOther(char * p = "") : nm(p) {}
 ~IOther() {}
 char * name() { return nm.get(); }
 void name(char * p) { nm.set(p); }
 . . .
};
class COther // Contraption Other functionality
{

 String tl;
 . . .
public:
 COther(char * p = "") : tl(p) {}
 ~COther() {}
 char * title() { return tl.get(); }
 void title(char * p) { tl.set(p); }
 . . .
};

Listing Eleven
// **************************************************************
// Derived class templates implementing association functionality
// **************************************************************
template <class LB, class RB> class L1to1;
template <class LB, class RB> class R1to1;
 // Forward declarations
template <class LB, class RB>
void link(L1to1<LB,RB> & L,R1to1<LB,RB> & R);
template <class LB, class RB>
void unlink( L1to1<LB,RB> & L );
template <class LB, class RB>
void unlink( R1to1<LB,RB> & R );
 // Function template prototypes
template <class LB, class RB>
class L1to1 : public LB
{
 R1to1<LB,RB> * r;
 void set( R1to1<LB,RB> * p ) { r = p; }
public:
 L1to1() : LB(), r(0) {}
 ~L1to1() { unlink(*this); }
 R1to1<LB,RB> * follow() { return r; }
 friend void link( L1to1<LB,RB> &, R1to1<LB,RB> & );
 friend void unlink( L1to1<LB,RB> & );
 friend void unlink( R1to1<LB,RB> & );
};
template <class LB, class RB>
class R1to1 : public RB
{
 L1to1<LB,RB> * l;
 void set( L1to1<LB,RB> * p ) { l = p; }
public:
 R1to1() : RB(), l(0) {}
 ~R1to1() { unlink(*this); }
 L1to1<LB,RB> * follow() { return l; }
 friend void link( L1to1<LB,RB> &, R1to1<LB,RB> & );
 friend void unlink( L1to1<LB,RB> & );
 friend void unlink( R1to1<LB,RB> & );
};

Listing Twelve
// ******************************************************
// Parameterized link management function implementations
// for template based association implementation
// ******************************************************
template <class LB, class RB>
void link(L1to1<LB,RB> & L, R1to1<LB,RB> & R)
{

 // link only if both objects are unlinked at entry
 if( !L.follow() && !R.follow() )
 {
 L.set( &R );
 R.set( &L );
 }
}
template <class LB, class RB>
void unlink( L1to1<LB,RB> & L )
{
 // unlink only if linked at entry
 if ( L.follow() )
 {
 L.follow()->set(0);
 L.set(0);
 }
}
template <class LB, class RB>
void unlink( R1to1<LB,RB> & R )
{
 // unlink only if linked at entry
 if ( R.follow() )
 {
 R.follow()->set(0);
 R.set(0);
 }
}




































Object-Oriented Facilities in Ada 95


Safety plus object-oriented capabilities




David L. Moore


David is a compiler writer of long standing. He was a proselytizer of Pascal,
worked on the Apex Ada development environment at Rational Corp., and is the
author of FTL Modula-2.


After programming in both Ada and C++, I've come to appreciate the power,
safety, and maintainability of Ada programs as well as the advantages of C++
object-oriented paradigm. With the Ada 95 standard, I no longer have to choose
between the two.
C++ programmers may not realize that the new Ada standard provides
object-oriented facilities. In Ada 95, polymorphism is implicit: It is a
result of the way you call functions, not how you declare them. Also, there
are no "classes"--object types now appear as "Tagged Records" (structures),
while subclasses are created using type derivation.
In Example 1, for instance, Point is an old-fashioned, non-object-oriented Ada
record from which a type Size has been derived. Size is identical to Point,
but it is a different type. If you attempted to assign a Point to a Size, you
would get a compilation error. To make the assignment work, you would have to
cast the Point to a Size, like so: s:=Size(p);.
Type derivation is very useful for separating objects that are represented the
same way but are logically different. In most languages, when building tables
in arrays, only pointers can protect against use of a wrong index. In Ada 95,
deriving index types from integer achieves the same result. The new standard
extends this type-derivation concept to provide full object orientation.


Ada's OO Facilities


Example 2(a) defines a type Object with two varieties of a Move routine and a
function that determines the extent (in x and y) of an Object. The functions
are primitive operations of the tagged type because they immediately follow
the declaration. (Actually, the rule is a little looser, but it's good
practice to declare all user-defined primitive operations immediately after
the type declaration.)
As Example 2(b) illustrates, Ada supports named parameters. This is useful for
procedures with default parameters. The last of the calls in Example 2(b) uses
the construct Size'(1,1), the Ada syntax for an aggregate constant. That is,
it creates a constant of type Size with field values of 1.
Example 2(c) shows the derivation of a Line type from the Object type. The
record type Line contains the fields of Object, together with a new field. The
Move and Extent procedures are primitive operations of the Object type because
they are defined immediately after the type declaration. The Move routines
will work just as well for a Line as for a base Object, but the extent of a
Line is different, so you have to override the Extent function.


Class-Wide Objects


Suppose that the implementation of Move needed to determine the extent of an
object. If, inside Move, you call Extent(obj), the extent for Object will be
called. You want to call the Extent belonging to the type of object actually
passed to Move. You do this with a class-wide type; see Example 3(a).
The phrase Object'class demonstrates type conversion. The Object'class type
covers all the types in the derivation tree of which Object is a member. Note
that Line'class and Object'class represent the same type; there is one
class-wide type for any tree of derivations, rather than one for every
subtree.
Whenever a parameter in a call to a primitive operation has a class-wide type,
as in the call to Extent in Example 3(a), the procedure call becomes a
dispatching call and the function associated with the actual type of obj will
be called. Therefore, you could write it like Example 3(b), where o_c is a
local variable of class-wide type. Because it has a class-wide type and is of
indeterminate size until given a value, the call must be initialized at the
time of declaration. Any calls that use o_c will be dispatching. (Notice that
o_c is a stack-based variable, not a pointer to some object on the heap.)
Of course, you can also declare a parameter to a subroutine to be a class-wide
type; in this case, an object of any type in the derivation tree can be passed
as a parameter.


Private Types and Child Packages


All of the fields and subprograms I've declared to this point have been
public. Ada also supports private fields and subprograms. In Ada, a private
object is private to the package in which it is declared, not to the type
itself. to make all the fields of Object and Line private, the previous
declarations should look like Example 4(a), which is a complete specification
package for these types. In addition, a package body would contain
implementations for all the procedures declared in the specification. Line is
derived directly from Object in both the public and the private parts of the
package specification. In fact, you could derive Line from a type derived from
Object in the private part, even though Line was declared as derived directly
from Object in the public part. This means that the actual hierarchy of
derivation, and the internals of the types themselves, can be hidden from the
package's clients.
You could declare additional primitive subprograms immediately following the
completion of the declarations in the private parts. The subprograms would be
private to the package, but primitive on the declared types and so would take
part in dispatching when appropriate.
In Ada, it is an error if there is no implementation in the package body for
anything declared in the package specification: You never have to hunt for a
missing definition when a program fails to link. In some languages, if the
parameter profiles in the declaration and definition of a procedure do not
match, you will also get an undefined external at link time. In Ada, the
problem will be found when you attempt to compile the package body.
However, because private parts of a record are visible only in the package in
which they are declared, package bodies can grow very large. Further, any
derived type that needs access to those private details must be in the same
package as the original declaration. This would also tend to make packages
large and produce large recompilations every time a new type was added or any
derived type was changed.
Ada 95 overcomes this with the concept of a "child package," which is
physically separate from its parent package, but logically part of it.
For example, suppose you want to derive a type for a circle from the Object
type, and you need access to the private part of Object. Without touching the
Graphical_Objects package, you can declare the type as in Example 4(b). This
child package will act as if it were a package nested inside the
Graphical_Objects package. Nested packages are common, but in this case, both
the specification and the body can be in separate files. A change made in the
specification of this child package will not cause recompilation of packages
that use only the parent package.
The child package also provides convenient support for third-party libraries.
A user can extend a supplied package without the entire source.


Constructors and Destructors


The types I've examined so far do not have constructors or destructors. Types
that have the Ada equivalent of constructors and destructors are called
"controlled types."
To declare a controlled type, you derive from a predefined type from the
standard package System.Finalization. Suppose you want to keep all instances
of a type chained together. The code would look like Example 5. For the
moment, examine the public part of the specification that derives a new type
from the type Controlled, which is imported from the predefined package
Finalization. In Ada, all classes that have constructors and destructors are
derived from a type in Finalization that is a child package of the System
package, which contains information about a particular implementation of Ada
and is always implicitly available. 
I've declared three procedures: Initialize, called immediately after an object
is initialized with default values; Duplicate, called immediately after an
object is assigned a new value; and Finalize, called immediately before an
object is destroyed. 
Consequently, if both a and b are chained objects and you assign a to b
(b:=a;), Finalize will first be called for b to finalize the value about to be
overwritten. The value is copied, and Duplicate is called to clean it up. The
procedures are, of course, called implicitly by the compiler.
Remember that in Ada you can specify a record statically, as illustrated in
Example 3 in the last call to move, which passed a constant Size record as a
parameter using the construct Size'(1,1). You could also do this for the
Object and Line types. In Example 6, for instance, you can omit the type mark
(Object') when it is known from the context. In this case, the values would be
assigned to the object, and then Initialize would be called. Obviously,
Duplicate and Initialize will often be the same (as in Example 6), so you can
just rename Duplicate to Initialize. The ability to declare a procedure and
define it as a rename of another was added as part of the new standard.

Alternatively, you can give fields in records default values. This is also
done before Initialize is called. 


Multiple Inheritance


Multiple inheritance is not part of the Ada 95 standard because it is
expensive. It can also cause unexpected surprises with dispatching.
However, some benefits of multiple inheritance can be achieved in other ways.
One example is "mixins." If you want a list of some object, you mix in the
object with a list class to produce a new class that is both a list and the
original class.
Even in C++, this is better done with a template than with multiple
inheritance, because types derived from the original type can also be included
in the list. With a mixin, only objects inherited from the mixin can be
contained in the list. Also, the original type can be held in many different
containers.
Another facility that can model some of the semantics of multiple inheritance
is an "access discriminant." This allows an object that is a field in a record
to access that record. It does not mix the primitive operations of two types
together, but this can be made easier with a suitable generic.


Other Ada Additions


While object orientation is the Ada 95 feature that's likely to be of greatest
interest to non-Ada programmers, the language has also improved in other ways:
Access types. "Access type" is the Ada term for a pointer. Example 5 uses an
access type to reference a variable, as in chn.prev.next. You do not need a
special symbol between the fields (as with C and Pascal), so you can change an
object with a record type into an object containing a pointer to that record
type without editing large amounts of code. 
This "locality of maintenance" principle is important in Ada, especially in
large projects. The goal is that local changes to code should be both easy and
safe. In some cases, this happens automatically. In others, it requires extra
work when the code is originally written; for example, using named parameters
to subroutines, or avoiding "others" clauses in case statements and array
definitions, so that a compilation error will be raised when new arms need to
be added.
An alternative to referencing a variable is to create an access variable for a
procedure. This is an improvement over Ada 83, where to pass a procedure as a
parameter, you created a generic and instantiated it with that procedure as a
parameter. (This was fine for Ada code but afforded no standard way to pass
callback procedures to APIs like X.)
Finally, some changes support object orientation. You can declare access types
to class-wide types (type T is access Object'class). You can also declare
parameters to procedures (type access to Object) and have dispatching occur on
the actual operand. In other words, dispatching will work for procedures that
pass around pointers, as well as procedures that pass the actual objects.
Decimal Types. Ada has always had fixed-point types that allow the expression
of fractional numbers without using floating point. A typical declaration
looked like this: type hundredths is delta 0.01 range -10.0 .. 10.0, which
seems ideal for money calculations.
Unfortunately, the values of this type were represented in binary, essentially
as an integer with an implied binary point seven bits from the right. So the
value 0.01 was represented as 1/128. From here on, it is steadily downhill if
you want results accurate to the penny.
With the new standard, you can declare types that do decimal arithmetic
accurately. Instead of the previous declaration, you would write: type Money
is delta 0.01 digits 8;, which will perform money calculations correctly. It
will hold values up to (but not including) one million dollars. Multiplication
of money values can be either truncated (the default) or rounded using the
form Money'round(a*b), where a and b can be any decimal types.
Such values can also be formatted using pictures similar to those found in
Cobol, making Ada a serious alternative to Cobol for business applications.
Modular Types. Traditional Ada lacked unsigned values that could take on any
value allowed by the size of a word. That is, unsigned types could only be a
subrange of integer values. With Ada 95, full unsigned types are available.
Because their main use is in system programming, they differ from traditional
integral types in that:
They do not overflow--they wrap around. Adding 1 to the largest value will
produce 0, rather than raising an exception.
And, Or, and Xor are defined so that bit operations can be performed. Ada has
always had other facilities to do bit operations. For example, you could map
record structures to bit-exact addresses to directly access hardware-control
registers. Even so, these new types are a useful addition.
Generic formal packages. Ada has always had a powerful generic capability, but
the number of parameters to a generic could grow very long when a generic was
used to extend the operations on a type created by instantiating another
generic. You sometimes ended up passing as a parameter every subroutine
exported from the first generic into the second.
Ada 95 allows a generic package to be passed as a parameter to another
generic. When this second generic is instantiated, the name of an
instantiation of the first generic is supplied for the parameter. 
New tasking constructs. Ada has always had tasking built into the language.
This makes it attractive for applications that need to take advantage of
multithreaded operating systems. The original tasking was rather cumbersome,
but a new "protected type" simplifies control of concurrency in tasking
programs. Although it has its own syntax, it is essentially a package; it has
a specification and a body. 
In a package, any number of procedures and functions can be executed
simultaneously by different tasks. In a protected type, one procedure must
complete before another can be entered, so any task that tries to make such a
call will be queued until the procedure can be entered. Any number of
functions in the protected type can be active at once, but a function cannot
be active while a procedure is active. Hence, protected types allow us to
implement the classic "n readers, 1 writer" protection mechanism.
The second facility is the ability to execute code that is to be aborted,
either after a given interval or when some other event occurs. Example 7, for
instance, spends ten seconds trying to prove Fermat's last theorem, then
aborts and grumbles about the tightness of the margin.


Conclusion


This article covers only the highlights of the new standard. There are also
detailed interfaces to other languages, including C, Fortran, and Cobol. Other
annexes include distributed computing, real time, systems programming, and
numerics for scientific programming.


References


Barnes, J.G.P. Programming in Ada, Fourth Edition. Reading, MA:
Addison-Wesley, 1994.
GNU Ada compiler (GNAT). Available from cs.nyu.edu /pub/gnat.
http://lgl-www.epfl.ch/Ada/.
Information Technology: Programming Language Ada, ISO/IEC 8652:1995.
Netnews conference on Ada.comp.lang.ada.
Example 1: An old-fashioned Ada record (not object oriented).
type Point is record
 x : integer;
 y : integer;
 end record;
type Size is new Point;
Example 2: (a) Defining a type Object; (b) supporting named parameters; (c)
deriving a Line type from the Object type.
(a)
- Base type for objects desplayed on screentype Object is tagged record
pos:Point; end record;procedure Move(Offset:Size;obj:in out Object);procedure
Move(x : integer :=0;y : integer:=0;obj: in out Object);function
Extent(obj:Object) return Size;

(b)
origin : Object;
...Move(1,1,origin);Move(obj=>origin,x=>1,y=>1);Move(obj=>origin,y=>1); - x
defaults to 0Move(Size'(1,1),origin);


(c)
type Line is new Object with record offset:Size; end record;function
Extent(obj:Object) return Size;
Example 3: (a) Class-wide type; (b) an alternative to (a).
(a)
procedure Move (Offset:Size;obj:in out Object) is s : Size :=
Extent(Object'class(obj)); ....

(b)
procedure Move (Offset:Size;obj:in out Object) is o_c: Object'class:=obj; s :
Size := Extent(o_c); ...
Example 4: (a) A specification package; (b) declaring a child package.
(a)
package Graphical_Objects is - declarations of Point and Size omitted type
Object is tagged private; - procedure declarations here type Line is new
Object with private; - Overriding definition of Extent hereprivate type Object
is tagged record pos:Position; end record; type Line is new Object with record
offset : Size; end record;end Graphical_Objects;

(b)
package Graphical_Objects.Conics is type Circle is new Object with
private;private type Circle is new Object with record radius:integer; end
record; end Graphical_Objects.Conics;
Example 5: Declaring a controlled type.
with Finalization;
package Chained_Objects is
 type Chained_Object is new Finalization.Controlled with private;
 procedure Initialize(Object:in out Chained_Object);
 procedure Duplicate (Object:in out Chained_Object);
 procedure Finalize (Object:in out Chained_Object);
private
 type Chain;
 type Chain_Ptr is access Chain;
 type Chain is record
 next,prev:Chained_Ptr;
 end record;
 type Chained_Object is new Finalization.Controlled with record
 The_Chain: aliased Chain;
 end record;
end Chained_Objects;
package body Chained Objects is
 head : Chain_Ptr:=null; - global variables
 tail : Chain_Ptr:=null;
procedure Initialize(Object:in out Chained_Object) is
 chn:chain renames object.the_chain;
begin
 chn.next:=null;
 chn.prev:=tail;
 if tail/=null then
 tail.next:=chn'access;
 else
 head:=chn'access;
 end if;
 tail:=chn'access;
 end Initialize;
procedure Duplicate(Object:in out Chained_Object) renames Initialize;
procedure Finalize(Object:in out Chained_Object) is
 chn:chain renames object.the_chain;
begin
 if chn.prev=null then
 head:=chn.next;
 else
 chn.prev.next:=chn.next;
 end if;
 if chn.next=null then
 tail:=chn.prev;
 else

 chn.next.prev:=chn.prev;
 end if;
end Finalize;
end Chained_Object;
Example 6: Omitting the type mark (Object') when it is known from the context.
O:Object:=(Pos=>(1,1));
L:Line:=(O with offset=>(1,1));
Example 7: Code that will abort unless some other event occurs.
select
 delay 10.0;
 Put_Line("Margin too small");
then abort
 Prove_Fermats_Last_Theorem;
end select;

















































Partial Revelation and Modula-3


Importing only necessary class features 




Steve Freeman


Steve is a Research Scientist at the Rank Xerox Research Centre in Grenoble,
France, working on the implementation of a rule-based coordination language.
He can be reached at freeman@xerox.fr.


Proponents of object-oriented programming claim that it provides at least two
benefits: code reuse and encapsulation. These features, however, often
conflict, as it is difficult to reuse something that is hidden, and most
statically typed, object-oriented languages (those in which the type of
expressions and variables can be determined at compile time) require that
visibility and inheritance information be described early in the class
hierarchy. As experienced software writers know, it is hard to predict from
first principles how a class will eventually be reused, so it is common to
expose more features of a class than necessary to avoid changing its
definition once in use. If you are not extremely careful, this approach
increases the dependencies on the initial implementation and makes the whole
system more brittle. 
Modula-3 lets you control reuse with a degree of flexibility uncommon in
strongly typed languages. The key concept is that an object type may be
divided up into partial types, each of which describes some aspect of the
object. A source file may then import, or reveal, only those aspects of an
object relevant to the task at hand. Thus, the parts of a system that depend
on a particular object feature can be isolated and found automatically when a
change is necessary.
In this article, I'll show how Modula-3's type system, with its partial
revelations, makes code reuse and encapsulation easier compared to the
statically typed, object-oriented languages. I'll then work through an example
showing the Modula-3 type system in use.


So What's the Problem? 


Statically typed, object-oriented languages generally support encapsulation by
distinguishing between public and private features in a type, although there
may also be intermediate states. There is a hierarchy of access in which
privileged code can see the whole structure of an object but external code can
only see the public features; see Figure 1.
The problem with this approach is that it treats all users of each level in
the class the same way, and all-purpose code is always difficult to write. The
class designer must draw a fine line in the class definition between exposing
too little to be useful and exposing too much to be safe. Even escape
mechanisms (such as the friend construct in C++) included in the visible
definition are liable to acquire dependencies. Once a design is achieved, its
details are embedded in every piece of code that uses the type, so a change in
a high-level class must be propagated throughout a system; if the class is
part of, say, a commercial library, then users just have to live with the
designer's decisions. Class internals may not need to be accessed frequently,
but when they are, it is likely to be important, and such prohibitions are
doubly frustrating if the required features are visible, but inaccessible, in
the class definition. 


An Associations Example


I'll illustrate my point using the "many-to-one associations" example in
Object-Oriented Modelling and Design, by James Rumbaugh et al. (Prentice-Hall,
1991), which shows two classes: Item and Group. When an Item is added to a
Group, the Group object is updated to include the new Item and the Item object
is updated so that it refers to the Group. The cross-reference attributes in
both classes, however, should be accessible only to the code that maintains
the association. If the attributes were freely available, they could be
updated separately, thus allowing inconsistencies. 
Most statically typed languages have some mechanism for limited relaxation of
encapsulation. C++ has the friend construct, whereby a class gives other
classes or procedures access to its private fields. Listing One allows the
Group to add and remove methods to update both sides of the association
together when links are added or removed, as in Listing Two.
The disadvantage of this approach is that the details of the implementation,
although protected from inconsistent access, are specified in the class
definition. This increases the cost of changing a class once it is in use and
limits the range of possible extensions. For example, to adapt the library for
multithreading, it might be necessary to add a mutex to the Item class that is
locked while the group_ field is being set. Propagating a change in a
low-level class, such as Item, can require rebuilding large amounts of code.
Similarly, if you want to extend the Group class so that, for example, changes
in membership are logged, the original author must have declared the add and
remove methods as virtual (that is, eligible to be overwritten in a subclass).
This approach constantly requires class designers to decide whether to make a
method virtual (and so more flexible) or not (and so more efficient) before
the class has been put to use.
To defer such decisions, it's common to define an abstract type (which
includes methods but no data fields) that describes what you can do with it,
not how it is implemented; see Listing Two (a). You then write a concrete
subclass that has data fields (that is, state) and implements the methods
defined in the parent abstract type, as in Listing Two (b). It's also common
to provide a function that you call to create new instances of the object.
Users see objects of the abstract parent type, but these are actually
implemented by objects of the concrete child.
This isolates the users of a class from its implementation, but makes reuse by
inheritance difficult, as a subclass of the public AbstractItem will not
inherit from the private ConcreteItem. Class designers may also use
delegation, in which the Item behavior is managed by an abstract object in the
public class to which the method calls are forwarded; see Listing Two (c).
In addition, methods such as AddItem in the (public) abstract classes must be
defined in terms of other abstract classes because the (private) concrete
classes are invisible at this level. This means that the implementations of
these methods must receive parameters as abstract types and cast them down to
their concrete subclass, which either requires an extra consistency check or
provides a potential source of pointer errors.
Other languages are more flexible. Eiffel, for example, provides a redefine
keyword that allows a subclass to override any of its ancestor's public
methods--unlike C++, in which the ancestor class must declare the methods that
may be overridden. Eiffel, however, also requires the ancestor class to
declare which other classes can access its private fields, so our Item class
is defined as Listing Three. Again, specifying exported features in the class
definition makes it more difficult to restructure an installed library or
program. Furthermore, the controls can be overridden by defining a new
subclass of GROUP to gain arbitrary access to the ITEM internals. 
Ada 95 also distinguishes between public and private features of a type and
uses its support for structured libraries to control access. Briefly, an Ada
program unit (or package) has a specification, divided into visible and
private sections, and a body. The logical interface to a package, its abstract
types and procedure declarations, is defined in the visible section of the
specification; this is what clients of the package use. The implementation of
a package is split between the private section of the specification, which
expands the definitions of the data types, and the package body, which
contains the procedure code; the private section is accessible from the body,
but not from clients of the package. An Ada 95 specification of the example
might look like Listing Four. 
Ada 95 supports hierarchies of packages. That is, given a package parent, you
can write a package parent.child to extend it; the visible specification of
the new package will inherit the visible features of the parent. More
importantly, the (private) implementation of a child can see all the private
features of its ancestors, whereas the public interface to a child sees only
the ancestors' visible sections. The Ada 95 approach is a little like a street
of theaters: The public sees only what's on stage, while some of the backstage
crew have access to everything in the building. Each theater (or library),
however, is independent; the crew of one theater is not allowed backstage next
door, but can still buy tickets to see the show. This approach supports the
way software libraries are built (or should be), but suffers from two
limitations: first, the concrete part of a data type is still held in the
interface file, so it may be difficult to predict what will need to be rebuilt
when an implementation changes; second, the package designer must still decide
exactly how it may be reused before it is put into service, so changes to an
installed library may be painful.
While it is possible to separate interface and implementation in most
statically typed OO languages, engineering such a design for long-term reuse
requires skill and experience because too many details must be committed to
too early. In Modula-3, programmers do not have to put all the details of a
class in a single place, yet the benefits of a strong type system are
retained. There are two main techniques: 
Classes can be subclassed when not all of the parent class is visible, so the
private features of a class really are private, not just inaccessible. 
Class definitions can be divided across multiple files, so a program unit
imports only the relevant aspects of a class; this allows fine-grain control
over the visibility of the parts of an object. 


Modula-3 Basics


Modula-3 is a statically typed, object-oriented language with single
inheritance, integrated threads, garbage collection and exception handling,
and separate interface and implementation files. To illustrate basic Modula-3
concepts, I'll first rewrite the Associations example in Listing One. Listing
Five is the interface file Association.i3 (interface files define a namespace
that can contain type definitions and procedure and variable declarations, but
no implementation code). The IMPORT statement makes the contents of another
interface file available; here, I import the type List from the interface List
to define the value returned from the items method of the Group object.
Modula-3 objects are records that may contain variables and methods; here, of
course, the abstract types have no variable fields because these are only
defined in the implementation. 
The phrase type1 <: type2 is an example of partial revelation. It says that
type1 is a subtype of type2 but that this is not the entire definition of
type1; the rest will be revealed in other places, possibly in other files. The
Association interface, for example, defines an Item as a subtype of an
AbstractItem, so it includes a memberOf method, but provides no other
information about its features. The AbstractItem and AbstractGroup names are
not strictly necessary, but are introduced to save retyping the object details
in the implementation file Association.m3, as in Listing Six. 
The phrase REVEAL type1=type2 provides more information about the structure of
type1. The use of "=" shows that this is the final revelation in the
declaration of an object type; there can be multiple partial revelations about
the structure of an object. In this case, an Item object contains a group data
field, and the memberOf method has been overridden by the procedure
ItemMemberOf; in other words, I provide a concrete implementation. The
implementation is entirely hidden from the rest of the library (it could be
replaced by relinking the application with a new implementation module), but
you can still meaningfully subclass from the Item and Group types in the
Association interface. For example, to add logging to the add method of Group,
write the subclass in Listing Seven.
With the addition of our new logging feature, the new LogGroup object inherits
all the hidden behavior of the Association.Group object. This demonstrates
subtyping the parent type when not all is visible. The implementation of
Assocation.Group is entirely hidden behind its interface, so program units
that use the public type do not depend on its private implementation. To
subtype in this way in C++ or Eiffel, for example, you would either have to
write a concrete class that exposes some implementation details or write a
delegation class that includes an abstract object to which method calls are
forwarded. 


Partitioning an Object Definition


Class designers use partial revelation to split the definition of an object
type across multiple files, both interface and implementation. A revelation in
an interface is a mechanism for access control; it makes some feature of an
object accessible wherever the interface is imported. A revelation in an
implementation, on the other hand, is a mechanism for encapsulation; it is an
addition to the object definition that is visible only within that
implementation. 



Beyond Simple Encapsulation: Associations Revisited


Partial revelation really shines when used to support sophisticated access
control to the features of an object type. To demonstrate this, I'll present
an unreasonable extension of the Association objects. We have a hypothetical
new requirement that Associations work with multiple threads, so the objects
must be protected against concurrent changes. We also want to allow a thread
to be notified when another thread has added a new item to a group. This new
requirement, however, says nothing about the implementation of associations,
so we can divide our new objects into three aspects: The first deals with the
data structures for associating items and groups, the second deals with
multithreading issues and handles locking and notification, and the third
deals with class-wide code that is independent of either of these specific
issues. These aspects are independent of each other, so a change of data
structure need not affect the locking scheme, and we can partition our object
types to reflect this.
The first interface (see Listing Eight) provides opaque types; it reveals only
that the objects have identifiable types and the operations the objects
respond to. This interface is for normal clients who just want to use the
types without knowing about their implementation. Next, I define abstract
types for the item and group objects; any implementation must provide code for
the methods of the object types in Listing Nine. This interface is similar to
the original version in Listing Five, except that it now includes the opaque
types ItemPrivate and GroupPrivate; I'll explain the purpose of these types
shortly. Once the library is installed, however, I might want to write the
logging subtype from Listing Seven; I can do this with the interfaces provided
so far. If I import AssocClass, I know that any group object includes an
addItem method that I can override. I can subclass knowing very little about
the how the types are implemented and pass the new type to the procedures in
Association. Note that, so far, the multithreading features of the objects are
completely invisible.
The next stage is to provide an implementation of items and groups, so I write
an interface AssocImpl as a default version; see Listing Ten (a). The subtypes
declared here are still opaque, so a client knows only that they can be used
with the procedures declared in Associations. The module AssocImpl.m3 in
Listing Ten (b) reveals more detail about these types--we are not defining a
further level of subtyping here, but adding more detail to an existing type.
This module reveals that the items are stored using a standard Set.T type and
provides implementations for the methods for the Item and Group objects. For
example, the AddItem procedure in Listing Ten (c) ensures that the Set field
of the Group object has been initialized and adds the new item to it. I don't
need to call the setGroup method of the item because, as you'll see below,
this will be done elsewhere. 


Exposing Unsafe Features 


More-sophisticated clients know that each call of a procedure in the
Association interface involves acquiring and freeing a lock, which may be
expensive. If they had access to an appropriate level of detail, they could
batch a set of calls within a single lock, rather than locking each time;
thus, the code in Listing Eleven (a) might change to that in Listing Eleven
(b). There's also an explicit call to notify any other waiting threads that
new items have arrived, as this is also normally provided by
Association.AddItem. Clearly, programmers working at this level must be more
careful because they can no longer rely on the implementation to ensure, for
example, that a group is locked before an item is added to it.
Modula-3 distinguishes between safe and unsafe program units; unsafe
interfaces and modules allow the programmer more freedom but provide fewer
guarantees that errors will be caught by compiler analysis of the code. An
unsafe interface may only be imported by unsafe modules and other unsafe
interfaces, but an unsafe module may implement a safe interface--which is how
Modula-3 provides safe access to external (and so uncheckable) libraries.
Modula-3 supports this controlled relaxation of its type safety to help
programmers isolate and identify code that is "dangerous."
I have made the interface UnsafeAssoc unsafe because it allows a programmer to
bypass the thread-safe code provided through the Association interface; this
code is shown in Listing Twelve. UnsafeAssoc reveals that both the item and
group object types are derived from MUTEX, a built-in mutual exclusion type,
so any code that imports this interface can lock item or group objects, as in
Listing Eleven (b). The interface also declares unsafe versions of the
thread-safe procedures that implement class-wide behavior. We know that these
procedures are not concerned with thread safety because item and group locking
is done separately. Data-structure management is defined in the AssocImpl
module, so the unsafe procedures have common behavior across all
implementations of Associations. I also use this interface to declare a
procedure that supports the notification of the arrival of new items.
Now I can write the module Assocation.m3, which provides the final revelations
about the item and group types and implements the procedures declared in
Association and UnsafeAssoc. Listing Thirteen (a) shows the start of the
module; the EXPORTS clause says that this module implements, and automatically
imports, those interfaces. Note that Association and AssocClass are safe,
whereas UnsafeAssoc is not. The final revelations for the item and group
object types show that there is nothing more to say about item objects, but
that groups need some extra state that will be used for the thread-specific
code. Listing Thirteen (b) shows the implementation of two procedures from the
UnsafeAssoc interface. NotifyNewItem simply sets the last item variable and
wakes up all the threads waiting on the group's condition variable, while
UnsafeAddItem adds the item to the group and sets the group in the item.
Finally, Listing Thirteen (c) shows the code for two of the procedures in the
Association interface; this is where all the code associated with thread
safety belongs. AddItem provides a safe wrapper for the UnsafeAddItem call,
and NewItem blocks its calling thread until a new item has been added to the
group.
The Association example, although contrived, shows how partial revelation can
be used to partition an object type by feature, rather than by the designer's
conception of how it will be used. Clients of the library can import just
those features they need, perhaps to reimplement the default implementation or
to optimize a set of calls. In this case, the object types were divided into
concerns about data structures, thread-safety, and class-wide behavior. I can
change each of these independently, and Modula-3's import system makes
dependencies on existing code easy to find.


Summary


Partial revelation is a powerful technique for providing controlled access to
the features of an object type. A class definition can be split into a set of
distinct features made accessible by importing interfaces. There is no
hierarchy of access imposed at the interface level, so each feature can be
imported independently into a program unit; see Figure 2. The definitive
structure of the type is then finally revealed in the implementation.
The division of an object type can be based on its essential features, rather
than on the designer's expectations about its use and implementation concerns.
Furthermore, importing the interface localizes the dependence on features of
the supertype and makes such dependencies easy to find automatically. When an
interface is intended for use within a library's implementation but not by its
clients, privacy can be achieved by controlling access to the interface file.
Modula-3 reduces the tension between reuse and encapsulation by allowing
programmers to avoid embedding implementation details in the visible
definition of a class. A designer can concentrate on extracting the essential
features of a class, rather than declaring in advance how it relates to other
classes and procedures. Users of a class can select only those features they
need and still reuse them by inheritance; the hidden parts of the class are
still available and can be revealed elsewhere in the program. Partial
revelation avoids the rigidity of a hierarchical access system and provides
inheritance with true encapsulation, as opposed to simple access control,
without the implementation costs and risks of delegation. 
The ability to import only the necessary features of a class means that
dependencies are limited to the code that uses them and are made explicit when
they occur. This reduces code fragility and makes it easier to develop
software-management tools. Furthermore, Modula-3's distinction between safe
and unsafe program units highlights vulnerable code and provides language
support for multiple roles in a software team.
Of course, writing reusable objects is still hard and no language can be a
substitute for good ideas, but the right language can assist in the their
design and implementation. The many libraries available with the Modula-3
distribution provide both models for how to write well-structured software and
a mine of reliable code for reuse. There's an excellent discussion of how this
technique was used for the Modula-3 I/O library in Systems Programming with
Modula-3, edited by Greg Nelson (Prentice-Hall, 1991). "Adding Digital Video
to an Object-Oriented User Interface Toolkit," by Mark Manasse and myself
(Object-Oriented Programming, ECOOP 1994, Springer-Verlag, 1994), describes
how the type system helped during a major modification of Trestle, the
Modula-3 user interface toolkit. Modula-3 is freely available for a wide range
of platforms. Its home page is at
http://www.research.digital.com/SRC/modula-3/html/home.html and is available
via ftp from gatekeeper.dec.com. There is also a newsgroup comp.lang.modula3.
Modula-3's partial revelation, especially when combined with its interface and
module structure, provides a powerful and unusual tool for managing the
trade-off between reuse and encapsulation. It doesn't impose the hierarchical
approach to encapsulation of other statically typed object-oriented languages,
so there's less need either to subvert the type system, or to increase code
dependencies to achieve the flexibility that any substantial library or
application needs. It's an effective programming-language technology that
deserves to be better known and widely used.


Acknowledgments 


Many thanks to Farshad Nayeri for his generous help with this article.
Figure 1: Access control in C++. Access to each level of privacy includes
access to all the less private levels.
Figure 2: The Group type hierarchy as defined in the interfaces; subtypes are
shown above their parent types.

Listing One
(a)
class Item {
public:
 Group* MemberOf() const {return group_; };
private:
 Group* group_;
 friend void Group::AddItem(Item*); // allows methods to access 
 friend void Group::RemoveItem(Item*); // the group_ field
};
class Group {
public:
 void AddItem(Item* item);
 void RemoveItem(Item* item);
 const ItemSet& Items() const {return itemSet_; };
private:
 ItemSet itemSet_; // a collection class to hold a set of items
};

(b)
void Group::AddItem(Item* item) {
 item->group_ = this;

 itemSet_.Add(item);
}

Listing Two 
(a)
class AbstractItem {
public:
 virtual AbstractGroup* MemberOf() const = 0;
};
AbstractItem* CreateItem();
class AbstractGroup {
public:
 virtual void AddItem(AbstractItem* item) = 0;
 virtual void RemoveItem(AbstractItem* item) = 0;
 virtual const ItemSet& Items() const = 0;
};
AbstractGroup* CreateGroup(...);

(b)
class ConcreteItem: public AbstractItem {
public:
 AbstractGroup* MemberOf() const {return group_; };
private:
 AbstractGroup* group_;
 friend void ConcreteGroup::AddItem(AbstractItem*);
 friend void ConcreteGroup::RemoveItem(AbstractItem*);
};
AbstractItem* CreateItem() {
 return (AbstractItem*)new ConcreteItem;
};

(c)
class Item: public AbstractItem {
public:
 Item() { impl_ = CreateItem(); };
 ~Item() { delete impl_; };
 Group* MemberOf() const {return impl_->MemberOf(); };
 AbstractItem* Implementation() const {return impl_;};
private:
 AbstractItem* impl_;
};

Listing Three
class ITEM
export
 member_of -- this method available to everyone
 set_group{GROUP}, clear_group{GROUP} -- methods available to class GROUP
feature
 mygroup: GROUP;
 member_of: GROUP is do Result := mygroup end;
 set_group(g: GROUP) is do mygroup := g end;
 clear_group is forget(mygroup) end;
end -- ITEM

Listing Four
with Items; use Items;
package Association is
 -- visible section
 type Item is private;

 type Item_ptr is access all Item'Class;
 type Group is private;
 type Group_ptr is access all Group'Class;
 procedure AddItem(G: Group_ptr; I Item_ptr);
 procedure RemoveItem(G: Group_ptr; I Item_ptr);
 function Items(G: in Group'Class) return Items.List;
private
 -- private to implementation
 type Item is tagged record
 group: Group_ptr;
 end record;
 type Group is tagged record
 Items : Items.Set;
 end record;
end Association;

Listing Five
INTERFACE Association;
IMPORT List;
TYPE
 Item <: AbstractItem;
 AbstractItem = OBJECT
 METHODS
 memberOf(): Group;
 END;
 Group <: AbstractGroup;
 AbstractGroup = OBJECT
 METHODS
 addItem(item: Item);
 removeItem(item: Item);
 items(): List.T;
 END;
END Association.

Listing Six
MODULE Association;
IMPORT Set;
REVEAL Item = AbstractItem BRANDED OBJECT
 group: Group;
OVERRIDES
 memberOf := ItemMemberOf;
END;
PROCEDURE ItemMemberOf(self: Item): Group =
 BEGIN RETURN self.group; END ItemMemberOf;
REVEAL Group = AbstractGroup BRANDED OBJECT
 itemSet: Set.T;
OVERRIDES
 addItem := GroupAddItem;
 removeItem := GroupRemoveItem;
 items := GroupItems;
END;
PROCEDURE GroupAddItem(self: Group; item: Item) =
 BEGIN self.itemSet.add(item); END;
PROCEDURE GroupRemoveItem(self: Group; item: Item) =
 BEGIN self.itemSet.remove(item); END;
PROCEDURE GroupItems(self: Group): Set.T =
 BEGIN RETURN self.itemSet.makeList(); END;
END Association.


Listing Seven
IMPORT Association;
TYPE LogGroup = Association.Group OBJECT
 OVERRIDES
 addItem := LogGroupAddItem;
 END;
PROCEDURE LogGroupAddItem(self: LogGroup; item: Assocation.Item) =
 BEGIN (* log the addition, then directly call the parent method *)
 LogAddition(self, item);
 Association.Group.addItem(self, item);
 END;

Listing Eight
INTERFACE Assocation;
IMPORT List;
TYPE Item <: ROOT; (* ROOT is the ancestor of all garbage-collected objects *)
PROCEDURE MemberOf (i: Item): Group;
TYPE Group <: ROOT;
PROCEDURE AddItem (g: Group; i: Item);
PROCEDURE RemoveItem (g: Group; i: Item);
PROCEDURE ItemList (g: Group): List.T;
PROCEDURE NewItem (g: Group): Item; (* blocks until a new item arrives, 
 then returns most recent addition *)
END Assocation.

Listing Nine
INTERFACE AssocClass;
IMPORT Association, List;
TYPE ItemPrivate <: ROOT;
REVEAL Association.Item = ItemPrivate BRANDED OBJECT
METHODS
 memberOf(): Association.Group;
 setGroup(g: Association.Group);
END;
TYPE GroupPrivate <: ROOT;
REVEAL Association.Group = GroupPrivate BRANDED OBJECT
METHODS
 addItem(i: Association.Item);
 removeItem(i: Association.Item);
 itemList(): List.T;
END;
END AssocClass.

Listing Ten
(a)
INTERFACE AssocImpl;
IMPORT Association;
TYPE Item <: Association.Item;
TYPE Group <: Assocation.Group;
END AssocImpl.

(b)
MODULE AssocImpl;
IMPORT Association, AssocClass, List, Set;
REVEAL Item = Association.Item BRANDED OBJECT
 group: Assocation.Group := NIL;
OVERRIDES
 memberOf := MemberOf;
 setGroup := SetGroup;

END;
REVEAL Group = Association.Group BRANDED OBJECT
 set: Set.T := NIL;
OVERRIDES
 addItem := AddItem;
 removeItem := RemoveItem;
 itemList := ItemList;
END;

(c)
PROCEDURE AddItem (self: Group; item: Association.Item) =
BEGIN
 IF self.set = NIL THEN self.set := NEW(Set.T); END;
 self.set.add(item);
END AddItem;

Listing Eleven
(a)
Association.AddItem(group, item1);
Association.AddItem(group, item2);
Association.AddItem(group, item3);

(b)
LOCK group DO
 UnsafeAddItem(group, item1);
 UnsafeAddItem(group, item2);
 UnsafeAddItem(group, item3);
 NotifyNewItem(group, item3);
END;

Listing Twelve
UNSAFE INTERFACE UnsafeAssoc;
IMPORT Association;
REVEAL Association.Item <: MUTEX;
PROCEDURE UnsafeMemberOf (i: Association.Item): Association.Group;
PROCEDURE UnsafeSetGroup (i: Assocation.Item; g: Assocation.Group);
REVEAL Association.Group <: MUTEX;
PROCEDURE UnsafeAddItem (g: Association.Group; i: Association.Item);
PROCEDURE UnsafeRemoveItem (g: Assocation.Group; i: Association.Item);
PROCEDURE UnsafeItemList (g: Association.Group): List.T;
PROCEDURE NotifyNewItem(g: Association.Group; i: Association.Item);
END UnsafeAssoc.

Listing Thirteen
(a)
UNSAFE MODULE Association EXPORTS Association, AssocClass, UnsafeAssoc;
IMPORT Thread;
(* these two revelations are extensions of those in UnsafeAssoc, so
 they start with a mutex *)
REVEAL ItemPrivate = MUTEX BRANDED OBJECT END; (* no further features *)
REVEAL GroupPrivate = MUTEX BRANDED OBJECT
 cond: Thread.Condition := NIL; (* for threads waiting for new items *)
 lastAddition: Item := NIL; (* the last item added *)
END;

(b)
PROCEDURE UnsafeAddItem(group: Assocation.Group; item: Association.Item) =
 BEGIN 
 group.addItem(item);

 item.setGroup(group);
 END UnsafeAddItem;
(* etc *) PROCEDURE NotifyNewItem (group: Assocation.Group; item:
Association.Item) =
 BEGIN 
 group.lastAddition := item;
 Thread.Broadcast(group.cond);
 END NotifyNewItem;

(c)
PROCEDURE AddItem (group: Group; item: Item) =
BEGIN
 LOCK g DO
 UnsafeAddItem(group, item);
 NotifyNewItem(group, item);
 END;
END AddItem;
(* etc *)
PROCEDURE NewItem (group: Group): Item =
VAR oldItem: Item;
BEGIN
 LOCK group DO
 oldItem := group.lastAddition;
 (* wait until a new item has been added to the group *)
 WHILE oldItem = group.lastAddition DO Thread.Wait(group.cond, group); END;
 RETURN group.lastAddition;
 END;
END NewItem;
BEGIN (* empty module initialisation block *)
END Association.


































Object-Oriented Programming in S


More than just data analysis




Richard Calaway


Rich can be contacted at rich@amtec.com.


The S language is a high-level, object-oriented system designed for data
analysis and graphics. Originally written by Richard A. Becker, John M.
Chambers, and Allan R. Wilks of AT&T Bell Laboratories' Statistics Research
Department, the S language is useful for a wide range of applications. In
fact, most current S users aren't involved with statistics, and most S
applications focus on basic quantitative computations and graphics.
S is relatively easy to work with. In its simplest form, you type an
expression, and S evaluates it and displays the answer (something like a desk
calculator). However, S can operate with large collections of data at once, so
one expression might produce a graph, fit a line to a set of points, or carry
out similarly complex operations.
A commercial implementation of the S language can be found in the S-Plus data
analysis and statistics software from Mathsoft's StatSci Division (Seattle,
WA). (The source code for S is licensed by Bell Labs, but distributed
exclusively by StatSci.) S is available for systems ranging from Windows-based
PCs to UNIX-based workstations (HP, SGI, Next, Sun, and others).
The S-Plus system consists of the S language, 1200 or so language extensions
that deal with statistical mathematical and analysis functions, and a
development environment. More specifically, the major areas in which S-Plus
extends S are time series, survival analysis, "modern regression" (including
LMS regression and projection-pursuit regression), classical statistical
tests, graphic-device drivers, and dynamic loading. All of the examples
presented in this article are based on the S-Plus implementation.
The real advantage of the object-oriented approach is evident when designing a
large system that will do similar, but not identical, things to a variety of
data objects. By specifying classes of data objects for which identical
effects will occur, you can define a single generic function that embraces the
similarities across object types, but permits individual implementations or
methods for each defined class. For example, if you type a print(object)
expression, you expect S to print the object in a suitable format. If all the
various predefined printing routines were combined into a single function, the
print function would need to be modified every time a new class of objects was
created. With object-oriented programming, however, print is truly generic; it
need not be modified to accommodate new classes of objects. Instead, the
objects carry their own methods with them. Thus, when you create a class of
objects, you can also create a set of methods to specify how those objects
will behave with respect to certain generic operations.
In S, both character vectors and factors are originally created from vectors
of character strings, and when printed, both give essentially the same
information; see Listing One. The distinct look of the printed factor arises
because factors are a distinct class of object, with their own printing
method, the function print.factor. 


Generic Functions 


Generic functions (that is, functions such as print or plot that take an
arbitrary object as argument) in S tend to be extremely simple thanks to the
utility function UseMethod, an internally implemented function that finds the
appropriate method and evaluates a call to it. As shown in Listing Two (a),
the typical generic function consists of a single call to UseMethod. When the
generic function is called, UseMethod determines the class of the argument x,
finds the appropriate method, then constructs and evaluates a call of the form
method (x, ... ), where "..." represents additional arguments to the method.
Although most generic functions have the simple structure in Listing Two (a)
for print and plot, a slightly more complicated definition may be needed. For
example, the assign generic function stores objects in different classes of
databases; therefore, it's important to assign not the class of the assigned
object but the class of the assigned database. The call to UseMethod has a
second argument specifying which assigned argument is to be searched for its
class attribute; see Listing Two (b).
The browser function acts generically when called with an argument, but has a
specific action when called with no arguments (in part because you need an
argument to find a method). This is embodied in its definition, as in Listing
Two (c).


Classes and Methods 


In S, an object's class attribute determines its method. If the class
attribute is missing, the default class is assumed. For example, factors are
of class factor, while vectors, having no class attribute, are of class
default. (Data types that existed before S-Plus 3.0 have no class attribute,
because classes and methods were new with that release. Thus, vectors,
matrices, arrays, lists, and time-series objects are classless.) 
A class attribute is just a character vector of any length. The first element
in the class attribute is the most-specific class of the object. For example,
an ordered factor has class attribute c("ordered", "factor"), and is said to
be class "ordered." (Ordered factors have a specific level ordering.)
Subsequent elements in the class attribute denote classes from which the
specific class inherits.
Methods are named using the convention action.class, where action is the name
of the generic function, and class is the class to which the method is
specific. For example, plot.factor is the plot method for factors, and
is.na.data.frame is the missing-value test method for data frames.
If the most-specific class of an object has no method, S searches the classes
from which the object inherits for the appropriate method. Every class
inherits from class default, so the default method is used if no more-specific
method exists.
Inheritance lets you define a new class using only those features that
distinguish it from classes from which it inherits. To take full advantage of
this, you must define methods incrementally so that a specific method can act
like a pre- or postprocessor to a more general method. For example, a method
for printing ordered factors should be able to draw on an existing method for
printing factors. This is done via NextMethod, which finds the next
most-specific method after the current method and creates and evaluates a call
to it. Like UseMethod, NextMethod is internally implemented.
For instance, Listing Three (a) is the definition of print.ordered. Like all
print methods, print.ordered returns its first argument. Values for all
methods should be compatible. In this case, the call to NextMethod finds the
function print.factor. print.ordered appends the ordered levels to its output.
The specific method for ordered factors is a postprocessor to the method for
factors in general, and most of print.factor is preprocessing for
print.default; see Listing Three (b). 
To build objects of a specific class, you need to define a constructor (or
generator) function. Typically, generator functions have the name of the
object they create--vector, factor, and so on. Listing Three (c) is the
definition of the factor generator function. Here, the generator function
explicitly sets the class attribute. Not all generator functions produce
objects with a nonnull class attribute. For example, numeric generates numeric
vectors, which have no class attribute. You can view the class of any object
with the class function, as in Listing Three (d), or you can modify it by
using class on the left side of an assignment, as in Listing Three (e).
However, modifying the class attribute should not be done lightly: Assigning a
class implies that the object is a compatible-value object of that class.


Public and Private Views of Methods


Object-oriented programming often distinguishes between the public (or
external) view and the private (or internal) view of a class implementation.
The public view is the conceptual view of the object and the functions that
operate on it. Ideally, the casual user should not be concerned with the
private view--the public view should be adequate for most situations.
When developing new methods, you must be clear at all times about which view
you are using, because the private view, unlike the public view, is
implementation dependent. If the implementation of a class changes, examine
methods that use the private view to see if they are still valid. The private
view is generally more efficient, particularly for the most commonly used
methods, but public methods are easier to maintain.


Defining New S Classes 


New classes in S are created by identifying one or more defining attributes
(or, for objects derived from S's list type, defining components) shared by
all members of the class, and then assigning a class attribute to all objects
containing those attributes. The class attribute allows the new class to use
the S generic dispatch mechanism.
As with many programming tasks, the key to successfully defining new classes
is to abstract the identifying features of a given data object, clearly
distinguishing objects within the class from those outside it. For example, a
data object with the attribute dim is necessarily an array. Testing for this
attribute is equivalent to testing for membership in the class.


S-Plus Classes



To illustrate new S-Plus classes, I'll define a class of graphical shapes. In
this model, shapes are specified as a sequence of points. Open shapes, such as
line segments and arcs, are specified by their endpoints. Closed shapes, such
as circles and squares, are specified by starting points and points that
uniquely determine the shape. A circle is specified as a center and a point on
the circle. A square is specified by one corner and a side length, while a
rectangle is specified by two diagonal corners.
The goal of defining these shapes is to create a rudimentary freehand drawing
using a graphics window. For this reason, I'll define the classes so that
objects can be created easily using a sequence of mouse clicks via the locator
function. Listing Four (a) is a generator function for circles. The circle
function lets you express the circle in several natural ways, thanks to the
helper function as.point, defined in Listing Four (b). You can give the center
as either a list containing x,y components, as you might get from the locator
function, or you can give it as an -xy vector. You can give the radius as a
scalar, or give a second point from which the radius can be calculated.
Listing Five (a) shows how to define a simple circle from the S-Plus command
line.
You store the circle as a list for ease of access to individual elements;
however, the default printing for lists seems rather formal for a circle,
where we only need to see a center and radius. Thus, it makes sense to define
a method for use with the print generic function; see Listing Five (b).
Listing Five (c) is a simpler method that provides the same results.
When a method is defined, its arguments should match those of the generic. It
may have extra arguments (hence the "..." built into every generic).
You define the draw function as a generic function; you can draw shapes with
draw, and as long as you define appropriate methods for all classes of shapes,
draw will operate correctly; see Listing Five (d). The call to UseMethod
signals the evaluator that draw is generic. The evaluator should therefore
first look for a specific method based on the class of the object, starting
with the most-specific class and moving up through less-specific classes until
the most-general class is reached. All S-Plus objects share the same general
class, class default. Listing Five (e), for example, is a version of the
method draw.circle. If you call draw with an object of class circle as its
argument, the S evaluator finds the appropriate method and draws a circle on
the current graphics device.


Group Methods


Three groups of S functions, all defined as calls to .Internal, are treated
specially by the methods mechanism: the Ops group, containing standard
operators for arithmetic, comparison, and logic; the Math group, containing
the elementary, vectorized mathematics functions (sin, exp, and so on); and
the Summary group, containing functions (such as max and sum) that take a
vector and return a single summary value. Table 1 lists the functions in each
group.
Rather than writing individual methods for each function in a group, you can
define a single method for the group as a whole. There are 17 functions in the
Ops group (19 if you count both the unary and infix forms of + and -) and 26
in the Math group, so the savings in programming can be significant. Of
course, in writing a group method, you must ensure that it gives the
appropriate answer for all functions in the group. 
Group methods have names of the form group.class. Thus, Math.factor is the
Math group method for objects of class factor, and Summary.data.frame is the
Summary group method for objects of class data.frame. If the method handles
all the functions in the group in the same way, it can be quite simple; see
Listing Six (a). (One caution: The Summary group does not include either mean
or median, both of which are implemented as S-Plus code.)
The economy of the group method is still significant even if a few of the
functions need to be handled separately. As an example of a nontrivial group
method, I'll define a group of operators for the finite field Z7, which
consists of the elements {a7=0,b7=1,c7=2,d7=3,e7=4,f7=5,g7=6}$ (0 to 6). The
usual operations are defined so that any operation on two elements of the set
yields an element of the set; for example, c7*e7=b7=1,d7/f7=c7=2.... Addition,
subtraction, and multiplication are simply the usual arithmetic operations
performed modulo 7, but division requires extra work to determine each
element's multiplicative inverse. Also, elements of the field can be
meaningfully combined with integers, but not with other real numbers or
complex numbers. 
You can define a new class, zseven, to represent the finite field Z7. Listing
Six (b) shows the generator function for this. Listing Six (c) shows the value
returned by a typical input vector. You suppress the printing of the class
attribute by defining a method for print, as in Listing Six (d). But the
significant part of the work is to define a group method Ops.zseven that will
behave correctly for all 17 functions in the Ops group. Most of these are
binary operations, so you define your method to have two arguments, e1 and e2,
as in > Ops.zseven <--function(e1,e2){}. While performing calculations, you
want to ignore the class of your operands, so you begin with the assignment e1
<-- unclass(e1). You do not unclass e2 immediately, because the operation may
be one of the unary operators (+, -, and !). You also test that e1 is a value
that makes sense in Z7 arithmetic; see Listing Seven (a). (The object .Generic
is created in the evaluation frame, and contains the name of the function
being called.)
You can now include e2 in your calculations; division must be treated
specially, but everything else passes on to the generic methods incorporated
in S-Plus's internal code; see Listing Seven (b). Finally, ensure that numeric
results are of class zseven, while logical results are passed back unchanged,
as in Listing Seven (c). The complete method looks like Listing Seven (d).
Alternatively, you can ignore the special case of division in the group method
and write an individual method for division, as in Listing Eight.
Individual methods override group methods. In this example, the overhead of
testing makes it simpler to incorporate the special case within the group
method. A working version can be defined as Listing Nine. Listings Ten (a) and
(b) test a few examples of this, and produce the expected answers.


Replacement Methods


Replacement functions typically replace either an element or attribute of
their arguments and appear on the left side of an S assignment arrow. S
interprets the expression f(x) <-- value as x <--"f<--"(x, value), so that the
replacement function to be defined has a name of the form f<--. All
replacement functions act generically: Methods can be written for them. 
In class zseven, you define a replacement to ensure that any new value remains
in the class, that is, that all elements in an object of class zseven are from
the set {0, 1, 2, 3, 4, 5, 6}. The public method in Listing Eleven
accomplishes this; it does not use any special knowledge of the implementation
of the class zseven, just the public view that zseven is simply the integers
mod 7.


References


Becker, R.A., J.M. Chambers, and A.R. Wilks. The New S Language. London, U.K.:
Chapman and Hall, 1988 (the "Blue Book").
Chambers, J.M. and T.J. Hastie. Statistical Models in S. London, U.K.: Chapman
and Hall, 1992 (the "White Book").
Spector, P. An Introduction to S and S-Plus. Belmont, CA: Duxbury Press, 1994.
Table 1: Functions affected by group methods.
Group Functions in Group 
Ops +(unary and infix), - (unary and infix), *, /, !
 (unary not), sign, ^, %%, %/%, <, >, <=, >=, ==, !=, , &.
Math abs, acos, acosh, asin, asinh, atan, atanh,
 ceiling, cos, cosh, cummax, cumsum, cumprod, exp,
 floor, gamma, lgamma, log, log10, round,
 signif, sin, sinh, tan, tanh, trunc
Summary all, any, max, min, prod, range, sum.

Listing One
> xxx <- c("White", "Black", "Gray","Gray", "White","White")
> yyy <- factor(xxx)
> print(xxx)
[1] "White" "Black" "Gray" "Gray" "White" "White"
> print(yyy)
[1] White Black Gray Gray White White

Listing Two
(a)
> plot
function(x, ...)
UseMethod("plot")
> print
function(x, ...)

UseMethod("print")

(b)
> assign
function(x, value, frame, where = NULL)
UseMethod("assign", where)

(c)
> browser
function(object, ...)
if(nargs()) UseMethod("browser") else {
 nframe <- sys.parent()
 msg <- paste(deparse(sys.call(nframe)), collapse = " ")
 if(nchar(msg) > 30)
 msg <- paste(substring(msg, 1, 30), ". . .")
 browser.default(nframe, 
 message = paste("Called from:", msg))
}

Listing Three
(a)
> print.ordered
function(x, ...)
{
 NextMethod("print")
 cat("\n", paste(levels(x), collapse = " < "), "\n")
 invisible(x)
}

(b)
 > print.factor
function(x, quote = F, abbreviate.arg = F, ...)
{
 if(length(xx <- check.factor(x)))
 stop(paste(
 "cannot be interpreted as a factor:\n\t", xx))
 xx <- x
 l <- levels(x)
 class(x) <- NULL
 if(abbreviate.arg)
 l <- abbreviate(l)
 if(any(is.na(x))) {
 l <- c(l, "NA")
 x[is.na(x)] <- length(l)
 }
 else x <- l[x]
 NextMethod("print", quote = quote)
 if(any(is.na(match(l, unique(x))))) {
 cat("Levels:\n")
 print.atomic(l)
 }
 invisible(xx)
}

(c)
> factor
function(x, levels = sort(unique(x)), 
 labels = as.character(levels), exclude = NA)
{

 if(length(exclude) > 0) {
 storage.mode(exclude) <- storage.mode(levels)
 # levels <- complement(levels, exclude)
 levels <- levels[is.na(match(levels,exclude))]
 }
 y <- match(x, levels)
 names(y) <- names(x)
 levels(y) <- if(length(labels) == length(levels))
 labels else if(length(labels) == 1
 )
 paste(labels, seq(along = levels), sep = ""
 )
 else stop(paste("invalid labels argument, length",
 length(labels), "should be", length(
 levels), "or 1"))
 class(y) <- "factor"
 y
}

(d)
> class(kyphosis)
[1] "data.frame"

(e)
> class(myobject) <- "myclass"

Listing Four
(a)
circle <-
function(center, radius, point.on.edge)
{
 center <- as.point(center)
 val <- NULL
 if(length(center$x) == 2) {
 val <- list(center = list(x = center$x[1],
 y = center$y[1]), radius = sqrt(
 diff(center$x)^2 + diff(center$y)^2
 ))
 }
 else if(length(center$x) == 1) {
 if(missing(radius)) {
 point.on.edge <- as.point(point.on.edge)
 }
 else if(is.atomic(radius)) {
 val <- list(center = center, radius = abs(radius))
 }
 else {
 point.on.edge <- as.point(radius)
 }
 if(is.null(val)) {
 val <- list(center = list(x =
 center$x[1], y = center$y[1]), radius = sqrt((
 point.on.edge$x - center$x)^
 2 + (point.on.edge$y - center$y)^2))
 }
 }
 class(val) <- "circle"
 val
}


(b)
as.point <- 
function(p)
{
 if(is.numeric(p) && length(p) == 2)
 list(x = p[1], y = p[2])
 else if(is.list(p) && !is.null(p$x) && !is.null(p$y))
 p
 else if(is.matrix(p))
 list(x = p[, 1], y = p[, 2])
 else stop("Cannot interpret input as point")
}

Listing Five
(a)
> simple.circle <- circle(center = c(0.5, 0.5), radius = 0.25) 
> simple.circle
$center:
$center$:
[1] 0.5
$center$y:
[1] 0.5
$radius:
[1] 0.25
attr(, "class"):
[1] "circle" "closed"

(b)
print.circle <-
function(x, ...)
{
 cat(" Center: x =", x$center$x, "\n", 
 " y =", x$center$y, "\n", 
 "Radius:", x$radius, "\n")
}

(c)
> simple.circle
 Center: x = 0.5
 y = 0.5
 Radius: 0.25

(d)
draw <-
function(x, ...)
UseMethod("draw")

(e)
 draw.circle <-
function(x, ...)
{
 center <- x$center
 radius <- x$radius
 symbols(center, circles = radius, add = T, inches = F, ...)
}

Listing Six
(a)

> Summary.data.frame
function(x, ...)
{
 x <- as.matrix(x)
 if(!is.numeric(x))
 stop("not defined on a data frame with non-numeric variables")
 NextMethod(.Generic)
}

(b)
zseven <-
function(x)
{
 if(any(x %% 1 != 0)) {
 x <- as.integer(x)
 warning("Non-integral values coerced to integer"
 )
 }
 x <- x %% 7
 class(x) <- "zseven"
 x
}

(c)
> zseven(c(5,10,15))
[1] 5 3 1

(d)
print.zseven <-
function(x,...)
{
 x <- unclass(x)
 NextMethod("print")
}

Listing Seven
(a)
# Test that e1 is a whole number
 if(is.complex(e1) any(e1 %% 1 != 0)) stop(
 "Operation not defined for e1") #
# Allow for unary operators
 if(missing(e2)) {
 if(.Generic == "+")
 value <- e1
 else if (.Generic == "-")
 value <- - e1
 else value <- !e1
 }

(b)
 else {
 e2 <- unclass(e2) #
# Test that e2 is a whole number
 if(is.complex(e2) any(e2 %% 1 != 0)) stop(
 "Operation not defined for e2") #
# Treat division as special case
 if(.Generic == "/")
 value <- e1 * inverse(e2, base = 7)
 else value <- NextMethod(.Generic)

 }

(c)
switch(mode(value),
 numeric = zseven(value),
 logical = value)

(d)
Ops.zseven <-
function(e1, e2)
{
 e1 <- unclass(e1) #
# Test that e1 is a whole number
 if(is.complex(e1) any(e1 %% 1 != 0)) stop(
 "Operation not defined for e1") #
# Allow for unary operators
 if(missing(e2)) {
 if(.Generic == "+")
 value <- e1
 else if(.Generic == "-")
 value <- - e1
 else value <- !e1
 }
 else {
 e2 <- unclass(e2) #
# Test that e2 is a whole number
 if(is.complex(e2) any(e2 %% 1 != 0)) stop(
 "Operation not defined for e2") #
# Treat division as special case
 if(.Generic == "/")
 value <- e1 * inverse(e2, base = 7)
 else value <- NextMethod(.Generic)
 }
 switch(mode(value),
 numeric = zseven(value),
 logical = value)
}

Listing Eight
"/.zseven" <- 
function(e1, e2)
{
 e1 <- unclass(e1) 
 e2 <- unclass(e2) #
# Test that e1 is a whole number
 if(is.complex(e1) any(e1 %% 1 != 0)) stop(
 "Operation not defined for e1") #
# Test that e2 is a whole number
 if(is.complex(e2) any(e2 %% 1 != 0)) stop(
 "Operation not defined for e2") #
 zseven(e1 * inverse(e2, base = 7)
}

Listing Nine
inverse <-
function(x, base = 7)
{
 set <- 1:base
 # Find the element e2 of the set such that e2*x=1

 n <- length(x)
 set <- outer(x, set) %% base
 return.val <- numeric(n)
 for(i in 1:n) {
 return.val[i] <- min(match(1, set[i, ]))
 }
 return.val
}

Listing Ten
(a)
> x7 <- zseven(c(3,4,5))
> y7 <- zseven(c(2,5,6))
> x7 * y7
[1] 6 6 2
> x7 / y7
[1] 5 5 2
> x7 + y7
[1] 5 2 4
> x7 - y7
[1] 1 6 6
> x7 == y7
[1] F F F
> x7 >= y7
[1] T F F
> -x7
[1] 4 3 2

(b)
> -x7 + x7
[1] 0 0 0

Listing Eleven
"[<-.zseven" <- 
function(x, ..., value)
{
 if (is.complex(value) value %% 1 != 0) 
 stop("Replacement not meaningful for this value")
 x <- NextMethod("[<-")
 x <- x %% 7
 x
}





















Cobol '97: A Status Report


Cobol gets object oriented




Henry Saade and Ann Wallace


Henry is a senior application-development programmer at IBM. He is the lead
designer and an architect of the mainframe IBM VS Cobol II and COBOL/370, and
is presently a participant in the X3J4 Cobol committee. He can be contacted at
IBM's Software Solutions Division, Santa Teresa Laboratory, San Jose, CA. Ann
is a senior programmer in the COBOL Solutions area of IBM's Software Solutions
Division, Santa Teresa Lab, San Jose, CA. She is an active member of X3J4 and
the chairperson of the international Cobol working group.


If you think that Cobol is a language for days past, consider that, according
to the Datapro Information Services Group, an estimated 150 billion lines of
Cobol source code are at work in mission-critical business applications
worldwide, and programmers add about five billion lines each year. Likewise,
International Data Corp. reports that revenues for Cobol desktop development
are expected to increase to $176.4 million by 1998, up from $86.3 million in
1993. These figures indicate a solid average growth rate of about 15.4 percent
a year. In medium-size and large U.S. companies, 42.7 percent of all
applications-development staffs use Cobol. Thirty-five percent of such
companies report that the language is used for more than two-thirds of their
applications.
While languages such as C++ and Smalltalk garner the lion's share of attention
from the object-oriented community, Cobol has also been making object-oriented
strides. In particular, a proposed revision of the Cobol standard includes
object-oriented extensions. The draft standard is being developed jointly by
the International Organization for Standardization (ISO) and Accredited
Standards Committee X3 (ASC X3), the latter operating under the procedures of
the American National Standards Institute (ANSI). The target date for
completion of the proposed standard is 1997. 
While we will focus in this article on the object-oriented extensions to
Cobol, we will also highlight other features proposed in the draft.


Historically Speaking


Cobol's early acceptance can be traced to the fact that it was the first
stable, portable business language. Since the language was conceived in 1959
by the Conference on Data Systems Languages (CODASYL), committees have
continually refined and improved it, incorporating innovative programming
methods. CODASYL, ANSI, and ISO have regularly published the agreed-upon
standards emerging from these committees. 
With the incorporation of structured programming, the 1985 standard (ANSI
X3.23-1985) introduced major enhancements to Cobol. Structured-programming
concepts were part of a movement in programming methodology toward replacing
unwieldy, multibranching "spaghetti" code with a more tightly controlled flow
of logic. As part of Cobol, it gave users more readable and maintainable
programs. 
Standardization has given Cobol a high degree of reliability and portability.
From the beginning, programmers wanted Cobol to be a robust language that they
could use on any platform or computer. This need expanded as multiplatform
installations became more common in the 1980s. Vendors met the challenge by
making standard-compliant implementations of Cobol available on many platforms
and systems, including mainframe and midrange computers, DOS, UNIX, Windows,
and OS/2. (IBM, for example, uses the same Cobol compiler technology on MVS,
VM, VSE, AIX, and OS/2.) 
Today, the 1985 Cobol standard is widely accepted, and most industry and
government organizations rely on adherence to it. In order to bid on
government jobs, for instance, Cobol implementations must conform to the
Federal Information Processing Standard for Cobol (FIPS 21-4), which is based
on Cobol standards. 


A New Standard


The draft 1997 proposal for Cobol incorporates the basic object-oriented
programming capabilities found in C++ and Smalltalk (see Table 1):
inheritance, which allows objects to inherit data and behaviors from other
objects; polymorphism, which simplifies coding by letting programmers use a
single interface to access objects of different classes; and encapsulation,
which hides the implementation of data and methods from clients (user code),
thereby protecting clients from the effects of implementation change. 
However, object-oriented extensions are just one piece of the new standard. It
also includes a common method of handling exceptions, to facilitate error
discovery; an option for increased portability of arithmetic, which lends
consistency and portability to certain computations: bit-string handling, to
allow manipulation of bits of data; compiler directives, for portable
specification of processing options; automatically expanded tables; dynamic
file allocation; and support for large character sets (for applications that
use data in languages other than English). 
The schedule for the new standard depends, among other factors, on the changes
vendors and users request during the review processes. A draft of the proposed
1997 standard underwent informal public review in spring of 1995, and the
resulting comments are presently being considered for incorporation into the
draft. A formal public review will take place in early 1996. 
Meanwhile, the draft 1997 standard is generating interest among vendors.
Hitachi, IBM, Ryan-McFarland, and Micro Focus have already incorporated
subsets, or partial implementations, of the object-oriented programming
proposals into their Cobol products, and IBM is committed to supporting the
final adopted standard. 


Object-Oriented Extensions


To ease the transition to object orientation, the committees are keeping the
standard as close as possible to Cobol 85. The fundamental elements are
classes, methods, interfaces, inheritance, and a few standard system classes.
A class defines the layout of object instance data and the methods for
accessing the data; see Example 1. The code for a method follows the form of a
program; see Example 2. 
The idea is to add just enough features to make Cobol a rich, object-oriented
model, while allowing Cobol shops and their existing skills to transition
easily into the object-oriented environment.
Cobol provides basic classes with methods for creating and initializing
objects (Base class); and saving, retrieving, and deleting persistent objects
(System-Object class).
Vendors are looking at other emerging standards in order to maximize
object-oriented Cobol's portability and flexibility in a client/server
environment, such as the Object Management Group (OMG) Common Object Request
Broker Architecture (CORBA). IBM's object-oriented Cobol is based on its
System Object Model (SOM), which implements CORBA. IBM's direct-to-SOM
object-oriented Cobol compiler enables programmers to sidestep learning the
SOM Interface Definition Language (IDL). The distributed features of SOM/DSOM
let applications access objects across multiple systems, enabling users to
create client/server applications with object-oriented Cobol.
Listing One is a simple banking application written in IBM Cobol. Listing One
defines an Account with four methods: OpenAccount, Balance, Deposit, and
Withdraw. Account uses the INHERIT keyword to derive from SomObject. Listing
One also creates SavingAccount, a subclass of Account, which shows how to use
the OVERRIDE keyword to override methods inherited from the parent class. In
this case, SavingAccount overrides the Deposit and Withdraw methods.
SavingAccount also introduces one new method, GetInterest. Finally, Listing
One shows how these methods can be invoked from a client program.


Conclusion


Cobol's cautious evolution has allowed it to incorporate inventive
software-development technologies at a pace that protects its integrity and
stability. And with recent object-oriented language extensions and the
availability of visual programming tools, Cobol has become a robust tool for
object-oriented application development. 
For information on obtaining future U.S. review copies of the draft Cobol
standard, contact Don Schricker, Chairman, Technical Committee X3J4 at
das@mfltd.co.uk. For details on future international reviews, contact Ann
Wallace, Convenor, IS0/IEC JTC1/SC22/WG4-Cobol at AnnWallace@vnet.ibm.com. 


Bibliography 



The Desktop 3GL Market: Review and Forecast, 1993-1998. Framingham, MA:
International Data Corp., December 1994.
IBM World Wide Web. URL http://www.torolab.ibm.com/software/ad/ad3gl.html. 
FAQ on Cobol. Micro Focus World Wide Web. URL
http://www.mfltd.co.uk/FAQ/cobol-faq.html.
Internet Usenet Newsgroup: comp.lang.cobol. 
Kain, Brad. "Distributed Architecture is Mission of OMG." Application
Development Trends (August 1994).
McClure, Steve. Object Technology into the Mainstream: The Adoption and
Assimilation of Object Technologies by the MIS Community in the United States.
Framingham, MA: International Data Corp., September 1994. 
Obin, Raymond. Object-Orientation: An Introduction for Cobol Programmers. Palo
Alto, CA: Micro Focus Press, 1993. 
Example 1: Defining a BankAccount class in Cobol. BankAccount inherits from
the standard Cobol system class Base.
CLASS-ID. BankAccount INHERITS Base.
OBJECT.
DATA DIVISION. 
 *> instance data for BankAccount objects
METHOD-ID. Create-BankAccount. 
 *> code for a method to create a *> BankAccount object
END METHOD Create-BankAccount.
END OBJECT.
END CLASS BankAccount.
Example 2: Defining a Cobol method. The INVOKE statement acts like the call
statement in Cobol 85. The name of the method and the parameters on the
procedure division statement define the interface for the method.
METHOD-ID. Create-BankAccount.
DATA DIVISION.
LINKAGE SECTION.
01 New-Account OBJECT REFERENCE BankAccount.
PROCEDURE DIVISION USING New-Account.
INVOKE SELF "new" RETURNING New-Account.
END METHOD Create-BankAccount.
Table 1: Comparing object-oriented language features.
Feature Smalltalk C++ Cobol
Pure object orientation Yes Hybrid Hybrid
Inheritance Single Multiple Multiple
Encapsulation Total Limited Total
Polymorphism Yes Yes Yes
Classes as objects Yes No Yes
Type checking No Yes Yes
Class libraries Excellent Good Limited
Persistent objects Yes No Simple

Listing One
 ACCOUNT CLASS DEFINITION
 IDENTIFICATION DIVISION.
 CLASS-ID. Account INHERITS SomObject.
 ENVIRONMENT DIVISION.
 CONFIGURATION SECTION.
 REPOSITORY. Linkage to
 CLASS SomObject is "SomObject" SOM Interface
 CLASS Customer. Repository
 DATA DIVISION. 
 WORKING-STORAGE SECTION.
 01 AccountNumber PIC 9(8) USAGE BINARY. Instance
 01 CustomerObject USAGE OBJECT REFERENCE Customer. Data
 01 AccountBalance PIC 9(9)V99 VALUE ZERO. or
 01 AccountType PIC X(10). Object Data
 PROCEDURE DIVISION.
 IDENTIFICATION DIVISION. OpenAccount
 METHOD-ID. OpenAccount. 
 DATA DIVISION. Method
 LINKAGE SECTION. Definition
 01 CustObj USAGE IS OBJECT REFERENCE Customer. 
 01 AccNum PIC 9(8) USAGE IS BINARY. 
 PROCEDURE DIVISION USING CustObj AccNum. 

 SET CustomerObject TO CustObj 
 MOVE AccNum TO AccountNumber. 
 END METHOD OpenAccount. 
 IDENTIFICATION DIVISION. Balance
 METHOD-ID. Balance. Account 
 DATA DIVISION. Method
 LINKAGE SECTION. Definition
 01 AccBal PIC 9(9)V99. 
 PROCEDURE DIVISION RETURNING AccBal. 
 MOVE AccountBalance TO AccBal. 
 END METHOD Balance. 
 IDENTIFICATION DIVISION. Deposit
 METHOD-ID. Deposit. Account
 DATA DIVISION. Method
 LINKAGE SECTION. Definition
 01 DepositAmount PIC 9(9)V99. 
 01 NewBalance PIC 9(9)V99. 
 PROCEDURE DIVISION USING DepositAmount RETURNING NewBalance.
 ADD DepositAmount TO AccountBalance 
 MOVE AccountBalance TO NewBalance. 
 END METHOD Deposit.
 IDENTIFICATION DIVISION. Withdraw
 METHOD-ID. Withdraw. Account
 DATA DIVISION. Method
 LINKAGE SECTION. Definition
 01 TransAmount PIC 9(9)V99. 
 01 NewBalance PIC 9(9)V99. 
 PROCEDURE DIVISION USING TransAmount RETURNING NewBalance.
 SUBTRACT TransAmount FROM AccountBalance 
 MOVE AccountBalance TO NewBalance. 
 END METHOD Withdraw.
 END CLASS Account.
 SUBCLASS OF THE ACCOUNT CLASS DEFINITION
 (SAVINGS ACCOUNT)
 IDENTIFICATION DIVISION.
 CLASS-ID. SavingAccount INHERITS Account.
 ENVIRONMENT DIVISION.
 CONFIGURATION SECTION.
 REPOSITORY.
 CLASS Account is "Account".
 DATA DIVISION.
 WORKING-STORAGE SECTION.
 01 WithdrawPerDay PIC S9(8) USAGE BINARY. Instance and
 01 MinBalance PIC 9(9)V99 VALUE 250. Object Data
 01 MinDeposit PIC 9(9)V99 VALUE 100. 
 01 InterestRate PIC 9V99 VALUE 0.05. 
 PROCEDURE DIVISION.
 IDENTIFICATION DIVISION.
 METHOD-ID. Deposit IS METHOD OVERRIDE. Override of the
 DATA DIVISION. Original Deposit
 WORKING-STORAGE SECTION. Method
 01 PenaltyAmount PIC 9(9)V99 VALUE 10.
 LINKAGE SECTION.
 01 DepositAmount PIC 9(9)V99.
 01 NewBalance PIC 9(9)V99.
 PROCEDURE DIVISION USING DepositAmount RETURNING NewBalance.
 IF DepositAmount LESS THAN MinDeposit THEN
* SELF and SUPER refer to the current object.
 INVOKE SELF "Withdraw" USING PenaltyAmount RETURNING NewBalance.

 END METHOD Deposit.
 IDENTIFICATION DIVISION.
 METHOD-ID. Withdraw IS METHOD OVERRIDE. Override of the
 DATA DIVISION. Original Withdraw
 LINKAGE SECTION. Method
 01 TransAmount PIC 9(9)V99.
 01 NewBalance PIC 9(9)V99.
 PROCEDURE DIVISION USING TransAmount RETURNING NewBalance.
 INVOKE SELF "Balance" RETURNING NewBalance.
 IF NewBalance - TransAmount IS GREATER THAN MinBalance THEN
 INVOKE SUPER "Withdraw" USING TransAmount RETURNING NEWBalance.
 ELSE
 DISPLAY "Transaction was not performed for lack of funds".
 END METHOD Withdraw.
 IDENTIFICATION DIVISION.
 METHOD-ID. GetInterest. GetInterest Method
 ENVIRONMENT DIVISION. Definition
 DATA DIVISION.
 WORKING-STORAGE SECTION.
 01 NewBalance PIC 9(9)V99.
 LINKAGE SECTION.
 01 TransAmount PIC 9(9)V99.
 01 InterestAmount PIC 9(9)V99.
 PROCEDURE DIVISION RETURNING InterestAmount.
 INVOKE SELF "Balance" RETURNING NewBalance.
 MULTIPLY NewBalance BY InterestRate GIVING InterestAmount.
 END METHOD GetInterest.
 
 END CLASS SavingAccount.
 CLIENT PROGRAM DEFINITION
This program is using the methods and classes defined above to
open a savings account.
 IDENTIFICATION DIVISION
 PROGRAM-ID. Client.
 ENVIRONMENT DIVISION.
 CONFIGURATION SECTION.
 REPOSITORY.
 CLASS Customer
 CLASS SavingAccount IS "Bank-Saving-Account".
 DATA DIVISION
 WORKING STORAGE SECTION.
 01 CustomerObj USAGE IS OBJECT REFERENCE Customer.
 01 anAccount USAGE OBJECT REFERENCE SavingAccount.
 01 AccountNum Pic 9(8).
 01 DepositAmount Pic 9(9)V99.
 01 NewAccountBalance Pic 9(9)V99.
 PROCEDURE DIVISION.
 INVOKE SavingAccount "somNew" RETURNING anAccount
 INVOKE anAccount "OpenAccount" USING CustomerObj AccountNum
 INVOKE anAccount "Deposit" USING DepositAmount
 RETURNING NewAccount Balance.
 END PROGRAM Client.
 ****************************






































































File-Streaming Classes in C++


Sidestepping cyclic dependencies




Kirit Saelensminde


Kirit works for Motion Graphics Limited in London. He can be contacted at
kgs@mgl.win-uk.net.


Smalltalk provides a mechanism that lets you stream simple objects, then
reload them later. Because Smalltalk has only one class hierarchy that manages
its own internal representation, the top-level class handles the streaming
automatically. Consequently, you can avoid extra coding only if your program
does not have cyclic dependencies--virtually impossible for many real-world
applications.
The Microsoft Foundation Class (MFC) Library (and Borland's OWL, for that
matter) provides a similar mechanism for C++. However, the streaming mechanism
(which can handle cyclic structures) requires that you derive your streamable
classes from a common superclass. After using the MFC streaming system to
manage our object structures for a raytracing engine, we identified a few
problems with the MFC implementation:
It requires the use of the MFC hierarchy. Since we wanted our engine to be
platform independent, we had to wean it from MFC.
You can't use schema numbering to automatically load older object maps from
the stream.
All streamable classes must be derived from a single superclass. For classes
that are used in calculations (in our application matrices and vectors), the
overhead of the extra constructor and destructor code can be very high. 
It can only handle up to 32,000 objects in one file. This is not enough to
save complex 3-D data for rendering.
To address these shortcomings, we decided to implement our own file-streaming
system. Our main goal was the removal of the common superclass. Since the
raytracer needed to be as fast as possible, the extra constructor and
destructor code became unacceptable overhead. A secondary goal was to enable
portability of both the file-streaming code and the binary files it produced,
which limited us to C++ features available across current compilers. This
meant we could not use run-time class identification or rely on any exception
mechanism to return error conditions. Since we also wanted to load binary
files on different architectures, we needed a mechanism to allow for format
and endian-mode translation of the base data types (integers, doubles, and the
like). Listings One and Two enable these features.
Still, the file-streaming class does not achieve all of our goals. For
example, the MFC CMapPtrToPtr class limits the number of objects it can write.
However, since this limitation is internal to our KSaver class, we can update
the KSaver class without changing its interface or protocol. The same can be
said about the CFile dependency; here, the KSaver constructor requires
updating, but the change isn't major.
Figure 1 shows a binary tree built by a sample program called tree.cpp
(available electronically; see "Availability," page 3). This example provides
a basis for discussing issues surrounding file streaming, such as:
Saving the object attribute information as base data types or other classes;
for example, Value::m_Value.
Saving the links between the objects; see, for example, Tree::m_Parent. The
problem with this is that you must be able to deal with NULL links.
Identifying the saved classes. The code that implements streaming generally
will not know what class it is saving; see Tree::File in tree.cpp, which
treats all FormulaElement subclasses in the same way. The problem is that the
Tree::File method will be expected to work with classes (such as Minus) that
have not been defined by the time the compiler reaches it.
To implement our approach to streaming, we use the familiar, object-oriented
technique of having a third class arbitrate links between two other objects.
In this case, the object to be streamed talks to an instance of the KSaver
class, which, in turn, talks to the streaming object (here, an instance of
CFile). KSaver is responsible for the actual format of the data that goes onto
the stream. When saving, you simply save all the data members for the class
and the superclasses and let the KSaver instance sort out what to do with
them.
Only the KSaver class needs to be intelligent. To help KSaver identify what it
is working on, you need two keys--one for the object, the other for its class.
Individual objects will always be at distinct addresses, allowing you to
identify them uniquely. However, to identify the class of the object, you need
metaclasses.


Metaclasses


Metaclasses describe other classes, so for each streamable class, you have an
instance of KMetaClass. You use these instances to provide a unique key value
for each streamed class and to create an instance of the class when loading.
To do this, each KMetaClass instance gets a unique identifier that identifies
it in the stream. Because the class identifiers must persist between different
platforms and invocations of the application, you use the class name. (When
using namespaces or local nested classes, you may have to mangle the names to
keep each class identifier unique.) 
The metaclasses are created by the macros in Listing One, which hide the
process from the users of the streaming system. The way C++ handles typing of
classes makes it messy. It's easy to identify a particular KMetaClass
associated with the class being pulled from the stream (just walk a list
looking for it), but getting a new instance of that class requires a virtual
function whose return type must be known. This means that you have to subclass
KMetaClass for each class that is to be streamable. KMetaClass then becomes an
abstract base class. It is still responsible for generating an image of the
object hierarchy at run time, but object instances must be created by
subclasses.
Take a look at the BASE_CLASS_SAVE and CLASS_SAVE macros. (All the new
metaclasses are local classes.) The base-class macro creates Make, a new
virtual method used to create the actual instance. Each meta-subclass then
overrides this method to return an instance of its associated class. A useful
side effect of this is that you can use the Class method for the object to
identify its class at run time.


The Rest of the Story


A closer look at the CLASS_SAVE macros (Listing One) shows that all the Load
and Save methods are static. Static members are the closest C++ equivalent of
Smalltalk class members; that is, they are associated with a class, not with
instances of a class. Members must be static so they will work correctly with
NULL pointers. It is very unsafe to dereference any pointer that may be NULL
to get at a virtual function, but doing the check in a static member and then
moving on to a separate virtual function is always safe.
When it comes to loading, it is more obvious why the method must be static:
Until you have an object instance, you cannot send it a message. I could have
used an overloaded function with global scope, but I preferred to have the
class stated explicitly.
The main() function in the sample program tree.cpp illustrates saving and
loading a data structure. Once the data structure is created and the file
successfully opened, you only need to use the static Save method associated
with the class you wish to save. Because each Tree instance saves all its data
members, it doesn't matter which one you save--KSaver correctly deals with
instances that have already gone onto the stream. Upon loading, the structure
is created in the same order. This means that if you save tAdd, then the
pointer to the structure when loaded will also be through tAdd. When tree.cpp
is executed, it produces a file (available electronically) that shows the
class name storage, how the unique object identifiers work, and the implicit
nature of the file format.


Using the Streamer


Using the file-streaming class is easier than figuring out the protocol
details. You simply include the correct CLASS_SAVE macro in your class
declaration and use the correct IMPLEMENT_CLASS_SAVE macro in the definition.
The only other thing that you need is the File method. This is declared
automatically in the header file by the CLASS_SAVE macro, so you must supply
an implementation for it. This forces you to send a schema number onto the
stream in case you need to add attributes to the class later.
Note the call to Attach at the beginning of main(). This ties the metaclass
hierarchy together so that the run-time type checking will work. This cannot
be done in the metaclass constructors because it is impossible to determine
the order in which they will execute when stored in different modules.


Conclusion


The most notable improvement to the file-streaming class would be to reduce
the amount of redundant data stored on the stream. For all the base items
(class-name lengths and unique identifiers) you need not write out schema
numbers. The unique identifier that introduces an object instance could also
be implicit.

Currently, the KSaver class does not support translation between the base data
types across platforms. The KSaver::Read and KSaver::Write methods would need
to be changed from the simple macro implementation to a more complex system
that could switch on the incoming schema number. This could even include
automatic type promotion and demotion for different-sized data types between
platforms. Because it uses MFC classes, the file-streaming class is not really
portable. Still, the CFile and CMapPtrToPtr classes are reasonably easy to
rewrite for any platform.
Figure 1: An example binary tree for the formula 4x6+5.

Listing One
/* saver.h -- Handles meta classes and saving.
 Copyright (c) Motion Graphics Ltd, 1994-95. All rights reserved.
 Defines the macros used when creating streamable classes. It is assumed a
 byte is 8 bits, a word is 16 bits and a double word (DWORD) is 32 bits. A 
 float is 4 bytes and a double is 8 bytes. Endian mode and floating point 
 representations are based on those for Intel 80x86 processors.
*/
#define UBYTE unsigned char
#define SBYTE signed char
#define UWORD unsigned int
#define SWORD signed int
#define UDWORD unsigned long
#define SDWORD signed long
#define OP_IO( type, sn ) \
 BOOL Read( type &v ) { \
 if ( Schema() == sn ) {\
 return Read( &v, sizeof( type ) ); \
 } else { \
 TRACE( "KFiler::Read( " #type " ) - " \
 "Bad schema number.\n" ); \
 return FALSE; \
 } \
 } \
 BOOL Write( type v ) { \
 Schema( sn ); \
 return Write( &v, sizeof( type ) ); \
 }
class KFiler {
 public: // Methods.
 // Constructors/destructor.
 KFiler( CFile *file, BOOL save );
 ~KFiler( void );
 // Find status of filer.
 BOOL Saving( void ) { return m_save; };
 // Essential book keeping chores.
 BOOL Schema( SWORD number );
 SWORD Schema( void );
 void *ReadUID( void *loc );
 BOOL WriteUID( void * p );
 // Read/Write values.
 BOOL Write( void __huge *p, UDWORD n );
 BOOL Read( void __huge *p, UDWORD n );
 // Write a null pointer.
 BOOL Null( void );
 // Return the error status for the file.
 BOOL Error( void ) {
 return m_error;
 };
 // Base cases.
 OP_IO( double, -16 );
 OP_IO( UBYTE, -17 );
 OP_IO( SBYTE, -18 );
 OP_IO( UWORD, -19 );
 OP_IO( SWORD, -20 );
 OP_IO( UDWORD, -21 );

 OP_IO( SDWORD, -22 );
 OP_IO( float, -23 );
 
 protected: // Instance variables.
 
 BOOL m_error; // TRUE if there has been an error.
 BOOL m_save; // TRUE if the archive is saving.
 CFile *m_file; // The file that is to be used.
 UDWORD m_uid; // Unique object id.
 CMapPtrToPtr m_map; // The map containing already 
 // written/read objects.
};
#undef OP_IO
class KMetaClass {
 public: // Methods.
 // Constructors/destructor.
 KMetaClass( char *name );
 KMetaClass( char *name, char *super_class );
 virtual ~KMetaClass( void );
 // Run through all classes and attach them together.
 static BOOL Attach( void );
 // Allow it to save/load.
 BOOL Save( KFiler &f );
 BOOL CheckNextStrict( KFiler &f );
 static KMetaClass *LoadNext( KFiler &f );
 // Checks on the object hierarchy.
 BOOL IsSubClass( char *name );
 protected: // Methods.
 // Find a class.
 static KMetaClass *Find( char *name, BOOL search_alias = TRUE );
 // Join classes together.
 void AttachSubClass( KMetaClass *kc );
 void AttachSibling( KMetaClass *kc );
 
 protected: // Instance variables.
 
 char *m_name; // The class name.
 char *m_super_name; // The name of the super class.
 KMetaClass *m_next_class; // The next class.
 KMetaClass *m_super_class; // The super class.
 KMetaClass *m_sub_class; // The first sub class.
 KMetaClass *m_sibling_class; // The next sibling class.
};
class KMetaClassAlias
{
 public: // Constructors.
 KMetaClassAlias( char *class_name, char *alias_name );
 ~KMetaClassAlias( void );
 public: // Instance variables.
 char *m_alias_name; // The name of the alias.
 char *m_class_name; // The name of the class.
 KMetaClassAlias *m_next_alias; // The next alias in the list.
};
#define CLASS_NAME( c ) \
 c::n_##c
#define BASE_CLASS_SAVE( c ) \
 static BOOL Save( KFiler &f, c &Ob ); \
 static BOOL Save( KFiler &f, c *Ob ); \
 static BOOL Load( KFiler &f, c &Ob ); \

 static BOOL Load( KFiler &f, c *&pOb ); \
 static char *n_##c; \
 class KMeta##c : public KMetaClass { public: \
 KMeta##c( void ) : KMetaClass( n_##c ) {}; \
 KMeta##c( char *name, char *super_class ) \
 : KMetaClass( name, super_class ) {}; \
 virtual ~KMeta##c( void ) {}; \
 virtual c *Make( void ); \
 }; \
 static KMeta##c c_##c; \
 virtual BOOL File( KFiler &f ); \
 virtual KMeta##c *Class( void );
#define CLASS_SAVE( c, b, t ) \
 static BOOL Save( KFiler &f, c &Ob ); \
 static BOOL Save( KFiler &f, c *Ob ); \
 static BOOL Load( KFiler &f, c &Ob ); \
 static BOOL Load( KFiler &f, c *&Ob ); \
 static char *n_##c; \
 class KMeta##c : public b::KMeta##b { public: \
 KMeta##c( void ) \
 : KMeta##b( n_##c, CLASS_NAME( b ) ) {}; \
 KMeta##c( char *name, char *super_class ) \
 : KMeta##b( name, super_class ) {}; \
 virtual ~KMeta##c( void ) {}; \
 t *Make( void ); \
 }; \
 static KMeta##c c_##c; \
 BOOL File( KFiler &f ); \
 t::KMeta##t *Class( void );
#define SAVE_LOAD( c ) \
 BOOL __export c::Save( KFiler &f, c &Ob ) { \
 Ob.Class()->Save( f ); \
 if ( f.WriteUID( &Ob ) ) { \
 return Ob.File( f ); \
 } else { \
 TRACE( #c "::Save - Cannot save pointer to instance\n" ); \
 return FALSE; \
 } \
 } \
 BOOL __export c::Save( KFiler &f, c *pOb ) { \
 if ( pOb == NULL ) { \
 return f.Null(); \
 } else { \
 pOb->Class()->Save( f ); \
 if ( f.WriteUID( pOb ) ) return pOb->File( f ); \
 else return TRUE; \
 } \
 } \
 BOOL __export c::Load( KFiler &f, c &Ob ) { \
 if ( Ob.Class()->CheckNextStrict( f ) ) { \
 if ( f.ReadUID( &Ob ) == NULL ) { \
 return Ob.File( f ); \
 } else { \
 TRACE( #c "::Load - Cannot load instance to object " \
 "pointer already loaded\n" ); \
 return FALSE; \
 } \
 } else { \
 TRACE( #c "::Load - Input class not exactly the same.\n" ); \

 return FALSE; \
 } \
 } \
 BOOL __export c::Load( KFiler &f, c *&pOb ) { \
 KMetaClass *k = KMetaClass::LoadNext( f ); \
 void *pp; \
 if ( k != NULL ) { \
 if ( k->IsSubClass( n_##c ) ) { \
 pOb = (c *)((KMeta##c *)k)->Make(); \
 if ( (pp = f.ReadUID( pOb )) == NULL ) { \
 return pOb->File( f ); \
 } else { \
 delete pOb; \
 pOb = (c *)pp; \
 return !f.Error(); \
 } \
 } else { \
 return FALSE; \
 } \
 } else { \
 pOb = NULL; \
 return !f.Error(); \
 } \
 }
#define IMPLEMENT_BASE_CLASS_SAVE( c ) \
 SAVE_LOAD( c ); \
 c * __export c::KMeta##c::Make( void ) { return new c(); }; \
 c::KMeta##c * __export c::Class( void ) { return &c_##c; }; \
 char * __export c::n_##c = #c; \
 c::KMeta##c __export c::c_##c;
#define IMPLEMENT_CLASS_SAVE( c, b, t ) \
 SAVE_LOAD( c ); \
 t * __export c::KMeta##c::Make( void ) { return new c(); }; \
 t::KMeta##t * __export c::Class( void ) { return &c_##c; }; \
 char * __export c::n_##c = #c; \
 c::KMeta##c __export c::c_##c;
#define CLASS_ALIAS( c, a ) \
 extern KMetaClassAlias a_##a; \
 KMetaClassAlias a_##a( CLASS_NAME( c ), #a );

Listing Two
/* saver.cpp -- Handles meta classes and saving.
 Copyright (c) Motion Graphics Ltd, 1994-95. All rights reserved.
 This code has been tested using MS Visual C++ 1.5 with MFC 2.5 under both 
 MS-DOS and Windows 3.1. There are still MFC dependencies in the code: 
 CFile handles basic file i/o; CMapPtrToPtr handles the UID and pointer 
 mapping; ASSERT macro used for debug only error checks; TRY, CATCH, 
 END_CATCH are MFC exception handlers. CException is the only exception 
 class MSVC handles. When not used in a Windows DLL then remove __export 
 references. Assumes that both pointers and the UID type (UDWORD) are the 
 same size (32 bit) for storing in the map.
*/
#include <afx.h> // MFC core and standard components.
#include <afxcoll.h> // MFC collections.
#define __export
#include "saver.h"
__export KFiler::KFiler( CFile *file, BOOL save )
{
 m_error = FALSE;

 m_save = save;
 m_uid = 0L;
 m_file = file;
 
 if ( Saving() ) {
 Schema( 1 );
 } else {
 switch ( Schema() ) {
 case 1:
 // There is no additional information.
 break;
 default:
 m_error = TRUE;
 break;
 }
 }
}
__export KFiler::~KFiler( void )
{
 TRY {
 m_file->Close();
 delete m_file;
 } CATCH( CException, e ) {
 TRACE( "KFiler::~KFiler - File close failed\n" );
 } END_CATCH;
}
BOOL __export KFiler::Schema( SWORD number )
{
 return Write( &number, sizeof( SWORD ) );
}
SWORD __export KFiler::Schema( void )
{
 SWORD sw;
 BOOL s;
 s = Read( &sw, sizeof( SWORD ) );
 if ( !s ) {
 sw = -1;
 TRACE( "KFiler::Schema - returning -1 as failed to read schema"
 " number.\n" );
 }
 return sw;
}
BOOL __export KFiler::Write( void __huge *p, DWORD n )
{
 ASSERT( m_save );
 if ( m_error ) {
 return FALSE;
 } else {
 TRY {
 m_file->WriteHuge( p, n );
 } CATCH( CException, e ) {
 TRACE( "KFiler::Write - Write failed\n" );
 m_error = TRUE;
 return FALSE;
 } END_CATCH;
 return TRUE;
 }
}
BOOL __export KFiler::Read( void __huge *p, DWORD n )

{
 ASSERT( !m_save );
 if ( m_error ) {
 return FALSE;
 } else {
 TRY {
 m_file->ReadHuge( p, n );
 } CATCH( CException, e ) {
 TRACE( "KFiler::Read - Read failed\n" );
 m_error = TRUE;
 return FALSE;
 } END_CATCH;
 return TRUE;
 }
}
BOOL __export KFiler::Null( void )
{
 return Write( (SDWORD)0 );
}
void * __export KFiler::ReadUID( void *loc )
{
 DWORD dw;
 void *p;
 
 Read( dw );
 if ( m_map.Lookup( (void *)dw, p ) ) {
 // Filed in before.
 return p;
 } else {
 // Filed in for the first time.
 m_map.SetAt( (void *)dw, loc );
 return NULL;
 }
}
BOOL __export KFiler::WriteUID( void * p )
{
 void *id;
 
 if ( m_map.Lookup( p, id ) ) {
 // Has been filed out.
 Write( (DWORD)id );
 return FALSE;
 } else {
 // Filed out for the first time.
 m_uid++;
 m_map.SetAt( p, (void *)m_uid );
 Write( m_uid );
 
 return TRUE;
 }
}
/* KMetaClass. */
KMetaClass *k_class_list = NULL;
KMetaClassAlias *k_alias_list = NULL;
#ifdef _DEBUG
 BOOL k_attached = FALSE;
#endif
__export KMetaClass::KMetaClass( char *name )
{

 m_name = name;
 m_super_name = NULL;
 m_next_class = k_class_list;
 m_super_class = NULL;
 m_sub_class = NULL;
 m_sibling_class = NULL;
 k_class_list = this;
}
__export KMetaClass::KMetaClass( char *name, char *super_class )
{
 m_name = name;
 m_super_name = super_class;
 m_next_class = k_class_list;
 m_super_class = NULL;
 m_sub_class = NULL;
 m_sibling_class = NULL;
 k_class_list = this;
}
__export KMetaClass::~KMetaClass( void )
{
}
BOOL __export KMetaClass::Attach( void )
{
 KMetaClass *kc, *sc;
 if ( kc != NULL ) {
 for( kc = k_class_list; kc != NULL; kc = kc->m_next_class ) {
 ASSERT( kc->m_name != NULL );
 ASSERT( kc->m_name != kc->m_super_name );
 ASSERT( strlen( kc->m_name ) > 0 );
 if ( kc->m_super_name != NULL ) {
 sc = Find( kc->m_super_name, FALSE );
 ASSERT( sc != NULL );
 sc->AttachSubClass( kc );
 }
 }
 #ifdef _DEBUG
 k_attached = TRUE;
 #endif
 return TRUE;
 } else {
 return FALSE;
 }
}
BOOL __export KMetaClass::Save( KFiler &f )
{
 SDWORD len;
 ASSERT( k_attached );
 len = strlen( m_name );
 return f.Write( len ) && f.Write( m_name, len + 1 );
}
KMetaClass * __export KMetaClass::LoadNext( KFiler &f )
{
 SDWORD len;
 char *p;
 KMetaClass *d;
 ASSERT( k_attached );
 if ( !f.Read( len ) ) len = 0;
 if ( len != 0 ) {
 p = (char *)malloc( (size_t)len + 1 );

 if ( f.Read( p, len + 1 ) ) {
 d = Find( p );
 free( p );
 return d;
 } else {
 return NULL;
 }
 } else {
 return NULL;
 }
}
BOOL __export KMetaClass::CheckNextStrict( KFiler &f )
{
 ASSERT( k_attached );
 return KMetaClass::LoadNext( f ) == this;
}
BOOL __export KMetaClass::IsSubClass( char *name )
{
 // Relationship is 'receiver IsSubClassOf name'
 if ( name == m_name ) {
 return TRUE;
 } else if ( m_super_class != NULL ) {
 return m_super_class->IsSubClass( name );
 } else {
 TRACE( "KMetaClass::IsSubClass - %s is not a sub-class of"
 " %s.\n", m_name, name );
 return FALSE;
 }
}
KMetaClass * __export KMetaClass::Find( char *name, BOOL search_alias )
{
 KMetaClass *k = k_class_list;
 KMetaClassAlias *a = k_alias_list;
 BOOL f = FALSE;
 //ASSERT( k_attached );
 while ( k != NULL && !f ) {
 if ( k->m_name == name strcmp( k->m_name, name ) == 0 ) {
 f = TRUE;
 } else {
 k = k->m_next_class;
 }
 }
 if ( k == NULL && search_alias ) {
 // Search aliases.
 while ( a != NULL && !f ) {
 if ( strcmp( name, a->m_alias_name ) == 0 ) {
 k = Find( a->m_class_name, FALSE );
 f = TRUE;
 } else {
 a = a->m_next_alias;
 }
 }
 }
 #ifdef _DEBUG
 if ( k == NULL ) {
 if ( search_alias ) {
 TRACE( "KMetaClass::Find - Failed to find class:"
 " %s in class list or in alias list\n", name );
 } else {

 TRACE( "KMetaClass::Find - Failed to find class:"
 "%s in class list\n", name );
 }
 }
 #endif
 return k;
}
void __export KMetaClass::AttachSubClass( KMetaClass *kc )
{
 kc->AttachSibling( m_sub_class );
 kc->m_super_class = this;
 m_sub_class = kc;
}
void __export KMetaClass::AttachSibling( KMetaClass *kc )
{
 m_sibling_class = kc;
}
/* Meta class alias code. */
__export KMetaClassAlias::KMetaClassAlias( char *class_name, char *alias_name
)
{
 m_alias_name = alias_name;
 m_class_name = class_name;
 m_next_alias = k_alias_list;
 k_alias_list = this;
}
__export KMetaClassAlias::~KMetaClassAlias( void )
{
}



































Inside MFC Serialization


Typesafe serialization that's fast and flexible




Jim Beveridge


Jim, a software developer at Turning Point Software, can be contacted at
jimb@turningpoint.com.


Having seen several commercial software packages through complete life cycles,
I am reluctant to rely on "black-box" solutions, which tend to break down as a
package evolves and becomes more complex. When I first saw the serialization
mechanism in the Microsoft Foundation Classes (MFC), I questioned whether it
was robust and flexible enough for a commercial application. I discovered
that, although it has limitations, the serialization mechanism in MFC is
strongly grounded in modern, object-oriented design theory. Furthermore, it is
typesafe and leaves room for your design to evolve.
Using MFC serialization is straightforward. By default, any class derived from
CObject can include a Serialize() member function that takes a CArchive as a
parameter. In this member function, you add your own code to save and load any
data associated with your class.
Data is serialized to and from a CArchive with operator<< and operator>>, much
like iostream. The big difference is that CArchive is strictly a binary data
format. As in iostream, there are default implementations that read and write
fundamental data types such as long and char. The absence of the data type int
facilitates portability between 16- and 32-bit implementations. The default
implementations also handle byte swapping for types that support it. (For more
information on portability between Little- and Big-endian architectures, see
"Endian-Neutral Software," by James R. Gillig, DDJ, October/November 1994.)
MFC's implementation seemed obvious and uninteresting until the day I created
multiple document types in the same application. At that point, I noticed that
whenever I loaded a file, MFC would correctly create the right kind of
document object and call the proper Serialize() member function. This happened
in spite of the fact that I had not written any code to help MFC create these
document types. Or so I thought....


Problems, Problems Everywhere


To create a document or any other kind of object on the fly, MFC needs to
solve three problems:
Problem 1. Arbitrary types must be created as needed, but the new operator can
only create an explicit type, so a form of "virtual constructors" for CObject
is necessary.
Problem 2. Developers need to be able to easily add new classes to be created.
Ideally, this would be done in the class definition and/or implementation.
Problem 3. A mapping scheme is needed to allow a particular type to be created
based on information read from a file. This mapping cannot be hardcoded in MFC
because developers add new types all the time.
As you'll see, MFC solves these problems elegantly with the use of a registry
with automatic type registration and an implementation of virtual constructors
based on these registered types. MFC's run-time type information is a building
block for this architecture.


The Type Registry


To handle run-time type information, MFC creates a registry of classes in the
application that are derived from CObject. This has nothing to do with the OLE
registry, but the concept is similar. The type registry is a linked list of
CRuntimeClass structures, in which each entry describes a CObject-derived
class in the application. Listing One shows the CRuntimeClass structure.
The real magic is that the types in this registry are not hardcoded in any
table. The first clue to how this trick is accomplished is at the top of
SCRIBDOC.H from the MFC "Scribble" sample application. The beginning of the
class declaration looks like Example 1(a). 
The online help says to use the DECLARE_DYNCREATE macro to enable objects of
CObject-derived classes to be created dynamically at run time. Although this
is a concise description of what DECLARE_DYNCREATE does, what really goes on
inside the macro is far more interesting. After preprocessing, the
DECLARE_DYNCREATE macro expands to several new class members; see Example
1(b). (Note that all examples are from MFC 3.1 and Visual C++ 2.1. I've
reformatted all pre-processor-generated code for readability.)
The GetRuntimeClass() virtual function is the basis for run-time types in MFC.
Run-time type information can be accessed for any CObject-derived object that
includes DECLARE_DYNAMIC, DECLARE_DYNCREATE, or DECLARE_SERIAL. This
information lets you determine if an object can legally be "downcast" to a
derived class or if one object is the same class as another. Although Visual
C++ does not support the new C++ operator dynamic_cast, using this run-time
type information will achieve the same effect.
The run-time type information is declared using a static member variable, in
this case classCScribDoc. The name (which has no space) is created in the
various DECLARE_xxx macros with the macro-concatenation operator. Both
_GetBaseClass() and GetRuntimeClass() are used to access this run-time class
information. GetRuntimeClass() is virtual, so the type of an object can be
found even with a pointer to a generic CObject.
Finally, the Construct() static member function forms the basis of MFC's use
of its class registry as a class factory that can create arbitrary types on
demand. To understand the workings of Construct(), some background is
necessary. 


Creating an Object


In Advanced C++: Programming Styles and Idioms (Addison-Wesley, 1992), James
O. Coplien describes the concept of a virtual constructor:
The virtual constructor is used when the type of an object needs to be
determined from the context in which the object is constructed.
In MFC, the context is based on information read from a serialized archive.
However, a virtual constructor is only a concept; no language construct
implements it directly. The new operator requires an explicit class as its
argument. Virtual constructors can be created by implementing in each class a
static function that calls new. This static member function can be called when
a particular type is needed. 
In MFC, this member function is called Construct(). It is created by the
IMPLEMENT_DYNCREATE or IMPLEMENT_SERIAL macros. One of these macros must
appear exactly once in a .cpp module for each class supporting dynamic
creation. In Scribble, the IMPLEMENT_DYNCREATE (CScribDoc, CDocument)
statement appears near the top of SCRIBDOC.CPP. The first argument is the
class, and the second is the class's parent class. Listing Two is the code
generated by the preprocessor.
When MFC needs a document or any other CObject-derived type, it calls the
CreateObject() member function for an instance of a CRuntimeClass.
CreateObject() allocates the memory using the size in the CRuntimeClass
structure, then calls ConstructObject(). ConstructObject verifies that the
type supports dynamic construction, then calls the Construct() function.
Although no explanation is given in the sources, this arrangement cleanly
separates the construction of an object from the memory allocation. This seems
like a lot of extra work, but it is required under certain circumstances. For
example, if an array were created manually, the memory would have to be
created with a single malloc() in order to be contiguous. By using
ConstructObject(), you could manually initialize each array entry. This
mechanism allows decisions normally made in C++ at compile time to be made at
run time.
Example 2 shows the Construct() function. The syntax of the call to new is a
little unusual. The function actually called is CObject::operator new(size_t,
void*). Remember, the size of a structure is an implied argument when operator
new is called, but must be explicitly declared in the operator new definition.
This version of new in CObject does nothing, but calling new has the side
effect of calling the constructor for this object. Again, the memory was
already allocated by CreateObject using the size information in CRuntimeClass.
By using the registry of CRuntimeClasses and the Construct() member function,
MFC is able to lookup and create new types on the fly, which solves Problem 1.
A potentially serious problem with this technique is that multiple inheritance
and virtual base classes are not supported (see MFC Technical Note #16).


Type Registration



Problem 2 is that users must be able to easily add new classes into the
registry. The idea of types registering themselves is core to object-oriented
design. If a type registers its own existence with a registry instead of
hardcoding the type into the registry, then the type can be freely added and
removed from the program without any code changes to the registry.
Although not immediately obvious, it is the IMPLEMENT_DYNCREATE macro that
enables users to easily add new classes into the registry. In the expansion of
IMPLEMENT_DYNCREATE in Listing Two, the static instance of CRuntimeClass in
CScribDoc is initialized as in Example 3.
Several of these entries have already been discussed. In particular, the
statement sizeof(CScribDoc) is used by CreateObject to allocate memory; then
the function pointed at by CScribDoc::Construct initializes the memory.
The next line places this information into the MFC type registry: static const
AFX_CLASSINIT _init_CScribDoc(&CScribDoc::classCScribDoc);. This declaration
constructs an object of type AFX_CLASSINIT using a constructor that takes a
CRuntimeClass as its argument. Because this object is at file scope, it will
be constructed before main(). AFX_CLASSINIT's constructor links this
CRuntimeClass into the MFC type registry. AFX_CLASSINIT itself has no member
data, so it will not take up any data space.
This mechanism allows on-the-fly registration of object types whenever they
are linked into the program, thus solving Problem 2.
A common question is, What is the difference between the various DECLARE and
IMPLEMENT macros? All DECLARE_DYNAMIC and IMPLEMENT_DYNAMIC macros define a
static instance of CRuntimeClass similar to the DYNCREATE shown earlier,
except that the Construct field is NULL. DECLARE_DYNCREATE and
IMPLEMENT_DYNCREATE add the Construct() function for dynamic type creation.
DECLARE_SERIAL and IMPLEMENT_SERIAL build on the DYNCREATE macros, and replace
the 0xFFFF entry with the structure's schema number. 
The SERIAL macros also define operator>> for the class. The operator>>
requires special handling because it is called with a pointer to the class,
but no instance of the class will exist until the instance has been serialized
in from the file. Without an instance of the class, MFC cannot access the
run-time class information to ensure that the object being loaded is
equivalent to or is a derived class of the given pointer. By overloading
operator>>, MFC is able to pass in a pointer to the run-time type information
so that the serialization mechanism will be typesafe.


Creating Types from a File


The third problem is to create a mapping scheme to allow a type to be created
based on information read from a file. Given that the compiler requires class
names to be unique and that the class name is already embedded in the
CRuntimeClass structure, the actual class name is the ideal candidate to write
to a file in order to identify a class.
An object could be saved to an archive by writing the class's name and data.
MFC does this and a little bit more for each class it encounters. The name of
the class is from the structure's CRuntimeClass, which is obtained from a
virtual function in the object. Because the typing is dynamic and done at run
time, a structure of type Tiger will be properly written even if MFC is passed
a pointer to a base class of type Animal. This type safety is important. Any
function can safely save an object to an archive even if the object's exact
type is unknown.
The same benefit applies to restoring an object from an archive. Going back to
the example at the beginning of the article, MFC is able to successfully load
the correct document type from a file with the simple statement in Example 4.
In the implementation of operator>>, MFC loads the name of the class from the
file, then searches the list of types for that name. As long as the type
exists in the registry and was declared with either DECLARE_DYNCREATE or
DECLARE_SERIAL, MFC can construct the object. The actual loading of the data
appropriate to that kind of object is delegated to the object itself by
calling its Serialize() virtual member function, and Problem 3 is solved.
The type of object created is separated from the type of object requested. If
a derived class is loaded into a pointer to a base class, as in the example of
CDocument, then the correct derived class will still be created. This is the
only way that the correct vtbl pointer can be set up to point at the object's
virtual functions. If the object in the archive is not a "kind of" the
specified object, MFC will throw an exception.
There are two potential pitfalls here for developers. First, renaming a
serializable structure or class will corrupt any old save files or archives.
Second, MFC does not record the length of each object into the archive. If MFC
can't load an object, it will not be able to skip the object and load the rest
of the archive.


Optimizing Archives


When I first scanned this code, I had visions of massive bloat in the save
file and painful delays while the linked list was walked repeatedly. But MFC
keeps its archive size down and its execution speed up by using hashed
identifiers.
MFC keeps a hash table that tracks all classes and objects written to an
archive. Once they are written, MFC does not rewrite them; it writes an
identifier instead. Thus, when the archive is read back in, the linked list of
types is traversed only for a new class. For subsequent instances of the
class, the proper instance of CRuntimeClass will be found with a hashed
lookup.
This behavior also means that multiple references to the same object are
handled correctly. If objects A and B both point to a single instance of
object C when the archive is created, they will both point to a single
instance of object C when the file is read back in. MFC will also correctly
resolve circular references between objects.
The implementation is much faster than I expected. On a 486/66, MFC was able
to save and load an archive over a megabyte long with 10,000 separate
instances of CArray-<DWORD,DWORD> in under two seconds.
An important limitation is that the hash table can have no more than 32,766
classes and objects per archive context. This count only includes classes
derived from CObject and serialized with operator<<, not fundamental types
such as short and long, CString, and CPoint. (See MFC Technical Note 2:
Persistent Object Data Format for more information on how archives are
constructed.)


Schema Versions


A poorly documented feature in 32-bit MFC 3.2 is support for versionable
schemas, where MFC allows the Serialize() routine to handle the various
versions of the class instead of throwing an exception. This feature is very
important in an evolving project. Although I will describe how to implement a
versionable schema, the implementation is broken in Visual C++ 2.x and fails
at run time. I recommend telling Microsoft how much you would like this
feature fixed.
In MFC, each structure that uses DECLARE_SERIAL and IMPLEMENT_SERIAL has an
associated version number. This number is normally set to 1, as shown in most
MFC sample code; for example, IMPLEMENT_SERIAL(CStroke, CObject, 1).
Each structure or class has its own version number, which can vary
independently of the others. MFC automatically writes the version number into
the archive after the class ID. Prior to 3.0, MFC did not have a mechanism to
completely support this version number, so older versions throw an exception
if the schema number of the object in the file does not match the current
schema number. This inhibits support for multiple schemas.
In MFC 3.0 and later, this behavior is only the default, and can be changed.
By ORing the third parameter of the IMPLEMENT_SERIAL macro with the constant
VERSIONABLE_SCHEMA, MFC will allow you to handle the schema version in your
Serialize() function. For example, to set the document version number to 3,
specify DECLARE_SERIAL(CScribDoc, CDocument, VERSIONABLE_SCHEMA3). 
To use this feature, a class should call GetObjectSchema() in its Serialize()
member function when the archive is loaded; see Listing Three.


Conclusion


By creating a foundation of a run-time type mechanism with a class factory and
building serialization on top of it, MFC implements a fast, flexible, typesafe
serialization mechanism. This mechanism should be powerful enough to satisfy
most design requirements.
Example 1: (a) The beginning of a class declaration; (b) after preprocessing,
the DECLARE_DYNCREATE macro expands to several new class members.
(a)
class CScribDoc : public CDocument{protected: // create from serialization
only CScribDoc(); DECLARE_DYNCREATE(CScribDoc) ...};

(b)
protected: static CRuntimeClass* __stdcall _GetBaseClass();public: static
CRuntimeClass classCScribDoc; virtual CRuntimeClass* GetRuntimeClass() const;
static void __stdcall Construct(void* p);
Example 2: The Construct() function.
void __stdcall CScribDoc::Construct
(void* p)
{
 new(p) CScribDoc;
}
Example 3: Initializing the static instance of CRuntimeClass in CScribDoc.
CRuntimeClass CScribDoc::
classCScribDoc = {

 "CScribDoc",
 sizeof(CScribDoc),
 0xFFFF,
 CScribDoc::Construct,
 &CScribDoc::_GetBaseClass,
 0 };
Example 4: Loading the correct document type from a file.
CDocument* pDoc;
CArchive& ar;
 ...
ar >> pDoc;

Listing One
struct CRuntimeClass
{
// Attributes
 LPCSTR m_lpszClassName;
 int m_nObjectSize;
 UINT m_wSchema; // schema number of the loaded class
 void (PASCAL* m_pfnConstruct)(void* p); // NULL => abstract class
 CRuntimeClass* m_pBaseClass;
// Operations
 CObject* CreateObject();
// Implementation
 BOOL ConstructObject(void* pThis);
 void Store(CArchive& ar);
 static CRuntimeClass* PASCAL Load(
 CArchive& ar, UINT* pwSchemaNum);
 // CRuntimeClass objects linked together in simple list
 CRuntimeClass* m_pNextClass;// linked list of registered classes
};

Listing Two
void __stdcall CScribDoc::Construct(void* p)
{
 new(p) CScribDoc;
}
CRuntimeClass* __stdcall CScribDoc::_GetBaseClass()
{
 return (&CDocument::classCDocument);
}
CRuntimeClass CScribDoc::classCScribDoc = {
 "CScribDoc",
 sizeof(CScribDoc),
 0xFFFF,
 CScribDoc::Construct,
 &CScribDoc::_GetBaseClass, 0 };
static const AFX_CLASSINIT _init_CScribDoc(&CScribDoc::classCScribDoc);
CRuntimeClass* CScribDoc::GetRuntimeClass() const
{
 return &CScribDoc::classCScribDoc;
}

Listing Three
class CSmallObject : public CObject {
 DECLARE_SERIAL(CSmallObject);
 DWORD m_value; // used to be unsigned short in version 1
};
IMPLEMENT_SERIAL(CSmallObject, CObject, VERSIONABLE_SCHEMA 2);

CSmallObject::Serialize(CArchive& ar)
{
 if (ar.IsStoring()) {
 ...
 }
 else {
 DWORD nVersion = ar.GetObjectSchema();
 switch (nVersion) {
 case -1:
 // -1 shows that the structure was created with DYNCREATE, not 
 // with SERIAL. Seeing this value is probably an error.
 break;
 case 1: // This version used unsigned short
 unsigned short oldval;
 ar >> oldval;
 m_value = oldval;
 break;
 case 2;
 // Current version uses DWORD
 ar >> m_value;
 break;
 default:
 // Bogus value - probably corrupt data file.
 break;
 }
 }
}




































Inside Flash Memory


Direct execution increases performance, lowers cost




Brian L. Dipert


Brian, an applications manager for high-density flash memory components at
Intel, is co-author of Designing With Flash Memory and author of The PCI
Handbook (both published by Annabooks Press). Brian can be contacted at
brian_l_dipert@ccm.hf.intel.com or bdipert@aol.com. 


Computer-hardware architectures have traditionally been defined by the need to
accommodate several levels of cache, DRAM main memory, boot nonvolatile
memory, and magnetic mass storage. This hardware approach has driven the
design and implementation of software architectures, including UNIX, DOS,
Windows, OS/2, and System 7--each of which is stored on hard disk drives
(HDDs) or ROM and paged into DRAM for execution.
But what about embedded-systems applications that don't require (or can't
accommodate) gigabyte hard drives or 16/64-Mbit-based DRAM arrays? To meet
cost, performance, power, size, weight, reliability, and other requirements,
these systems typically execute operating systems and applications directly,
instead of downloading them from hard drives. Flash-memory component arrays,
cards, and drives make such direct execution possible. Moreover, the current
generation of Embedded Flash RAMs delivers read performance that matches or
exceeds that of DRAM, eliminating the redundancy of slow, nonvolatile storage
memory and fast, volatile execution RAM.


Direct-Execute Compilers


Although flash memory is optimized for updatable code storage/execution and
mass-storage applications, RAM is still the memory solution of choice for
very-often-updated temporary data (video memory, stack, interrupt vector
tables, and the like). This means that for code storage and execution, the
compiler must be able to partition code and data and direct them to different
areas of the system memory map.
A number of compilers can place code segments in nonvolatile flash memory; see
Table 1(a). This locating function is accomplished either via command-line
options or in a configuration file accessed by the linker/locater. In Example
1, for instance, a TARGET.LD locator configuration file for the GNU/i960
toolset interfaces the Intel i960JF processor to Intel 28F016XS Embedded Flash
RAM. This hardware architecture is common in laser printers, datacom hubs and
routers, RAID controllers, graphical X-terminals, and similar applications.
Figure 1 and Figure 2 present a system-block diagram and system memory map,
respectively. The upper 4 MBs of the system memory map (corresponding to the
CPU boot location), made up of flash memory, contain system code segments and
static (non-updated) data tables and constants. System RAM stores the stack
and temporary data tables. RAM also contains initialized data tables, stored
in flash memory but copied to RAM and updated during system operation.
Software provided by compiler vendors lets you copy these initialized data
tables from flash memory to RAM on system boot. Software with data and
code-structure addresses that have not been hardcoded can be dynamically
placed by the linker/locator for the specific target system. This flexibility
also enables porting of legacy code to new architectures.


Direct-Execute Operating Systems and Applications


A variety of direct-execute operating systems are optimized for applications
that include embedded PC systems, real-time environments, and
handheld-computing devices; see Table 1(b). These systems are not constrained
to the standardized computer disk/DRAM memory architecture.
Figure 3, for instance, shows the Intel 386EX CPU interface to an Intel
28F400BX BootBlock flash memory. This hardware architecture can be utilized in
embedded PC designs, such as industrial controllers, point-of-sale terminals,
and handheld computers. The techniques I'll discuss here are based on
Datalight's ROM-DOS but apply to other operating systems as well. The memory
maps in Figure 4 and Figure 5 indicate that the BIOS, ROM-DOS, and
direct-execute RXE applications in the ROMDISK all reside within and are
executed directly from the flash-memory device. The system thereby requires
only a small amount of system RAM for scratchpad memory, stack, and the
interrupt-vector table. Because the operating system and applications are
stored in flash memory, they can be easily updated.
If the system does not require video RAM, the 4-Mbit flash memory can directly
map from addresses 80000H-FFFFFH (Figure 4). Alternatively, half of the flash
memory can map to extended memory, freeing up the A0000H-BFFFFH memory segment
(Figure 5).
Normally, the system boots BIOS and ROM-DOS directly from the BootBlock
flash-memory main blocks. Parameter blocks store nonvolatile system data and
are useful for integrating EEPROM functionality in many applications.
If the system is reset or loses power during flash-memory update, main-block
contents may be left in an undetermined state. Hardware inversion of
flash-memory-block locations, as shown in Figure 4 and Figure 5 (recovery
operation), enables system boot and recovery from the kernel code stored in
the hardware-locked, flash-memory boot block. This inversion can be
implemented via motherboard jumper, back-panel switch, or special keyboard
sequence.


How Does ROM-DOS Work?


When a flash memory containing ROM-DOS is placed in a DOS-based computing
system that might otherwise contain a floppy or hard disk, ROM-DOS will boot
from the flash memory.
In a traditional, DOS-based computer boot sequence, the BIOS follows these
steps:
1. Initializes itself.
2. Performs a Power On Self Test (POST).
3. Initializes the interrupt vector table.
4. Initializes all devices.
5. Searches upper memory (segments C000-FFFF) for BIOS extensions.
6. Boots the system via an INT 19H.
The BIOS has already pointed INT 19H to an internal BIOS routine that will
load the beginning of the operating system from a floppy or a hard disk.
ROM-DOS, on the other hand, contains a BIOS extension (ROM scan), a small
routine that the BIOS detects and executes after running the POST. The ROM-DOS
BIOS extension routine (only about 15 lines of assembly-language code) changes
the interrupt 19H vector to point to the entry point for ROM-DOS, rather than
the internal BIOS routine. Then the ROM-DOS BIOS extension returns and the
BIOS again takes control. When the BIOS is ready to boot, it executes an
interrupt 19H, causing a direct jump to the ROM-DOS entry point and booting
ROM-DOS on the target system.
One of ROM-DOS's features, RXE, is an executable file format that minimizes
RAM usage by executing the code directly out of flash memory. An EXE file is a
single program block loaded sequentially into RAM to execute. An RXE program
is like an EXE, but has two distinct program blocks: the code block and the
data block. The code block is "fixed up" to reside and execute in an absolute
area of flash memory. The data block is relocatable and is loaded into RAM by
DOS before execution begins.
A DOS EXE file created using a high-level language such as Microsoft C,
Borland C, or QuickBasic can be directly converted to an RXE file using
utilities from Datalight. Assembly-language portions of a program may require
some modification to run as an RXE. The source code for the EXE must be
available, and in some cases, the language startup source-code files or
library source-code files may be necessary.
To convert code to an RXE file, the DATA segment must have at least one fixup
and there must not be any self-modifying code or overlays. In addition, the
program must not assume that the data segment immediately follows the code
segment, and all of the program code must be addressable in memory at all
times. This last restriction invalidates the use of RXE programs from a
paged-extended memory diskette.
During the conversion process, the user (or utility program) specifies the
absolute segment address where the RXE file will reside and the relative
offset of the data block within the RXE. The division between the code and
data blocks is typically specified as a segment name or segment class (from
the .MAP file), or as an absolute number.
The conversion is performed as follows:
1. All of the code-block fixups are resolved and then removed from the fixup
list. 
2. The code block is placed at the provided fixed address.
3. All data-block fixups in the code block are emulated or flagged as errors.
The data block's run-time address is unknown until DOS loads the data block
into RAM, so references to the data-block segments are unknown at conversion
time. If an instruction that loads a register with data-block segment values
is encountered in the code block, the converter emulates this load with an INT
18H call. INT 18H loads the correct register with the appropriate segment
value at run time and returns to the program. For instance, Example 2(a) is
changed to Example 2(b). If the code block references a data-block segment but
does not load a register with it, these references cannot be emulated and are
flagged as errors; see Example 2(c).

4. Data-block fixups are adjusted and left in the fixup list, completing the
conversion.
The loading and execution of an RXE follows the standard operation of a
direct-executable program; see Figure 6. The code block executes out of flash
memory and the program data resides in RAM. If any of the data is initialized,
it is copied from flash memory to RAM before execution. This offers
considerable RAM savings over an EXE file format, which must execute
completely out of RAM. Direct execution also preserves some additional
flash-memory space because the EXE header is considerably reduced when
converted to an RXE.
Datalight's RXE utility provides the ability to execute a program in place:
"XIP" (PCMCIA standards-group lingo for "eXecute In Place").


Flash Translation Layer (FTL)


For mass-storage applications, flash-memory file-system software comprehends
and effectively compensates for flash memory's large blocking structure
compared to a 512-byte sector size for hard drives. It also spreads file
writes across the entire flash-memory media to eliminate excessive erasing
(cycling) of a subset of the available flash-memory blocks. This technique,
which is called "wear leveling," extends flash-memory life.
Since flash memory does not behave like traditional, magnetic media, special
software is necessary for managing stored files. This flash-file-system
software comprehends flash memory's bit-level program (1-->0), block-level
erase (0-->1), and wear-leveling requirements. Table 1(c) lists a number of
flash-file-system options available for various operating-system alternatives,
including, but not limited to, DOS.
For example, Figure 7 interfaces 4 MB of resident flash memory to the system
DRAM controller and uses FTL software for disk-drive emulation. Applications
with low-density mass-storage requirements include data loggers, medical
instrumentation, fax machines and scanners, ruggedized terminals, and audio
recorders.
Intel's 28F016XD Embedded Flash RAM includes a DRAM-compatible hardware
interface for easy system integration. By being located on the DRAM bus,
28F016XD accesses can also be cached for highest-effective performance, just
like DRAM accesses. The system memory map is a variation of the ROM-DOS design
described earlier; see Figure 8. FTL is approximately 20-30 KBs in size, plus
optional PCMCIA drivers. Note that the EMS window is optional; FTL disk drives
can be accessed either through a sliding window or linearly, in protected
mode; see Figure 9(c).
In the DOS world, the resident flash disk can interface to the operating
system in one of several implementations; see Figure 9. Card Services is the
PCMCIA software layer responsible for allocating system resources, such as the
memory space for removable flash-memory cards. The Card Services layer allows
any PCMCIA memory or I/O card to be supported. The implementation in Figure
9(a) might be used in systems that require both a Resident Flash Disk (RFD)
and PCMCIA cards. FTL can be used to communicate with both the removable
flash-memory cards and RFD. However, each would require its own Socket
Services or low-level driver, plus unique hardware-interface logic.
Socket Services, a PCMCIA software standard originally developed for PCMCIA
sockets, configures the window that accesses the RFD. There's no "socket" as
such for a RFD, but many FTL developers still refer to this driver as "Socket
Services." Either an 82365SL-compliant PCMCIA-card-interface controller or a
hardware-page register can be used to implement the RFD window. The
implementation in Figure 9(b) might be used in systems that require both RFD
and PCMCIA memory cards, but not PCMCIA I/O cards.


How Does FTL Work?


FTL uses the existing operating system for upper-level file-handling
capabilities, such as translating file-based operations to sector-based
versions. By translating received, sector-based requests, an FTL driver
appears to the upper-layer software as a sector-based, magnetic hard drive.
A typical hard drive has a sector size of 512 bytes. Upper-layer software
expects to be able to fully modify these sectors at any given time. It is a
requirement of flash-memory technology that the block containing the sector be
fully erased in order to change any stored 0s back to 1s.
When upper-layer software tries to modify a stored sector, FTL remaps the
request to an available, fully erased 512-byte area within the flash-memory
array; see Figure 10. These remapped sectors are called "virtual small blocks"
(VSBs). FTL subdivides each 64-KB flash-memory block into smaller VSBs. For
example, each 28F016XD Embedded Flash RAM contains 32 64-KB blocks, or 4000
VSBs (each 512 bytes in size).
Depending on the types of files stored to the RFD, some natural wear leveling
may occur. For example, data files are often deleted and updated, and
therefore tend to migrate throughout the RFD. Configuration and executable
files, on the other hand, are read frequently but deleted/updated rarely (if
at all). For this reason, FTL software solutions supplement natural "passive"
wear leveling with "active" erase-cycling allocation schemes. These
vendor-proprietary techniques keep track of cycle counts for each RFD block
and relocate static files at vendor-defined block-to-block cycle differential
points. With effective wear leveling, flash memory's 10,000-100,000 block
erase-cycling specifications exceed the requirements of most system-lifetime
file-update profiles. RFD reliability far exceeds that of alternative magnetic
hard drives, given reasonable lifetime-cycling usage.
When a file is updated or deleted, the FTL file system marks the old file as
"dirty." As FTL updates files, more and more of the total RFD area transitions
from clean/available to dirty/unavailable.
At some vendor-specific point, FTL will determine that a block with many dirty
virtual blocks should be erased and converted back to clean space. The file
system first copies the remaining clean VSBs in the dirty block to another
free or "spare" block before erasing and reclaiming the block. The more spare
blocks reserved for garbage collection, the quicker the process, but the less
available flash-memory storage capacity for a given array size.
FTL provides an effective solution for interfacing legacy operating systems
(originally intended for magnetic disk drives) to flash memory with its unique
benefits and characteristics. FTL's efficient usage of flash-memory eliminates
hard-drive-like block-size requirements, enabling use of FTL with a range of
flash-memory technologies optimized for lowest silicon cost.
Flash versus Other Memory Approaches
The redundancy of traditional memory hierarchies does little to optimize the
performance of today's fast microprocessors. Both system boot time and
application task-switching response bog down because of disk drive spin-up
time and nonvolatile-memory-to-RAM file-load delays. Furthermore, DRAMs must
be constantly refreshed by the memory controller to preserve their stored
data. The hard drive draws current with every motor rotation and may actually
draw more average current if it is "parked" too frequently, since spin-up
causes high current draw.
Magnetic media contain moving parts and have narrow operating-temperature
ranges. Though improving, these devices have unacceptably low tolerance to
shock, vibration, and movement during read/write in ruggedized and mobile
environments.
All in all, multiple levels of memory mean multiple sources of system-memory
cost and multiple levels of potential component failure. Excessive-heat
generation also impacts system lifetime, and the many board traces required
are a significant manufacturing challenge and reliability headache. DRAMs are
subject to single-bit errors caused by alpha-particle radiation, and
mission-critical systems subsequently add costly EDAC circuitry to circumvent
this problem.
But just as today's embedded-system designs are revolutionary, so too are
their memory architectures. Incremental improvements to the traditional
approach no longer measure up to the potential of a flash-memory-based
alternative. Intel's ETOX flash-memory technology, for example, enables fast
reads (as fast as an effective 30 ns) and writes (as fast as 30 MB/sec burst
and 500 KB/sec sustained per component). A range of densities, from 256 Kbit
to 32 Mbit, in x8 and x16 interface options and various blocking schemes,
address a variety of system configurations.
Flash memory provides both the high read performance of DRAM and SRAM, and the
nonvolatility of ROM and hard-drive-like write performance. Lengthy
software-load overhead and task-switching delays are eliminated. Code runs as
fast or faster than that in DRAM with less system-hardware complexity.
The price of a high-density flash-memory array is comparable to that of a
DRAM/ROM or DRAM/small hard drive combination (including interface and control
logic). Unlike ROM, flash memory is in-system updatable to keep system costs
low both initially and throughout system lifetime.
Being nonvolatile, flash memory requires no periodic refresh and no constant
application of power to retain stored information. The redundancy of multiple
memory technologies in the traditional memory architecture results in multiple
sources of power consumption. These multiple memories must be summed to
determine the true memory subsystem power draw. With flash memory, there is
only one very efficient memory technology, consuming little system power.
Compact flash memory TSOP packaging provides up to 18.4 MB/in2 density
capability, with components mounted on both sides of the system board.
Eliminating memory duplication saves substantial board space and yields higher
system ruggedness and reliability. Very-low power consumption reduces the size
and weight of system batteries and power supplies.
--B.L.D.
Figure 1: Intel i960r JF microprocessor interface to Intel 28F016XS Embedded
Flash RAM memory delivers 0 wait state burst read performance.
Figure 2: System memory map for Intel i960 JF/28F016XS Embedded Flash RAM
memory design.
Figure 3: Intel386EX interface to Intel 28F400BX BootBlock flash memory
enables high-performance, upgradable, direct-execute code.
Figure 4: System memory map for Intel 386EX/28F400BX BootBlock flash-memory
design.
Figure 5: Alternative system memory map for Intel 386EX/28F400BX BootBlock
flash-memory design.
Figure 6: Datalight's RXE utility and file format segments code and data
portions of DOS applications to enable direct execution out of flash memory.
Figure 7: The 28F016XD Embedded Flash RAM memory enables DRAM SIMM pinout
compatibility and a no-glue interface to system DRAM controllers.
Figure 8: System conventional memory map for Intel 28F016XS Flash RAM RFD
design (28016XD array located at/above 1-MB address).
Figure 9: Implementation options for resident flash-disk-interface software
(a) for systems that require both an RFD and PCMCIA memory and I/O cards; (b)
for those that require an RFD and PCMCIA memory card, but not PCMCIA I/O
cards; (c) for systems that require only an RFD.
Figure 10: FTL enables usage with a wide range of flash-memory block sizes and
minimizes per-block cycling by storing new/updated file versions in available
virtual small blocks.
Table 1: (a) Direct-execute software compiler/linker vendors; (b)
direct-execute operating-system vendors; (c) flash-memory file-system vendors.
Application Category Vendor Application
(a)
Compiler/linker Cygnus
 Embedded Performance
 GreenHills
 MetaWare
 Microtec Research
 Software Development Systems
Linking locators Phar Lap
 and libraries Systems & Software
 for x86 compilers


(b)
Direct-execute
 DOS/DOS-based OSs General Software Embedded DOS
 Datalight ROM-DOS
Handheld Computing OS GeoWorks GeoWorks
 General Magic MagicCap
 Apple Computer Newton OS
MS-DOS ROM Executable,
 MS-Windows ROM
 Executable Annabooks Press
Real-time, embedded OSs Chorus Systems Chorus Nucleus
 Lynx Real-Time Systems LynxOS
 Microware Systems OS/9
 Integrated Systems pSOSystem
 JMI Software Systems PSX
 QNX Software Systems QNX
 Spectron Microsystems SPOX
 VentureCom Venix
 Microtec Research VRTX
 WindRiver Systems VxWorks

(c)
Flash File System (FFS) Datalight CardTrick FFS
 Annabooks Microsoft MS-FFS2
 SystemSoft Microsoft MS-FFS2
Flash Translation Layer
 (FTL) Datalight CardTrick FTL
 SCM Microsystem FTL-FFS
MS-FFS2 (Enhanced) SystemSoft SystemSoft FTL
 M-Systems TrueFFS
Example 1: TARGET.LD file segments code (into flash memory) and data (into
RAM).
MEMORY
{
 FlashRAM: o=0xFFC00000, 1=0x400000
 DataRAM: o=0x00000000, 1=0x10000
}
SECTIONS
{
 .text: ;Code Segment
 {
 } >FlashRAM
 .data: ;Data Segment
 { ;Initialized Data Variables
 _ram = .;
 } >DataRAM
 .bss: ;Data Segment
 { ;Uninitialized Data Variables
 } >DataRAM
}
Example 2: Converting a DOS EXE file to an RXE file.
(a)

MOV AX,SEG DGROUP
(b)
INT 18HDB 0

(c)
MOV AX,DS_var . . .DS_var: DW DGROUP

































































Environment Variables and Windows 3.1


Manipulating variables and running tasks 




John (Fritz) Lowrey


Fritz is a PC consultant and developer for the University of Southern
California University Computer Services department working in C, Visual Basic,
and Borland Delphi. He can be contacted at jlowrey@ucs.usc.edu.


What do you do if you have several hundred networked computers (like we do at
the University of Southern California student computer lab) running software
that demands a customized run-time environment? If you are using DOS, the
usual solution is to set aside enough space for the environment (in
CONFIG.SYS), then add the program's settings (such as FOOBAR=/a /s /d /f:2) to
an entry in the AUTOEXEC.BAT file. You can even create a separate batch file
that sets this variable before running the program, then clears it after the
program exits. 
At USC, several programs installed on a network need a pointer to themselves
in either the PATH statement or the Novell search path. This isn't simple, as
some PATHs are already approaching the maximum allowable length (127
characters in DOS 6.2), and we are mapping several servers to satisfy the
needs of the lab's users. Imagine trying to get the PATH to hold a number of
directory names such as F:\PROGRAMS\CLASS\BIO\BIGAPP\ or
F:\PROGRAMS\WP\WINWORD\WFW6.0\. You approach critical mass pretty fast!
Consequently, I needed to write a program that would alter a program's
environment values (PATH and TEMP mostly) on a "per run" basis, so that we
would need to set only a few universal values in AUTOEXEC.BAT. Essentially my
program would "wrap" the target program, changing the environment as needed,
then exit once the target program was run. Since this sort of thing is trivial
under DOS and UNIX, I figured it would also be easy under Windows. I was
wrong. 
Most operating systems have some concept of an "environment" for a particular
program. DOS and UNIX (and Windows, for that matter) use a set of strings in
an array (char *envp[] or char **environ in C/C++) that designates certain
program options, such as search path and temporary directory. These strings
are manipulated by the standard library calls; char *getenv(char *search)
which gets environment string identified by search, and intputenv(char
*putstr), which changes or adds an environment variable.
A program uses and changes its variables as it sees fit. Child programs,
however, get copies of the parent's environment variables; see Figure 1. Thus,
variables for a given child task can be altered from that of the parent.
Examples are the search path (PATH=c:\;c:\dos;c:\windows) and the temporary
directory (TEMP=c:\temp); see Figure 2.
Windows programs exhibit behavior based upon environment settings; for
instance, using the TEMP and PATH values to locate files and components. This
DOS-like behavior may lead you to assume that if an environment variable is
modified, a child would then inherit it, but this isn't the case.
By default, Windows programs get a pointer to--rather than a copy of--an
environment space; see Figure 3. To me, this meant that though getenv() should
work, putenv() must be either very dangerous or totally ineffectual. The
problem is that if all programs get a pointer to the same chunk of memory and
each of them is able to alter it, then any alteration to the environment space
will affect all tasks unpredictably. Books such as Undocumented Windows, by
Andrew Schulman et al. (Addison-Wesley, 1992), describe the Windows
application-startup process; however, no mention is made of
environment-related behavior. To make things more confusing, both Visual C++
1.5 and Borland C++ 4.5 include getenv() and putenv() calls that seem to work;
that is, getenv("PATH") returns the value of the PATH from your AUTOEXEC.BAT
file, and putenv("FOO=BAR") followed by getenv("FOO") returns BAR. But if you
then run a child program, it inherits the unchanged DOS environment. How can
this be?
The Borland and Microsoft Windows C/C++ run-time startup code appears to make
a copy of the default environment into an internal array that is then accessed
by envp[], getenv(), and putenv(); see Listing One. While useful within a
single program, this is pretty useless for making functional changes to the
environment used by child tasks; see Figure 4.
Working my way through assorted documents in the Windows 3.1 API reference, I
found HINSTANCE LoadModule(LPCSTR lpszModuleName, LPVOID lpvParameterBlock).
The lpvParameterBlock structure must be user defined; Example 1 identifies its
fields.
It seemed to me that the segEnv field would run a child with a modified
environment, but I had to make the changes without ruining the rest of the
system. I first attempted to pass the segment value of the char **_environ
array maintained by the Borland run-time code, but this induced a GPF. The
next logical step was to make a memory buffer the same size as the environment
and filling it with my own data. To determine how much space to allocate, I
created an integer value called "ENVSIZE." This value was determined by hand,
as there seems to be no straightforward way to get it from Windows. In the
demonstration code the size is passed to EnvInit() as an argument. (To find
your environment size, look at the "SHELL" line in CONFIG.SYS. If you are
using command.com, the /e: parameter sets the environment size; mine is set to
1024 bytes.) To ensure the child task access to this memory, I made it
sharable within the GlobalAlloc() API call, as shown in Example 2.
To get all of the DOS environment strings into this buffer, you can loop
through envp[] and copy all of the strings (and NULLs) or use the LPSTR
GetDOSEnvironment() API call, which returns a pointer to the first address of
either the default Windows environment or an environment space created for
(and passed to) this program. While LPSTR GetDOSEnvironment() is documented in
the help system, there is no mention or warning about the dangers of altering
such a globally accessible memory space. I made my copy this way:
memcpy(pNewEnv, GetDOSEnvironment(), ENVSIZE);. 
Once you have a copy of the environment, you can manipulate it as you see fit.
For instance, Example 3(a) gives you a print-out of your environment strings
in Windows. Under DOS or UNIX, the loop would look like Example 3(b).
At this point, you can write into your memory block without affecting the rest
of the system. I reimplemented getenv() and putenv() to use my memory buffer,
emulating the documented functionality of the original routines as closely as
possible; see Listing Two (winenv.c).
Next I wanted to run a program so that it got my new environment variables
rather than the Windows defaults. To run a child, most people use UINT
WinExec(LPCSTR lpszCmdLine, UINT fuCmdShow) since it is easy to construct a
command line with the program name and command-line parameters; for example,
WinExec("notepad c:\autoexec.bat", SW_SHOWNORMAL);. However, if you want the
child to inherit the new environment, this won't work because WinExec() enters
the kernel module and runs the child program using the kernel defaults,
including the environment pointer; see Figure 5.
To pass the new environment to the child task, call LoadModule(). Example 4 is
the code I use to run my child. The resulting new program gets a pointer to
the memory space filled with environment strings created by the parent program
when it calls GetDOSEnvironment(); see Figure 6. The Windows 3.1 API reference
alludes to this behavior but does not document it.
The result of all this is that I can now use my versions of getenv() and
putenv() and run a program that inherits a new set of environment values. The
only caveat is that the memory buffer must exist when the child needs it.
Because the startup code is executed before the LoadModule() call returns,
this is not a problem if the child is a C/C++ program (but not a DLL--these
don't get run-time environment pointers and may be unloaded, then loaded by a
different task). Once the startup code has run, the child has the environment
space buffered internally for use by envp[]/environ. If the child does not use
this startup-code mechanism (or the child was written in a language like
Borland Delphi that does not initialize an environment arena), it is critical
not to free the environment memory buffer before exiting the parent. If the
parent frees the new buffer, then, when the child tries to access its
environment, it will GPF due to an invalid pointer access.


Summing Up


Environment handling under Windows 3.1 is poorly documented, and using a
single environment space in the Windows system area for all programs is
dangerous. Although the Borland C++ and Microsoft C/C++ getenv() and putenv()
routines appear to work under Windows, they do not affect the behavior of
child tasks as might be expected, and this should be documented.
Solving this problem was a merry chase, and I enjoyed it (though I lost quite
a bit of hair in the process). I regret only that I have not found a way to
create a program that can build an altered environment buffer, run a child,
and then exit without worrying about whether or not the child will crash when
trying to find its PATH. Perhaps you can help out here.


Acknowledgments


Thanks to C.J. Zinngrabe, Chuck Hellier, Frank Callaham, Steve Bridges, and
Mike Beatrice.
Figure 1: The child gets a copy of the parent environment.
Figure 2: If Child 1 changes the environment, Child 2 will get a copy of the
changes.
Figure 3: Windows programs get a pointer to--not a copy of--the environment
variables.
Figure 4: Since envp[], getenv(), and putenv() work on a buffered environment,
the child program doesn't see any changes.
Figure 5: A WinExec()ed child will get the default Windows environment
pointer.
Figure 6: LoadModule() lets you point the child at a new environment space.
Example 1: lpvParameterBlock fields.
struct lpvParameterBlock {
 WORD segEnv; /* child environment */
 LPSTR lpszCmdLine; /* child command tail */
 UINT FAR* lpShow; /* how to show child */
 UINT FAR* lpReserved; /* must be NULL */
} LOADPARMS;
Example 2: Allocating sharable memory for environment information.
hNewEnv = GlobalAlloc(GPTR GMEM_SHARE, ENVSIZE);/* get the memory */

pNewEnv = GlobalLock(hNewEnv); /* lock it down to get a pointer */
Example 3: (a) Printing out environment strings; (b) loop for DOS or UNIX.
(a)
char *tempstr;tempstr = pNewEnv;while (*tempstr != NULL) { printf("%s\n",
tempstr); tempstr += strlen(tempstr) + 1; /* move to the next string */}
(b)
int i;for(i=0; envp[i] != NULL; i++) printf("%s\n", envp[i]);
Example 4: Code to run a child that can access its parent's environment.
struct LOADPARMS parms; /* needed for LoadModule */
char *progname = "envtst.exe"; /* see included source */
word show[2] = {2, SW_SHOWNORMAL];
parms.segEnv = FP_SEG(pNewEnv); /* BC++ macro to get seg address */
parms.lpszCmdLine = (LPSTR) ""; /* No command line options */
parms.lpShow = &show; /* address of show state array */
parms.lpReserved = (LPSTR) 0;
result = LoadModule(progname, &parms);

Listing One
/* envtst.c : This prints out the current envp[] set and the current Windows 
 default environment settings. I used this to determine if a change to the 
 environment settings had taken place. Build this as a Borland EasyWin 
 (Windows command line) executable. 5/95 Fritz Lowrey
*/
#include <stdio.h>
int main(int argc, char *argv[], char *envp[]) {
 int i;
 char *denv; /* default environment */
 printf("Program environment array:\n");
 for (i=0; envp[i] != NULL; i++)
 printf("%s\n", envp[i]);
 printf("\nWindows default environment:\n");
 denv = GetDOSEnvironment();
 while (*denv != NULL) {
 printf("%s\n", denv);
 denv += strlen(denv) + 1; /* move to the next string */
 }
 exit(0);
}

Listing Two
/* winenv.c: Build using Borland C++ EasyWin environment to allow for stdio
 function calls. Copyright John "Fritz" Lowrey, 24 May, 1995. This code and 
 research that made it possible were done in conjunction with the University
of
 Southern California University Computer Services Dept.
*/
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <dos.h>
#define DEMO
/* static variable for environment manipulation, not visible to other
modules*/
static char *lpNewEnv; /* pointer to environment space */
static HGLOBAL hNewEnv; /* handle to environment memory */
static int ENVSIZE; /* size of environment space */
/* LOADPARMS stucture needed by LoadModule */
struct LOADPARMS{
 WORD segEnv; /* child environment */
 LPSTR lpszCmdLine; /* child command tail */
 UINT FAR* lpShow; /* how to show child */
 UINT FAR* lpReserved; /* must be NULL */

} ;
/* Initialize the environment space, ENVSIZE is the size of the environment
region (defined on the SHELL line of CONFIG.SYS */
/* returns -1 on error or 0 on sucess */
int EnvInit(int esize) {
 ENVSIZE = esize;
 if((hNewEnv = GlobalAlloc(GPTR GMEM_SHARE, ENVSIZE)) == NULL)
 return -1;
 if ((lpNewEnv = GlobalLock(hNewEnv)) == NULL)
 return -1;
 /* we now have a pointer to the memory, fill it from the env space */
 if (memcpy(lpNewEnv, GetDOSEnvironment(), ENVSIZE) == NULL)
 return -1;
 /* environment space is initialized, return 0 */
 return 0;
}
/* definitions for new getenv and putenv routines */
/* Simple new getenv() routine. Seach must be a label only */
LPSTR NewGetEnv(LPSTR search) {
 LPSTR tmpstr;
 /* point tmpstr at the environment space */
 tmpstr = lpNewEnv;
 /* scan through the space */
 while (tmpstr[0] != NULL) {
 /* if "search" is found at begining of tmpstr, return tmpstr */
 if (strstr(tmpstr, search) == tmpstr)
 return tmpstr;
 tmpstr += strlen(tmpstr) + 1; /* move to next string */
 }
 /* if we fall through to here, return NULL */
 return NULL;
}
/* new putenv(): returns 0 on sucess -1 on failure */
int NewPutEnv(LPSTR putstr) {
 LPSTR currentloc; /* currentlocation in the buffer */
 LPSTR tmpstr; /* used to move through buffer */
 char label[30]; /* the lable portion of putstr */
 HGLOBAL hHoldEnv;
 char *pHoldEnv; /* holding area for the environment */
 int deleting = 0;
 /* if there's nothing to do, return failure */
 if(putstr == NULL)
 return -1;
 /* if the = in input is in 1st position, or there is no '=', fail */
 if((strchr(putstr, '=') == 0) (strchr(putstr, '=') == putstr))
 return -1;
 /* create holding area for the new environment */
 if((hHoldEnv = GlobalAlloc(GPTR, ENVSIZE)) == NULL)
 return -1;
 if((pHoldEnv = GlobalLock(hHoldEnv)) == NULL)
 return -1;
 /* assume we're OK, get the label */
 memset(label, '\0', 30);
 memcpy(label, putstr, strchr(putstr, '=') - putstr);
 /* check to see if were deleting */
 if(putstr[strlen(putstr)-1] == '=')
 /* '=' is last character, were deleting */
 deleting = 1;
 /* now move through the input, trying to find the label */

 tmpstr = lpNewEnv;
 currentloc = pHoldEnv;
 while(tmpstr[0] != NULL) {
 if(strstr(tmpstr, label) == tmpstr) {
 if(!deleting) {
 /* string found, copy in new string */
 /* be sure to get the NULL */
 memcpy(currentloc, putstr, strlen(putstr) +1);
 currentloc += strlen(putstr) + 1;
 }
 }
 else {
 /* not found. copy current string to holding area */
 memcpy(currentloc, tmpstr, strlen(tmpstr) + 1);
 currentloc += strlen(tmpstr) + 1;
 }
 tmpstr += strlen(tmpstr) + 1; /* get next string */
 }
 currentloc[0] = NULL; /* ensure a trailing NULL */
 /* now copy all of this stuff back on top of the envspace */
 memcpy(lpNewEnv, pHoldEnv, ENVSIZE);
 /* free up the hold environment */
 GlobalUnlock(hHoldEnv);
 GlobalFree(hHoldEnv);
 return 0;
}
/* NewWinExec will run a program using the new environment values */
int NewWinExec(char *progname, char *cmdline, int showstate) {
 UINT shows[2];
 struct LOADPARMS parms;
 shows[0] = 2;
 shows[1] = showstate;
 parms.segEnv = FP_SEG(lpNewEnv);
 parms.lpszCmdLine = cmdline;
 parms.lpShow = &shows[0];
 parms.lpReserved = NULL;
 return LoadModule(progname, &parms);
}
/* free up the environment memory space */
void EnvClose(void) {
 GlobalUnlock(hNewEnv);
 GlobalFree(hNewEnv);
}
#ifdef DEMO
int main(int argc, char *argv[], char *envp[]){
 LPSTR tmpstr;
 int i;
 FILE *outfile;
 outfile = fopen("c:\\temp\\envfile.txt", "w");
 printf("Environment from envp:\n");
 for (i =0; envp[i] != NULL; i++) {
 printf("%s\n", envp[i]);
 fprintf(outfile, "%s\n", envp[i]);
 }
 /* initialize the holding environment */
 if(EnvInit(1024) == -1) {
 printf("Environment failure!\n");
 exit(-1);
 }

 printf("\nEnvironment from pNewEnv:\n");
 fprintf(outfile, "\nEnvironment from pNewEnv:\n");
 tmpstr = lpNewEnv;
 while(tmpstr[0] != NULL) {
 printf("%s\n", tmpstr);
 fprintf(outfile, "%s\n", tmpstr);
 tmpstr += strlen(tmpstr) + 1;
 }
 printf("\nPATH from NewGetEnv = %s\n", NewGetEnv("PATH"));
 fprintf(outfile, "\nPATH from NewGetEnv = %s\n", NewGetEnv("PATH"));
 printf("\nTEMP from NewGetEnv = %s\n", NewGetEnv("TEMP"));
 fprintf(outfile, "\nTEMP from NewGetEnv = %s\n", NewGetEnv("TEMP"));
 printf("\nSetting PATH and TEMP...\n");
 fprintf(outfile, "\nSetting PATH and TEMP...\n");
 if(NewPutEnv("TEMP=c:\\fritz") == -1) {
 printf("NewPutEnv error!\n");
 fprintf(outfile, "NewPutEnv error!\n");
 }
 if(NewPutEnv("PATH=c:\\;c:\\dos;c:\\windows") == -1) {
 printf("NewPutEnv error!\n");
 fprintf(outfile, "NewPutEnv error!\n");
 }
 printf("\nPATH from NewGetEnv = %s\n", NewGetEnv("PATH"));
 fprintf(outfile, "\nPATH from NewGetEnv = %s\n", NewGetEnv("PATH"));
 printf("\nTEMP from NewGetEnv = %s\n", NewGetEnv("TEMP"));
 fprintf(outfile, "\nTEMP from NewGetEnv = %s\n", NewGetEnv("TEMP"));
 fclose(outfile);
 EnvClose();
 exit(0);
}
#endif DEMO
































Examining CA-Visual Objects


An object-oriented environment for Windows development




Rod da Silva


Rod is a Windows-developer consultant/trainer and principal of Software
Perspectives of Richmond Hill, Ontario. He can be reached on CompuServe at
73020,2311 or by phone at 905-771-6675.


CA-VisualObjects(VO)is anapplication-development environment that sports an
incremental, native-code compiler, visual painters and editors, and an
advanced, repository-based storage system that manages all aspects of a
project automatically. VO's underlying language, based on xBase, has
extensions that allow optional strong typing and full object orientation. 
VO's language, in fact, is the most recent incarnation of the Clipper
language, the tool of choice for many hardcore DOS business application
developers for over a decade. Clipper was originally designed by Nantucket to
be a dBase III compiler producing stand-alone executables. Computer Associates
(CA) bought Nantucket in 1992, thereby acquiring Clipper and a research
project known as "Aspen," Nantucket's probe into GUI development. Aspen, which
evolved into VO, was to be a completely object-oriented, repository-based,
cross-platform application-development environment targeting the GUI
environment and based on a strongly typed, object-oriented version of the
Clipper language. 
In this article, I'll examine the VO language, looking at characteristics such
as memory management and calling conventions, and finally focusing on its
object-oriented features.


Optional Strong Typing


CA-Visual Objects supports optional strong typing. It is hard for any
classically trained programmer to do without variable and/or function strong
typing. However, all xBase variables and functions are polymorphic in that
they can change their types at any time. Consequently, variables and functions
are not declared as a certain type in the original dBase language. Moreover,
the actual declaration of the variable itself is optional, allowing for
dynamically created variables; see Example 1(a). Millions of lines of xBase
code use undeclared, dynamically created, polymorphic variables like this.
However, this code is dated, and xBase developers now opt for lexically scoped
variables using Hungarian notation. In Example 1(b), for instance, nX and cY
are still polymorphic, in that their type can change. However, the adoption of
Hungarian notation provides a kind of "soft" typing, at least in terms of
readability.
VO also supports this kind of code, making it compatible with existing xBase
systems. VO is, however, largely geared toward strong typing of its variables
and functions, allowing variable references to be resolved by the compiler
(generating native code) instead of the slower, run-time symbol table.
Strongly typed code also lets the compiler perform standard type checking; see
Example 2. 
The VO language allows you to mix and match these coding styles. Thus, legacy
xBase code can be ported (often without change) to the VO environment, with
the option of later being incrementally strongly typed, module by module, when
speed is an issue. New code should be strongly typed, but the compiler does
not require you to strongly type variables or functions. The compiler also
supports an automatic "type-inferencing" switch that infers the type of an
untyped variable from its usage in the code and automatically generates native
code for it. Thus, the single reference to the untyped variable i in a For
loop would be inferred as type WORD or LONG, and the appropriate native
machine code would be generated. Since the compiler supports varying degrees
of optimization (loop, peep-hole, and the like), the code in Example 2 has the
speed/size characteristics as comparable C/C++ code.


Automatic Memory Management


When it comes to memory management, VO inherits high-level base data types
from CA-Clipper. Consequently, the allocation and deallocation of memory used
by these variables is completely automatic. You declare and use your
variables, then return from the function you are in when done. All garbage
collection occurs automatically in the background, reclaiming all unreferenced
memory. Functions such as new, delete, malloc, and free aren't an issue for
regular VO development.
Additionally, dynamic data types (such as STRINGs and ARRAYs) can grow and
shrink as necessary; see Example 3. All requests for dynamic memory are
handled by VO's own virtual dynamic memory manager. 


Calling DLLs with VO


In addition to data types that use dynamic memory, VO supports the traditional
data types used to call any DLL function (including the Windows API). BYTE,
INT, SHORTINT, LONG, FLOAT, REAL4, REAL8, WORD, DWORD, PTR, PSZ, and C-style
STRUCTUREs can all be used to call any DLLed function when the appropriate
prototype is provided. (VO also supports statically dimensioned arrays of any
of these data types.) This means that you can go directly to the Windows API
or third-party DLL for its functionality, although the preferred way is
through the layer of abstraction afforded by a class wrapper. For example,
Listing One is the VO equivalent of the SDK version of the ubiquitous "Hello
World" program.
VO comes with a library of predefined prototypes for the Windows API. You
simply specify the "Windows API" library in your application's search path
(the equivalent of making it available to the linker), and you can call the
API functions using the appropriate manifest constants as if they were any
function available to the system. There are no header files of prototypes to
manage, as they are precompiled as separate libraries. An import DLL prototype
differs only syntactically from the equivalent C/C++ prototype. Example 4
shows typical function prototypes found in the supplied Windows API system
library.


Calling Conventions


VO supports the calling conventions STRICT (C-style), PASCAL, Windows
CALLBACK, and CLIPPER. The default calling convention is either STRICT or
CLIPPER, depending on whether or not you strongly type the parameters in the
function declaration. 
The first three are the same as their C/Windows counterparts. CLIPPER,
however, is a proprietary calling convention required to support
CA-Clipper-style functions. This convention allows for flexible parameter
passing for traditional, weakly typed CA-Clipper functions. For example, the
caller of a function with a CLIPPER calling convention can pass a variable
number of parameters or skip parameters altogether (using a NIL value passed
as a place holder). This allows for code such as that in Listing Two. Note
that declaring a function with the CLIPPER calling convention does not
preclude its use of strongly typed variables or the typing of its return
value. However, you cannot strongly type its parameters; all parameters passed
are by definition weakly typed, polymorphic data types, which VO calls
"USUALs" (or "ANYs"). Listing Three, a more efficient version of the
ShowMessage() function in Listing Two, uses strong typing where it can, yet
maintains flexible parameter passing. 
Traditional calling conventions and VO's proprietary CLIPPER calling
conventions differ in two ways. 
VO function calls are resolved at run time via a symbol-table lookup. While
slower than a lexically resolved function, this allows flexibility (especially
for "data-driven" systems) since the decision of which function to call can be
deferred until run time. Even so, symbol-table lookup is done not by function
name, but by a 2-byte VO symbol (a special data type that is a direct
representation of a Windows atom) representation of the function name. The
lookup itself is accomplished by a hashing scheme that determines the code to
execute. 
Parameters passed to a CLIPPER function are not passed on the machine stack
but on a separate eval stack. Thus, the prologue code of a CLIPPER function
must call another internal run-time function to retrieve the parameters passed
rather than obtain them from the stack directly. Unless the function is called
repeatedly, this overhead is negligible, and the added flexibility that
dynamic late-bound functions add is worth the performance hit. 


Creating DLLs 


VO also supports the creation of DLLs for distribution. Due to many of its
dynamic capabilities, VO supports the creation of VO-only and foreign-hosted
(traditional) DLLs. There are no restrictions on the code that can be placed
in a VO-only DLL. Anything you can write with VO can be packaged with a click
of a mouse into DLL form, callable by any other VO application. VO-only DLLs
do not behave like typical Windows DLLs, however. Any application loading a
VO-only DLL gets a fresh copy of the DLL's data segment. This means VO-only
DLLs behave more like OS/2 DLLs, in that "static" data is not shared among the
DLL's clients.
VO's foreign-hosted DLLs, however, are typical Windows DLLs in that the data
segment is shared by all DLL users. But as you might expect, there are
restrictions on what code can be placed in these types of DLLs. Basically, you
cannot export functions with CLIPPER calling conventions. All export functions
must be traditional, strongly typed functions with STRICT, PASCAL, or CALLBACK
calling conventions because only another VO application can communicate with a
DLL export function with the proprietary CLIPPER calling convention.
Creating DLLs with VO is straightforward: There are no .DEF files to define,
nor do you explicitly declare functions as export. With VO, everything is
exported unless you tell it otherwise by qualifying the function or data
declaration as "STATIC," effectively scoping it to its own module. Turning a
sample project from an executable into a DLL involves toggling a radio button
and specifying the type and name of the DLL you wish to create. The DLL
creation process creates the physical DLL file as well as an import library of
VO prototypes for the exported declarations in the DLL. This import library
can be used by other VO applications to statically link the DLL, if necessary.
Alternatively, you can convert these generated prototypes to the equivalent
syntax in C/C++, Visual Basic, PowerBuilder, and so on to use a VO
foreign-hosted DLL in their respective languages.



Object Orientation


Like C++, VO is a hybrid language in that you can program in it with or
without using classes, mixing object-oriented and procedural styles as
necessary. However, VO implements classes differently from C++. For instance,
class methods are not declared within the class declaration proper. Methods
are declared and implemented as separate lexical units much like functions and
the VO's repository must associate them with the proper class. (See the
accompanying text box entitled "VO's Repository-Based Architecture.") The code
in Example 5(a), for instance, is stored as two separate (yet related)
"entities" in the system's repository. The build cycle automatically ties the
two together when required. The method declaration/implementation need not be
in the same library as the class declaration, although this is the norm. This
architecture lets you add a method to a class at any time without changing or
needing access to the source code for the class declaration. So, for example,
you could add the method in Example 5(b) to VO's abstract framework class
WINDOW in any module or library of my application; it would then automatically
be inherited by all WINDOW's descendant classes. 
To instantiate an object of a class, VO uses the object-instantiation
operators "{}". To invoke a method or access an instance variable of an
object, VO uses the send operator ":" as in Example 5(c).
VO classes do not have constructors in the traditional sense since the system
automatically allocates dynamic memory for each new object. However, class
designers can implement an Init() method to automatically call object
initialization whenever an object of the class is instantiated. Parameters can
be passed to the Init() method by including them between the {} operators; see
Example 5(c). Additionally, default values can be specified in the
instance-variable declarations for the class as compile-time constants; see
Example 5(a).
VO supports a hybrid message-binding model: Instance variables are usually
early bound, whereas methods are always late bound since they all use the
CLIPPER calling convention. This combination provides speed and flexibility
since the code within a method can run at native code speed, yet the decision
as to which method to call can be deferred until run time.
VO methods are implemented the same way as CLIPPER functions. Moreover, a
further optimization in the VO method-invocation implementation ensures that
once a method has been looked up, its code is not executed by a traditional
stack-based function call. Instead, it jumps directly to the code's entry
point with a jmp instruction, executes the code, and then jumps directly to
the original point of method invocation (not the launch point in the
message-lookup routine). This avoids the stack set up and clean up of a
typical function call. The result is fast, late-bound execution of all
methods. (On a Pentium 90, I can issue one million method invocations on an
object in just over two seconds.)
Inheritance in VO is clean and consistent, especially compared with C++. To
call a method, you must send a message to an object, whether you are inside a
class (in one of its methods) or outside. Notice in Listing Four the
consistency of the method and function calling syntax regardless of where it
takes place. Outside the class (in the Start function), the method is invoked
by issuing the send operator against the object variable and the function is
called directly. Inside a class method, a method invocation is issued by
sending a message, this time to self (a method's equivalent to C++'s this), or
super if you want to call up the chain of inheritance, and the
function-calling syntax remains unchanged. Compare this with equivalent C++
code, which uses a different calling syntax for both functions and messages,
depending on whether or not the call takes place inside a method.
As with C++, classes declared in VO are effectively user-defined types.
Therefore you can use a declared class as a legitimate type declaration for
your object variables. In Example 6, for instance, the first two variable
declarations are essentially equivalent. They both define a variable and
assign it a SomeClass object. Both variables are of a polymorphic (xBase)
type, so their type can be changed to any other data type by simply assigning
a new value at any time. The compiler cannot perform any type checking for
these variables since the variable type can change at any moment. Any
erroneous usage of these variables will only be found at run time via the
run-time type-checking code necessarily generated for such variables. The only
real difference between the two declarations is that the second explicitly
documents this intention. 
The AS OBJECT declaration tightens up the rules and allows the compiler to
generate more-efficient code. It also provides compile-time type checking to
ensure that only variables of type object are ever assigned to this variable.
There's still plenty of opportunity to misuse this variable, however, since
the compiler will allow any object (of any class) to be assigned to this
variable.
The fourth variable declaration is more restrictive still: Any descendant of
the ParentOfSomeClass base class can be assigned to the variable, but no other
object type. The most restrictive of all is the last declaration, which allows
only objects of type SomeClass (or its descendants) to be legal assignment
values to this variable.
VO's typing flexibility treats all methods as the equivalent of C++ virtual
methods in that the correct method implementation is chosen by the actual type
of the object (not the declared type) while still allowing the flexibility of
a more relaxed declaration. This eliminates any need to cast pointers to base
classes into pointers to subclasses to ensure the correct implementation of an
object is invoked. For example, VO's compiler allows the code in Example 7(a).

At first glance the assignment of a Dog{} object to a variable typed as Animal
might look like a type conflict. However, the compiler correctly reasons that,
semantically speaking, Dog is "a-kind-of" Animal, and can respond to the same
messages as an Animal object. The compiler is smart enough, however, not to
allow the reverse situation, and will generate a compiler error for code such
as that in Example 7(b), which could result in an undefined situation. This
flexible object-typing capability (along with each class's compiler-determined
flat-message table) eliminates the equivalent of C++'s internal virtual
pointer tables, yet still allows fast, type-safe, intuitive, and syntactically
clean code.
C++ developers traditionally use Get/Set methods to access or assign to an
object's protected or hidden member. These methods detract from the
informational value of the class-user's code because a method-only interface
cannot clearly indicate when and where the "state" of the object is being
queried nor set as clearly as a mixed-instance variable/method interface. 
VO's ACCESS/ASSIGN methods maintain the distinction between instance variable
and method. Moreover, this distinction can be maintained without sacrificing
the class's encapsulation. ACCESS/ASSIGNs provide safe access to a class's
instance variables without resorting to presenting the class user with a
method-only interface to the class. 


So What's Missing?


VO is not without its shortcomings. The language is full featured, but some
C++ features currently have no VO equivalents; typed pointers, for instance
(VO's pointer data type PTR is always a C/C++ void equivalent). C++ supports
function-pointer dereferencing, while VO only allows you to pass the pointer
off to C/C++ for dereferencing. C++ supports scoped methods, but all VO
methods are public. Nor does VO support multiple inheritance, unions, default
arguments, variable-length parameter passing for the C-style calling
conventions, compiler support for pure abstract classes (currently all classes
can be instantiated in VO), const declarations, and compiler pragmas.
But VO's greatest problems reside outside of the language: The environment
itself shows immaturity. The IDE, while functional, is not up to the standards
of more mature development environments, and the class-framework library (a
layer of VO classes interfacing with CommonView) can be awkward. In addition,
some of the tools you get with the environment don't measure up. For example,
VO's Window painter does not natively support VBXs (although third-party tools
allow VO to call any VBX and enable the VO Window painter to visually position
the controls on a form and generate the appropriate object-oriented code). The
tools also lack multiprogrammer support--the current version of VO only
supports single-user, local repositories. While you can import/export your
applications, this isn't a substitution for the ability of more than one
programmer to share a single repository of code (project). 


Conclusion


CA-Visual Objects is an alternative to C++ for large-scale development under
Windows. It boasts fast, native-code executables and an impressive language
well-suited to building both application- and system-level software. VO has
become my language of choice for general-purpose Windows development because
its power/productivity ratio is better than other languages I've worked with.
VO's Repository-Based Architecture
CA-Visual Objects addresses the complexities of GUI development via an active
repository known internally as "Adam." Adam represents a mass-storage system
that holds as independent pieces of data every aspect of all your current
projects--source code; object (compiled) code; menu, window, and icon
resources; database definitions; library functions and class prototypes;
define constants; and project settings.
The repository stores all the information necessary to describe the
relationships between the various components of your project. This "meta" data
is not required or used in your final application, but is necessary to
automatically manage the project in a way similar to the traditional make
utility. In this respect, VO offers the same kind of project management as
other development environments, letting you work with various components of a
project without dealing with the underlying files and dependency rules. But
since VO does not manage physical files the way make does, its
project-management capabilities exceed those of other systems, providing
faster build times.
Perhaps the single greatest difference between VO and other approaches is that
VO application components are not file based. With make utilities, your
dependencies are formed at the file level based on time and date stamps. In
VO, dependencies' granularity goes down to the "entity" level. In fact, VO has
no concept of a source file--everything is measured in lexical entities, much
smaller pieces of code that can stand alone as separate application
components. Since relationship information is tracked at the entity level
rather than at the source-file level, rebuilds are faster, leading to fast,
"incremental" compiles.
VO's source-code editor is linked to its parser such that source code is not
only color coded according to syntax, but also visually divided into separate
entities. Language constructs stored in the repository as separate entities
(segregated by the editor) include functions, procedures (functions without
return values), class declarations, methods, globals, and defines (run-time
constants). This means that if an application contains a define statement
(such as DEFINEx :=10) on which three functions are dependent, the compiler
will automatically flag the functions to be rebuilt if the define statement is
changed. The module can have dozens of entities, but only the four would be
flagged to be recompiled.
These relationships are not retained just within a given project. For example,
assume the define statement is stored in a separate project of type library,
and the library is included in another project's search path. Since both
projects are stored in the same repository, the repository can track
relationships across the projects; then, if the define in the library is ever
modified, the three functions that reference the define in the dependent
application are flagged for recompile. This fast, incremental compile--coupled
with VO's ability to test all applications from within the IDE itself--gives
the VO environment an edit/build/test cycle on par performancewise with
interpretive environments.
Since the VO repository is "all seeing," it can provide some useful
functionality. For example, since all code is stored in the repository, if you
type a function name into the source-code editor and click the mouse to choose
"expand prototype," VO will provide a template of the parameters of the
function--complete with type information. This is great for those who can't
remember the number, type, or order of parameters to functions.
The repository also provides browsing. Since the repository catalogs and
categorizes all the different types of entities it stores, tools can be built
to browse the contents of the repository on a per project basis or even across
the entire repository. VO comes with two such browsers: the general-purpose
Entity browser, which displays all of the entities grouped alphabetically
within entity type; and the Class browser, which displays in a hierarchical
fashion all classes and their associated methods and instance variables,
within the context of their class-tree hierarchy.
--R.d.S.
For More Information
CA-Visual Objects
Computer Associates International
1 Computer Associates Plaza
Second Floor
Islandia, NY 11788-7000
516-225-5224
Example 1: (a) Declaration of variables is optional, allowing for dynamically
created polymorphic variables; (b) nX and cY are local but still polymorphic.
(a)
FUNCTION SomeUDF x := 4 // dynamically create and initialize x ? x // print x
x := "Hello" // x can change type at anytime ? x // print xRETURN NIL
(b)
FUNCTION SomeUDF LOCAL nX, cY // lexical scoped variables nX := 4 ? nX cY :=
"Hello" ? cYRETURN NIL
Example 2: Strongly type VO code
FUNCTION SomeUDF AS VOID
 LOCAL wX AS WORD
 LOCAL cY := "Hello" AS STRING
 wX := 4
 ? wX
 ? cY

RETURN
Example 3: Dynamic data types.
FUNCTION ComputeSquares( wStart, wLimit AS WORD ) AS ARRAY
 LOCAL aSquares := {} AS ARRAY // empty array - we will grow it as necessary
 LOCAL i AS WORD
 // Loop to "grow" the array
 FOR i := wStart UPTO wLimit
 AADD( aSquares, i*i ) // add an element to the array
 NEXT i
 ? STR( i-1 ) + " squares have been computed!" // '+' concatenates strings
 WAIT
RETURN aSquares // array's memory will NOT be automatically
 // collected here (the normal case) since a
 // reference to it is being passed out to the caller
Example 4: Typical function prototypes in the VO-supplied Windows API system
library.
_DLL FUNCTION UpdateWindow( hWnd AS WORD ) AS VOID PASCAL:USER.124
_DLL FUNCTION UpdateWindow( hWnd AS WORD ) AS VOID PASCAL:USER.UPDATEWINDOW
Example 5: (a) Class declarations and methods are stored as two separate
"entities" in repository; (b) adding a method to VO's abstract framework class
WINDOW; (c) invoking a method.
(a)
CLASS SomeClass EXPORT wIVar1 := 0 AS Word // EXPORT same as C++ public
PROTECT cIVar2 := "Hello" AS STRING // PROTECT same as C++ protect HIDDEN
lIVar3 := TRUE AS LOGIC // HIDDEN same as C++ privateMETHOD SomeMethod CLASS
SomeClass // Each method declares the class it belongs to ? wIVar1, cIVar2,
lIVar3
(b)
METHOD NewWindowExtensionMethod CLASS Window // Do your extension stuff here
(c)
oWin := ShellWindow{ oOwner } // Instantiate a descendant of
WINDOWoWin:NewWindowExtensionMethod() // Invoke our newly added method
Example 6: The first two variable declarations are equivalent. 
CLASS SomeClass INHERIT ParentOfSomeClass
FUNCTION Start
 LOCAL oObj1 := SomeClass{}
 LOCAL oObj2 := SomeClass{} AS USUAL
 LOCAL oObj3 := SomeClass{} AS OBJECT
 LOCAL oObj4 := SomeClass{}AS ParentOfSomeClass
 LOCAL oObj5 := SomeClass{} AS SomeClass
Example 7: (a) VO's compiler will allow this code; (b) this code will generate
a compiler error.
(a)CLASS AnimalCLASS Mammal INHERIT AnimalCLASS Dog INHERIT MammalFUNCTION
Start LOCAL oRexx AS Animal oRexx := Dog{} ... 
(b)FUNCTION Start LOCAL oRexx AS Dog oRexx := Animal{} ...

Listing One
FUNCTION Start
 LOCAL wc IS _WINWNDCLASS
 LOCAL msg IS _WINMSG
 LOCAL l AS LOGIC
 LOCAL hwnd AS WORD
 
 // Create a Windows window 'class'
 wc.style := _OR(CS_VREDRAW, CS_HREDRAW)
 wc.lpfnWndProc := @MainWndProc() // pointers to functions 
 wc.cbClsExtra := 0
 wc.cbWndExtra := 0
 wc.hInstance := _GetInst()
 wc.hIcon := LoadIcon( _GetInst(), PSZ(IDI_APPLICATION) )
 wc.hCursor := LoadCursor( 0, PSZ(IDC_ARROW) )
 wc.hbrBackground := GetStockObject(WHITE_BRUSH)
 wc.lpszMenuName := PSZ("")
 wc.lpszClassName := String2Psz( "HelloWorldClass" )
 // Register class with Windows
 IF !RegisterClass(@wc)
 RETURN FALSE
 ENDIF
 // Construct an instance of the window class just created

 hwnd := CreateWindow(;
 String2Psz( "HelloWorldClass" ),;
 String2Psz( "Hello World!" ),;
 WS_OVERLAPPEDWINDOW ,;
 100, 100, 500, 500,;
 0, 0, _GetInst(),;
 _MAKEPTR(0,0))
 // Bail out if something went wrong
 IF hwnd = 0
 RETURN FALSE
 ENDIF
 // Show and paint the window
 ShowWindow(hwnd, SW_SHOWNORMAL)
 UpdateWindow(hwnd)
 // Begin looping for messages
 WHILE GetMessage(@msg, 0, 0, 0)
 TranslateMessage(@msg)
 DispatchMessage(@msg)
 ENDDO
 // We are done so unregister class and exit
 UnregisterClass (String2Psz( "HelloWorldClass" ), _GetInst())
FUNCTION MainWndProc (hwnd AS WORD, message AS SHORTINT,;
 wParam AS WORD, lParam As LONG) AS LONG CALLBACK
 // Minimal Window Procedure Posts a Quit message on
 // receiving the WM_DESTORY message
 IF message = WM_DESTROY
 PostQuitMessage(0)
 RETURN 0L
 ENDIF
 // All other messages are defaulted to Windows base behavior
 RETURN DefWindowProc(hwnd, message, wParam, lParam)

Listing Two
ShowMessage()
ShowMessage(,,"A message at the default location")
ShowMessage(100,100) 
ShowMessage(,400)
...
FUNCTION ShowMessage( nXCoord, nYCoord, cMessage ) CLIPPER
 // Default parameters not passed to intelligent values
 IF IsNil( nXCoord )
 nXCoord := 0
 ENDIF
 IF IsNil( nYCoord )
 nYCoord := 620
 ENDIF
 IF IsNil( cMessage )
 cMessage := "One Moment Please..."
 ENDIF
 // Code to display message goes here...
 ...

Listing Three
FUNCTION ShowMessage( nXCoord, nYCoord, cMessage ) AS VOID CLIPPER
 LOCAL wXCoord, wYCoord AS WORD
 LOCAL cMsg AS STRING
 // Default parameters not passed to intelligent values
 // Stored in strongly typed variables
 IF IsNil( nXCoord )

 wXCoord := 0
 ELSE
 wXCoord := nXCoord 
 ENDIF
 IF IsNil( nYCoord )
 wYCoord := 620
 ELSE
 wYCoord := nYCoord
 ENDIF
 IF IsNil( cMessage )
 cMsg := "One Moment Please..."
 ELSE
 cMsg := cMessage
 ENDIF
 // Code to display message using strongly typed variables goes here...
 ...

Listing Four
CLASS BaseClass
METHOD SomeMethod CLASS BaseClass
 ...
CLASS DerivedClass INHERIT BaseClass
// Override SomeMethod of BaseClass
METHOD SomeMethod CLASS DerivedClass
 ...
 self:SomeOtherMethod() // methods always require a message send
 ...
 SomeFunction() // consistent function calling syntax 
 ...
 super:SomeMethod() // call base class method again with a message send
FUNCTION Start
 LOCAL oObj := DerivedClass{} AS DerivedClass
 oObj:SomeMethod() // methods always require a message send 
 SomeOtherFunction() // consistent function calling syntax 





























PowerBuilder NVOs


Useful tools for creating OO apps 




Mark Robinson


Mark is a PowerBuilder consultant with Toronto-based Data Management
Consultants (DMC), specializing in the delivery of custom Oracle solutions.
Mark can be contacted on CompuServe at 75462,422.


With so much emphasis placed on the visual aspects of PowerBuilder's native
objects, little attention has been given to Non-Visual User Objects (NVOs).
Although they may be PowerBuilder's most useful tool for creating truly
object-oriented applications, NVOs are rarely used effectively and seem poorly
understood. PowerBuilder itself is not inherently object oriented, but it
allows you to develop applications using procedural, object-oriented, or
cross-combined methodologies. 
Increasingly in client-server computing, companies are creating
database-level, entity-oriented data objects to provide consistent, uniform
interaction with data sets from within applications. NVOs can be used to
segment applications along functional or task-oriented boundaries as well as
along data boundaries. If properly implemented, data-oriented NVOs also
provide a logical interface on the application-development side. In this
article, I'll examine the effective use of NVOs and their role in application
development. 


Anatomy of an NVO


NVOs are composed of instance variables, functions, structures, and events.
PowerBuilder provides only two predefined events--the Constructor Event and
Destructor Event--but you can create additional user-defined events. However,
the calling arguments that can be supplied when an event is triggered are
limited in number and complexity, so the preferred method of communication
with NVOs is via function calls.
The Constructor Event is triggered as the object is being instantiated and can
be used to initialize attributes or any other necessary operation at startup.
Make sure that the application has initialized everything the user object
requires before the object is created. Similarly, the Destructor Event is
triggered as the object is being destroyed, allowing the user object to clean
up after itself if necessary. You should always explicitly destroy any objects
that you create at run time, if only to exert greater control over the
behavior of the instantiated objects.
Typically, the instance variables contained within the NVO will be declared as
private or protected variables. This means that their scope is limited to the
object that contains them and they cannot be accessed by scripts that are
external to the NVO. This method can be frustrating, but it is extremely
important to maintain the NVO's self-reliance. If the value of an instance
variable is required outside the object, then its contents should be available
through a function call. Generally, the more functions and variables available
to external objects and scripts, the harder the object and application are to
maintain. The internal activities of the NVO will likely change over time, so
the less the outside world knows about them, the better.
Access to functions and data within an NVO can be organized in two ways. The
first method is to directly access the NVO's functions and data stores; see
Listing One. However, this makes the NVO's internal workings available to
external scripts, thus defeating the NVO's long-term purpose.
The second method obscures the NVO's inner activity by providing a
single-entry dispatch service within the NVO, as demonstrated by Listing Two.
The NVO receives instructions through a single entry point, from which it
calls the functions necessary to perform the operation. This allows the NVO to
evolve without requiring that scripts be rewritten in every application that
uses it. The drawback of this approach is that many NVOs deal with several
different and complex data types, and it is difficult to move many different
types of data through a single pipeline-style interface. To deal with this,
you can either convert data into a standard data type (such as a string or
PowerObject) and parse the variable from within the NVO, or provide multiple
dispatch functions to handle the different data types. You must decide which
approach is most logical on a case-by-case basis.
Listing Two uses the function f_Method() to send instructions to the
customer-manager object, CustMgr, based on a predefined method identifier
called find and the qualifying string sCustKey. The number of qualifying
arguments varies according to the individual need of the NVO; one qualifying
argument satisfies a significant percent of the requirements.
In Listing Two, the script that calls the CustMgr NVO does not know how
CustMgr will resolve the request, how the data is stored internally, or where
the data is stored. The f_Query() function works on the same premise as
f_Method(). The calling script relies on blind faith, and assumes that its
request will be processed. Listing One forces the internal workings of the
CustMgr NVO to remain static, while Listing Two only requires that the
dispatch services remain static. The way in which the requests and queries are
satisfied is hidden from the calling script. Table entities and attributes
could be renamed or restructured, and the only maintenance required would be
to update the CustMgr NVO and regenerate the application.


Using NVOs in Class Libraries 


In developing a truly object-oriented class library, NVOs play an important
role by encapsulating functions and data into discrete, reusable objects.
Typical functional NVOs would be a menu security manager, message manager,
window manager, and perhaps an external-function API manager. While these NVOs
would be generic for any company, data-aware NVOs are company specific.
Typical data-aware NVOs might be a customer-profile manager or an inventory
manager.
A class library comprises several layers of objects, starting with the most
generic and progressively becoming more specialized; see Figure 1.
The NVO is not used as a base object from which application-specific objects
are created, but rather to extend the library's functionality. The NVO is
generally used in its original form, but its operation can be customized
within an application through inheritance.


NVOs as Functional Task Managers


A Functional Task Manager (FTM) is a discrete unit of processing logic that
contains all the knowledge necessary to perform a task. Typically, FTMs are
business-independent units that provide applications with value-added
capabilities such as access security, drag-and-drop, and serial
communications. The complexities of the particular task are hidden from
you--they are taken care of by the packaged routines of the FTM. A side
benefit of this application architecture is that you don't need to master
everything. With a little knowledge about serial communications, for example,
you can embed a serial-communications FTM into an application and quickly work
on the application-specific scripts, as opposed to delving into the mysteries
of serial communications under Windows. 
Typically, FTMs are layered. The lowest-level functions perform small and/or
implementation-specific tasks. Each new level has progressively more-generic
tasks, finally bundling the lower-level tasks into a form that is usable by
application developers. It is possible to accomplish this layered effect in
one of two ways: by coding the layering within a single NVO or by using
inheritance to progressively build more-specific FTMs from FTMs that are more
generic. But using inheritance for its own sake adds unnecessary complexity
and overhead. If several FTMs are based on a similar low-level function set,
then inheritance is the perfect choice to create different FTM classes. If the
FTM stands alone as a single class, however, multiple levels of inheritance
merely create excess baggage. 


NVOs as Data Object Managers


A Data Object Manager (DOM) is to Data Objects what FTMs are to tasks. In its
simplest form, a Data Object is a business-dependent entity or group of
entities--such as Customer, Supplier, or Component--with an attached set of
methods for manipulating it. These methods fully define all operations that
may be performed on the given data set. All applications must access the Data
Object via the methods associated with it and must fully comply with its
rigorous rules. These rules are designed to protect the integrity of the
underlying data and provide a consistent interface from any application.
Generally, most attention given to Data Objects applies only to the back-end
DBMS, where the methods are implemented as stored procedures and triggers. NVO
DOMs extend this concept to the front-end development tool. The back-end DBMS
enforcement is still required; NVO DOMs make the front end consistent with the
back end. The concepts illustrated in Listing Two apply equally to Data
Objects. 


The NVO in Action: Browsing PowerBuilder Libraries


The uo_ObjManager NVO is an FTM that allows you to browse library lists and
directories. It provides code for choosing objects from a PowerBuilder
library; their subsequent manipulation is up to you. Example 1 presents the
naming standards and conventions I've used.
The uo_ObjManager object contains several embedded functions; see Figure 2. It
reads the PB.INI file to determine the current application and library. It
then locates the current library list and produces a list of object names
stored in a given library.

Built around this NVO is a window that manages the user interface to the
library and object lists. (The code and events are not included in this
article.) Figure 3 is a simple UI that could be used to select a PowerBuilder
object.
Both FTMs and DOMs must be instantiated at run time, generally during the
Application Open event or the Open Event of an MDI Frame window (in the case
of an MDI application). They are instantiated using the CREATE statement and
usually assigned to a global variable of that type. The DESTROY statement
eliminates the object and invalidates any references to it. Example 2 details
the declaration, instantiation, and destruction of NVOs.
The uo_ObjManager NVO is instantiated into the global ObjMgr and initialized
during the Application Open Event; see Listing Three. In the Open Event of the
window, w_main, the library list DataWindow is populated by instructing ObjMgr
to create a library list and then requesting its value; see Listing Four. As
the user clicks on a particular library, the program reads the directory of
all objects contained in that library and displays them in the dw_ObjList data
window; see Listing Five.
The ObjMgr NVO reduces the UI to a generic list-of-values handler--no
intrinsic knowledge about PowerBuilder libraries or objects is embedded within
the window. Thus, this sample application is segregated along the functional
boundaries of producing the lists and managing the UI. 
As Example 3 shows, ObjMgr has several private instance variables for storing
results generated by the creation of library lists and object lists. ObjMgr
contains six functions: two public and four protected. Table 1 describes the
available function-access levels and their impact on design.
ObjMgr can process four methods:
Initialize, to retrieve and store information about the current application
and its library.
BuildLibraryList, to create a list of libraries associated with the current
application and store it in a format compatible with the ImportString
function.
SetLibDirType, to allow the calling scripts to alter the search criteria as
needed. (ObjMgr stores the current search-object type internally.)
ReadLibrary, to accept a library name and create a list of all objects
contained within it that match the object type specified by the SetLibDirType
method. The object list is stored internally.
The f_Method function, which dispatches the four methods accordingly (see
Listing Six), acts upon requests to perform some action and returns the
success of the operation. The f_Query function acts upon requests for
information stored within the NVO. It is set up as a dispatch function similar
to f_Method, but it returns the value of the instance variables directly and
does not rely on any supporting functions. The f_Query function, detailed in
Listing Seven, supports the following requests:
LibraryList, which returns the value of the current library list stored in the
instance variable isLibraryList.
ObjectList, which returns the internal directory listing of the previously
supplied library name and library-entry type. The object list is stored in
isObjectList in a form compatible with the ImportString function.
Once a request has been dispatched by f_Method, the NVO is free to solve it.
PowerBuilder stores the library lists for each defined application in its
PB.INI file. The default application is identified by two profile strings:
APPNAME, which identifies the application; and APPLIB, which identifies the
library that holds the application. The library lists are stored in PB.INI as
$library1(application)=library1;..;libraryn. Once the application name and
library name are determined, they are concatenated and the library list
profile string is read from the PB.INI file. The f_IdentifyApplication
function, which is associated to the Initialize method, appears in Listing
Eight.
Once the application is identified, the library list can be retrieved from the
PB.INI file by reading a profile string. (You can then format it into a
PowerBuilder list that can be imported into a data-window object.) The
libraries in the list are separated by semicolons that must be replaced with a
Tab and Line Feed combination to be compatible with ImportString; see Listing
Nine.
Once created, the library list can be displayed to the user. To retrieve a
list of library objects that match a specific type, the user simply selects a
library. The f_ReadLibrary function reads a specific library and creates a
list of objects. PowerBuilder's LibraryDirectory function returns each
directory entry as object name-{tab}datetime{tab}comments{linefeed}. The
f_ReadLibrary function truncates the date, time, and comments from each entry
and stores the resulting string in isObjList; see Listing Ten.
A complication of the single-entry pipe-line is dealing with many different
data types. One solution is to convert them all to a single data type. This is
not always possible, but it will solve most conflicts. Listing Eleven
illustrates the conversion between an enumerated data type and a string.
PowerBuilder's LibraryDirectory function requires an enumerated LibDirType
variable indicating the type of object to be returned. The f_SetLibDir-Type
function in Listing Eleven translates string identifiers into enumerated
LibDir-Type values.


PowerBuilder Version Differences


With the introduction of PowerBuilder 4.0, PowerSoft has added more predefined
user-object classes. NVOs can be either Custom or Standard class objects. The
Standard class objects are based on predefined PowerBuilder object classes
such as Message, Pipeline, Error, and Transaction. The techniques I've
discussed here should be implemented using the Custom object class.


For More Information


PowerBuilder 4.0
Powersoft Corp.
561 Virginia Rd.
Concord, MA 01742-2732
508-287-1500
Figure 1: Components of a layered class library.
Figure 2: Embedded-function list for uo_ObjManager.
Figure 3: Object-browser UI.
Example 1: Naming conventions and standards: (a) scope; (b) type; (c)
VariableName.
(a)
g=Global variables s=Shared variables i=Instance variable Local-variable scope
indicator is left blank.
(b)
s=String l=Long i=Integere=Enumerated
(c)
sLibraryList is an instance string variable.eLibDir is a local enumerated
variable.
Example 2: Declaring, instantiating, and destroying an NVO. (a) Declaration of
ObjMgr in the global declarations window; (b) instantiation of ObjMgr in
Application Open event; (c) destruction of ObjMgr in Application Close event.
(a)
// declare place holder for instance of uo_objmanageruo_objmanager ObjMgr

(b)
// create instance of uo_objmanagerObjMgr = CREATE uo_objmanager

(c)
// remove instance of uo_objmanagerDESTROY ObjMgr
Example 3: ObjMgr private instance variables.
Private string isAppName // currently defined application
Private string isAppLib // currently defined library containing application
Private string isAppDir // directory path to isAppLib
Private string isPBiniFile // full path & name to PB.INI
Private string isLibraryList // current library list
Private string isObjectList // current object list

Private LibDirType ieLibDirType = DirAll! // default object type to browse
Table 1: Object functions can be declared with one of three access attributes.

Access DescriptionAttribute 
Public Least restrictive level of function access; available
 from any script within the application. If the
 function is called from a script external to the
 object in which it was declared, it must be referenced
 in a fully qualified manner such as
 ObjMgr.f_Method().
Protected Medium level of function access; called from
 scripts within the object or descendant object in
 which the function was declared. If the object was
 inherited to create a new class, all scripts within
 the object can still reference the function in the
 ancestor.
Private Most restrictive level of function access; same as
 protected access except that the function cannot be
 called from descendant scripts. The function is
 hidden from all scripts outside the exact class in
 which it was declared. Generally, only one or two
 functions in an NVO should be publicly accessible.
 Use of protected versus private access depends on the
 intent of the object. Private access limits all new
 object classes to using ancestor functionality as
 originally intended. Protected access allows new
 object classes to completely re-invent the behavior
 of the object and should be used with caution.

Listing One
// Find customer profile identified by sCustKey
IF CustMgr.f_FindCustomer(sCustKey) THEN
 // cannot find sCustKey
 RETURN
END IF
// Get customer occupation
sCustJob = CustMgr.CustProfile.sOccupation

Listing Two
// Find customer profile identified by sCustKey
IF CustMgr.f_Method("find", sCustKey) THEN
 // cannot find sCustKey
 RETURN
END IF
// Get customer occupation
sCustJob = CustMgr.f_Query("profile", "occupation")

Listing Three
// create an instance of uo_ObjManager
ObjMgr = CREATE uo_objmanager
// instruct it to initialize itself
ObjMgr.f_Method('Initialize', '')
// open the browsing window
Open(w_main)

Listing Four
// instruct ObjMgr to create an internal library list
ObjMgr.f_Method('BuildLibraryList', '')
// request the library list and import it into dw_LibList
dw_LibList.ImportString(ObjMgr.f_Query('LibraryList', ''))


Listing Five
long lRow
// check for valid row selection
lRow = This.GetClickedRow()
IF lRow > 0 THEN
 // update library selection
 This.SelectRow(0, FALSE)
 This.SelectRow(lRow, TRUE)
 // create an internal list of objects contained in selected library
 ObjMgr.f_Method('ReadLibrary', This.GetItemString(lRow, 'selection'))
 // refresh destination datawindow, dw_ObjList
 dw_ObjList.Reset()
 // request the object list and import it into dw_ObjList
 dw_ObjList.ImportString(ObjMgr.f_Query('ObjectList', ''))
END IF

Listing Six
// Boolean f_Method(string sMethod, string sQualifier)
// Dispatch methods identified by sMethod. sQualifier provides
// additional information for individual methods.
CHOOSE CASE Lower(sMethod)
 CASE 'initialize'
 RETURN f_IdentifyApplication()
 CASE 'buildlibrarylist'
 RETURN f_BuildLibraryList()
 CASE 'readlibrary'
 RETURN f_ReadLibrary(sQualifier)
 CASE 'setlibdirtype'
 RETURN f_SetLibDirType(sQualifier)
 CASE ELSE
 RETURN False
END CHOOSE
RETURN True

Listing Seven
// String f_Query(string sQuery, string sQualifier)
// Provides feedback to requests for internal information.
CHOOSE CASE Lower(sQuery)
 CASE 'librarylist'
 RETURN isLibraryList
 CASE 'objectlist'
 RETURN isObjectList
 CASE ELSE
 RETURN ""
END CHOOSE

Listing Eight
// Boolean f_IdentifyApplication()
// Reads current application from PB.INI file.
long lPos1, lPos2
string sPBPath
// Get path for PB.INI 
sPBPath = ProfileString ("WIN.INI", "POWERBUILDER", "INITPATH","")
// Build full filename for PB.INI
isPBiniFile = "PB.INI"
IF sPBPath <> "" THEN
 IF Right(sPBPath, 1) <> "\" THEN
 sPBPath = sPBPath + "\"

 END IF
 isPBiniFile = sPBPath + isPBiniFile
END IF
// Get Application name and main library from INI file
isAppName = ProfileString(isPBiniFile, "APPLICATION", "APPNAME", "")
isAppLib = ProfileString(isPBiniFile, "APPLICATION", "APPLIB","")
// separate path prefixed to application library
isAppDir = ""
lpos1 = 0
lpos2 = Pos(isAppLib, "\")
DO WHILE lpos2 > 0
 lpos1 = lpos2
 lpos2 = Pos(isAppLib, "\", lpos1 + 1)
LOOP
IF lpos1 > 0 THEN isAppDir = Left(isAppLib, lpos1) 
RETURN True

Listing Nine
// Boolean f_BuildLibraryList(). Reads current library list and formats it 
// to be compatible with the ImportString() function.
long lPos1
// Get the library path from PB.INI for the specified application
isLibraryList = ProfileString(isPBiniFile, "Application","$" +&
 isAppLib + "(" + isAppName + ")", "")
lPos1 = Pos(isLibraryList, ";")
DO WHILE lPos1 > 0
 isLibraryList = Replace(isLibraryList, lPos1, 1, "~t~n")
 lPos1 = Pos(isLibraryList, ";", lPos1 + 2)
LOOP
RETURN True

Listing Ten
// Boolean f_ReadLibrary(sQualifier). Reads current library based on 
// preset object type and creates list of objects found.
string sObjList
Long lPos1, lPos2, lDirLen
isObjectList = ""
sObjList = LibraryDirectory (sQualifier, ieLibDirType)
// For each entry in a LibraryDirectory listing,
lDirLen = Len(sObjList)
lPos1 = 1
DO WHILE lPos1 < lDirLen
 // Locate first tab separator
 lPos2 = Pos (sObjList, "~t", lPos1)
 // Peel object name & append to object list
 isObjectList = isObjectList +&
 Mid(sObjList, lPos1, lPos2 - lPos1) + "~t~n"
 // Advance to start of next directory item
 lPos1 = Pos (sObjList, "~n", lPos2) + 1
LOOP
RETURN True

Listing Eleven
// Boolean f_SetLibDirType(string sQualifier). 
// Identifies library directory type.
CHOOSE CASE Lower(sQualifier)
 CASE 'dirall'
 ieLibDirType = DirAll!
 CASE 'dirapplication'

 ieLibDirType = DirApplication!
 CASE 'dirdatawindow'
 ieLibDirType = DirDataWindow!
 CASE 'dirfunction'
 ieLibDirType = DirFunction!
 CASE 'dirmenu'
 ieLibDirType = DirMenu!
 CASE 'dirstructure'
 ieLibDirType = DirStructure!
 CASE 'diruserobject'
 ieLibDirType = DirUserObject!
 CASE 'dirwindow'
 ieLibDirType = DirWindow!
 
END CHOOSE
RETURN True
DDJ














































PROGRAMMING PARADIGMS


Getting Wired on HotJava




Michael Swaine


In August, DDJ published the article "Java and Internet Programming," by
Arthur van Hoff. Java is a "simple, object-oriented, distributed, interpreted,
robust, secure, architecture neutral, portable, high-performance,
multithreaded, and dynamic language," as Sun's documentation succinctly
buzzifies it. In the same month, Dr. Dobb's Developer Update carried a stellar
article on Java and the Java-based Web browser, HotJava, entitled "Net Gets a
Java Buzz," by Ray Valds. (Both articles, by the way, are available online at
http://www.ddj.com. The official Web site for info on Java and HotJava is
http://java.sun.com. But I'd also recommend Ray's Java Jive page:
http://www.dobbs.com/dddu/java.html.) 
Java and HotJava are getting a lot of deserved attention these days, and not
just in DDJ. Lately, I, too, have been studying Java--especially HotJava (for
an absurdly limited purpose). Thus this.


You Want Some Hot Java, Mister?


Right now, HTML, in one version or another, is the de facto standard for
Web-page development. Granted, opinions differ about which HTML ought to be
(or is already?) the standard. While an open standards committee (World Wide
Web Consortium) pounds out the details of different versions and levels of
HTML, one company has been unilaterally extending HTML in ways of which not
everyone approves. It is significant that this company's Web browser is used
by more users than all the others put together. And that the company was
cofounded by the lead developer of the first widely used Web browser, Mosaic.
And that most of the Mosaic developers now work for that company. 
The company in question is, of course, Netscape Communications, and its
extensions to HTML arouse strong feelings among certain excitable types. A
remarkable number of Web-page developers see fit to include unpaid
advertisements for Netscape in their Home pages, often pestering their
visitors to go out and get the Netscape browser so they can see the neat
effects included on these wonderful pages. Then there are those who take a
less favorable view of Netscape's extensions, like the maintainer of The
Enhanced for Netscape Hall of Shame at
http://www.europa.com/~yyz/netbin/netscape_hos.html.
But....
I subscribe to the HTML Author's Guild list, and there is discussion on that
list about changing the group's name to something less language dependent. The
Web Author's Guild, maybe? Within the Web-authoring community there is a clear
perception that HTML may be transitory. What might supplant HTML, in their
opinion? A virtual reality markup language, some think. Some
executable-content markup language, most think. They mean the markup language
of HotJava. The markup language supported by HotJava is a superset of HTML.
Its chances to become some sort of standard were markedly enhanced in May
when, you guessed it, Netscape announced that it would support HotJava's
extensions to HTML in its browser. This will be easy, since HotJava implements
nearly all of Netscape's extensions to HTML. It skips a couple of them and
adds exactly one of its own. Let's see: HotJava, the hot browser of the
moment, and Netscape, the runaway market leader, will support essentially the
same vocabulary. The World Wide Web Consortium is defining an HTML 3.0 that
will probably include more or less this same vocabulary. It's beginning to
look like the only real issue in dispute is whether or not text should be
centered on a Web page. I vote yes.


You Want a Hot Meal with That?


I'm incorporating support for the HotJava extensions in the HyperCard-based
Web-page editor "HoTMeaL" that I've been writing to keep myself on the cutting
edge of software development (thassa joke, son). It should be in beta by the
time you read this (see "Availability," page 3), or drop me a line at
mswaine@cruzio.com with the subject "HoTMeaL," if for some odd reason you're
interested. (The editor may or may not keep the name HoTMeaL, which is a play
on HoTMetaL Pro, a popular HTML editor from SoftQuad, as well as on HotJava.
You want a hot meal with your hot java? I will not be offended if you do not
honor the weird capitalization in your e-mail.)
I warned you that my interest in HotJava was for an absurdly limited purpose,
and I'd better confess just how limited it is: Web browsers interpret HTML
tags, which are derived from SGML. Some tags define the format of the text,
like the boldface tag (<B> this copy bold </B>) or the code tag, which is used
to format code and is usually interpreted with a fixed-width font (for
example, <CODE> public final class Hashtable { </CODE>). Some tags signal
links to other Web pages or graphics files: <A
HREF="HTTP://WWW.HogFarm.swainesworld.com"> Mike's Home Page </A> <IMG
SOURCE="MikePict.JPEG">. To this, Java Web-page authoring adds one tag--the
APP tag. It specifies an applet, a Java application, and it looks a lot like
the HTML tags that embed graphics in Web pages:
<APP
CLASS="ClassName"
SRC="URL"
ALIGN=alignment
WIDTH=widthInPixels
HEIGHT=heightInPixels
 AppletSpecificAttribute=aValue
.>
The things on the left side of the equal signs are attributes; those on the
right are their assigned values. ClassName is the name of the applet's
subclass and is the only required attribute. An applet can require zero or
more AppletSpecificAttributes. The value for the SRC attribute tells where to
find the applet, much as the SRC attribute for the IMG tag tells where to find
the GIF or JPEG file, or the HREF attribute for the A tag tells where to find
the linked page. In the case of the APP tag, though, the SRC attribute
specifies not a static image or page, but executable code, and this makes all
the difference. This one tag changes the whole experience of the Web. HTML has
its hot links, but the APP tag makes a Web site truly hot.
With HoTMeaL, I've added the Java APP tag and its attributes to the lists of
tags you can drop into your documents. 
HTML tags in HoTMeaL reside in lists in a library, and you can select which
tag list you want to work with today. Each list corresponds to an HTML
vocabulary. I've implemented the Java tag vocabulary as one of these lists, so
you can decide to work with the Java vocabulary for one project, then click a
couple of buttons and be working with basic HTML 2.0 or with the Netscape
extended HTML vocabulary for a different project. And if a new version of Java
adds new tags, users should be able to upgrade HoTMeaL to use these just by
pasting some text into a field. 
That's it. That's all I've done. I don't offer any support for programming
Java apps (way beyond my poor powers), and although HoTMeaL includes a crude
browser or previewer, it doesn't know from APP tags. Clearly, I didn't need to
read all the Java and HotJava documentation just to do this. All I needed to
read was the page that defines the syntax of the APP tag. 
But be warned: Many editing tools will soon be announcing support for Java or
HotJava. Some of them may mean as little by this trendy claim as I do.


Sugar or Sweet-n-Low?


Another nifty aspect of the HotJava experience: Every HTML browser has a fixed
set of capabilities. HotJava doesn't. Support for new protocols and formats
can, in principle, be downloaded. Basic functionality can be extended by
grabbing executable content over an Internet connection. This is cool. 
HTML browsers come with a certain amount of sugar. You can choose the browser
with the sugar you want, but when, say, Sweet-n-Low-enhanced Web pages come
along, the old browsers won't be able to handle the new sweetener. HotJava, in
principle, will. (Actually, users of my editor can--in principle--download a
new HTML vocabulary, but that's only because HTML tags are just text. HotJava
lets the user actually change the program's code by downloading.)
All of this downloadable functionality raises some deep questions regarding
security. When you import code fragments across the net and execute
them--which is exactly what HotJava is all about--you are inviting security
problems. Dave Winer's "Aretha" package (formerly known as "Frontier"), which
contains a scripting language, and which is currently being Internetified,
took some heat this summer for just that. HotJava and Java were both designed
with this problem in mind, and have a lot of security features built in. But
not enough to prevent a sufficiently talented and motivated cracker from
breaching system security, you can be sure.
To be fair, the primary security issue here is probably inherent in the Web
itself; HotJava just makes it more obvious. I predict interesting times.


A Pretty Good(TM) Idea


Van Hoff wraps up his DDJ article with something slightly tricky about
security: "In the future, Java will also provide features for signing code
using public-key encryption. This will allow the secure exchange of Java code
with trusted partners over the Internet."

It's the "trusted partners" business that's a little tricky, as I recently had
pounded into my head. I just installed a public-key encryption system, in the
form of Phil Zimmermann's Pretty Good Privacy (PGP), on my main machine.
Anyone who wants to send me secure messages can e-mail me at
mswaine@cruzio.com using my public key. It follows.
User ID: Mike Swaine <mswaine@cruzio.com>
Key size: 512
Creation date: 1995/07/25
Key ID: 4CF1F515
Key fingerprint: F0 DA AB 42 77 93 F4 D1 AB
 E2 BB 3E A3 75 A6 22
Publishing your public key in a magazine has disadvantages, but it has one
major advantage: keyholder identification. Let me explain.
Somehow, the codesmiths of World War II got along with single-key encryption
systems. Such systems don't differ in principle from the letter-substitution
ciphers we learned as kids, or the code Sherlock Holmes cracked in The Valley
of Fear, in which numbers indicate the ordinal position of the target word in
a particular book. In such codes, there is a single key--the algorithm "shift
forward one letter," or the name of a book, or a large number used to
transform one string of bits into another. The last of these is the idea
behind DES, the U.S. Federal Data Encryption Standard. With a single-key
system, one key is used to both encrypt and decrypt the message, and the
sender and receiver must both have the key. 
The catch-22 of single-key systems is, how do you ensure that the sender and
receiver both have the key? If a secure channel exists for the sender to send
the key to the receiver, then they don't need the key. If there is no secure
channel, then how do they communicate the key?


The Advantage of Going Public


Public-key systems seem to solve the problem. These systems use a pair of
keys, mathematically generated, with the following nice properties: Any
message encrypted with one can only be decrypted by the other, and neither key
can be derived from the other. This solves the catch-22 of single-key systems.
I can publish my public key to the world, keeping the private key secret. Now
anyone can send me a private message by encrypting it with my public key, and
only I will be able to decrypt it.
As an added benefit, the public-key system gives you digital signatures. I can
encrypt a message with my secret key and send it to you. This gains me nothing
in security: Anyone with my public key can decrypt the message. What it does
get me, though, is authentication: If you can decrypt the message with my
public key, then it has to have been encrypted with my secret key, which I
guard as jealously as my automatic-teller PIN number. Therefore, the message
must have come from me. Absolute identification. A digital signature. Better
than DNA.
Digital signatures and encryption can be combined, so that I can send you a
message that only you can read, and that you can be sure could only have been
sent by me. The codesmiths of WW II would have loved public-key encryption for
solving the catch-22 of single-key systems. Unfortunately, it has a catch of
its own.
Yes, I can send you messages and be sure of their privacy as long as I have
your public key. Yes, you can send me messages and I can be sure they're from
you as long as I have your public key. And yes, you can send me that public
key in a message over an unsecure channel. But how do I know the message is
from you?
If I had your public key, I could verify that it was from you. But until I
have your public key, how can I be sure that you're not somebody else
masquerading as you? In short, how do you send me your public key in a way
that I can be sure it's yours? Now, if we had a secure channel....
In fact, the best method that anybody has come with yet for certifying a
public key as indeed belonging to person X is this: Have somebody who knows
X's voice call him or her up and verify the key. That's it. Oh, there are
more-sophisticated-sounding systems of certification, but they all amount to
the same thing: Somebody who can vouch for the person has to assure you that
the key you have is indeed that person's key.
Thus the phrase "trusted partners."
It all seems just a bit crude.
In our situation (yours and mine), you, the DDJ reader, wanting (assuming you
did want) to send me, the DDJ editor-at-large, a secure message, it's not so
clunky. Unless someone has gone to the trouble of publishing a bogus issue of
DDJ and replacing your copy with this counterfeit, PGPassures you that the
message you are reading right now is from me, and that the key I've supplied
is really mine.
So, to establish that your public key is really yours, all you have to do is
publish it in your column in a nationally or internationally distributed
magazine. As described, it's a solution of limited applicability, but it is a
solution. Generalizing it to the non-magazine-columnist population is left as
an exercise for the reader.


You Can Call Him "Zimmy" 


Perhaps you know the Phil Zimmermann story.
Zimmermann is a modest man who has written a modest account of his trials and
tribulations in developing a piece of software that goes by the modest name of
"Pretty Good Privacy." He's a peculiar sort of modest man, though: the sort
who would trademark the phrase Pretty Good. A modest man for the '90s. Where
he's written this modest account is in The Official PGP User's Guide (MIT
Press, 1985). The trials and tribulations he's had probably include the usual
ones that beset any nontrivial software-development project, but he doesn't
talk about those. Phil has, you see, the additional trials and tribulations
that come with being under investigation by the U.S. Federal government for
illegal arms exporting. Because, you see, PGP is encryption software, and to
the Federal government, encryption software is the same as armaments, and
domestic electronic publication is the same as export. And unless you're a
Marine colonel, you can go to prison in the U.S. for illegally exporting arms.
An equally engaging section of the PGP manual is the stirring foreword by John
Perry Barlow, who in four-and-a-half pages manages to touch on most of the
major political and social and cultural issues raised by the universal
availability of secure cryptographic software, including the potential to make
the paying of taxes optional.
By the way, the manual also explains how to use the product.





























C PROGRAMMING


MidiFitz and Windows 95




Al Stevens


This is the October column, which I am writing in July. Next month Windows 95
will be officially released. I can say that with certainty; I now have a copy
of the to-be released CD-ROM. It is installed and works okay. Bill Gates and
Shaq O'Neal launched it on TV.
For years I was devoted to command-line user interfaces. I begrudgingly used
Windows because of its superior applications for writing--word processors and
e-mail, mainly. But to manage the computing environment--copy files, delete
them, make subdirectories, format diskettes, make backups, keep track of
everything--I used MS-DOS, sometimes by itself and other times from a Windows
DOS box. MS-DOS is easier and more intuitive than the File Manager. I worked
that way with Windows 3.0, 3.1, Windows for Workgroups 3.11, and Windows NT
3.5. Other PC users concur. They mostly use Windows to load and run graphical
applications and MS-DOS to maintain their systems.
I've been a Windows 95 beta tester since it was called "Chicago." My forays
into MS-DOS became less frequent as I learned more about Windows 95. There is
little, if anything, that works better in MS-DOS. The only exceptions are
applications that do not run under Windows, and I don't use many of them
anymore. Most DOS game programs, which once required a dedicated machine--not
a DOS box--run nicely in a Windows 95 DOS box.
In a previous column, I mentioned the artistic quality of Windows 95 beta
CD-ROMs. The build-490 disc is the nicest one so far. It is a pleasant forest
green, but its distinguishing characteristic is not its color, but its
distinctive wintergreen aroma. Something like unused kitty litter. This was no
fluke, no chance contamination of one particular disk by a fir falling next to
a Redmond product packer's open window. Nosiree. I have two copies of
build-490 from two sources, and both CD-ROMs have that same alluring scent. As
I opened the jewel case, a sense of calm fell over me. I felt relaxed and had
visions of a bucolic scene--pastures, trees, wildflowers, fluffy white clouds,
a cooling breeze, deer grazing, and birds singing. Sort of the way that Edward
G. Robinson voluntarily bites the big one in Soylent Green. That should have
warned me. One should not trust scratch 'n' sniff software.
Throughout its documentation and Help displays, Microsoft refers to Windows 95
as, simply, Windows. Could Microsoft be trying to distance itself from what,
in hindsight, seems to be a shot in the foot--a major product with a name that
sports an unplanned early expiration date? Built-in obscurity like Brazil '66.
Like a credit card or a carton of milk. How will you feel in 1996 running an
old operating system? Better sniff it first. It might be spoiled.


MidiFitz


Last month I launched a new project, a Windows program that uses MIDI to
emulate a jazz piano player's rhythm section in real time. The code that I
discussed was an early version that simply read the keyboard's MIDI messages
and displayed them in a list box. The full program, which is now complete,
examines the notes being played, deduces a musical chord, and plays a bass
line through the MIDI system.
The program's name is "MidiFitz," for reasons that will become obvious. If
anyone has this name trademarked, don't go running to your lawyer. Just call,
and we'll cave in...er, accommodate like we always do.
I learned several things during this program's development. I learned more
about Visual C++ and its integration with the Microsoft Foundation Classes. I
had to dig into the multimedia API to figure out the Windows MIDI system. Then
there was the musical part--getting the program to play the bass violin
correctly.


Visual C++ 2.0 and MFC


Visual C++ 2.0 is not the latest version, but it's the one I have, and it was
sufficient to develop this program. Figure 1 shows the program's application
window, which I designed completely from within the visual-programming
environment.
There are several anomalies with Visual C++ 2.0 and Windows 95. First, the
size and configuration of windows differ significantly between the visual
development environment and the run-time environment. This makes design
difficult, because the final product does not look exactly like what you
designed. Under NT, the windows are the same, suggesting that later versions
of Visual C++ will no doubt catch up to the alien Windows 95 operating
environment. Second, MFC 3.0 supports a small subset of the controls that
Windows applications use. For example, it does not support spin buttons, which
I needed for setting the tempo of the song in the application. I had to
contrive a spin button by associating a scroll bar with an edit box. Third, I
could find no way to add a menu to a dialog-based application. Microsoft's
documentation is huge--maybe even comprehensive--but try to find something
when you don't know how or where Microsoft filed it. Fourth, the DDV/DDX
associations fall way short of the mark. You can associate a control reference
or a variable to a control on a dialog box, but not both. The implementation
of DDV for radio buttons is cumbersome, at best. You can associate a BOOL
variable with only one of the buttons in a group, no matter how many there
are. What good is that? It's easier to forget DDV and do it yourself.
These criticisms are the musings of a novitiate in the hallowed halls of
Windows programming. Those of you in the know could laugh at my floundering,
and well you might. I'd rather you jumped on Microsoft and forced it to
provide better documentation. These impediments are typical of what the rest
of us can expect when we take the plunge.


Encapsulating the MIDI API


The MIDI API is a set of C functions that a Windows program can call. MFC
ignores this API, so a C++ programmer has to encapsulate it. Eventually, I'll
have the entire multimedia API nicely wrapped up. I'm sure others will do
likewise. For now I took the pragmatic approach and designed two classes that
encapsulate MIDI input and output specifically to support this program.
Listings One through Four are midiin.h, midiin.cpp, midiout.h, and
midiout.cpp, respectively.


Reading the MIDI Device


To figure out what chord the piano player is playing, the program reads the
MIDI input device, intercepts and interprets the notes, and passes them
through to the MIDI output device.
MIDI input is simple. A program calls the Windows midiInOpen API function
specifying the input device number, the address of a handle that identifies
the open device, and the handle of a window to which MIDI input messages are
to be sent. The device number is 0 in this program. I'm assuming only one
input device, the MIDI keyboard. The MidiIn class declared in midiin.h
(Listing One) and midiin.cpp (Listing Two) encapsulates this interface. A
program declares an instance of the MidiIn class passing a window handle as
the only parameter to the MidiIn constructor. When a MIDI input event occurs,
the designated window receives an MM_MIM_DATA message, which is a packet
containing an event code and, when appropriate, a note code and a velocity
code.


MIDI Input Events


There are many different MIDI event messages; MidiFitz is concerned with only
a few. The source-code file midifdlg.cpp, which is not published this month
but is available electronically (see "Availability, page 3), receives and
processes those event messages.
The controller change event signals that either the sustaining or dampening
pedal was pressed or released. This event has nothing to do with chords, but
the program passes it through to the MIDI output device. The Sound Blaster's
MIDI system ignores the event. The Turtle Beach sound card recognizes and
processes it appropriately.
The note on event arrives twice--once when you press a key and once when you
release it. (The keyboard in this context is the MIDI 88-note keyboard, not
the typewriter keyboard on your PC.) There is a note off event, but my
keyboard does not send it. The note on event includes a note code to identify
the note to play and a velocity parameter that specifies how hard the player
hit the key. When the player releases the key, the keyboard sends a note on
event with a velocity of 0. Other keyboards, particularly organ keyboards with
no dynamics and, therefore, no meaningful velocity change, might not work the
same way.
The input device can send three timing event messages: start-timing clock,
timing clock, and stop-timing clock. The first is sent when the system starts
playing a song, the second is sent at fixed intervals during the song, and the
last event is sent when the song is finished. Timing-clock events are sent at
a clock speed of 24 times per quarter note. A quarter note is typically the
time duration of one beat in a song. Not always, but often enough to serve as
the reference interval for the MIDI system. The faster the tempo, the higher
the rate of timing-clock event messages. One device in the system sends the
timing-clock event messages, and the other devices synchronize with them. The
Timing radio buttons on the MidiFitz application window tell the program
whether to generate and send the timing clock events or expect them from an
external source.


Playing by Ear



Jimmy Durante claimed to have found the lost chord by sitting on the piano
keys. "Dat's funny," he said, "I usually plays by ear."
A piano player doesn't think about how the bass player knows what note to
play. Unless there is a chart, the bass player usually knows the song and
plays a bass line that fits the song's chord progression as he hears it in his
head. He "plays by ear." (I'm using male gender here because it's awkward to
always say "his or her" and "he or she," and I don't believe political
correctness should compromise readability. Besides, in many years of
professional playing I've met only one female bass fiddle player.)
What about when I play something that the other guy never heard before? A
musician deduces some things through experience. Certain patterns of chord
sequences resolve in predictable ways. Not always, however, and I wondered how
the good players deal with that. Piano players have some leeway when
accompanying singers or horn players. We can wait until we hear the harmonic
properties of the song and plug a chord in behind the beat. Comping, they call
it, and if we do it right, it sounds like we know the tune. But bass players
have to be right on the beat with the right note. How do they handle a tune
they've never heard?
I consulted an expert. For several years I played in a duo with a truly great
bass player/vocalist named John Fitzgerald. Fitz stands on my left and watches
my left hand. If Fitz doesn't know the song, he reads my hand on the keyboard.
After a few gigs with any piano player, Fitz can read that left hand
unerringly. After playing one chorus of any tune, Fitz knows it for life.
Fitz provided the inspiration and the objectives (and the name) for this
program. Like him, the program watches the chords during the first chorus and
plays along. For subsequent choruses, the program plays a bass line based on
the chords it learned during the first chorus, which means that the piano
player can be less accurate in the voicings and timing of chords after the
first chorus.


Parsing a Chord


Every time the piano player presses a note, the program puts a True value into
an array of 88 bools. The value goes into the array element corresponding to
the note on the keyboard. When the player releases the note, the program
writes a False value to its element in the array. At any time during the
running of the program, that array represents the notes currently being
pressed on the MIDI keyboard. If you sit on the keyboard, you too might find
the lost chord in the array.
For each keypress, the program tries to determine if a known chord (known to
the program, that is) is being played. It takes at least two notes to
represent a chord. The more notes, the more information the program can deduce
about the chord.
There are 88 keys on the keyboard but only 12 basic tones. The groups of 12
are repeated at progressively higher pitches as you go up the keyboard. Each
octave produces 12 tones that are each twice the frequency of the
corresponding tones in the next-lower octave. The "A" in the center of the
keyboard produces a tone at 440 Hz. The one below it is 220. The one above it
is 880. The frequency of each next-adjacent tone can therefore be computed by
multiplying the current frequency by 1.059463094359, which, when repeated 12
times, doubles the original frequency. And that's all I have to say about
that.
Depending on the reach of your two hands (or your hindquarters, as the case
may be), nearly any combination from 1 to 10 of the 88 notes is possible.
Given the atonal nature of some contemporary music forms, any combination is
acceptable, depending on what you like. From this chaos, our minds form order
as we listen to the songs we like. From this same chaos, MidiFitz has to
interpret the notes into a chord. Here is what it does.
MidiFitz does not apply any knowledge beyond what you are currently playing to
interpret the chord. Unlike its namesake, MidiFitz does not know, for example,
that a form of the C chord can be expected following a progression of DMINOR,
G7TH. Although a bass player is almost always right to make that assumption,
the times that it does not work often result in dissonance at best and angry
singers at worst. Therefore, MidiFitz depends on the piano player to specify
the chords.
The program collects the four lowest distinct notes being played and places
them into an array. Notes repeated in higher groups of 12 are eliminated, so
that only one instance of each tone is in the sample. The four notes might be
close together (like Shearing) or spread out (like Brubeck) depending on how
you voice the chord. MidiFitz collapses the four notes, which might be spread
all across the keyboard, into the same grouping of 12. These close groupings
of two-to-four notes now constitute a likely representation of the chord being
played, assuming the piano player knows the right chord. 
The two-to-four note sample is run past an array of chord voicings. Here I
must confess: Those voicings reflect the way that I improvise. I've tried to
include most conventional voicings as well, but there are many ways to play
any chord, and I might have missed your favorite voicings. Every improvising
piano player's sound is distinctive. Within four measures, I can identify the
player if he or she is among my favorites. The sound is a product of the
shape, weight, and dimensions of the player's hands, the player's voicing of
chords, and the player's understanding of chord substitutions.
The array of voicings contains elements of the midiChord structure shown in
Example 1. Each voicing in the array consists of from two to four notes. The
first note, which represents the lowest note being played, is always
represented by the number 1. The other notes are half-note offsets from the
first. The root data member is the half-note offset from the lowest note that
represents the root note of the chord. This method allows the piano player to
express a chord with multiple inversions where the root is not necessarily the
lowest note played. This feature is what sets MidiFitz apart from MIDI
keyboards that require you to play root-position chords in order to build
accompaniments in real time.
The six bool members identify the chord form--major, minor, seventh,
diminished, augmented, and major seventh. When you play the chord, the program
finds the voicing in the array, determines the root note by applying the root
offset to the lowest note played, and determines the chord form based on the
setting of the bool members.


Playing a Line


Given a chord, the program has to play a respectable bass line. For those of
you who don't already know, a bass line is that bottom note that the guy with
the four-string guitar plays in rock bands. The note usually provides the
harmonic foundation of the music as well as keeping time. A bass line "walks"
the chord progression in time with the drummer.
If you still don't know what I'm talking about, stand on any street corner and
wait for an old car to go by with young occupants and huge speakers on the
back window shelf. You'll hear the bass line: BOOM! BOOM! BOOM! BOOM! See the
occupants mindlessly stare out the windows. Watch the car windows bulge, the
occupant's heads implode slightly, and their eyes bug out with each pulse.
Imagine their eardrums turning to leather. That's a bass line. MidiFitz, being
a mainstream jazz player, is quieter and does not threaten your senses.
MidiFitz observes the current chord and determines a bass line to play from an
array of bass lines. There are two forms: one for major and minor chords; and
one for the sevenths, augmented, diminished, and minor seventh chords. Each
line consists of measures and beats. Each beat consists of a numerical note
relative to the root note of the chord. The program selects a measure from the
eight possible measures based on the current measure within the song. The
program selects a note from that measure based on the beat within the measure.
If the chord is a minor or diminished and the note is a third, the note is
flatted (decremented). If the chord is diminished and the note is a fifth or
seventh, the note is flatted. If the chord is augmented and the note is a
fifth, the note is sharped (incremented). Then MidiFitz plays the note.
Playing the note consists of sending a note on message with a velocity based
on the current volume to an object of the MidiOut class. The class
encapsulates the MIDI output API. You can't just turn a note on and leave it
on; it might persist beyond its time. The program waits for a few clock ticks
and sends a note on message with a zero velocity.
The MIDI output device is selected by the user from among those available. The
Windows MIDI API includes functions that tell you how many MIDI output devices
are available and what their names are. The MidiOut class uses those functions
to load a drop-down list box with the names of the output devices. From that
list, the piano player selects an output device for the bass line and the
piano notes.
An output device has 16 channels, and you can assign instrument voices to
each. I use channel 0 for the piano voice and channel 1 for the bass. The
piano player selects from the standard piano and bass voices to assign to the
channels. Those assignments, in MIDI parlance, are "patches." The MidiOut
class includes a SetPatch member function to set the instrument voice patch
for each channel.


The Form


The first version of MidiFitz is based on the 32-bar song structure. A bar is
the same as a measure. Not all songs follow that structure, but it is a start.
For the first 16 bars of the first chorus, MidiFitz plays two bass notes per
measure: "In two." For the next eight bars, which is the bridge (or release)
in a typical 32-bar song, MidiFitz plays four notes to the measure: "In four."
Then back to two notes per measure for the last eight bars. After the first
chorus, everything is in four. That pattern is a common jazz idiom used to add
an element of swing to a song.


Remembering the Song


MidiFitz remembers the chord structure that you lay down in the first 32 bars
and repeats it automatically for all subsequent choruses. That way, the piano
player can play less diligently in the left hand after the first chorus, but
the chords will still be as correct as in the first chorus. You can even let
the bass play a solo after the first chorus.


Ending the Tune


During the course of a chorus, you can choose the Last Chorus button to tell
MidiFitz to wind up the tune with a standard ending when the chorus ends. You
can choose the Tag button to tell MidiFitz to put a standard tag on the tune.
If MidiFitz is using an external clock, you can end at any time by stopping
the MIDI device that supplies the external clock events. That practice is
known among musicians as "chorus interruptus."


What Else?


I've had a lot of fun with this project, and it's a program that I use a lot
to practice at home. It can't replace a live bass player and drummer, but its
time is perfect and it doesn't drink, come in late, or bother the waitress.
I'd like to add a menu that saves songs and loads saved songs. I'd like more
musical forms, such as the 12-bar blues or tunes with no bridge. Ask a
musician in Louisville, Kentucky where the bridge to Indiana is, and he'll
answer, "There ain't no bridge to Indiana." Count the measures in "The Lady is
a Tramp" for a surprise.
If ever an application cried out for a touch-screen interface, this is it. I'd
add more features if I could energize them with a quick tap on a large command
button. A piano player's attention is on the other keyboard, and there are no
hands left over for mice and the like.



Source Code


The source-code files for the MidiFitz project are free. You can download them
from the DDJ Forum on CompuServe and on the Internet by anonymous ftp; see
"Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Be sure to include a
note that says which project you want. The code is free, but if you care to
support my Careware charity, include a dollar for the Brevard County Food
Bank.
Figure 1: The MidiFitz application.
Example 1: The midiChord structure.
struct midiChord {
 unsigned char note[4];
 unsigned char root;
 BOOL major;
 BOOL minor;
 BOOL seventh;
 BOOL diminished;
 BOOL augmented;
 BOOL major7;
 int operator!=(midiChord& crd);
};

Listing One
// ----- midiin.h
#ifndef MIDIIN_H
#define MIDIIN_H
#include "stdafx.h"
#include <mmsystem.h>
class MidiIn {
 HMIDIIN hMidiIn;
public:
 MidiIn(HWND hWnd);
 ~MidiIn();
};
#endif

Listing Two
// ---- midiin.cpp
#include "stdafx.h"
#include "midifitz.h"
#include "midifdlg.h"
MidiIn::MidiIn(HWND hWnd)
{
 hMidiIn = 0;
 // ---- test for MIDI input devices
 if (midiInGetNumDevs() == 0)
 throw("No MIDI input devices");
 // ---- assume one MIDI input device, open the keyboard
 if (midiInOpen(&hMidiIn, 0, (unsigned long) hWnd, 0, 
 CALLBACK_WINDOW) != 0)
 throw("Cannot open MIDI input device");
 midiInStart(hMidiIn);
}
MidiIn::~MidiIn()
{
 if (hMidiIn != 0) {
 midiInStop(hMidiIn);
 midiInClose(hMidiIn);
 }
}


Listing Three
// ----- midiout.h
#ifndef MIDIOUT_H
#define MIDIOUT_H
#include "stdafx.h"
#include <mmsystem.h>
class MidiOut {
 HMIDIOUT hMidiOut;
 short int numDevices;
 short int currDevice;
 BOOL midimapper;
public:
 MidiOut(short int outputdevice);
 ~MidiOut();
 void DeviceList(CComboBox* dlist);
 void SetPatch(short int channel, short int voice);
 void NoteOn(short int channel, short int note, short int velocity);
 void NoteOff(short int channel, short int note, short int velocity);
 void Pedal(short int channel, short int pedal, short int velocity);
 void ChangeDevice(short int device);
 void StartMessage();
 void TimingMessage();
 void StopMessage();
};
#endif

Listing Four
// ---- midiout.cpp
#include "stdafx.h"
#include "midifitz.h"
#include "midifdlg.h"
MidiOut::MidiOut(short int outputdevice)
{
 hMidiOut = 0;
 numDevices = 0;
 // ---- test for MIDI input devices
 if ((numDevices = midiOutGetNumDevs()) == 0)
 throw("No MIDI output devices");
 MIDIOUTCAPS ocaps;
 midimapper =
 !midiOutGetDevCaps((unsigned)MIDIMAPPER, &ocaps, sizeof(ocaps));
 if (midimapper)
 --outputdevice;
 currDevice = outputdevice;
 if (midiOutOpen(&hMidiOut, currDevice, 0, 0L, 0L))
 throw ("Cannot open MIDI output device");
}
MidiOut::~MidiOut()
{
 if (hMidiOut != 0)
 midiOutClose(hMidiOut);
}
void MidiOut::ChangeDevice(short int device)
{
 if (midimapper)
 --device;
 ASSERT(device < numDevices);
 if (device != currDevice) {
 midiOutClose(hMidiOut);

 currDevice = device;
 if (midiOutOpen(&hMidiOut, currDevice, 0, 0L, 0L))
 throw("Cannot open MIDI output device");
 }
}
void MidiOut::SetPatch(short int channel, short int voice)
{
 DWORD mmsg = 0xc0 channel (voice << 8);
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::NoteOn(short int channel, short int note, short int velocity)
{
 DWORD mmsg = 0x90 channel (note << 8) (velocity << 16);
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::NoteOff(short int channel, short int note, short int velocity)
{
 DWORD mmsg = 0x80 channel (note << 8) (velocity << 16);
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::Pedal(short int channel, short int pedal, short int velocity)
{
 DWORD mmsg = 0xB0 channel (pedal << 8) (velocity << 16);
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::StartMessage()
{
 DWORD mmsg = 0xfa;
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::TimingMessage()
{
 DWORD mmsg = 0xf8;
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::StopMessage()
{
 DWORD mmsg = 0xfc;
 midiOutShortMsg(hMidiOut, mmsg);
}
void MidiOut::DeviceList(CComboBox* dlist)
{
 ASSERT(dlist != 0);
 MIDIOUTCAPS ocaps;
 if (midimapper)
 dlist->AddString("MIDI Mapper");
 for (short int i = 0; i < numDevices; i++) {
 midiOutGetDevCaps(i, &ocaps, sizeof(ocaps));
 dlist->AddString(ocaps.szPname);
 }
}








































































ALGORITHM ALLEY


Common-Fraction Approximation of Real Numbers 




Louis J. Plebani


Louis is a member of the Department of Industrial and Manufacturing Systems
Engineering at Lehigh University. He can be contacted at ljp2@lehigh.edu.


In integer arithmetic, a common practice for calculating a fractional part of
a value is to multiply by the numerator of some fraction and divide by the
denominator. This is such a common operation for Forth programmers (who
frequently work on small, integer-only machines) that one of the most basic
Forth words is */, which does a multiplication followed by a division with a
double-precision intermediate result. But even in the current age of
sophisticated, single-chip computers with floating-point instructions, the
need to apply the old chestnut can arise. Thus, to calculate 0.6 of some
quantity, you would multiply by 3 and divide by 5. In this case, the choice of
a common fraction is trivial, but it isn't always: Accuracy and magnitude may
be of concern.
Suppose you need to multiply by the fractional part of sqrt 3=1.7320508+. A
simplistic approach would be to multiply by 732 and divide by 1000 for an
accuracy of 5E-5, but this might not be the best solution. Unless a
double-precision intermediate result is provided, overflow can occur. For
example, if a multiplication routine with an unsigned 16-bit product was
preferred, overflow would occur for any number greater than 216/732=89.
Slightly better accuracy with less overflow complication can be achieved by
multiplying by 112/153=0.732026+, which has an approximation error of 2.5E-5.
To restrict the numerator to two or even one decimal digit, the best common
fractions would be 71/97=0.7319587+ for an error of 9.2E5 or 8/11=0.72727+ for
an error of 4.8E3. If the denominator were restricted to an 8-bit binary
number so that an 8-bit divisor routine could be used, then the best common
fraction would be 153/209=0.7320574+ for an error of 6.61E-6.
Even if multiple-precision multiplication and division are available, it is
often necessary to obtain a better approximation. For instance, to multiply by
the decimal part of p=3.141592653+ with an accuracy of 1E-6, the simple
approach would be to multiply by 141,593 and divide by 100,000, which would
give an accuracy of 3.54E-7. However, the fraction 16/113 approximates the
decimal part of p to an error of 2.67E-7: the same order of magnitude of error
without the need for multiple-precision arithmetic. If you don't mind using
16-bit divisors, 4703/33215=0.1415926539+ is accurate to within 3.32E-10. Most
programmers prefer the latter two approximations.
The question, What is the best common fraction to use for a particular
application? is most likely to take one of three slightly different forms:
1. What is the common fraction with the smallest numerator and denominator
that will approximate a given real number to within a specified tolerance?
2. What is the common fraction with a numerator less than N that best
approximates a given real number?
3. What is the common fraction with a denominator less than N that best
approximates a given real number?
In this article, I'll describe some simple search procedures to determine the
fractions that answer these questions. But first I'll briefly turn to
elementary number theory, specifically, Farey Sequences.
Farey was a mineralogist who, in 1816, made observations about common
fractions and published them without proof. The mathematician Cauchy published
the proofs for Farey's observations shortly thereafter. Farey wrote down the
sequence of all reduced fractions between 0 and 1 whose denominators are
limited by a number N, the "order" of the Farey Sequence. Figure 1 shows the
Farey Sequences of orders 1, 2, 3, 4, and 5.
Farey observed (and Cauchy proved) that for two consecutive fractions a/b<c/d
of the Farey Sequence of order N, ad-bc=-1. This theorem is proved by
induction. By inspection, it can be seen that it is true for orders 1, 2, 3,
4, and 5. Assuming that it is true for a series of order N, it can be proved
that it is true for a series of order N+1, thus completing the inductive
proof. 
The "mediants" of a Farey Sequence are also interesting. The fraction
(a+c)/(b+d) is called the mediant of two consecutive terms of a sequence a/b
and c/d. For instance, a/b<(a+c)/(b+d)<c/d since a(b+d)<b(a+c)=>ad<bc=>a/b<c/d
and (a+c)d<(b+d)c=>ad<bc=>a/b<c/d. Assume that a/b and c/d are consecutive
terms of a Farey Sequence of order N. Now consider a Farey Sequence of order
N+1 with a subsequence {a/b,h/k,c/d} contained in the sequence of order N+1
but not of order N. If this is true, then ak-bh=-1, hd-kc=-1. Solving these
equations for h and k yields h=(a+c)/(bc-ad) and k=(b+d)/(bc-ad). But since
a/b and c/d are consecutive terms of a Farey Sequence of order N, ab-bc=1,
which reduces the expressions for h and k to h=a+c and k=c+d. This illustrates
a second theorem associated with Farey Sequences: Fractions that belong to the
Farey Sequence of order N+1 but not of order N are mediants of the Farey
Sequence of order N.
This important theorem means that starting from the Farey Sequence [0/1,1/1],
sequences of a higher order can be constructed by successively inserting
mediants with appropriate denominators. This enables efficient search
procedures to be defined for answering the three questions posed earlier. I'll
describe the answer to Question 1; the procedures for the other two questions
are simple modifications to the first.
Question 1 seeks a common fraction that will approximate a real number g to
within a given tolerance e. We need an efficient way to search all possible
Farey Sequences. Suppose you know that g is contained in some interval
[a/b,c/d] of a Farey Sequence. If neither endpoint approximates g to within
the desired tolerance e, another approximation to g can be made by calculating
the mediant of this Farey interval. This will divide the interval into the two
intervals [a/b,h/k] and [h/k,c/d], where h/k=(a+c)/(c+d). If g-h/ke, h/k
yields the desired common fraction, and you are finished. Otherwise, you need
only determine which interval contains g, replace the appropriate interval
limit with h/k, and repeat the procedure. 
This search procedure examines those intervals containing g of successively
higher-order Farey Sequences until an interval limit is found that
approximates g to the desired tolerance. Assuming that neither 0 or 1 are a
satisfactory approximation to g, the steps of the search procedure are:
1. Initialize endpoints of the first Farey Sequence [a/b,c/d] to a=0,b=c,d=1.
2. Calculate an approximation h/k, where h=a+c,k=b+d.
3. Calculate error=(h/k)-g. If errore, stop. h/k is the desired common
fraction.
4. If error<0, change the lower limits to a=h,b=k. Go to step 2.
5. If error>0, change the upper limits to c=h,d=k. Go to step 2.
Listing One is a C function to perform the search. On a 486-based PC, the
function executes instantly for any reasonable values of g and e. When
programmed on a programmable calculator, the execution time ranges from a
second to a minute or more.
Answering the maximum-numerator question is nearly as straightforward.
Intervals on successively higher Farey Sequences are searched until either the
numerator of the new interval boundary in the next-higher sequence exceeds the
limit N or the desired tolerance is achieved. If the absolute best
approximation is desired with the magnitude constraint, a tolerance of 0 can
be specified. The major modification is that if the search loop is exited due
to the numerator magnitude constraint, both endpoints of the interval
[a/b,c/d] must be checked for the best approximation. Listing Two is a C
function that implements this procedure.
The procedure for the maximum-denominator problem is similar to that of the
maximum numerator. Intervals on successively higher Farey Sequences are
searched until either the denominator of the new interval boundary in the
next-higher sequence exceeds the limit max_denominator or the desired
tolerance is achieved; see Listing Three.


Reference


Rademacher, Hans. Lectures on Elementary Number Theory. New York, NY:
Blaisdell Publishing Company, 1964.
Figure 1: Farey Sequences of orders 1, 2, 3, 4, and 5
 0 1
 - -
 1, 1
 0 1 1
 - - -
 1, 2, 1
 0 1 1 2 1
 - - - - -
 1, 3, 2, 3, 1
 0 1 1 1 2 3 1
 - - - - - - -
 1, 4, 3, 2, 3, 4, 1
 0 1 1 1 2 1 3 2 3 4 1
 - - - - - - - - - - -

 1, 5, 4, 3, 5, 2, 5, 3, 4, 5, 1

Listing One
#include <stdio.h>
#include <math.h>
void Fraction( double realnum, double maxerror, double* num, double* denom) {
 double a,b,c,d,h,k,error;
 a = 0;
 b = c = d = 1;
 for( ;; ) {
 h = a + c;
 k = b + d;
 error = h/k - realnum;
 if ( fabs(error) <= maxerror )
 break;
 if ( error < 0 ) {
 a = h;
 b = k;
 }else{
 c = h;
 d = k;
 }
 }
 *num = h;
 *denom = k;
}
void main() {
 double realnum, maxerror, num, denom;
 printf( "\nEnter real number to be approximated > ");
 scanf("%lf", &realnum);
 printf( "\nEnter maximum allowable error > ");
 scanf("%lf",&maxerror);
 Fraction( realnum, maxerror, &num, &denom);
 printf( "\n real number = %e", realnum);
 printf( "\n numerator = %.0f", num);
 printf( "\n denominator = %.0f", denom);
 printf( "\nnumerator / denominator = %e", num/denom);
 printf( "\n error in approx = %e", num/denom - realnum);
}

Listing Two
#include <stdio.h>
#include <math.h>
void FractionLimitNumerator( double realnum, double maxerror,
 double* num, double* denom, double maxnum) {
 double a,b,c,d,h,k,error;
 a = 0;
 b = c = d = 1;
 for( ;; ) {
 h = a + c;
 k = b + d;
 if ( h > maxnum ) {
 if ( fabs(a/b - realnum) < fabs(c/d - realnum) ) {
 h = a;
 k = b;
 }else{
 h = c;
 k = d;
 }

 break;
 }
 error = h/k - realnum;
 if ( fabs(error) <= maxerror )
 break;
 if ( error < 0 ) {
 a = h;
 b = k;
 }else{
 c = h;
 d = k;
 }
 }
 *num = h;
 *denom = k;
}
void main() {
 double realnum, maxerror, num, denom, maxnum;
 printf( "\nEnter real number to be approximated > ");
 scanf("%lf", &realnum);
 printf( "\nEnter maximum allowable error > ");
 scanf("%lf",&maxerror);
 printf( "\nEnter maximum allowable numerator > ");
 scanf("%lf",&maxnum);
 FractionLimitNumerator( realnum, maxerror, &num, &denom, maxnum);
 printf( "\n real number = %e", realnum);
 printf( "\n numerator = %.0f", num);
 printf( "\n denominator = %.0f", denom);
 printf( "\nnumerator / denominator = %e", num/denom);
 printf( "\n error in approx = %e", num/denom - realnum);
}

Listing Three
#include <stdio.h>
#include <math.h>
void FractionLimitDenominator( double realnum, double maxerror,
 double* num, double* denom, double maxdenom) {
 double a,b,c,d,h,k,error;
 a = 0;
 b = c = d = 1;
 for( ;; ) {
 h = a + c;
 k = b + d;
 if ( k > maxdenom ) {
 if ( fabs(a/b - realnum) < fabs(c/d - realnum) ) {
 h = a;
 k = b;
 }else{
 h = c;
 k = d;
 }
 break;
 }
 error = h/k - realnum;
 if ( fabs(error) <= maxerror )
 break;
 if ( error < 0 ) {
 a = h;
 b = k;

 }else{
 c = h;
 d = k;
 }
 }
 *num = h;
 *denom = k;
}
void main() {
 double realnum, maxerror, num, denom, maxdenom;
 printf( "\nEnter real number to be approximated > ");
 scanf("%lf", &realnum);
 printf( "\nEnter maximum allowable error > ");
 scanf("%lf",&maxerror);
 printf( "\nEnter maximum allowable denominator > ");
 scanf("%lf",&maxdenom);
 FractionLimitDenominator( realnum, maxerror, &num, &denom, maxdenom);
 printf( "\n real number = %e", realnum);
 printf( "\n numerator = %.0f", num);
 printf( "\n denominator = %.0f", denom);
 printf( "\nnumerator / denominator = %e", num/denom);
 printf( "\n error in approx = %e", num/denom - realnum);
}
DDJ







































PROGRAMMER'S BOOKSHELF


A Sample of C++ Books




Ray Valds


When OOP and C++ were still new to the mainstream, there emerged a tremendous
need on the part of programmers for books on C++. Five years or so ago, a
familiar question in online forums was, I want to learn more about C++, where
do I start? It's a sign of the times (as well as of the decade we live in)
that today, the question often is, I'll be laid off next month unless I learn
C++, where do I start?
In response to this perceived need, book publishers unleashed a mountain of
books on C++. A search through the Computer Literacy bookstore Web site
(http://www. clbooks.com) now returns 376 books with "C++" in the title.
Leafing through a few of these books, you'll notice that some bear the
earmarks of haste: superficial coverage of the topic, example code that will
not compile, sundry misconceptions, and nuggets of misinformation. On
occasion, it seems that the author is only a few weeks ahead of the reader in
grasping the subject.
Some time ago, Andrew Schulman suggested a useful rule of thumb in weeding out
the Johnny-come-lately novices from dyed-in-the-wool mavens: Consider only
titles whose authors worked at or were once affiliated with Bell Labs. This
filter would include worthwhile books by Bjarne Stroustrup, Margaret Ellis,
Stan Lippman, Rob Murray, Jim Coplien, Stephen Dew-hurst, Kathy Stark, Andrew
Koenig, Tom Cargill, Jonathan Shapiro and Martin Carroll, which should be
plenty. The premise behind this rule is that C++ is a complex enough subject
that only someone with years of hands-on experience can contribute useful
advice, and for a long time, the only place one could get this experience was
at Bell Labs. Obviously, this premise is less valid as C++ becomes more
entrenched in the mainstream; even in times past, the stringent rule may have
slighted some illuminating introductory treatments by outsiders. Nevertheless,
I've found Schulman's suggestion useful when approaching a bookstore's sagging
shelves.
More recently, I've been considering a different rule of thumb: Never read
anything you can't lift. Okay, I've had to break this rule, but let me explain
its rationale. 
There's been an arms race among book publishers to make titles as physically
wide as possible, to garner every last inch of shelf space and edge out
competitors. This has resulted in overweight tomes of 800 pages or more,
usually specific to a particular compiler or operating system, often released
within a few weeks of the compiler's initial shipment, yet claiming to be the
last word on the subject. Too often, you find inside a rehash of the vendor
documentation, some toy programs, a few homilies, and well-worn bits of
advice. The tips and suggestions therein take everything at face value, with
little attempt to go beyond the manufacturer's feature list; there are few of
the hard-won lessons that only extensive experience can provide.
As a result, I'm drawn to slim books that carry more than their weight in
knowledge and insight--classics such as Scott Meyer's Effective C++
(Addison-Wesley, 1992), or James Coplien's Advanced C++ Programming Styles and
Idioms (Addison-Wesley, 1992). These books cover material less likely to
become obsolete with a new version of a particular compiler. C++ beginners
have for a long time been well served by Stan Lippmann's C++ Primer
(Addison-Wesley, 1991) and Stephen Dewhurst and Kathy Stark's Programming in
C++, Second Edition (Prentice-Hall, 1995). These four books together take up
about the same shelf space as one overweight, underpowered title.
However, C++ continues to advance, and more programmers are involved in
multiperson projects whose codebase evolves over time. Once you've learned the
syntax and some language tips, object-oriented design becomes more of a
concern. One recently released book that satisfies a range of needs is
Mastering Object-Oriented Design in C++, by Cay Horstmann. The book lives up
to its billing on the jacket: "Everything you need in one manageable package!"
In this small-format volume, Horstmann provides both an introduction to the
C++ language and a "minimethodology" for object-oriented design that
emphasizes tractable, real-world use instead of the ivory-tower overkill that
some methodologies promote.
Horstmann is a man of few words, and he has packed just about every page in
this book with useful information. The brevity is refreshing, and the clean
and concise approach contrasts favorably with more-weighty tomes. For example,
one popular C++ book has a 60-page chapter on pointers that begins around page
400. Horstmann dispenses with the subject in all of two pages. There are no
long-winded explanations, drawn-out examples, padding of the book with page
after page of repetitive listings, or half-page diagrams fluffed up with the
currently fashionable 3-D, graduated-tint screens. The graphic design is as
efficient and precise as the prose. There are many code fragments, usually two
or three per page, woven naturally into the fabric of the discussion.
Horstmann's book covers features recently added to C++, including exceptions,
templates, namespaces, and run-time type. This coverage is not tacked on at
the end, but is clearly part of the author's coding repertoire.
Horstmann makes his points in a no-nonsense fashion that is sometimes
opinionated but whose opinions carry the weight of experience. For example,
when talking about adding new operations to an existing class implementation:
"Public data and cluttered interfaces must be avoided. You have no good choice
but to stop coding your immediate task and to reexamine the overall class
design."
Every chapter ends with a list of design hints: "Don't pollute the global
namespace with constants," "A class that has a destructor needs both a copy
constructor and an assignment operator," "Beware of constructors with a single
integer argument," and "Class types are the norm; challenge basic types."
These epigrams are followed by a one- or two-paragraph explanation.
Horstmann meticulously adheres to a coding style that "stays away from a good
number of C++ features I consider of marginal value (such as private
inheritance or pointers to member functions)." He acknowledges that coding
preferences can be a matter of intense debate, and provides a 20-page appendix
that lays out the guidelines formally so that you can modify them for personal
or local taste.
If he sounds like a stern schoolmaster, perhaps that's because he is.
Horstmann teaches computer science at San Jose State University in California,
and this book has been designed for classroom teaching as well as personal
use. Every chapter has exercises that go beyond the rote repetition you may
have encountered in school. For example, Exercise 13.1.1 is: "Try corrupting
the free list on purpose to see what happens on your system. Delete the same
pointer twice. Then allocate pointers and check for duplicates or just for
crashes."
The writer is more than a detached academic. Not mentioned in the book is the
fact that Horstmann is author of a commercial word-processing package for
scientific documents called "ChiWriter." This complex program has served as
proving ground for the author's design hints and coding tips.
Recently, I've encountered programmers who have already completed the slow
march up the learning curve of C++ and are ready to embark on a project
involving Microsoft Visual C++ (MSVC) and Microsoft Foundation Class (MFC),
only to find themselves at sea in a new ocean of information: the 80,000 lines
of dense C++ code that comprise MFC, and the way this codebase interacts with
the underlying Windows API. Having been in the same boat, I empathize
strongly. 
In contrast with the general topic of C++, there are very few independent
sources of information on MSVC and MFC. One possible explanation, again, has
to do with the complexity of the subject. Unless you work at Microsoft, you
probably have not had the opportunity to work extensively with the latest
compiler versions, given the rapid release rate of the languages division
there.
The bulk of published material on this topic originates from Microsoft--the
documentation, tutorials, tech notes and sample code packaged with its
compiler and also found on the Microsoft Developer Network CD-ROM (MSDN).
There is also the Microsoft Press book, Inside Visual C++, Second Edition, by
David Kruglinksi (covering MSVC 1.5 and MFC 2.5). Fortunately, all these
official works are of uniformly high quality and go a long way towards
providing the "tribal knowledge"--the accumulated wisdom about tips, traps,
and techniques--necessary for being productive in a complex environment. The
Books Online information packaged with MSVC, plus the additional information
in the MSDN CD, is essential for any developer working on the Windows
platform, even if you use another vendor's compiler. Having the material in
electronic form facilitates browsing and searching, although it does hinder
extended reading. If you have a laser printer, however, it's easy enough to
print out even entire chapters for offline perusal.
But what happens if your programming problem is not mentioned in this
material? For example, I recently wanted to use the CPropertySheet class in
MFC to implement a "tabbed dialog" app in 16-bit Windows. This class is
supported in MSVC 2.1 for Windows 95 and Windows NT, but the 16-bit compiler
(MSVC 1.5) isn't mentioned anywhere in the class reference docs. It turns out
that CPropertySheet works fine with V1.52 or later; you just use it in the
same manner as in V2.1. To find this out, I had to break my small-is-beautiful
rule and consult some wide and heavy books.
I was pleasantly surprised by the quality of the Kruglinski book and of three
other recent books on MSVC and MFC: Win32 Programming using Visual C++, by
Mike Blaszczak; Visual C++ How-To, by Scott Stanfield et al.; and Visual C++
2, by Marshall Brain and Lance Lovette. All are safe choices and provide good
hard-copy coverage of MSVC and MFC, although none contain information that
goes substantially beyond the product documentation. The Kruglinski,
Blaszczak, and Brain/Lovette books all follow the traditional approach of
covering the MSVC package on a component-by-component basis, with a chapter
on, say, AppWizard, another on dialogs, one on ODBC, and so on. 
The Visual C++ How-To is distinguished by its question-oriented
problem-solving approach. There is no overview or introductory material;
instead, the book presents a list of very specific questions, such as "How do
I write a customized DDX/DDV routine?" or "How do I display a progress meter
in the status bar?". Each question is answered in a half-dozen or so pages,
which contain step-by-step instructions for adding lines of code to
compiler-generated source files. (As you perhaps know, much of MFC programming
involves interacting with Class Wizard and related code generators.) Because
of this approach, Visual C++ How-To complements the other three books--think
of it as additional Tech Notes that Microsoft did not write.
If I had to pick only one of the other three, I'd choose Win32 Programming
using Visual C++. Perhaps because Blaszczak was part of the MFC development
team, his book seems to have that extra modicum of insight into the design
motives behind some MFC constructs. He does first restate the party line,
which is that you don't need to know how MFC (or any other app framework)
works in order to use it. But then he admits that when things go wrong,
everyone feels the need to understand the internals behind mechanisms such as
message maps or DDX/DDV tables. Although the book has a more conventional
purpose than delving under the hood of MFC, often you can read between the
lines of his narrative and get a sense of the underlying design goals. In the
case of message maps, he devotes ten full pages to unraveling the twisted set
of macros that implement this peculiar mechanism.
In summary, when you need the answer to a specific programming problem, it's
fairly easy to sacrifice principles of elegance and conciseness. The
Kruglinski, Blaszczak, and Brain/Lovette books contain plenty of answers (the
Brain/Lovette book comes with an online index, not just a listings diskette).
Any one of them would be a worthwhile addition to your library. Ironically, my
particular question turned out not be found in any of these books, and I had
to rely on the "tribal knowledge" of a friend to solve the CPropertySheet
problem.
Mastering Object-Oriented Design in C++ 
Cay S. Horstmann
John Wiley, 1995, 454 pp., $43.95
ISBN 0-471-59484-9
Inside Visual C++, Second Edition
David Kruglinksi 
Microsoft Press, 1994, 768 pp., $39.95
ISBN 1-55615-661-8
Visual C++ How-To
Scott Stanfield, Mickey Williams, Alan Light, and Ralph Arvesen
Waite Group Press, 1995
570 pp., $39.95
ISBN 1-878739-82-4
Win32 Programming using Visual C++
Mike Blaszczak
Wrox Press, 1995, 733 pp., $44.95
ISBN 1-874416-47-8
Visual C++ 2
Marshall Brain and Lance Lovette
Prentice Hall, 1995 
834 pp., $42.95
ISBN 0-13-305145-5
Development Library CD-ROM
Microsoft Developer Network

$195 for 4-issue subscription
http://www.microsoft.com/msdn/





























































SWAINE'S FLAMES


The Invasion from Redmond


Biff: Welcome to the WOC-TV Evening News from Oconomowoc, Wisconsin. I'm Biff
Deltoid...
Connie: ...and I'm Connie Coiffure. Well, the heat wave continues unabated and
a huge, flaming object, believed to be a meteorite, fell on a farm near
Grovers Mill, New Jersey. But the big news today happened in computer stores
everywhere, right, Biff?
Biff: Right, Connie. Microsoft released Windows 95 today, and all Oconomowoc
is abuzz.
Connie: We're going to go to one of those stores now, where Natalie Stankie
has a live report. Natalie?
Natalie: Thanks, Connie. We're here at Verne's Computers and Live Bait, where
chaos reins. I'm going to try to get through the screaming throng to speak to
one of the customers just emerging from the store with what appears to be a
tattered copy of Windows 95. Sir? Sir?
Lucky Customer: Let me through! Let me through!
Natalie: Let go of my collar; I'm the media.
Lucky Customer: Oh. Sorry.
Natalie: Forget it. Can you tell us what you're feeling right now?
Lucky Customer: I'm shaken. They ran out a few minutes ago. I got one of the
last copies.
Natalie: But there must be hundreds of people still trying to get in.
Lucky Customer: Must be. I'm afraid it's gonna get ugly in there.
Natalie: Well, that's the story from Verne's. Back to you, Connie.
Connie: Thanks, Natalie. So how big is this phenomenon, Biff?
Biff: Connie, let's go to media analyst Myron Clummer for his view on the size
of this thing.
Myron: Biff, according to a chart in today's USA Today, Microsoft has already
spent more money on the Windows 95 launch than it spent on the entire Windows
3 launch, the previous record.
Biff: How does the impact compare with, say, the O.J. Simpson trial, Myron?
Myron: Well, now you're talking about the most important event of our time,
Biff. Windows 95 will change Oconomowocians in ways we can't even imagine
today, but remember, the Simpson trial went on for about a year. That said,
this first day of the Windows 95 media blitz is bigger than any single day of
the O.J. Simpson trial.
Connie: Biff, I've just been informed that roving reporter Guglielmo Weezer is
at the outlet mall. Guglielmo?
Guglielmo: Thanks, Connie. I'm talking with Jack Scupper, a marine biologist
currently living in a refrigerator carton behind Safeway.
Connie: Not a lot of call for marine biologists in Oconomowoc, eh, Guglielmo?
Guglielmo: Right you are, Connie. Mr. Scupper, can you share with the WOC-TV
viewers your unique homeless perspective on this revolutionary development?
Jack: Frankly, I think it's a testament to the genius of Bill Gates. He's my
hero.
Guglielmo: I hear that a lot. Is Bill old enough to run for President in 1996,
Connie?
Connie: Interesting question, Guglielmo. But how about the other end of the
spectrum, Biff?
Biff: Well, we have Wall Street analyst Larry Bullfeather on the phone. Larry,
we just heard the story on the street; what's the story on the Street?
Larry: I've never seen anything like it, Biff. Microsoft stock has just gone
through the roof.
Biff: Would you call this a bullish vote of confidence in Microsoft's
technology, Larry?
Larry: That, Biff, and a reaction to today's vote on Capitol Hill to eliminate
funding for the Antitrust Division of the Department of Justice.
Biff: This just in: Huge, tentacled creatures with black eyes and V-shaped
mouths dripping saliva were seen crawling out of that meteorite that fell near
Grovers Mill, New Jersey, earlier today.
Connie: Do you suppose they use Windows 95, Biff?
Biff: If they don't now, I'll bet they will soon, Connie.
Connie: From WOC-TV, Oconomowoc, Wisconsin, this is Connie Coiffure...
Biff: ...and Biff Deltoid saying, Good night and sweet dreams.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com



















OF INTEREST
Pacific HiTech has introduced its Windows Interactive Archive (WIA), a
compilation of thousands of Windows programs from the Internet. The WIA is a
two CD-ROM set containing over 1100 MB of files from a variety of sites. A GUI
lets you search by program name, author, description, or category. Program
categories listed on WIA include: drivers, utilities, writing, business,
education, and games. The WIA CD-ROM set retails for $29.95. 
Pacific HiTech 
3855 South 500 West, Suite M
Salt Lake City, UT 84115
801-261-1024
http://www.pht.com/
Imagix 4D (from Imagix) is a tool designed to help developers better
understand C/C++ code by providing a graphical, 3-D view of application code.
The Imagix data-collection system enables Imagix 4D to analyze any source code
written in K&R, ANSI, Microsoft, or GNU C.
The tool collects data from different sources, ranging from the software
itself (source code and makefiles) to static- and dynamic-analysis data files
generated by the compiler. Once collected, the data is integrated into an
entity/relationship/attribute database. This information is then used to
create a series of views--from high-level architecture snapshots, to the
nitty-gritty of class and function dependencies. 
Imagix 4D is currently available for Sun workstations, under both Solaris and
SunOS. It sells for $795.00 per seat.
Imagix Corp.
3800 SW Cedar Hills Blvd., Suite 227 Beaverton, OR 97005-2035
503-644-4905
Computer Associates has released its CA-Visual Objects SDK. The SDK, which is
available on CD-ROM, includes CA-Visual Objects coded for class interfaces,
online documentation, and sample programs.
The SDK includes APIs for: Integrated Development Environment (IDE) Subsystem,
which provides the functions required to interface with the CA-Visual Objects
IDE; Replaceable Database Drivers (RDD) Subsystem, to enable database drivers
for different file formats using a common language interface; Repository
Subsystem, which lets you store data and code used or generated by add-on and
replacement tools in the CA-Visual Objects repository; Item Subsystem, which
provides the interface for C, C++, and assembly-language programmers for
CA-Visual Objects polymorphic variables; Error Subsystem, which enables the
implementation of customizable error-handling capabilities to support
enhancements to the CA-Visual Objects language. The CA-Visual Objects SDK
sells for $195.00. 
Computer Associates International
One Computer Associates Plaza
Islandia, NY 11788-7000
516-342-5224
http://www.cai.com
NetManage has released Version 4.5 of its TCP/IP software-development kit,
NEWT-SDK. The NEWT-SDK suite provides developers with tools for creating
custom TCP/IP applications. The SDK lets developers work at a higher level
than the WinSock API by packaging all of the protocol intelligence into DLLs
and VBXs.
The SDK includes five new VBXs and six new APIs. The new VBX support includes
custom controls for FTP, SMTP, and SNMP, allowing you to create network
applications in Visual Basic without knowing the underlying application
protocols or the TCP/IP protocols. All of the VBXs are based on the Windows
Sockets TCP/IP API, so any application developed with these VBXs can
communicate across the Internet. The new APIs include a Win-CGI interface to
NetManage's Personal Web Server.
NEWT-SDK 4.5 also includes support for the WinSNMP standard that allows SNMP
applications to be portable across different TCP/IP protocol stacks. NEWT-SDK
4.5 sells for $500.00 per single copy.
NetManage Inc. 
10725 De Anza Blvd. 
Cupertino, CA 95014 
408-973-7171 
http://www.netmanage.com
Template Graphics has announced Open Inventor for Win32, a version of its Open
Inventor 3D C++ toolkit for Microsoft Win32 environments. Open Inventor, a C++
toolkit used for the rapid development of 3-D graphics applications, is based
on the OpenGL interface for rendering. 
Open Inventor for Win32 provides tight integration with Microsoft Visual C++,
API choices between Win32, MFC, and UNIX (libSoXt), complete Open Inventor
look-and-feel, Open Inventor 2.0.1 code set, Inventor Wizard with walk, fly,
examine, and scene viewers, WebSpace viewer for the Internet, SceneViewer for
CAD/CAM, VRML 1.0 node support, MDI support, OpenGL/3D-DDI acceleration,
Windows 95 build and run time, OLE in-place activation and automation, and
more. The toolkit lists $1495.00. 
Template Graphics Software 
9920 Pacific Heights Blvd., Suite 200
San Diego, CA 92121
619-457-5359 ext. 233 
http://www.sd.tgs.com/template
The SimbaEngine 3.0 SDK from PageAhead Software is a toolkit that allows you
to build a Windows-based ODBC driver for any data source. SimbaEngine 3.0
includes a variety of enhancements, including subqueries, unions, and ODBC
Version 2, Level 2 API functions. SimbaEngine 3.0 is available for Windows
3.x, Windows NT, and Windows 95.
PageAhead Software Corp. 
2125 Western Ave., Suite 301
Seattle, WA 98121.
206-441-0340
http://www.pageahead.com 
SoftQuad is shipping Version 2 of its HoTMetaL Pro HTML editor, along with
Panorama Pro, an SGML browser for the Internet. Among other features, HoTMetaL
Pro 2.0 provides filters that convert Word, WordPerfect, and Ami Pro documents
into HTML. Panorama Pro works in conjunction with Web browsers such as Mosaic
and Netscape to allow access to SGML documents. HoTMetaL Pro 2.0 sells for
$195.00; Panorama Pro retails for $139.00. 
SoftQuad Inc.
56 Aberfoyle Crescent
Toronto, ON
Canada M8X 2W4 
416-239-4801
http://www.sq.com
Watcom has announced Version 10.5 of its C/C++ development system. Version
10.5 includes Visual Programmer from Blue Sky Software, as well as Windows 95
support. Visual Programmer is a visual development tool and MFC code generator
for developing 16- and 32-bit applications. Watcom C/C++ 10.5 sells for
$350.00. Current users can upgrade for $129.00.
Watcom International
415 Phillip Street
Waterloo, ON
Canada N2L 3X2
519-886-3700
Visual Numerics has announced ObjectSuite, a family of object-oriented
development tools. The first member of the family to be released is the IMSL
Math Module for C++. This library is organized into classes of mathematical
and statistical algorithms. More specifically, the six sections include
complex arithmetic, vectors, matrices, matrix factorization, splines, and
pseudo-random-number generation. 
The Math Module will initially support Windows and Windows NT (using Visual
C++ and Borland C++), as well as UNIX workstations (Sun and HP). The Math
Module sells for $695.00 for both the UNIX and Windows implementations.
Visual Numerics

9990 Richmond Ave., Suite 400
Houston, TX 77042-4548
713-784-3131
Subtle Software has announced the release of its Subtleware Class Generator
(CGen), an interactive Windows tool that creates C++ class definitions from a
relational-database schema.
CGen lets you connect to a database, select a schema (table) in the database,
and generate C++ class definitions that model the selected database schema.
Once a C++ class has been generated, you can refine that class by adding
methods, data members, and superclasses. CGen runs under both Windows 3 and
Windows NT and works with virtually all databases using ODBC. CGen sells for
$89.00.
Subtle Software 
7 Wells Ave., Suite 27-28
Newton, MA 02159
617-558-4100
Forefront has announced its Forefront Help Buttons, a toolset for creating
buttons and toolbars for Windows Help files. The toolset adds Up, Down, and
Inactive button states to WinHelp, with different graphics or text for each
state. Buttons can be linked to an action (jump, pop-up, macro, and the like).
The Help Buttons also let you create buttons in secondary windows. The toolset
includes a menu-driven button editor which operates with the ForeHelp 2 author
system. ForeFront Help Buttons sells for $69.00.
Forefront Inc.
5171 Eldorado Springs Drive
Boulder, CO 80303
303-499-9181
TokenKeeper is a network-based license manager from Tower Concepts. The
software consists of four modules: a license manager, license generator, API,
and graphical/statistical viewer.
The license manager lets you incorporate management of software licensing into
end-user applications. End users can monitor the number of users and various
uses of the application. Communications is established with the clients on a
TCP/IP network. The license generator supports several licensing schemes:
time-expiration licenses, demo licenses, and option-specific licenses. The API
includes object libraries that allow interaction between the application and
the license manager. Finally, the viewer presents statistical information on
end-user license usage in a graphical format.
TokenKeeper, which is available for Sun, HP, SGI, and RS/6000 systems, sells
for $2995.00.
Tower Concepts
103 Sylvan Way
New Hartford, NY 13413
315-724-3540
http://www.tower.com
DXE, an API for reading, writing, and displaying AutoCad DWG and DXF files,
has been released by Tailor Made Software. The DXE API is divided into five
sections: DWG read, DWG write, DXF read, DXF write, and drawing display. The
software, which is available for an annual $5000.00 developer-support fee,
supports DOS, Windows, Windows 95, Windows NT, and UNIX AutoCad applications. 
Tailor Made Software
28006 122nd Place SE
Kent, WA 98031
206-631-1513
75017.1764@compuserve.com
Addison-Wesley recently published Btrieve Complete, by noted computer author
Jim Kyle. After an overview of how Btrieve organizes and manages data, the
book goes on to provide details on Btrieve's file structures and internal
operations. The accompanying diskette includes Btrieve tools and utilities.
The 528-page Btrieve Complete sells for $39.95. ISBN 0-201-48326-2.
Addison-Wesley Publishing
1 Jacob Way
Reading, MA 01867
617-944-3700
http://www.aw.com
PERTS (short for "Prototyping Environment for Real-Time Systems"), a modeling
tool for real-time systems, has been announced by Tri-Pacific. PERTS is based
on the Software Engineering Institute's Rate Monotonic Analysis methodology,
which allows you to "guarantee" performance and schedulability for real-time
system architectures. PERTS provides worst-case scheduling models so that you
can determine where bottlenecks are and whether or not the application will
meet performance criteria. PERTS currently runs on Windows NT, Sun, HP, and
RS/6000 systems. Licensing fees for the software start at $3500.00 for single
NT users and $5000.00 for single UNIX users.
Tri-Pacific
1070 Marina Village Parkway, Suite 202
Alameda, CA 94501
510-814-1775
76434.117@compuserve.com
Intel has announced Version 2 of its 3DR high-performance 3-D graphics library
for real-time animation and the display of photo-realistic images on
486/Pentium-based PCs running Windows. 3DR Version 2 adds a Geometry Pipeline
(3DR/GP) to the original technology's Rasterization Engine (3DR/RE) core to
support enhancements such as a fully general lighting model, texture
processing and UI toolkits, and a comprehensive geometric-math library.
Version 2 offers a general-purpose programming model, plus optimized
capabilities designed for games.
3DR/RE software provides Windows-compatible 3-D rasterization functions. With
its high-performance, software-only rendering providing base-line
capabilities, the 3DR Rasterization Engine provides scalable performance
through the Intel/Microsoft Display Control Interface (DCI) standard, 3D-DDI,
and custom 3DR drivers. With the addition of a Geometry Pipeline, 3DR Version
2 provides a geometry-level solution for games, multimedia, CAD,
visualization, and scientific applications. Both the RE and GP interfaces are
exposed, and application developers can write to either separately or to both
layers at the same time in the software stack. Version 2 includes the
following features: a complete set of 3-D primitives for rendering triangles,
polygons, polylines, lines, and points; raster images and bitmaps for fonts
and sprites; multiple texture-mapping algorithms with true perspective
correction, filtering, and mipmapping; true-color programming model at all
pixel depths; optional Z buffering with a complete set of logical Z
operations; antialiasing; transparency and alpha blending; and message-passing
architecture.
The 3DR Version 2 SDK, available free of charge from Intel's Architecture
Labs, includes 3DR development libraries, a run-time system, documentation,
and example programs. The run time can be distributed royalty free.
Intel Architectural Labs
800-253-3696 
3Dgraphics@intel.com 
http://www.intel.com
















EDITORIAL


Bellying Up to the Public Trough


If politicians and bureaucrats hadn't been inflicted upon us naturally, we'd
probably invent them anyway--for entertainment purposes, if nothing else.
After all, who but an elected official, Louisiana state legislator John Travis
in this case, would say "I can't believe that we are going to let a majority
of the people decide what's best for this state." 
Deserving honorable mention is the Patent and Trademark Office's Richard
Maulsby. In response to concerns that Patent and Trademark Office (PTO)
records would no longer be available free-of-charge on the Internet, Maulsby
responded "we're not in a position to offer all of this information [to
taxpayers] for free." Although Maulsby claims that the PTO relies entirely on
fees paid by companies and applicants, he seems to have forgotten that
taxpayers pay for the desk he sits at, the phone he uses, and the information
he's hoarding.
What prompted Maulsby's comment was the demise of government-database
dissemination by the Internet Multicasting Service (IMS), an organization run
by Internet pioneer Carl Malamud. Since January 1994, IMS has made available
to Internet users (at http://www.town.hall.org) a variety of government
databases, including the Congressional Record, U.S. Patent Database, and the
Security and Exchange Commission's "Electronic Data Gathering, Analysis, and
Retrieval" (EDGAR) database. EDGAR contains corporate records ranging from
shareholder proxy statements to notices of corporate takeovers. Access to
government databases via IMS has been amazingly popular. IMS has sent out 3.1
million SEC filings since January 1994, averaging over 16,000 documents per
day. Additionally, the nonprofit organization has distributed about 1.6
million patent documents during the same period.
One reason IMS access has been in demand is that it can cost as much as $600
to obtain online EDGAR data from commercial services like Mead Data,
Lexis/Nexis, and Disclosure. Although these businesses provide added value,
you still get the same raw data via IMS. Likewise, Moody's and others sell
CD-ROMs containing EDGAR data for up to a $1000 each. All in all, commercial
repackaging of EDGAR data is a $250 million-a-year business that is bound to
grow, as corporate electronic document filing becomes mandatory in 1996.
Yet, in an interesting turnabout, SEC Chairman Arthur Levitt, Jr. declared
that Internet distribution of SEC filings is of the "highest priority."
Likewise, SEC Commissioner Steven Wallman said that he'd "like to get back to
the original concept that the public has immediate access to this information
for free." Neither Levitt or Wallman have made any concrete proposals as to
how this should happen, however. Instead, an SEC spokesperson said that the
database might be put on a government BBS or the White House Web site.
Obviously, Levitt couldn't comment on other government databases, such the
PTO's.
Malamud claims that, when it comes to electronic distribution of government
documents, the SEC and PTO really don't have any choice. The recently passed
Paperwork Reduction Act, a key part of the GOP's Contract with America,
requires that government agencies go online with records and related
documents. In the meantime, Malamud has been lining up private assistance from
the likes of MIT, NYU, Sun, MCI, Time, R.R. Donnelley, and others. "We've done
two years of public service, thank you," Malamud said, noting that he has
personally financed parts of the project. 
In short, it's time for IMS to move forward in fulfilling its charter of
developing new users of the Internet--and for the government to live up to its
obligations to its citizens. 
But back to patents.... In a blow to the rapid development of a truly
interactive Internet, it appears there's a patent pending on the concept of
executable content--like that provided by the Java programming language and
supported to a lesser extent by leading Web browsers. According to recent
accounts, the patent, held by the University of California and licensed
exclusively to Eolas Technologies, covers the use of embedded program objects
("applets") within Web documents. The patent also covers the use of any
algorithm which implements dynamic, bidirectional communications between Web
browsers and external applications. 
As we're used to seeing with university technology-licensing programs,
"Weblet" technology was developed by a faculty member who then gained
exclusive rights through an "outside" company. Once again, publicly funded R&D
ends up benefiting private interests.
In this case, Dr. Michael Doyle, a University of California professor--and
coincidentally the CEO of Eolas--developed the technology in 1993. The first
application was an interactive, 3-D medical visualization used in conjunction
with the Mosaic-based Eolas browser. Interestingly, Doyle received his PhD
from the University of Illinois at Urbana-Champaign, where he was tangentially
associated with the National Center for Supercomputing Applications, which
developed Mosaic.
I guess luck really is what you make it.
Jonathan Ericksoneditor-in-chief











































LETTERS


C/C++ Compiler Comparison


Dear DDJ,
In his article "Comparing C/C++ Compilers" (DDJ, September 1995), Tim Parker
states: "Performance-wise, the GNU C compiler is unremarkable. It was about
average in speed tests during our first trials." Fair enough, I suppose. But
was he referring to compilation speed, or speed of the executed code?
He continues: "It has no optimizing capability, so we spent a couple of days
playing with flags and options. Eventually, we managed to find several useful
tweaks that improved performance a little, but the compiler still didn't win
any prizes."
Bzzzt. The only thing I can imagine is that Mr. Parker completely neglected to
read either the man page or the (very thorough) TeXinfo documentation for gcc.
Here's an excerpt from the gcc man page:
Optimization Options
-fcaller-saves -fcse-follow-jumps -fcse-skip-blocks
-fdelayed-branch -felide-constructors
-fexpensive-optimizations -ffast-math -ffloat-store
-fforce-addr -fforce-mem -finline -functions
-fkeep-inline-functions -fmemoize -lookups
-fno-default-inline -fno-defer-pop
-fno-function-cse -fno-inline -fno -peephole
-fomit-frame-pointer -frerun-cse -after-loop
-fschedule-insns -fschedule-insns2
-fstrength-reduce -fthread-jumps -funroll-all-loops
-funroll-loops -O -O2
And those are just the vanilla optimizations. There are also a few language
extensions that can improve efficiency. For example, first-class labels, which
can allow direct threading in interpreters (similar to continuation-passing
style) or computed jumps. Also, specification of return-value destination,
which is vaguely like the placement option in C++'s new operator.
I can't comment much on the quality of generated code, but I do know that, for
speed of compilation, GCC 2.5.8 under Linux is noticeably faster than Watcom
C++ 10.0 under DOS (the same 486/66 for both). And the Watcom compiler is
fast.
Todd Eigenschink 
eigenstr@CS.Rose-Hulman.Edu
Tim responds: Thanks for your note, Todd. Indeed, I did read the GCC man pages
and all other accompanying documentation you mention. My point in the article
was that, although you can set optimization through careful use of options and
flags, there is no "optimize" mode as with almost every other compiler. I
didn't offer my opinion on whether or not this is good or bad--just that it
was missing. As for speed, I double-checked my benchmarks several times, and
the results are applicable for both compilation and execution.


MFC versus OWL


Dear DDJ,
In his September 1995 "C Programming" column, Al Stevens wrote that "MFC is
not only the de facto standard Windows framework, it's the best one
available." Ugh!
I've been using Borland C++ and OWL for a few years, so I'll make comparisons
directly between MFC and OWL. I will not say that OWL is the best Windows
framework. I think it's very good, but I suspect that some of the commercial
frameworks (zApp, Zinc, and the like) are even better. 
How much does the framework cover? MFC totals about 45 classes; OWL, about 150
classes. MFC doesn't even cover all the basic Windows window types; for
example, it has no class to encapsulate radio buttons, nor one for check
boxes.
How much does the framework take advantage of object-oriented design? OWL:
Pretty well. OWL uses inheritance (including multiple inheritance); the class
hierarchy is fairly clean, and makes a lot of sense to anyone familiar with
Windows. For example, TRadioButton is a descendent of TCheckBox, which is a
descendent of TButton; TComboBox is a descendent of TListBox; and TEdit is a
descendent of TStatic. In contrast, MFC is a very "flat" class hierarchy.
CStatic, CEdit, CListBox, CButton, and CComboBox are all at the same level in
the class hierarchy.
How well does the framework allow the programmer to use object-oriented
design? Again, OWL does a pretty good job, and again, MFC does not. In
particular, I've found that writing object-oriented code for dialog boxes is
(in some ways) impossible with MFC. One of the things that I find myself doing
a lot in OWL is customizing dialog controls: If I need some new behavior from
a list box, I write a descendent class of TListBox, and in the dialog's
constructor I attach an instance of my list-box class to the list-box control
defined in the dialog's resource template. In MFC, a class instance can only
be attached to a dialog control defined in the dialog resource template
through the use of a temporary pointer. Since the attachment is temporary, the
class you've defined has no opportunity to modify the control's behavior. Of
course, there are ways around this, but they're somewhat ugly and, of course,
completely unnecessary in a well-designed framework.
Furthermore, there are issues related to 16-bit versus 32-bit platforms.
Borland uses the same compiler and same framework to generate both 16-bit and
32-bit applications; Microsoft has different compilers and (slightly)
different versions of MFC for the two different platforms. The separate
compiler issue is especially irritating for me--I have to support 16-bit
platforms for another two or three years; Visual C++ 1.52 doesn't even support
all the C++ language features I like to use (in particular, templates and
structured exception handling), and Microsoft has stated that they are never
going to add new C++ language features to Visual C++ 1.5x.
On the other hand, there are a lot of things to like about Visual C++ 2.x. I
suspect that when combined with a good third-party class framework, it would
be hard to beat as a development platform.
Jim King 
jim.king@mail.sstar.com
Dear DDJ,
I take exception to Al Stevens' opinion of OWL versus MFC ("C Programming"
September 1995). Not only do I feel that OWL is superior to MFC in every way,
shape, and form, but I also don't think that de facto standards are determined
by fiat or by the massive amounts of propaganda for MFC that Microsoft has
created to promote its own Windows application framework. I have found that
there is hardly a single area of Windows programming that is not easier to
implement using OWL than MFC, and I do not want to use an interface that makes
my job harder and programming less fun. Why don't we let programmers decide
what they like rather than attempt to coerce them into using a supposedly more
popular platform? Just because it is Microsoft does not make it the best, and
in this case, OWL does a magnificent job of creating an application framework
while MFC is only a slightly better than mediocre implementation. If you like
being as close as possible to the Windows SDK, you might want to stick with
MFC, but if you want to do real object-oriented programming with tremendous
flexibility and extensibility, OWL is the clear choice.
Edward Diener 
70304.2632@compuserve.com
Al responds: I agree with Edward when he asserts that the de facto standards
should not and cannot be created by decree. They occur naturally as the result
of the wide acceptance and usage of a convention by the practitioners in an
industry. MFC passes that test. Most compilers have licensed MFC, most
programmers prefer it, and most Windows-programming employment opportunities
require it.
Edward and I have different opinions about what we prefer in a framework class
library. Both libraries have their technical strengths and weaknesses. I
prefer MFC's close-to-the-bones approach over OWL's would-be object-oriented
shroud. A translucent veil rather than a blackout curtain. My opinion is, I
think, more mainstream than his. I did not, however, mean to suggest that he
and other OWL users should be forced to change. But I am flattered that he
thinks that my influence could coerce programmers to do anything they don't
want to do. 


PNG Patents?


Dear DDJ,
Jonathan Erickson's "Editorial" about PNG (DDJ, September 1995) claims that
"PNG is free and open, and available for use without fear of patent
infringement." How is it possible to write a new and useful program of
significance without fear of patent infringement? That is, how can you know
"there are no patents associated with PNG"? Has someone reviewed all existing
patents to ensure that PNG does not infringe? Even if this complete review of
all patents could be done, isn't this process subject to interpretation? What
about patent applications that are being reviewed at PTO and may be issued in
the future?
While it may be true that the inventors of PNG have not filed any patents,
isn't it possible that PNG could still infringe on one or more existing or
future patents? Isn't it possible that several years from now, PNG could be in
a state similar to that of GIF today?
Christopher Glaeser

cdg@nullstone.com


Satan Revisited 


Dear DDJ,
I have concerns over Jonathan Erickson's "Editorial" on the Satan program
(DDJ, June 1995). 
I installed and used Satan at my last job (at a university) and think that
Satan has gotten a bad rap for no real reason at all. Satan has come under
fire by the press, users, and system administrators alike. I just do not see
what the big problem is.
Contrary to (popular?) belief, Satan does not use any kind of magic to test a
system. In fact, to my knowledge, all of the so-called holes that it scans for
are documented, and can be found by anyone looking in the right places. Holes
that are discovered are posted to the Internet on a regular basis by CERT (if
memory serves correctly).
Anyone can find out this information by doing some footwork. I find that most
of the people who are having problems with Satan are the system administrators
that are not doing their job (IMHO) and are bitching about it. 
Use of Satan is not a concern if you do not have the holes that it looks for
(all the more reason to have Satan test your system before someone else does).
If people take the time to research the program, and the holes that it looks
for, they will find that it is not as bad as it seems. I believe that this is
one case where word-of-mouth just got outta hand.
James R. Twine
SJMR66B@prodigy.com


Thinking about Thunks


Dear DDJ,
Having written extensive thunking libraries for the x86 under OS/2 (refer to
CompuServe's OS2DF1 library 9), I wish to warn other developers about some
pitfalls which exist in the segmented x86 architecture when going from 32-bit
to 16-bit code. Because many of the 32-bit API calls in Windows 95 use thunks
back to underlying 16-bit code, these x86 architectural problems may cause
even flawlessly written application code which calls the 32-bit APIs to
encounter problems.
Thunking, for those not familiar with it, is the process of changing an x86
processor between the flat model of 32-bit protected mode and the 16:16 model
of 16-bit protected mode. In the 32-bit flat model, stack, code, and data are
addressed as simple linear 32-bit offsets from a starting position. In the
16-bit protected mode, all of these entities are addressed as a 16-bit segment
selector and a 16-bit offset from the start of the selector. In 16-bit mode,
the x86 segment can be no more than 64K in size. When thunking, you convert
between these two addressing schemes. Under both OS/2 and Windows 95, the
16-bit selectors are tiled; that is, where one 16-bit selector ends, the next
one starts. In other words, the 16-bit selectors are not overlapped. Not only
does this maximize the memory space available to the 16-bit mode, it prevents
a nasty block-move overlap bug from occurring. This scheme allows 16-bit code
access to any individual byte of data available to the 32-bit code.
Notice the careful wording of the last sentence: I talked about "individual"
bytes, saying nothing about "arrays" of such bytes. If an array in 32-bit mode
crosses one of the 64K boundaries between the tiled 16-bit selectors, it is
not accessible to underlying 16-bit code as a contiguous array. The physical
address of an array in 32-bit memory depends not only on what variables
precede it in your program, but also on what other programs are running. The
probability of a thunked API call failing randomly is given by the equation:
((size of passed array-1)/65,536. If array is one byte in length, the
probability of failure is zero; if array is 65,536 bytes long, only the one
correct alignment can succeed. Any array larger than 65,536 bytes in length is
guaranteed to fail. This problem is particularly insidious for any passed
array existing in application-program heap space--which may have a variable
location, depending on the options selected during the operation history of
the program. The result of this thunking problem is that even an application
program which contains no bugs can fail in a nonrepeatable, intermittent
fashion.
The stability of OS/2 jumped dramatically between revision 2.1 and 3.0 when
the entire graphics system was rewritten in 32-bit code. Thunked APIs are a
classic example of something which almost works. It has become difficult
enough to write stable, usable applications without the operating system
introducing intermittent problems. Unless--and until--Microsoft demonstrates
to the developer community that it has a stable, reentrant solution to this
array-boundary condition problem built into its thunking routines, the release
of Windows 95 is, in my opinion, inherently unstable, premature, and
unacceptable.
Bob Canup
73513.216@compuserve.com


Silicon Fixes


Dear DDJ,
In his column "Apple Talks the Talk and Walks the Dog at WWDC" (DDJ,
"Programming Paradigms," August 1995), Michael Swaine attributes SuperCard to
Silicon Graphics. The company that developed and distributes SuperCard is
Silicon Beach Software, makers of SuperPaint. It's my understanding that
Silicon Beach has no connection with Silicon Graphics.
Jack Herrington
jackh@axonet.com.au
DDJ responds: Right you are, Jack. Thanks.



























Networking Objects with CORBA


Component objects meet client/server




Mark Betz


Mark is director of object technologies at Block Financial's technology
center, where he works on distributed multimedia information systems. He can
be contacted at mbetz@conductor.com.


The evolution of object and network technologies is bearing fruit in the form
of broader system interconnectivity and more-modular application
architectures. Central to this evolutionary process is distributed object
computing (DOC), a model for application development that promises to
revolutionize how systems are conceived and implemented. Much has been written
about DOC over the last couple of years (see, for instance, Dr. Dobb's Special
Report on Interoperable Objects, Winter 1994/95), but only recently have
developers made use of DOC-based tools such as NeXT's Portable Distributed
Objects (PDO), IBM's System Object Model (SOM), and Iona's Orbix.(The latter
two are implementations of the Object Management Group's Common Object Request
Broker Architecture--CORBA--specification.) Before long, we're told, Microsoft
will make available distributed versions of its Component Object Model (COM)
technology. Still, the benefits of distributed-object computing are not
clearly understood. Consequently, in this article I'll present a CORBA-based
architecture designed to support a typical business--a virtual bookshop. This
project not only demonstrates the technologies underlying distributed objects,
but also the Internet technologies with which DOC must work to reach a
reasonable number of users. 
Figure 1 is a model of how a business like a virtual bookshop might operate.
The shop is "located" in a cluster of host machines attached to the Internet
backbone through a dedicated line to a local service provider. By virtue of
the Internet's IP packet-switching protocol, each of these hosts is uniquely
addressable among all other connected machines worldwide. Customers "arrive"
at the shop by connecting to the Internet and executing an interface
application. One benefit of using DOC as the underlying architecture of the
business is that your services can have various interfaces. Some clients might
use an HTML browser, while others might run an application that uses
distributed binary objects directly. You might even have to provide a shell
interface suitable for a terminal connection. The architecture accommodates
all these modes of access, as well as those that may be required in the
future. How will customers pay for their purchases? There are currently a
number of proposals for secure, electronic financial transactions, including
"DigiCash," "CyberCash," and "NetCash." In this discussion, I'll assume that
security technology exists to allow credit-card numbers to be transmitted in
privacy. 


A DOC Backgrounder


Before diving into the bookshop's architecture and connection model, it's
important to consider just what objects are being distributed and how they
support the business. Figure 2 illustrates a simple object model. Obviously,
to track what you have and what you've sold, you need a Book class. This would
be best derived from a more generalized inventory item class, but I'll ignore
good object modeling in order to focus on the system architecture. (Modeling a
distributed system in objects is a discipline that certainly deserves its own
article.) In addition to Book, the domain layer of the application contains
Customer, CreditAuth, Order, Service, and Session objects. The Customer and
Order classes are self-descriptive. CreditAuth encapsulates a connection to a
credit-authorization network to which credit-card purchases are submitted for
approval. Service is a class of objects that lets clients send you e-mail,
browse a company directory, and perform other customer-service-related tasks.
Session is attached to a Customer object on entry to the system; it captures
the state of the system for one user over one connection and can be used to
allow restart in cases of disconnection, for example.
Clearly, a number of databases are also necessary, at least for inventory,
customer, and order information. There will probably be an e-mail database and
a short-term-storage database for session information. Whether you think of
the distributed objects as interfaces to the databases or the databases as
persistence for the distributed objects doesn't much matter. The important
point is that the technology allows you a great deal of freedom in choosing
these components. When you're done, clients won't depend on the database
architecture, the networking model, the platforms used to support the
implementation, or even the interfaces of the distributed objects themselves.
This is the key benefit of DOC technology--it takes the core object-oriented
concept of encapsulation to its logical conclusion. Where object-oriented
languages allow encapsulation of state and behavior at the syntax level, DOC
allows physical encapsulation. When an object is linked into an application at
the binary level, it brings with it a host of dependencies that make it
difficult to reuse. The monolithic applications created on a binary-linkage
model become littered with these dependencies and grow more difficult to
maintain and extend. 
By contrast, a distributed object (the Customer class, for example) is
implemented once in a running process that is developed and maintained
locally, but available globally. It is available via a call-level synchronous
interface in which calls to remote resources look exactly like calls to local
objects and procedures. The actual implementation details are not complex from
a system-programming perspective, and the end result is well worth it:
Application developers can be presented with neatly packaged components that
export simple interfaces to distributed services. This borders upon the
self-supporting community of component developers and utilizers that has been
the promise of object technology since its beginning. The immediate benefit is
simpler, more-robust applications. Rather than encompassing all of the
business rules within its own source, an application evolves into a controller
and coordinator of distributed objects. To achieve the benefits of this model
and apply it to the sample business, you need two key facilities--networking
and object distribution. 


TCP/IP Networking


It would be difficult to overstate the importance of TCP/IP in making the
vision I've described a matter of routine system development. Originally the
language of interconnection between the nation's large civilian and military
research and engineering institutions, the TCP/IP protocol has exploded in
usage along with the Internet. The catalyst behind this expansion has been the
development of efficient serial-line protocols that carry TCP/IP traffic,
enabling low-cost dial-up connections to the net. Principle among these is
Point-to-Point Protocol (PPP). Major operating-system vendors (okay,
Microsoft) now standardly include a TCP/IP/PPP stack in their system-software
suite: Every copy of Windows 95, Windows NT, and OS/2 ships with TCP/IP. UNIX
vendors have been including TCP/IP with their systems since the University of
California, Berkeley added it to BSD (along with a raft of nifty utilities and
the idea of a "socket") in 1983 or so. I won't go into details of TCP/IP
design or operation beyond what affects the application model; suffice it to
say that it is an efficient protocol for wide-area interconnectivity of a
variety of small systems connected to a variety of networks.
Perhaps the most significant recent development involving TCP/IP is the
stampede of major commercial online services to implement it over their
backbones. Nearly every major service now operates PPP servers and allows
TCP/IP traffic for their customers with dial-up accounts. CompuServe's WinCIM
access software now runs on top of WinSock, the Windows sockets specification,
and can be used to access CompuServe over an Internet PPP connection.
Conversely, users can access Internet addresses over CompuServe dial-up nodes,
all of which can now access the provider's PPP servers. Recently, Compu-Serve
announced it was ditching the venerable "77777,7777" address form (based on an
old octal format) for a user alias of the form user@compuserve.com. Other
services such as Prodigy and America Online have reacted similarly; see Figure
3. We'll likely see a day when every online system in the world will be
addressable from every other online system. Fortunately, in addition to being
the lingua franca of modern networking, TCP/IP is also the preferred language
of most Object Request Brokers, the central component of a distributed-object
computing technology.
IP packet switching allows you to uniquely address and send packets to
literally millions of machines. TCP provides a safe, stream-oriented interface
which packetizes a flow of data, delivers it to IP for transport to the remote
system, and there reassembles the stream, guaranteeing packet order and
integrity. Pretty impressive technology, but the devil is in the
details--what's in all those little packets of data? TCP/IP is a network-level
protocol, actually a set of protocols called a "stack." When an application
uses TCP/IP to deliver data to a remote machine, it has to decide what to put
in on one end, and what it all means on the other. This constitutes a
requirement for an application-level protocol. Early Internet applications
were e-mail and file-copy utilities, each supported by its own protocol.
E-mail and file transfer remain the Internet's primary uses, but other
application protocols have emerged. A recent addition is the HyperText
Transport Protocol (HTTP), the river that carries the World Wide Web. This
protocol uses TCP/IP to transmit hypertext HTML documents rendered by a
browser.


RPC and IDL


Application-level protocols are a bother to develop: The task is time
consuming and difficult to standardize from one application to another. Early
developers of network apps already used a standard protocol for communication
between the various parts of every application--an API. When one part of a
program needs a service, it calls a procedure located in another part of the
program. In a sense, the formal argument list and return value of the
procedure specify a microprotocol that the compiler and processor collaborate
to implement for us. To achieve a more fully distributed application model,
you simply move some of those procedures to other machines on the net, where
they can be maintained by people who understand them and used by those who
need them. Why not have a call-level interface that spans a network
connection? This is just the effect that developers of early Remote Procedure
Call (RPC) models achieved. RPC models implement another layer of abstraction
on top of the network interface. When a client executes a procedure, the call
is directed across the net to a server on a host, and the results are
returned. As Figure 4 shows, an RPC model allows you to view the distributed
call stack exactly as you would a local call stack.
Developers of RPC models also strove for platform and language independence.
To be truly useful as a development paradigm, RPC mechanisms had to function
in heterogeneous environments where different languages were used to implement
systems on various platforms. This required two key facilities: a
language-independent means of describing a procedure and a system-level
facility for handling differences in data representation. The first
requirement is satisfied by Interface Definition Language (IDL), a declarative
syntax used to describe procedures, their formal arguments, and their return
types. A compiler translates the IDL declarations into client and server
source in a particular language, which might be different for each. IDL is
implementation neutral and can be translated into any language that includes
the concept of a call or method invocation. The second requirement is
satisfied by "marshaling," a system-software process that handles seamless
data representation, conversion of arguments, and results in terms of word
width, endianness, and the like. 
What object orientation did for procedural models of application development,
distributed objects do for RPC mechanisms. Instead of describing procedures,
CORBA IDL describes objects in terms of "interfaces"--named sets of attributes
and methods similar in appearance and syntax to C++ classes. Unlike C++, IDL
has no syntax that implies implementation--no pointers, flow-control
constructs, or anything that reserves storage in any sense. An interface is
exactly that--a set of method names with their formal argument lists and
return types. Even IDL attributes are simply the implication of Get/Set
methods for a named value. If you're switching back and forth between C++ and
IDL concepts, consider interfaces and classes roughly synonymous within each
context. In addition to interfaces, IDL allows the declaration of supporting
types, including structs, enums, exceptions, strings, and a type of container
called a sequence. IDL is translated into an implementation language by an IDL
compiler which has a C-compatible preprocessing phase, so C preprocessor
constructs such as comments and # directives (such as #define) are also
supported. Example 1 is an IDL interface for the Book class that illustrates
some of these concepts. 
Once run through an IDL compiler, the interface and type declarations are
translated into corresponding language elements. For the remainder of this
article, I'll use C++ as an implementation language, though mappings for C and
Smalltalk are also in use. C++ interfaces are translated into several classes,
while enums, structs, unions, arrays, and strings are translated into their
C++ counterparts. In the case of strings, the translation is to char*, though
with the acceptance of a string class as part of the ANSI C++ standard, it's
reasonable to expect its use in the future. In addition to the translation of
the IDL classes, a fair amount of source code is generated to support various
parts of the CORBA architecture. For a given interface compilation, the result
is a common header file, containing the classes and supporting types, and two
source files, one each for the client and server executables. By convention,
the header file name has the form interface.hh. The two source files, which
contain implementations of the interface classes and supporting code, are
named interfaceC.cpp and interfaceS.cpp for the client and server,
respectively.
The IDL class generated for use on the client side is known as a "proxy." It
declares virtual functions that correspond to the methods in the IDL interface
from which it was compiled. The client-side source code implements these
functions as stubs that make calls to the remote server, in a model very
similar to RPC mechanisms. The client proxy classes are all behavior and have
no size. Clients can use these classes directly, although it is better to
layer in a wrapper class on top of them for several reasons: First, while the
C++ IDL mapping is now an accepted part of the CORBA standard, no available
compilers conform to it 100 percent. Consequently, it is valuable to isolate
client code from the specifics of a given mapping. More importantly, the life
cycle of a client proxy is not exactly that of a local C++ class. Clients
declare a proxy class pointer, then assign it the return value of a function
called _bind, a static method of the proxy class. The binding associates the
proxy with a remote-server process running on some host. Once _bind
successfully returns, the client can make calls on the proxy as if it were
local. In short, the wrapper class hides _bind, server and host selection, and
CORBA exception handling from the client code. I've found it useful to create
Windows DLLs that export a wrapper-class API. Other interfaces based on this
architecture (VBX and OCX interfaces, for instance) are also possible. 


Implementing Component Objects


The implementation side is more complicated. 
Along with the client proxy, the IDL compilation process produces a class
whose form is derived from a section of the CORBA specification that describes
the Basic Object Adapter (BOA)--a set of services that connect an
implementation to the ORB and help it handle requests. The BOA's most
prominent feature is the implementation parent class (the BOAImpl class in
Orbix, for instance), which is similar to the client proxy from which it is
derived. However, each virtual function in this class is pure virtual, so the
class is not instantiable. Rather, it is meant to be used as a derivation
point for the implementation class. To provide an implementation of an IDL
interface, you derive a class from BOAImpl and implement each of the virtual
functions. These functions might simply return a value, or they might call
down to an SQL-database access layer such as DBLibrary or connect to another
remote resource. Because the object server is a process, it needs a main() (or
WinMain(), as the case may be) in which calls can be made to initialize the
server and the ORB run-time libraries linked into it.
The run-time libraries, which are different for the client and server, provide
support for the proxy and implementation classes. On the client side, they
marshal the parameters to a call, establish a TCP/IP connection to a server,
and transmit the call and parameters. The server library receives the call,
unmarshals the arguments, and uses a dispatch class to match the name of the
method to an entry point in the implementation class. This entry is then
called; when it returns, any return value follows a similar route back to the
client. 
Two other components of the system need to be visited. The first is a set of
services--including server startup and termination--that form the ORB core
within the CORBA specification. How they are provided is up to the
implementor. Iona's Orbix provides a daemon process that runs on server and
name-lookup hosts, but is not needed on client machines. 
The second component is the locator facility. CORBA defines interfaces by
name, just as C++ defines classes by name. Interfaces are implemented in
servers that run on hosts. Hosts have unique names within a network, and
servers are uniquely named within a host. The locator service takes the name
of a server and returns a list of hosts that run it. It is up to the client to
decide which host to request a server binding from. Clients can be hardwired
to a specific host, but use of the locator facility grants clients complete
independence from the back-end host/server structure.


Designing the Virtual Bookshop 



You can approach the architectural design of the virtual bookshop at three
levels: the distributed-object model, the composition of the objects into
server processes and the client/server architecture, and the host architecture
on which the servers run. 
A CORBA object server may implement any number of interfaces. In this model,
I'll use one server process per object type. In reality, you might group
related objects into one server for efficiency. The architecture of the
servers is more or less set by the CORBA implementation: a main entry point,
the interface implementation classes, the ORB run-time libraries, the socket
DLL, the TCP/IP stack, and potentially connection layers for other resources
such as databases. On the client side, I'll use proxy wrapper layering--one
proxy and one wrapper for each object in the model. The client will also
contain user-interface code and controlling logic. Figure 5 shows the
client/server architecture for this example. The HTTP server for the Web
interface is simply a special-case client of the ORB. 
Realizing this architecture would be a matter of implementing each of the
servers, as well as the client libraries. There is a lot of potential and
attendant flexibility for the design of supporting services that rely on the
ORB but are outside the scope of its specification. Servers will benefit from
object garbage collection, remote-event reporting and server control,
centralized security, and so on. These services can be provided at the ORB
interface level, without resorting to specialized network programming. The
clients offer perhaps the greatest potential and challenge for achieving
object independence. The layering concept can be expanded to allow clients to
load object definitions dynamically, and objects can supply their own
user-interface constructs, which helps tremendously when dealing with
versioning issues. Once the clients and servers have been designed, it remains
only to outline a strategy for structuring the physical resources on the back
end.
That strategy involves answering many questions: 
What platform should the servers run on? CORBA implementations are available
for all major platforms, from Windows NT and Windows 95 to flavors of UNIX,
AS400, and the like. 
How many servers are needed for a given object type? Connections to servers
are limited by the TCP/IP stack-connection limits, efficiency and performance
concerns, and the number of clients exposed if a given server goes down. 
How many servers should run on a given host, and of what types? You may choose
to run one type of server on a host, or to roll out a suite of servers
duplicated on every host. 
What database systems will be used, on what platforms, and how will the object
servers communicate with the databases? I've built systems involving TCP/IP
connectivity between clients and object servers, and NETBEUI connectivity
between object servers and Microsoft SQL Server databases. Many other models
are possible. 
Each question may be answered differently for a given app, but you can
determine which back-end components are necessary and propose an organization
for this example.
The bookshop will need one object server for each type of object in the model.
Each server will run on a separate host with a daemon to start it, stop it,
manage it, and provide information about it to clients. Initially (until the
customers roll in) you'll have one host per type. If business booms and the
Order server host is overwhelmed, you can roll out another by simply adding a
machine to the net, installing the ORB and server files, and adding a line to
the locator database. You'll also need at least one fat database server. My
guess is that you could get a very good start using SQL Server on a fast
Pentium and grow from there. The other components needed are a name host and a
Web server. The name host is a specialized host that runs the ORB daemon and
maintains the locator database. It is the only host that clients need to know
the name of. At run time, a client will specify this host as the lookup server
for bind requests. It will see a lot of traffic, but each connection will last
just a few milliseconds under normal circumstances. The Web server is a
specialized client of the ORB, which queries distributed objects and formats
the results into HTML documents for return to a browser. I won't go into
detail about the HTML component. HTTP is sufficiently flexible to accept input
from the user and to query objects. Figure 6 shows the completed system
architecture for the book shop.


Conclusion


If you build a virtual business, will customers come? Well, that's not my area
of expertise, but a lot of money is being bet on it. More to the point, if
customers do arrive, the architecture presented here will scale nearly
infinitely through the addition of more hosts and servers, additional
lookup-server capability, faster database servers, and the like. As your
business grows, you can change the platforms on which the servers are
implemented, replace database systems, radically redesign data schemas, or
make any other changes to the back end that you wish--and all without
affecting a single client. If the World Wide Web is ultimately replaced with a
different interface, you can accommodate it, as long as it provides a means of
calling a library on whatever it uses for a server. 
What I've described here carries distributed-object computing to its
extreme--objects implemented on many platforms, for many operating systems,
collaborating on a worldwide basis over the global Internet. But these
technologies are just as useful when objects are implemented only one
departmental workgroup away. Either way, the technologies of
distributed-object computing will provide simple solutions to complex
client/server development tasks. 
Figure 1: A virtual bookshop on the global TCP/IP Internet.
Figure 2: Simple object model for the virtual-bookshop application.
Figure 3: Convergence of major TCP/IP networks with commercial and ad hoc
providers.
Figure 4: Local versus distributed call stack
Figure 5: Partial client/server architecture for the virtual-bookshop
application.
Figure 6: Physical architecture of the virtual bookshop.
Example 1: Sample IDL Book interface and supporting types.
/* IDL declarations for the Book interface and supporting types */
typedef sequence<string> Authors;
enum MediaCode
{
 mcPaper,
 mcHard,
 mcCassette,
 mcQuality
};
struct PubInfo
{
 string House;
 string Date;
};
interface Book
{
 Authors GetAuthors();
 string Title();
 float Price();
 float Discount();
 PubInfo GetPubInfo();
 string ISBNCode();
 MediaCode Media();
};











































































Your Own Endian Engine


Solving memory-order and addressing challenges in C




John Rogers


John, a programmer in the Seattle area, can be contacted on CompuServe at
72634,2402.


Back in the days before heterogeneous networks and client/server
architectures, life was relatively simple for programmers. IBM-supplied
mainframes ruled the roost, a byte was eight bits, and numbers were always in
Big-endian order. Now, however, there are C compilers in which char takes nine
bits. There are even rumors of 10-bit byte machines. It wouldn't surprise me
to see 16-bit char implementations (for Unicode) in a couple of years.
Furthermore, today's computers can switch between Big- and Little-endian at
boot time, and even juggle different byte orders between data and program
space. Moreover, Big- and Little-endian aren't the only byte-ordering
schemes--there are also Middle-endian systems, such as the DEC VAX.
Most computers address memory by byte, although word-addressable computers
have been around for decades. For C programmers, the term "byte" has become
shorthand for "the smallest chunk of addressable memory." The best alternative
term I've seen for this is "minimum addressable unit" (MAU), from the IEEE
standard for a portable object file format (MUFOM). Consequently, I now use
the term "MAU order" in place of "byte order." 
To alleviate the confusion caused by differing orders and sizes, I've
developed an "endian engine" that handles every byte order (including every
possible Middle-endian order) and every byte size, native or not. I've even
used this engine to simulate a 36-bit machine with 9-bit bytes on a 32-bit
machine with 8-bit bytes. In this article, I'll show how the engine can be
used to implement big-integer routines. The engine and the routines are
building blocks for a simulator I hope to write that would emulate Knuth's
upcoming 64-bit computer (MMIX) with just about any ANSI/ISO C compiler.


Two Things Not To Do


There are a couple of familiar workarounds for the byte-ordering problems;
neither is recommended, however. The first is the swab (swap bytes) function
provided in many versions of UNIX, as far back as Version 7 (1979) and as
recently as 4.4BSD-Lite (1994). The swab function only exchanges pairs of
adjacent char values--useful in the days of the 16-bit PDP-11, but it's a
relic now.
The other method is to use a union in C. However, the ANSI C standard says,
"if a member of a union object is accessed after a value has been stored in a
different member of the object, the behavior is implementation-defined."
(Section 3.3.2.3, which also lists one exception that doesn't help this
particular problem.) Elsewhere, ANSI C says that strictly conforming programs
shall not depend on any implementation-defined behavior.


BSD Notation


Several schemes exist for byte-ordering notation. Both Bolsky and Plauger
notations, for instance, can have a leading zero (0123 is a 4-byte,
Little-endian value in Plauger notation). This can cause subtle problems in C,
where a leading 0 indicates an octal value.
"BSD notation" is my name for the byte orders returned by the 4.4BSD sysctl
library function. In BSD notation, 4321 is a 4-byte, Big-endian value. Leading
zeros are not possible, so octal confusion is avoided. The 4.4BSD-Lite source
CD-ROM (available from O'Reilly & Associates) includes various endian.h header
files that define equates in this notation. (The files are for kernel use
only.)
Unfortunately, since the three notations I've mentioned are all in decimal,
none work well with 16-byte entities. BSD notation is handy for simple things
and is available from a 4.4BSD library routine (sysctl), but it isn't powerful
enough to completely describe the format of something in memory. For
convenience, I've provided a way to use BSD notation with the endian engine,
via a routine that accepts BSD notation.
Table 1 is a list of the byte orders for a number of systems. The list uses
BSD notation where necessary. Please e-mail me any additions or corrections to
the list.


Describing Memory Format


Listing Two, end.h, contains Memory_Format_T, a structure that describes the
format of the memory for a given entity. Memory_Format_T has four members:
The size of a minimum addressable unit (MAU) in bits. 
MAU_Order_T, an enum type that is shorthand for the MAU order. MAU_Order_T is
declared in the rep.h header, and its values are BigEndian, LittleEndian, and
MiddleEndian; see Listing One. 
The number of MAUs in the entity.
A pointer to an array of references indicating exactly which MAU goes where.
This is necessary only if MAU_Order_T is MiddleEndian.
In BSD notation, the byte order for a 4-byte, Big-endian value would be 4321.
The equivalent array of MAU references is {4,3,2,1}. The DEC PDP-11 (a
Middle-endian machine) would have an array of MAU references of {3,4,1,2} for
a 32-bit integer on that machine. (The pointer to this array must be NULL for
the BigEndian and LittleEndian MAU_Order_T values.)
Example 1 is C code that initializes Memory_Format_T structures for four
different systems, including the Middle-endian PDP-11 and the 36-bit Honeywell
6000 (which has 9-bit char values).


Swapping "Bytes": EndSwap


The main routine in the endian engine (EndSwap) copies data and swaps MAUs as
it goes. EndSwap requires two memory formats: one for the destination and
another for the source. All the routines I declared in end.h can be used
outside the endian engine, but EndSwap is the only one an application must
call. Listing Two is the end.h source. EndSwap is in Listing Three. A call to
it appears in Listing Four (the BigSwap routine).


Other Ways to Build Memory Formats


Besides building them directly in the application, I've provided two other
means to create Memory_Format_T values. Given a byte order in BSD notation,
EndBSDByteOrderToFormat will allocate and fill in a Memory_Format_T structure.
EndLearnNativeUnsignedLongFormat will build a Memory_Format_T structure for
the native unsigned long type in data space; this order may not apply in
program space or to other data types. Both routines return NULL if out of
memory or an error occurs. Callers should deallocate the memory allocated by
those routines via the standard C free function. The source code for these
functions is available electronically; see "Availability," page 3.



Big-Integer Routines


Eventually, I want the assembler and simulator that I write for the 64-bit
Big-endian MMIX computer to run on my 32-bit Little-endian computer. In the
middle, I envision a set of big-integer routines that would perform (at least)
64-bit integer math on any ANSI/ISO C system. The big-integer routines would
be useful for compilers, debuggers, encryption, and the like. (Contact me at
72634.2402@compuserve.com if you'd like to examine the "draft standard" for
these routines.) My implementation of the big-integer library uses the endian
engine; other implementations may vary.
Sample big-integer routines are available electronically. The BigAdd routine
adds two big integers using an optional layout. BigAdd calls BigSwap to
convert each number to native layout. BigSwap uses the endian engine (EndSwap)
to do the actual swap. Then BigAdd does the addition and calls BigSwap to
convert the sum from native layout back to the caller's layout. The current
version of BigAdd only supports one of the four usual representations for
integers. In the future, I'll add support for two's complement, one's
complement, and signed magnitude.
The rep.h header (Listing One) is shared between the big-integer routines and
the endian engine. bigimpl.h (available electronically) and anything beginning
with the prefix Big_ are not intended for use outside my big-integer code.


Endian-Engine Limitations


I faced a dilemma when I wrote the endian engine. What should it do if asked
to process an entity that isn't an exact multiple of the number of bits in a
char in that implementation of C? Consequently, if, when swapping data, a
destination is not an integral number of char values, then: 
Destination size is rounded up to an integral number of char values.
Additional bits of the destination have unspecified values after the swap.
Additional bits are located as if they were simply more high-order bits in
their usual positions.
Also, the total number of bits (MAU sizexMAU count) for the destination and
source must be identical.


Design Decisions


This version of the endian engine includes optimizations for the most common
combinations of formats. EndSwap uses EndSameFormat (available electronically)
to determine if a straight memory copy would be correct; if so, EndSwap calls
EndCopy (also available electronically). The next optimization EndSwap tries
is for simply reversed char order; it uses routines in EndRev.c (see
"Availability") to check for and handle that case.
If it can't use either of those optimizations for a given case, EndSwap calls
EndSmallestMoveSize to determine the maximum number of bits that it can move
together in a single chunk. EndSmallestMoveSize bases this computation on the
number of bits in a destination MAU, the source MAU size, and the native char
size in bits. The smallest size into which those three things can be divided
is the greatest common denominator (GCD) of those three numbers. EndSwap loops
for each chunk of this number of bits. For the actual bit processing (such as
moving groups of bits in the EndSwap routine), I use the bit-operation macros
I described in my article "Bit Operations with C Macros" (Dr. Dobb's
Sourcebook of PowerPC Programming, September/October 1995). The MVBITS (move
bits) bitops.h macro is one used in the endian engine.
Another possible optimization would be to build a table of "moves" for any
given pair of formats. Each "move" would describe a set of contiguous bits in
the source and how and where that set of bits would end up in the destination.
Finally, the resulting table could be ordered so that only one pass over the
destination would be necessary. (Perhaps this use of fewer writes would speed
up some hardware caches.) Having built the moves table for a given pair of
formats, I would cache it for later use. All this may seem like a lot of work,
but it would probably only need to be done once.


Picking Up the Pieces


The only missing piece of the endian engine is a routine to compute the GCD of
two unsigned numbers. The book C: A Reference Manual, Third Edition, by Samuel
P. Harbison and Guy L. Steele, Jr. (Prentice-Hall, 1991), provides listings of
GCD functions in C. The endimpl.h header (available electronically) declares
EndGCD as a function. I suspect you could "#define EndGCD gcd" or whatever the
name of your GCD routine is.
You'll also need the bitops.h header (described in my previous article, which
contains the MVBITS (move bits), BTEST (bit test), and IBSET (bit set)
bit-operation macros.


The Future: Prevention


If you design a computer architecture, file formats, I/O interface, protocol,
or the like, I strongly recommend you specify Big-endian order. In his classic
paper "On Holy Wars And A Plea For Peace," Danny Cohen said:
To the best of my knowledge only the Big-Endians...have built systems with a
consistent order which works across chunk-boundaries, registers, instructions
and memories. I failed to find a Little-Endians' system which is totally
consistent.


References


ANSI X3.159-1989. American National Standard for Information
Systems--Programming Language--C.
4.4BSD-Lite Berkeley Software Distribution CD-ROM. Sebastopol, CA: O'Reilly &
Associates, 1994. 
Bolsky, M.I. The C Programmer's Handbook. Englewood Cliffs, NJ: Prentice-Hall,
1985. 
Cohen, Danny. "On Holy Wars And A Plea For Peace." USC Information Sciences
Institute (ISI) IEN 137. (April 1, 1980).
Cullens, Chane. "Serialization and MFC." Dr. Dobb's Journal (April 1995). 
Erdelsky, Philip J. "Portable Byte Ordering in C++." C/C++ Users Journal
(January 1995).
Fairman, William and Randal Hoff. "Cross-Platform Database Programming." Dr.
Dobb's Journal (March 1995).
Gillig, James R. "Endian-Neutral Software." Dr. Dobb's Journal (October
andNovember 1994).
Harbison, Samuel P. and Guy L. Steele, Jr. C: A Reference Manual, Third
Edition. Englewood Cliffs, NJ: Prentice-Hall, 1991. 
IEEE Std 695-1990. IEEE Standard for Microprocessor Universal Format for
Object Modules. 
Plauger, P.J. "You Must be Joking." Computer Language (April 1987). 
Rogers, John. Draft Standard for Big Integer Routines for the C Programming
Language. Draft 0.1 (December, 1994). E-mail at 72634.2402@compuserve.com.
--------. "Bit Operations with C Macros." Dr. Dobb's Sourcebook of PowerPC
Programming (September/October 1995).
Table 1: Byte-order master list. Notation: 4321 is Big-endian, and so on.

 Processor OS Order References
AT&T 3B Any Big-endian Bolsky
AT&T 3B2 Any * Plauger
DEC PDP-11 Any ** Bolsky, 4.4BSD-Lite source, Cohen, Plauger
DECsystem-10 Any Big-endian Cohen
DEC VAX Any *** Bolsky, Cohen, DEC VAX-11 ProgrammingCard, Plauger
Honeywell 6000 All **** Bolsky
HP-PA 7100 NT Little-endian Cullens
HP-PA 7100 UNIX Big-endian Cullens
IBM AS/400 Any Big-endian Gillig
IBM System/360 Any Big-endian Cohen
IBM System/370 Any Big-endian Cohen, Gillig, Plauger
Intel 80x86 Any Little-endian Bolsky, Cullens, Erdelsky, Fairman, Gillig
MIPS NT Little-endian Cullens
MIPS UNIX Big-endian Cullens
MMIX Any Big-endian Knuth (1992 draft)
Motorola 680x0 Any Big-endian Cohen, Cullens, Erdelsky, Fairman,
Gillig,Plauger
NSC16000 Any Little-endian? Bolsky
NSC32016 Any ***** Plauger
PowerPC NT Little-endian Cullens, Gillig
PowerPC Any Big-endian Cullens, Gillig
RS/6000 Any Big-endian Cullens, Fairman
SPARC UNIX Big-endian Cullens
Zilog 8000 Any Big-endian Bolsky
*AT&T 3B2: Depends on where data is stored: 32-bit integer (data space) 4321
 32-bit integer (program space) 1234
**DEC PDP-11: Middle-endian: 16-bit integer 12
 32-bit integer 3412
 32-bit float (F) 3412
 64-bit float (D?) 78563412
***DEC VAX: Middle-endian: 16-bit integer 12
 32-bit integer 1234
 64-bit integer 12345678
 128-bit integer 123456789(10)(11)(12)(13)(14)(15)(16)
 32-bit float (F) 3412
 64-bit float (D) 78563412
 64-bit float (G) ?
 128-bit float (H) ?
 Packed decimal 78563412
****Honeywell: Big-endian, 36-bit system, with 9-bit characters!
*****NSC32016: Depends on where data is stored: 32-bit integer (data space)
1234
 32-bit integer (program space) 4321
Example 1: Initializing Memory_Format_T structures for four different systems.

Memory_Format_T BE_36_Format; /* Big-endian 36-bit format. */
Memory_Format_T Intel_32_Bit_Format;
Memory_Format_T Motorola_32_Bit_Format;
Memory_Format_T PDP11_Format;
MAU_Number_T PDP11_Order[4] = {3,4,1,2};
/* Fill in big-endian 36-bit format, with 9-bit MAUs.
 * The Honeywell 6000 uses this format.
 */
BE_36_Format.MAU_Size_In_Bits = (Bit_Number_T) 9;
BE_36_Format.MAU_Order = BigEndian;
BE_36_Format.MAU_Count = (MAU_Number_T) 4;
BE_36_Format.MAU_References_Array = NULL;
/* Fill in Intel 32-bit (little-endian) format. */
Intel_32_Bit_Format.MAU_Size_In_Bits = (Bit_Number_T) 8;
Intel_32_Bit_Format.MAU_Order = LittleEndian;
Intel_32_Bit_Format.MAU_Count = (MAU_Number_T) 4;

Intel_32_Bit_Format.MAU_References_Array = NULL;
/* Fill in Motorola 32-bit (big-endian) format. */
Motorola_32_Bit_Format.MAU_Size_In_Bits = (Bit_Number_T) 8;
Motorola_32_Bit_Format.MAU_Order = BigEndian;
Motorola_32_Bit_Format.MAU_Count = (MAU_Number_T) 4;
Motorola_32_Bit_Format.MAU_References_Array = NULL;
/* Fill in PDP-11 middle-endian format. */
PDP11_Format.MAU_Size_In_Bits = (Bit_Number_T) 8;
PDP11_Format.MAU_Order = MiddleEndian;
PDP11_Format.MAU_Count = (MAU_Number_T) 4;
PDP11_Format.MAU_References_Array = PDP11_Order;

Listing One
/* Rep.h -- Copyright (c) 1995 by JR (John Rogers). All rights reserved.
 * AUTHOR - JR (John Rogers), 72634.2402@CompuServe.com */
/* Gracefully allow multiple includes of this file. */
#ifndef REP_H
#define REP_H
typedef enum {
 NoSign, OnesComplement, SignedMagnitude, TwosComplement
} Int_Rep_T;
typedef enum {
 BigEndian, LittleEndian, MiddleEndian
} MAU_Order_T;
/* This must have a range of at least 0..64 */
typedef unsigned char Bit_Number_T;
/* This must have a range of at least 0..64 */
typedef unsigned char MAU_Number_T;
typedef struct {
 MAU_Order_T MauOrder;
 Int_Rep_T IntRep;
 Bit_Number_T BitsPerMau;
 /* This next field must be NULL if MauOrder is
 * BigEndian or LittleEndian. */
 MAU_Number_T * MAU_References_Array;
} Int_Rep_And_Format_T;
#endif

Listing Two
/* End.h -- Copyright (c) 1995 by JR (John Rogers). All rights reserved.
 * AUTHOR - JR (John Rogers) 72634.2402@CompuServe.com */
/* Gracefully allow multiple includes of this file. */
#ifndef END_H
#define END_H
/******************* I N C L U D E S *****************/
#include <boolean.h> /* Boolean_T. */
#include <rep.h> /* MAU_Number_T, etc. */
/************************ T Y P E S ******************/
typedef struct {
 MAU_Order_T MAU_Order;
 Bit_Number_T MAU_Size_In_Bits;
 MAU_Number_T MAU_Count;
 /* This next field must be NULL if MAU_Order is BigEndian or LittleEndian.
 * A big-endian value would be treated as equivalent as an array of
 * {4,3,2,1} values for this, if MAU_Count happens to be 4. */
 MAU_Number_T * MAU_References_Array;
} Memory_Format_T;
/******************* R O U T I N E S ****************/
Memory_Format_T * /* Return NULL on error. Use free() when done. */

EndBSDByteOrderToFormat(
 int Byte_Order_Digits); /* 4321 = big-endian. */
Memory_Format_T * /* Return NULL on error. Use free() when done. */
EndLearnNativeUnsignedLongFormat(
 void);
/* Boolean_T
 * EndMAUReferenceValid(
 * MAU_Number_T ref,
 * MAU_Number_T number_of_MAUs);
 */
#define EndMAUReferenceValid( ref, number_of_MAUs ) \
 ( \
 ( (ref) > (MAU_Number_T) 0 ) \
 && ( (ref) <= (number_of_MAUs) ) \
 )
Boolean_T
EndSameFormat(
 const Memory_Format_T * One,
 const Memory_Format_T * Another);
void
EndSwap(
 void * Dest,
 const void * Src,
 const Memory_Format_T * Dest_Format,
 const Memory_Format_T * Src_Format);
/* Bit_Number_T
 * EndTotalBits(
 * Memory_Format_T * Format);
 */
#define EndTotalBits(Format) \
 ( (Bit_Number_T) ( \
 ((Format)->MAU_Count) \
 * ((Format)->MAU_Size_In_Bits) ) )
Boolean_T
EndValidFormat(
 const Memory_Format_T * Format);
#endif

Listing Three
/* EndSwap.c -- Copyright (c) 1995 by JR (John Rogers). All rights reserved.
 * POLICIES - This version of the endian engine implements the
 * following policies when swapping data: If a destination is not an 
 * integral number of parts (chars), then:
 * a) the destination size is rounded up to an integral number of parts.
 * b) the additional bits have unspecified values after the swap.
 * The total number of bits (MAU size times MAU count) for the destination 
 * and source must be identical.
 * AUTHOR - JR (John Rogers) 72634.2402@CompuServe.com
 */
#include <assert.h> /* assert */
#include "bitops.h" /* MVBITS(). */
#include <end.h> /* EndSameFormat(). */
#include "endimpl.h" /* Moves_T, etc. */
#include <stddef.h> /* NULL */
void
EndSwap(
 void * Dest,
 const void * Src,
 const Memory_Format_T * Dest_Format,

 const Memory_Format_T * Src_Format)
{
 Bit_Number_T Bits_Left;
 Part_T * Dest_Parts = (Part_T *) Dest;
 Bit_Number_T Dest_Total_Bits;
 Bit_Number_T Low_Logical_Bit_Of_Chunk;
 Bit_Number_T Smallest_Move;
 const Part_T * Src_Parts = (Part_T *) Src;
 Bit_Number_T Src_Total_Bits;
 /* Validate caller's arguments. */
 assert( EndValidFormat( Dest_Format ) );
 Dest_Total_Bits = EndTotalBits(Dest_Format);
 assert( Dest_Total_Bits > (Bit_Number_T) 0 );
 assert( EndValidFormat( Src_Format ) );
 Src_Total_Bits = EndTotalBits(Src_Format);
 assert( Src_Total_Bits > (Bit_Number_T) 0 );
 assert( Src_Total_Bits == Dest_Total_Bits );
 assert( Src != Dest );
 /* Handle easiest case: same format. */
 if (EndSameFormat( Dest_Format, Src_Format)) {
 EndCopy( Dest, Src, Dest_Format );
 return;
 }
 /* Handle next most easy case: one format is reverse of the other. */
 if (EndIsReversePartsFormat(Dest_Format, Src_Format)) {
 EndReverseParts(
 Dest, Src, EndNumberOfParts(Dest_Format) );
 return;
 }
 /* Compute smallest chunk size we can move and not overlap two MAUs 
 * or two parts. */
 Smallest_Move = EndSmallestMoveSize(
 Dest_Format, Src_Format);
 assert( Smallest_Move > (Bit_Number_T) 0 );
 /* Loop for each same-size chunk. */
 Low_Logical_Bit_Of_Chunk = (Bit_Number_T) 0;
 for (Bits_Left = Dest_Total_Bits;
 Bits_Left > (Bit_Number_T) 0;
 Bits_Left = Bits_Left - Smallest_Move) {
 Bit_Number_T Dest_Raw_Bit_Number_In_Part;
 MAU_Number_T Dest_Raw_Part_Index;
 Bit_Number_T Src_Raw_Bit_Number_In_Part;
 MAU_Number_T Src_Raw_Part_Index;
 assert( Low_Logical_Bit_Of_Chunk < Src_Total_Bits );
 /* Find this dest bit. */
 EndFindRawBit(
 Low_Logical_Bit_Of_Chunk, & Dest_Raw_Part_Index,
 & Dest_Raw_Bit_Number_In_Part, Dest_Format);
 /* Find this source bit. */
 EndFindRawBit(
 Low_Logical_Bit_Of_Chunk, & Src_Raw_Part_Index,
 & Src_Raw_Bit_Number_In_Part, Src_Format);
 /* At last, move a chunk! */
 MVBITS(
 Src_Parts[Src_Raw_Part_Index], /* src */
 Src_Raw_Bit_Number_In_Part, /* srcindex */
 Smallest_Move, /* len */
 & Dest_Parts[Dest_Raw_Part_Index], /* destptr */
 Dest_Raw_Bit_Number_In_Part, /* destindex */

 Part_T); /* type */
 /* Bump to next chunk. */
 Low_Logical_Bit_Of_Chunk += Smallest_Move;
 }
}

Listing Four 
/* BigSwap.c -- Copyright (c) 1995 by JR (John Rogers). All rights reserved.
 * POLICIES - This version of the endian engine implements the
 * following policies when swapping data: If a destination is not an 
 * integral number of parts (chars), then:
 * a) the destination size is rounded up to an integral number of parts.
 * b) the additional bits have unspecified values after the swap.
 * The total number of bits (MAU size times MAU count) for the 
 * destination and source must be identical.
 * AUTHOR - JR (John Rogers) 72634.2402@CompuServe.com */
#include <assert.h> /* assert */
#include "bigimpl.h" /* Big_LayoutToFormat(). */
#include <bigint.h> /* Big_Int_T, my prototype, etc. */
#include <end.h> /* EndSameFormat(). */
#include <stddef.h> /* NULL */
void BigSwap(
 Big_Int_T * Dest, const Big_Int_T * Src,
 const Int_Rep_And_Format_T * Dest_Layout,
 const Int_Rep_And_Format_T * Src_Layout)
{
 Memory_Format_T Dest_Format;
 Memory_Format_T Src_Format;
 /* Set global native format, just in case. */
 if ( !Big_Done_Global_Setup ) {
 Big_GlobalSetup();
 }
 assert( Big_Native_Layout != NULL ); /* out of memory? */
 assert( Big_End_Native_Format != NULL );
 /* Need format (for dest) that endian engine can handle. */
 if (Dest_Layout != NULL) {
 Big_LayoutToFormat( &Dest_Format, Dest_Layout );
 } else {
 Dest_Format = *Big_End_Native_Format;
 }
 assert( EndValidFormat( &Dest_Format ) );
 /* Repeat that for the source. */
 if (Src_Layout != NULL) {
 Big_LayoutToFormat( &Src_Format, Src_Layout );
 } else {
 Src_Format = *Big_End_Native_Format;
 }
 assert( EndValidFormat( &Src_Format ) );
 /* Call endian engine to swap the bits. */
 EndSwap(
 Dest, Src, &Dest_Format, &Src_Format);
}
ORACLE CALL INTERFACE

Listing One
//////////////////////////////////////////////////////////////////////
// DBConnection Implementation
//////////////////////////////////////////////////////////////////////
#include <dbobject.h>

// Default Constructor
DBConnection::DBConnection()
{ 
 m_pLoginCursor = 0;
 Reset();
}
// Constructor with access string
DBConnection::DBConnection(const char *AccessString)
{
 m_pLoginCursor = 0;
 Connect(AccessString);
} 
DBConnection::~DBConnection()
{
 if (IsConnected()) {
 Disconnect();
 }
 delete m_pLoginCursor;
}
short DBConnection::Disconnect()
{
 if (IsConnected()) {
 if (ologof(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-210, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 m_nIsConnected = 0;
 return 1;
 }
 }
 else {
 SetError(-203, NOLOGIN_CURSOR );
 return 0;
 }
}
short DBConnection::Connect(const char *pAccess)
{
 if (! m_pLoginCursor) { 
 m_pLoginCursor = new DBOBJLOGIN;
 if (!m_pLoginCursor) {
 SetError(-100, ALLOCATION_ERROR); 
 return 0;
 }
 }
 if (!*pAccess) {
 SetError(-200, NULLACCESSS_STRING);
 return 0;
 }
 char *pPassWrd = strchr(pAccess, '/');
 if (!pPassWrd) {
 SetError(-201, NOPASSWORD);
 return 0;
 }
 pPassWrd++; // past the '/'
 int UserLen = strlen(pAccess) - strlen(pPassWrd);
 char User[30];

 if (UserLen > 30) UserLen = 30; // truncate
 strncpy(User, pAccess, UserLen-1 );
 User[UserLen-1] = 0; 
 
 // Copy to member 
 strcpy(m_pUserName, User);
 if (orlon(m_pLoginCursor, hda, (char*)pAccess)) {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc) );
 SetError(-205, m_pUserName);
 return 0;
 }
 Reset();
 m_nIsConnected = 1;
 return 1;
}
short DBConnection::Commit()
{
 if (m_pLoginCursor) {
 if (ocom(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 Reset();
 return 1; 
 }
 else {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-202, m_pUserName);
 } 
 Reset();
 return 1;
}
short DBConnection::SetAutoCommit(short OnOrOff)
{
 if (m_pLoginCursor) {
 if (OnOrOff == 1) {
 if (ocon(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 } 
 }
 else {
 if (ocof(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 }
 }
 }

 else {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc) );
 SetError(201, "No Oracle Login Object Defined");
 return 0;
 }
}
short DBConnection::Rollback()
{
 if (m_pLoginCursor) {
 if (orol(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 }
 }
 else {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(201, NOLOGIN_CURSOR);
 return 0;
 }
}
const char* DBConnection::DBErrorMsg(int ErrorNo)
{
 if (m_pLoginCursor) {
 static char ExceptionBuffer[MAX_DBMSG];
 memset(ExceptionBuffer,0, MAX_DBMSG);
 oerhms(m_pLoginCursor, ErrorNo, ExceptionBuffer, MAX_DBMSG-1);
 return ExceptionBuffer;
 }
 else {
 SetError(-203, NOLOGIN_CURSOR);
 return "";
 } 
}

























The Oracle Call Interface and C++


High-level abstractions for database developers




Jeremy Woo-Sam and Tony Murphy


Jeremy is the project leader of SQL*C++, and Tony is the project leader for
SQR at MITI. They can be reached at jeremyw@ miti.com and tonym@miti.com,
respectively. 


The Oracle Call Interface (OCI) allows C applications to access and manipulate
data in an Oracle database. Unfortunately, using the C API functions requires
extensive knowledge of your tables and their data types. You must also know
how to access columns inside a table and efficiently retrieve database rows
into an application. By creating abstractions for database access and
manipulation, you can wrap the C API into C++ classes. This provides database
access in an easy-to-use and easy-to-maintain form, promotes code reuse, and
bridges RDBMS concepts to an object-oriented paradigm.
In this article, we will present DBObject, a C++ class library that provides
database connectivity and query access to an Oracle RDBMS. In doing so, we
will examine the interface classes that are used within a C++ application, and
the implementation classes that support array fetching within the interface
classes. We will use an example that supports some primary data types:
variable character strings, fixed-length character strings, numerics, and date
types. Our abstraction of a database connection allows both local and remote
connectivity, while our abstraction of database columns provides the mechanism
for accessing data in the query.
The object-oriented approach offers database developers several benefits. For
instance, users not familiar with cursors, array fetching, and other OCI
concepts often find retrieving data from Oracle a time-consuming task. Object
orientation lets users focus on solving their domain problems rather than on
retrieving data.
Oracle data types are substantially different from those supported in C. This
causes problems when converting between the database and C data types. For
example, the Oracle NUMBER data type can have a value of NULL or 0. In the
Oracle world, these values are different. This means no primitive C data type
can entirely represent an Oracle NUMBER, because float, double, and int cannot
represent both NULL and 0. This problem is commonly referred to as the
"impedance mismatch" between RDBMS and C data types.
The common solution is to map the Oracle NUMBER data type to a C struct.
However, this creates a new problem because you cannot use the C arithmetic
operators with user-defined structs. (Note that C++ classes do allow structs
to have operators.)
Finally, using high-level database abstractions lets you extend a class
library's functionality without impacting existing applications. For example,
the first version of our class did not implement array fetching; this was
added later to improve efficiency. Likewise, obsolete OCI calls can be
replaced with their newer counterparts without changing any application code.
And, since our abstractions are valid on most SQL databases, our class library
can easily be ported to other relational databases. 


The DBObject Classes


The DBObjects database class library has two layers: the interface layer and
the implementation layer. Interface-layer classes are routinely used to
retrieve information from Oracle; implementation-layer classes are used
internally by the interface layer. Figure 1 shows the class hierarchy for
DBObjects, while Figure 2 shows the implementation and interface layers.
The DBObject class is the base class. It handles local and database errors.
The DBConnection class is an abstraction of a database connection; see Listing
One. DBConnection encapsulates the OCI functions that let you log into an
Oracle RDBMS; see Table 1. Multiple DBConnection objects can be created to
access different local or remote data sources within a single application.
Each time a DBConnection object is instantiated, it in turn creates and
maintains an Oracle Login Cursor. The Oracle Login Cursor is deleted when the
DBConnection object is destroyed.
Class DBSQLEngine encapsulates the database cursor and the OCI functions for
parsing SQL statements. The class also describes column information in SELECT
statements, defines the output buffers, and fetches rows from the database.
The DBQuery class, which inherits from DBSQLEngine, takes a valid SELECT
statement and retrieves rows for the SELECT statement. The member function
MoveNext() provides the means of navigating through the rows of the query.
The abstract DBCol class provides the interface to the data in the DBQuery
object. The derived type-specific classes format the data within DBQuery so
that it can be manipulated or printed.
The DBColumnBuffer classes are implementation classes that support array
fetching, which allows multiple rows to be fetched with a single database
access. This technique greatly reduces network traffic and improves access to
database rows. The DBColumnBuffer classes also contain information about the
column, such as its position, name, width, data type, scale, and precision. 


The Interface Layer


The interface layer is comprised of DBConnection, DBQuery, DBCol, DBColNumber,
DBColDate, and DBColString. Each class is a well-defined abstraction of some
aspect of the Oracle interface. 
DBConnection manages the database connection. You simply instantiate an
instance of the class by passing an Oracle connect string to the constructor
to establish a connection with the database. When the object is destroyed
(either explicitly or when it goes out of scope), the destructor severs the
connection with the database. Behind the scenes, the OCI functions ORLON and
OLOGOF are used to log on to and off of the Oracle server.
DBQuery abstracts the concept of an SQL query. Its constructor accepts an SQL
string as a parameter. To execute the query, invoke the Execute() member
function. The rows of data returned by the query can be sequenced using the
MoveNext() member function. 
Internally, the DBQuery object uses an instance of DBSQLEngine to parse the
SQL statement. It allocates DBColumnBuffers to hold the columns retrieved from
the database by the SQL string. DBQuery uses the OCI function ODESCR to decide
which type of column buffer to use: DBNumberBuff, DBDateBuff, or DBBufferInfo.
DBCol classes are used to interface and manipulate the actual data returned
from the database. DBColNumber, DBColDate, and DBColString implement the C++
equivalents to the Oracle NUMBER, DATE, and CHAR data types, respectively.
DBCol is an abstract base class for the column classes that implements
services required by DBColNumber, DBColDate, and DBColString. Such services
include maintaining a pointer to the DBQuery object, identifying the
corresponding DBColumnBuffer that contains the physical data, and implementing
an IsNull() member function to tell if a column's value is set to NULL. 


Interoperability of Column Objects


Implementing a cast to double in DBColNumber lets you use instances of this
type with C arithmetic operators. For example, Total=A+B is legal in C++,
where A and B are of type DBColNumber or double, and Total is of type double.
During arithmetic operations, a NULL column is assumed to be of value 0.
However, if you prefer a value other than 0, simply inherit from DBColNumber
and redefine the cast.
DBColString implements a cast to char*, which allows its instances to
interoperate with functions that require null-terminated strings. For example,
strcmp(A,B) is acceptable in C++ if A and B are either of type DBColString or
are pointers to null-terminated strings. During string function calls, a NULL
value is converted to a null-terminated string of length 0. Users can
implement their own representation of NULL by overloading the
DBColString::GetString() member function.


The Implementation Layer


Implementation classes include DBSQLEngine, DBColumnBuffer, DBStringBuffer,
DBNumberBuffer, and DBDateBuffer. They represent the internal structure and
implementation of handling the rows retrieved from the database.
The DBSQLEngine class encapsulates the OCI functions that handle database
cursors. These OCI functions handle fetching rows from the database, parsing
the SQL statement, describing columns from a SELECT statement, and defining
the memory area to hold rows returned from a database fetch. DBSQLEngine can
be easily extended to support DML and DDL statements.
DBColumnBuffer is an abstract base class for DBStringBuffer, DBNumberBuffer,
and DBDateBuffer. It represents the static and dynamic information of a
database column and is responsible for gathering and maintaining information
about it, including column type, external type, width, position, name, scale,
and precision. DBColumnBuffer also contains information about the return
length and NULL status for each row of the column. The pure virtual member
functions GetData() and GetBufferAddress() are defined in the derived classes.
This allows the derived classes to manage, maintain, and index their rows.



The Big Picture


The DBConnection, DBQuery, and DBCol classes work together to access data from
the database. Separating these classes into complete, distinct abstractions
allows greater flexibility than folding connectivity, query capabilities, and
column handling into one class. Once a successful connection to the database
is established, the DBConnection object can be used by multiple DBQuery
objects. The DBQuery constructor takes the SELECT string and a reference to a
DBConnection object as its arguments. When a DBQuery object has been
successfully created, DBCol objects can be associated to the DBQuery internal
columns. The DBCol objects take a reference to a DBQuery object and the column
number that it wants to reference. Once "attached" to the DBQuery internal
column, you can print and manipulate the DBCol object. 
The DBQuery object constructs and maintains the DBColumnBuffer objects. It
parses the SELECT statement and creates a linked list of DBBufferInfo objects
for each column in the SELECT statement by calling the DBSQLEngine::Describe()
function. A linked list is preferable to an array of DBColumnBuffer objects
because Oracle does not provide any functions to determine the number of
columns in a SELECT statement. To determine the number of columns in the
query, the ODESCR function is called until it returns the end-of-select error
status. While calling DBSQLEngine::Describe(), we gather information about the
column to populate the DBQuery object's DBColumnBuffer array. 
The DBQuery object now allocates an array of DBColumnBuffer pointers that
reference a data-type-specific DBColumnBuffer object such as a DBStringBuffer
or a DBDateBuffer. The DBQuery object walks the linked list, allocates a
data-type-specific DBColumnBuffer object, copies the information from the
linked list, and sets the proper width and external type.
After the array has been populated, the DBQuery object passes each
DBColumnBuffer object in the array to the DBSQLEngine member function
Define(). This is done to determine the destination address for the rows
retrieved. 
Example 1 is a typical C++ application that establishes a connection and
performs a SELECT on a table called "EMP" (see Example 2). When Example 1 is
executed, the output shown in Example 3 is generated. The source code for
DBConnection, DBQuery, and other classes is available electronically; see
"Availability," page 3.


Possible Extensions


To extend the DBObject class library, you could, for example, add enhanced
data types. The DBObjects database class library demonstrates how to abstract
a limited number of Oracle data types in C++. The same techniques can be
applied to simulate other Oracle data types such as LONG and LONG RAW.
The current implementation utilizes double-precision floating-point numbers to
represent the numeric values of NUMBER columns. This is sufficient for
applications that don't require more than 15 digits of precision. Oracle
itself supports a maximum precision of 38 digits. For more than 15 digits of
precision, a high-precision column class could be developed.
Client-application writers could then manage the efficiency-versus-precision
trade-offs using the column class that most closely corresponds with their
needs.
Additional member functions could be added to DBColString, DBColNumber, and
DBColDate, resulting in more complete data types. This might include more
operators and perhaps more advanced date manipulation. 
Since most member functions are declared virtual, new column classes can be
inherited from existing implementations. This technique can be used to add or
modify behavior. For example, to change the way a column type prints its
output, override the PrintValue() member function.
The DBObject class currently contains member functions to let derived classes
set and get errors. A possible extension would be to throw an exception after
the error message is copied into the buffer. Another extension could have the
SetError() function terminate the application if the error were severe.
Because all errors are processed in the DBObject class, your error-handling
strategy can be implemented in this class.


Conclusion


Class libraries like DBObjects will play a substantial role in reducing
complexity and enhancing interoperability between the C++ language and
relational-database technology. The benefits don't just end here, however.
By-products of our approach are portability, efficient database access at low
cost, and reduced maintenance. Who knows? This type of technology may even aid
a migration from RDBMS to the great nirvana of OODBMS.
Figure 1: DBObjects library architecture.
Figure 2: The DBObjects class library.
Table 1: OCI functions.
Function Description
orlon Establishes concurrent communication between an
 OCI program and an Oracle database.
ologof Disconnects a login data area from the Oracle program
 global area and frees all Oracle resources owned by the Oracle
 user process.
ocom Commits the current transaction.
ocon Enables autocommit.
ocof Disables autocommit.
orol Rolls back current transaction.
oopen Opens specified cursor.
oparse Parses an SQL statement or PL/SQL block and associates it
 with a cursor.
ocan Cancels a query after the desired number of rows
 have been fetched.
oclose Disconnects a cursor from the data area in the Oracle
 server with which it is associated.
ofen Fetches on multiple rows into arrays of variables, taking
 advantage of the Oracle array interface.
oexec Executes the SQL statement associated with a cursor.
odescr Describes select-list items for SQL queries.
odefin Defines an output variable for a specified select-list
 item of an SQL query.
oerhms Returns the text of an Oracle error message, given
 the error code.
Example 1: Typical C++ app.
#include <iostream.h>
#include <dbobject.h>
#include <dbcol.h>
int main()
{

 DBConnection TestConn("scott/tiger@t:cire-ss2:oracle7");
 DBQuery TestQuery("SELECT ename,empno,job,hiredate from EMP",&TestConn);
 TestQuery.Execute();
 DBColString colEname(&TestQuery, 1);
 DBColNumber colEmpno(&TestQuery, 2);
 DBColString colJob(&TestQuery, 3);
 DBColDate colHiredate(&TestQuery, 4);
 while(TestQuery.MoveNext())
 {
 cout << "EName: " << colEname << endl
 << "Empno: " << colEmpno << endl
 << "Job: " << colJob << endl
 << "Hired " << colHiredate << endl << endl;
 }
}
Example 2: Contents of EMP table.
ENAME EMPNO JOB HIREDATE
------ ----- ---- ---------
SMITH 7369 CLERK 17-DEC-80
ALLEN 7499 SALESMAN 20-FEB-81
WARD 7521 SALESMAN 22-FEB-81
JONES 7566 MANAGER 02-APR-81
MARTIN 7654 SALESMAN 28-SEP-81
BLAKE 7698 MANAGER 01-MAY-81
CLARK 7782 MANAGER 09-JUN-81
SCOTT 7788 ANALYST 09-DEC-82
KING 7839 PRESIDENT 17-NOV-81
TURNER 7844 SALESMAN 08-SEP-81
ADAMS 7876 CLERK 12-JAN-83
JAMES 7900 CLERK 03-DEC-81
Example 3: Output generated by Example 1.
EName: SMITH
Empno: 7369
Job: CLERK
Hired 17-DEC-1980 00:00:00
EName: ALLEN
Empno: 7499
Job: SALESMAN
Hired 20-FEB-1981 00:00:00
EName: WARD
Empno: 7521
Job: SALESMAN
Hired 22-FEB-1981 00:00:00
EName: JONES
Empno: 7566
Job: MANAGER
Hired 02-APR-1981 00:00:00
EName: MARTIN
Empno: 7654
Job: SALESMAN
Hired 28-SEP-1981 00:00:00
EName: BLAKE
Empno: 7698
Job: MANAGER
Hired 01-MAY-1981 00:00:00
EName: CLARK
Empno: 7782
Job: MANAGER
Hired 09-JUN-1981 00:00:00

EName: SCOTT
Empno: 7788
Job: ANALYST
Hired 09-DEC-1982 00:00:00
EName: KING
Empno: 7839
Job: PRESIDENT
Hired 17-NOV-1981 00:00:00
EName: TURNER
Empno: 7844
Job: SALESMAN
Hired 08-SEP-1981 00:00:00
EName: ADAMS
Empno: 7876
Job: CLERK
Hired 12-JAN-1983 00:00:00
EName: JAMES
Empno: 7900
Job: CLERK
Hired 03-DEC-1981 00:00:00

Listing One
//////////////////////////////////////////////////////////////////////
// DBConnection Implementation
//////////////////////////////////////////////////////////////////////
#include <dbobject.h>
// Default Constructor
DBConnection::DBConnection()
{ 
 m_pLoginCursor = 0;
 Reset();
}
// Constructor with access string
DBConnection::DBConnection(const char *AccessString)
{
 m_pLoginCursor = 0;
 Connect(AccessString);
} 
DBConnection::~DBConnection()
{
 if (IsConnected()) {
 Disconnect();
 }
 delete m_pLoginCursor;
}
short DBConnection::Disconnect()
{
 if (IsConnected()) {
 if (ologof(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-210, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 m_nIsConnected = 0;
 return 1;
 }
 }

 else {
 SetError(-203, NOLOGIN_CURSOR );
 return 0;
 }
}
short DBConnection::Connect(const char *pAccess)
{
 if (! m_pLoginCursor) { 
 m_pLoginCursor = new DBOBJLOGIN;
 if (!m_pLoginCursor) {
 SetError(-100, ALLOCATION_ERROR); 
 return 0;
 }
 }
 if (!*pAccess) {
 SetError(-200, NULLACCESSS_STRING);
 return 0;
 }
 char *pPassWrd = strchr(pAccess, '/');
 if (!pPassWrd) {
 SetError(-201, NOPASSWORD);
 return 0;
 }
 pPassWrd++; // past the '/'
 int UserLen = strlen(pAccess) - strlen(pPassWrd);
 char User[30];
 if (UserLen > 30) UserLen = 30; // truncate
 strncpy(User, pAccess, UserLen-1 );
 User[UserLen-1] = 0; 
 
 // Copy to member 
 strcpy(m_pUserName, User);
 if (orlon(m_pLoginCursor, hda, (char*)pAccess)) {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc) );
 SetError(-205, m_pUserName);
 return 0;
 }
 Reset();
 m_nIsConnected = 1;
 return 1;
}
short DBConnection::Commit()
{
 if (m_pLoginCursor) {
 if (ocom(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 Reset();
 return 1; 
 }
 else {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-202, m_pUserName);
 } 
 Reset();
 return 1;
}

short DBConnection::SetAutoCommit(short OnOrOff)
{
 if (m_pLoginCursor) {
 if (OnOrOff == 1) {
 if (ocon(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 } 
 }
 else {
 if (ocof(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 }
 }
 }
 else {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc) );
 SetError(201, "No Oracle Login Object Defined");
 return 0;
 }
}
short DBConnection::Rollback()
{
 if (m_pLoginCursor) {
 if (orol(m_pLoginCursor)) {
 SetDBError(m_pLoginCursor->csrarc,DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(-206, m_pUserName); 
 return 0;
 }
 else {
 Reset();
 return 1;
 }
 }
 else {
 SetDBError(m_pLoginCursor->csrarc, DBErrorMsg(m_pLoginCursor->csrrc));
 SetError(201, NOLOGIN_CURSOR);
 return 0;
 }
}
const char* DBConnection::DBErrorMsg(int ErrorNo)
{
 if (m_pLoginCursor) {
 static char ExceptionBuffer[MAX_DBMSG];
 memset(ExceptionBuffer,0, MAX_DBMSG);
 oerhms(m_pLoginCursor, ErrorNo, ExceptionBuffer, MAX_DBMSG-1);
 return ExceptionBuffer;
 }

 else {
 SetError(-203, NOLOGIN_CURSOR);
 return "";
 } 
}


























































Programming TI's Multimedia Video Processor


Client/server programs for real-time video




William May


Bill is the principal software engineer for Minerva Systems, a developer of
high-end MPEG encoders. He can be reached at bmay@minervasys.com.


Originally dubbed the MVP ("multimedia video processor"), the Texas
Instruments TMS320C80 processor affords the ability to program video
algorithms in software--in other words, video DSP. The MVP is a radical
departure from TI's traditional approach to digital-signal processing, so
knowing how to program TI's fixed- or floating-point DSPs won't help you much.
From top to bottom, the MVP architecture is designed to achieve performance
orders of magnitude greater than traditional DSPs. 
Likewise, you'll usually need to extensively rework algorithms you are
currently familiar with, although you'll occasionally find that the MVP gives
life to algorithms long since discarded or forgotten. In short, to take full
advantage of the MVP's power when programming real-time video algorithms,
you'll usually need to develop new approaches.
In this article, I'll examine what it means to write software for real-time
video. If you're familiar with programming in C on Intel and Motorola
processors, the MVP will give you glimpse of a strange, performance-driven
world. For instance, you must be in complete control of how data is organized
in memory and how it moves across various buses to be processed by the MVP.
There is a premium on making every CPU cycle count. Clearly, high-level
abstractions are very difficult. Still, the payoff is worth it, especially
when you see what real-time video is really like.


Architecture Overview


CCIR-601 is the international standard for digital video. It is a single
format that encompasses both NTSC video (the video standard for the U.S. and
Japan, among others) and PAL (the European standard). In both cases, the total
data rate for video is 27 MB/sec. One half of the data (13.5 MB/sec)
represents luma, or gray-scale information; the other half represents chroma
for two color channels (each of which is 6.75 MB/sec). However, not all the
information in the video signal must be processed. In the NTSC variant of
CCIR-601, the active video area in each frame is 720x486 pixels, at 29.97
frames/sec. Thus, you must process about 21 MB/sec to handle NTSC in real
time. The rate for PAL is slightly lower. By comparison, processing
high-quality digital audio requires just 88,200 samples/sec, less than .5
percent of the data rate for video.
From these simple calculations two things are evident. First, a video
processor and its surrounding hardware must be able to read and write data
very fast--just reading in and writing out captured video requires a rate of
42 MB/sec. If multiple passes are needed or intermediate results need to be
stored, the requirements are that much greater.
An example of a simple image-processing algorithm is a 3x3 filter for edge
detection. In a naive implementation, such a filter might require nine
multiplications and eight additions per sample (usually the number is reduced
due to symmetry in the filter). This comes out to about 360 million
calculations per second, just for this very simple filter. In real
applications, such a filter would be only one component of a much larger
calculation.
This level of performance can be achieved by making a processor either run
extremely fast or do a lot of work in each cycle. In general, I think the
second approach is preferable. Making a processor faster often introduces new
problems, such as the need for faster RAMs, faster buses, and the like to keep
the processor busy. This gets expensive, and it can be very difficult to debug
the hardware. However, if the processor is highly parallel, it can run at more
"leisurely" clock rates, allowing the overall system to use standard
components and connections that are more easily obtained, more easily
debugged, and more reliable.
The MVP takes the second approach: parallelism throughout. The MVP actually
contains six processors: a fairly standard RISC processor, an extremely
sophisticated DMA engine, and four parallel processors(PPs). Figure 1
illustrates the chip architecture. There are also 50 KB of on-chip static RAM.
The first production release of MVP supports a 50-MHz clock. However, because
of the six processors, the MVP can perform a massive amount of work in each
cycle, including the following tasks:
The master processor can perform a single RISC instruction in parallel with a
floating-point operation.
The transfer controller can do a 64-bit (8-byte) read or write. If the read or
write completes an I/O request, it can also update source and destination
pointers.
The parallel processors can do up to eight multiplies, 16
additions/subtractions (with shifting and masking), and eight reads or writes
to on-chip memory (including effective address calculations and updates of
address registers).
Of course, in real algorithms, it is almost never possible or useful to do all
these operations in each cycle. However, the challenge of programming the MVP
is to keep as many of these operations going as possible, for a given
algorithm.


The Master Processor


The master processor (MP) is a straightforward RISC processor that is easily
programmed in C; see Figure 2. The MP has a floating point unit (FPU) that
runs in parallel with the RISC core. In many applications, the MP operates as
a traffic cop. It responds to interrupts from I/O devices, handles
communications with a host, allocates memory, and assigns work to the parallel
processors. Many system configurations are possible, but in most, the MVP is
used as a compute server, with a host telling the MP what to do and when. The
MP in turn schedules the parallel processors to perform the specific tasks.
The FPU can be useful, even though it is not as fast as the parallel
processors. For example, in many applications audio is handled by the FPU,
while video is handled by the parallel processors.
TI supplies a multitasking kernel that runs on the MP. The kernel provides
facilities for memory management, running multiple tasks, communications
between tasks, interrupt handling, and so on. High-level control of an
application usually resides in tasks running on the MP. These tasks then
allocate memory or processing resources (the PPs) as needed, and communicate
with other tasks or interrupt handlers. The kernel is fairly small, but
provides the basis for easily developing many application- or
hardware-specific features.


The Transfer Controller


Although it is not obvious at first, one of the keys of the MVP is the
transfer controller (TC). The TC (see Figure 3) is an intelligent DMA
controller, so virtually all I/O can be preprogrammed. Once I/O is set up at
the beginning of an algorithm, the processors typically spend all their time
computing, with data being read and written in the background by the transfer
controller. 
Besides intelligence, the TC also has enormous bandwidth, assuming that memory
is set up correctly. There is a 64-bit data path on both sides of the transfer
controller, so on a 50-MHz MVP, it is possible to move 400 MB of data per
second. In practice, there is usually at least one wait state, making typical
performance half the maximum--200 MB/sec. Still, this is sufficient for many
video-processing algorithms. In my experience, I/O is usually not the
bottleneck when working with the MVP.
The I/O tasks that can be specified are flexible, ranging from simple requests
like reading/writing single or multiple lines of data to reading/writing
blocks of data such as the 8x8 blocks in JPEG compression, to reading/writing
patches of data (for drawing text and lines). Addressing can be absolute
addresses, offsets from a starting point, or ping-ponging between double
buffers. Many graphics operations (fills and rectangular bit blits, for
example) can be performed using only the TC.


The Parallel Processors


The PPs are the compute engines of the MVP. In the first-generation MVP, there
are four PPs. The hardware design also has support for one, two, or eight PPs.
The PPs are purposely not called DSPs, as their architecture is quite
different. PPs are more geared to performing graphics and image-processing
operations than a DSP (which is more targeted for processing one-dimensional
data).
As Figure 4 and Figure 5 illustrate, each PP internally has four separate
computation units--a multiplier, an ALU, and two addressing units. Each unit
operates in parallel with the others, so that in each cycle (all instructions
execute in a single cycle) a single PP can do multiplication, ALU operations,
and two reads/writes (with accompanying effective address calculations and
address-register updates).
The multiplier and the ALU can be split, so the multiplier can perform a
single 16x16-->32-bit multiply in each cycle, or two 8x8-->16-bit multiplies.
The multiplier also has a mode where it can calculate a rounded result
(16x16-->16-bit rounded), scale the result, and swap one of the coefficients
so that another rounded multiply can be performed on the next cycle.
The ALU does all the usual logical and arithmetic operations on data in
registers. However, this ALU can operate in full 32-bit mode or be split to
perform two 16-bit ("halfword split") operations or four 8-bit ("byte split")
operations. In byte-split mode, each ALU is able to perform four ALU
operations (add, subtract, And, Or, and so on) in each cycle. Special hardware
and muxes in the ALU also allow rotating one of the inputs, generating masks,
expanding bits, and storing status bits for each ALU split.

These possibilities present a problem--it is difficult to design an
instruction set that can specify such an array of operations and modes. TI
resolved this by defining a basic set of operations that could be specified in
a 64-bit instruction word. This basic 64-bit instruction word can specify a
multiplication, a wide array of ALU operations, and two I/O operations for
each cycle.
However, the PP hardware is too flexible to be expressed in even a 64-bit
instruction. To provide access to the full functionality, there's the ealu
("extended ALU") instruction, which requires that the d0 register be used as
an additional 32 bits of the instruction, bringing the total to 96 bits. This
costs a potentially useful register, but opens up enormous flexibility. With
ealu instructions, some additional multiplier modes and several additional ALU
paths become available. The ealu instructions are often seen in tight loops
(see the image-crossfade example developed shortly).
In addition to the multiplier and ALU, each PP supports fairly typical address
generation calculations (pre- and post-increment, indexed addressing) and
hardware-loop controllers, which are nice because they can be nested for
multi-dimensional processing; and since they automatically handle instruction
latency issues, they're also easy to use. One of the interesting uses of the
loop controllers is for hardware branching on exceptions (such as numeric
overflows or underflows). This allows a tight loop to operate very
efficiently, without regard for exception handling. Execution of the loop
branches out to a special handler only when a problem occurs.


Programming the MVP


Developing an application on the MVP typically involves the following steps:
1. Create a client and server task to invoke algorithm execution; see Listings
One and Two, respectively. These tasks execute on the MP, communicate with a
host, and allocate compute resources (the PPs) when needed. This is easy,
since it is all written in C.
2. Lay out memory usage on the PPs and program them to issue I/O requests to
the transfer controller. This is a complex balancing act, where the effects of
limited memory are balanced against the possibility of crossbar contention and
complex addressing in inner loops. The geometry of an algorithm must be
carefully analyzed. Sometimes, this is straightforward. Many image-processing
algorithms can be processed along scan lines. A JPEG codec naturally works on
8x8 blocks. In other cases, there are multiple I/O options, and the option
selected can dramatically affect overall performance.
 One class of algorithms that is difficult to program includes image rotations
and warping. In the case of general warping (mesh warps) a small patch of the
image may explode into a large patch, making memory and I/O management quite
complex.
3. Code the algorithm on the PP. Here again, there are many options and
decisions. The first is the topology of the algorithm execution. In most
cases, the image is subdivided in some way, and each PP executes the algorithm
on its portion of the image. In this case, each PP is executing the same code.
This is the simplest topology to code, since the PPs do not need to
communicate with each other and the computational load is automatically
balanced among them.
 This is the most common means of subdividing a task, but the MVP supports
numerous others. A task can be pipelined, with each PP performing part of the
task and moving the result to the next PP. Or one PP can start performing a
task and send the result to the other three for completion, and so on. Of
course, algorithms may also use the MP to perform part of the processing.
4. Finally, code the algorithm itself. PP code is usually written in assembly
language. Although TI supplies a C compiler for the PPs, it is not up to the
task of using the PPs' resources efficiently.


An Image-Crossfade Example


To illustrate how some of these pieces fit together in a real application,
I'll develop an application that performs an image-crossfade operation. Two
images are supplied as inputs, and a third is generated as the weighted
average of the two inputs. By gradually changing the weight over time, a
crossfade is performed. Example 1 implements this algorithm in C, while
Listing Three presents the PP code, which shows the typical situation where
I/O requests are set up based on the request passed down from the MP. The
assembly language is what TI calls an "algebraic assembly language" and is
much more expressive than an opcode-based assembly language. It also uses many
C-like expressions. The PP assembler is responsible for translating this huge
assortment of expressions into PP machine language.
In this case (refer to the section of Listing Three beginning with ; set our
loop, once through for each line), you see three packet-transfer requests
being set up (two input, one output). Once in motion, the TC uses these
packets to continually bring in data and write out processed data, without any
further effort by the PP (except for making sure that new data has arrived
when it is ready to process more data). Although it is not evident from the
listing, the packet transfers support 3-D transfers--rows and columns as well
as separate counters, pitches, and offsets for each row and column. It is
common to use the three dimensions to bring in blocks of data from memory,
using the third dimension to ping-pong between on-chip buffers. Proper use of
this mode eliminates contention between the TC and the PPs. 
To set up the packet transfer, the PP needs to know where to find its data and
put the result. The process is done via shared memory. The MP has presumably
received a command from the host to crossfade two images (with a particular
weight) and display the result. The MP calculates the addresses and offsets so
that the source images are processed in four sections, and the results are
written to a display buffer. Thus, each PP gets an argument buffer containing
such information as pointers to the source images and result image, and the
weight to use in the calculation. The MP puts this argument buffer in each
PP's local memory.
Finally, the inner loop performs the actual calculations (refer in Listing
Three to the code beginning with width = width >> 1;), where each of the four
PPs is executing to do a crossfade operation. The "" symbol means that
instructions are executed in parallel. When you see this, the first
instruction (without the "") and following instructions (with "") all execute
in the same machine cycle.
C-like syntax is used in many places. For example, &* calculates an effective
address. Instructions such as zero =&*(La_Image0 = dba + fBuff0); load an
effective address into an address register. There are also special registers
(which the assembler recognizes by the name "zero") that are read as zero and
throw-away writes.
Like most algorithms on the PP, there is little or no data management inside
the inner loop. The address registers are loaded with pointers to the data,
and loop execution begins. When this loop is invoked, the TC has already
loaded the source data. While the loop is running, the TC is simultaneously
writing out the results of the last call to the loop and bringing in source
data for the next call. This allows the PP to concentrate completely on the
computation. 
Note that the PP initializes d0 to perform an ealu instruction. It then
initializes the hardware loop-control registers (ls0, le0, lr0, and lctl) by
loading the start and end instruction addresses in the loop, loading the loop
counter, and setting a control register that enables looping (" is a logical
Or, just like in C).
Next, I prime the loop by preloading some registers and starting some of the
calculations. Once again, C-like syntax is used to represent pointer
references and incrementing. The =uh tells the PP to do unsigned halfword
loads (don't sign extend the data).
The first products are calculated. Fixed-point values are used for the weight
and the data. Each multiply does two 8x8--> 16-bit multiplies (=um means the
multiplier does an unsigned split multiply). The results of two multiplies are
later added using a halfword split ALU. Using split multiplies requires ealu
instructions, but ealu is also used to align the data the way I want for later
processing. ("\\" is a register rotate operator; positive rotates go left.)
The inner loop itself is three instructions long. Each iteration calculates
two pixel results. All reads and writes from on-chip memory, four
multiplications, accumulation, and data alignment are done in these three
cycles. Of course, all these operations must occur in the right sequence in
order to achieve the right results. Condensing the operations into a tight
loop while taking advantage of parallelism is the main task of programming
such an inner loop. 
The inner loop is pipelined, so more than two sets of pixels are actually
processed in each loop. This is typical of efficient code on the MVP.
When all is done, the PP branches to the return address in the special
register iprs7. Due to instruction pipelining in the parallel processors, the
two instructions following a branch are always executed. These two delay slots
are often used for cleanup (as in this case). 


Conclusion


So what is the MVP really capable of? TI claims the MVP can perform two
billion operations per second. This raw statistic, while accurate in the
literal sense, obscures more than it illuminates. Table 1 (developed by TI)
lists benchmarks of some real algorithms, most of which are part of the tools
distribution from TI (except for the JPEG codec). 
The bottom line is that the MVP really is a video DSP, but it is first
generation, so it has its limitations--cycles, memory, and bandwidth. I look
forward to seeing where TI (and its competitors) go with this technology over
the next several years.
Figure 1: MVP architecture overview.
Figure 2: Master processor.
Figure 3: Transfer controller.
Figure 4: Parallel-processor overview. Lds=local destination/source bus,
Gsrc=global source bus, Gdst=global destination bus, Repl=replicate hardware,
A/S=align/sign-extend hardware.
Figure 5: Parallel-processor data unit.
Example 1: Implementing the image-crossfade algorithm in C
/* perform a step in a crossfade. w is the current "weight" */
for (i = 0; i < height; i++) {
 for (j = 0; j < width; j++) {
 *dst++ = w * *src1++ + (1-w) * *src2++;
 }
}
Table 1: MVP benchmarks (generated by TI, presumably on a 50-MHz MVP).
Benchmarks Processor Result
Dhrystones MP only 140,000
3-D graphics transforms MP only 2.6 MB/sec
800x600 image (4:1:1 YCrCb)
 JPEG encode/decode 4 PPs 42-59 ms
8x8 forward DCT (H.261 accuracy) 4 PPs 800,000/sec
3x3 median filter 4 PPs 25 MB/sec

3x3 convolution (16-bit precision) 4 PPs 22MB/sec
2x3 convolution (8-bit precision) 4 PPs 40 MB/sec

Listing One
/* client.c -- C source code for simulating client task */
#include <stddef.h>
#include <task.h>
#include "app.h"
#include "hwparams.h"
#include "main.h"
extern unsigned char image1[];
extern unsigned char image2[];
extern unsigned char image_out[];
/* Simulate client task that sends request messages to server tasks
 * that run on 340I MP, and receives reply messages from same. */
void DummyClient(void *arg)
{
 MSG_BODY *msgBody;
 long *pI;
 long i;
 
 TaskOpenPort(PORTID_RECLAMATION);
 TaskOpenPort(PORTID_CLIENTREPLY);
 for (i = 0; i < 8; i++) {
 TaskReclaimMsg(TaskAllocMsg(40, PORTID_RECLAMATION));
 }
 /* keep sending messages to the server */
 for (;;) {
 msgBody = TaskReceiveMsg(PORTID_RECLAMATION);
 msgBody->opCode = REQUEST_CROSSFADE_IMAGE;
 pI = (long *)msgBody;
 pI[1] = 640;
 pI[2] = 240;
 pI[3] = 0x8080; /* weight 50:50 crossfade */
 pI[4] = (long)image1;
 pI[5] = (long)image2;
 pI[6] = (long)image_out;
 TaskSendMsg(msgBody, PORTID_SERVER);
 /* Wait for next reply message to arrive from MVP. */
 if (!TaskWaitEvents(1L << EVENTNUM_AUXMSG)) {
 TaskYield(-1);
 }
 msgBody = TaskReceiveMsg(PORTID_CLIENTREPLY); /* get msg */
 }
 TaskClosePort(PORTID_RECLAMATION);
}

Listing Two
/* Crossfade.c -- C source code for crossfade server task */
#include <stdlib.h>
#include <task.h>
#include "app.h"
#include "hwparams.h"
#include "main.h"
#include "mpppcmd.h"
#include "MemoryMapMP.h"
#define PPS_to_go 4
void setup_pps(SRVARG *arg, CrossfadeParams *sp, PPCMDBUF *cmdBufs[]);
/* A simple server task. The single argument is a structure containing

 * all the persistent data needed to represent the state of the task from
 * one activation to the next. (Between activations, a task maintains no
 * state information on the single, system stack.) The value returned by
 * this function to the task scheduler is a long word containing up to 32
 * flags that indicate the set of events the task has selected to wait
 * on. The task will be activated again when one of these events occurs.
 */
void CrossfadeServer(void *argument)
{
 SRVARG *arg = (SRVARG *)argument;
 void *msgBody; /* save current request from client */
 PPCMDBUF *cmdBufs[4]; /* current PP command buffer */
 long opCode, portId, i, j;
 PPCMDBUF *cmdBuf;
 PPINFO *pp;
 CrossfadeParams *params;
 portId = TaskOpenPort(arg->portId);
 for (i = 0; i < PPS_to_go; i++) {
 pp = &(arg->pp[i]);
 /* Initialize the PPs that belong to this task. */
 cmdBuf = PpCmdBufInit(pp->ppNum, pp->program, 2);
 cmdBufs[i] = cmdBuf;
 PpCmdBufSetArgs(cmdBuf, (void *)(0x1000260 + (i << 12)));
 cmdBuf = PpCmdBufNext(cmdBuf);
 PpCmdBufSetArgs(cmdBuf, (void *)(0x10002C0 + (i << 12)));
 
 pp->semaId = TaskOpenSema(pp->semaId, 0);
 PpMsgIntSetSignal(pp->ppNum, 1, pp->semaId);
 } 
 for (i = 0; i < PPS_to_go; i++) {
 cmdBuf = cmdBufs[i];
 PpCmdBufSetFunc(cmdBuf, gPPCmd[PPCMDNUM_SETUP_FOR_CROSSFADE]);
 PpCmdBufIssue(cmdBuf);
 cmdBufs[i] = PpCmdBufNext(cmdBuf);
 }
 /* This task and its server PPs have now been initialized. Await the
 * arrival of the first request message from a client. Repeat the
 * loop below for each new request received from a client. */
 while(1) {
 msgBody = TaskReceiveMsg(portId);
 /* Begin processing new request message from client */
 switch (((MSG_BODY *)msgBody)->opCode) {
 case REQUEST_CROSSFADE_IMAGE:
 params = (CrossfadeParams *)&(((MSG_BODY *)msgBody)->filler[0]);
 setup_pps(arg, params, cmdBufs);
 /* Then return the request message as a reply message
 * indicating completion of the request. */
 ((MSG_BODY *)msgBody)->opCode = REPLY_CROSSFADE_IMAGE;
 portId = TaskGetReplyPort(msgBody);
 TaskSendMsg(msgBody, portId);
 break;
 default:
 /* No error handling yet. Just discard bad request and
 * continue waiting for a valid request to arrive. */
 TaskReclaimMsg(msgBody);
 break;
 }
 }
}

/* setup the PPs and let them go */
void setup_pps(SRVARG *arg, CrossfadeParams *sp, PPCMDBUF *cmdBufs[])
{
 long width, height, ratio;
 unsigned char *src_1, *src_2, *dst;
 long used_pps = PPS_to_go; 
 long partial_height;
 long partial_offset;
 long i;
 PPCMDBUF *cmdBuf;
 CrossfadeParams *argBuf;
 PPINFO *pp;
 /* Load operands into PP's argument buffer. */
 width = sp->Width;
 height = sp->Height;
 ratio = sp->Ratio;
 src_1 = sp->Src1Address;
 src_2 = sp->Src2Address;
 dst = sp->DstAddress;
 partial_height = sp->Height/PPS_to_go;
 partial_offset = width * partial_height; /* offset partial images */
 for (i = 0; i < PPS_to_go; i++) {
 cmdBuf = cmdBufs[i];
 argBuf = (CrossfadeParams *)PpCmdBufGetArgs(cmdBuf);
 while (PpCmdBufBusy(cmdBuf)) {
 TaskWaitSema(arg->pp[i].semaId);
 }
 cmdBuf = cmdBufs[i];
 argBuf = (CrossfadeParams *)PpCmdBufGetArgs(cmdBuf);
 argBuf->Width = width;
 argBuf->Height = partial_height;
 argBuf->Ratio = ratio;
 argBuf->Src1Address = (unsigned char *)(src_1 + i * partial_offset);
 argBuf->Src2Address = (unsigned char *)(src_2 + i * partial_offset);
 argBuf->DstAddress = (unsigned char *)(dst + i * partial_offset);
 PpCmdBufSetFunc(cmdBuf, gPPCmd[PPCMDNUM_CROSSFADE]);
 PpCmdBufIssue(cmdBuf);
 }
 /* The task has been partitioned among PP's. Now wait
 * until the busy ones finish, and update command buffer pointers. */
 for (i = 0; i < used_pps; i++) {
 cmdBuf = cmdBufs[i];
 if (PpCmdBufBusy(cmdBuf)) {
 TaskWaitSema(arg->pp[i].semaId);
 }
 cmdBufs[i] = PpCmdBufNext(cmdBuf);
 }
}

Listing Three
;------------------------------------------------------------------------
; Does a crossfade between two images, writing data out to a third
; image. The image pointers, sizes, and a crossfade multiplier
; in 1.15 format are specified.
;
 .include "MemoryMapPP.h"
 .include "PacketPP.h"
 .ptext
 .global _SetupForCrossfade

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; setup the PR stuff
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_SetupForCrossfade:
Ga_Packet0 .set a10
Ga_Packet1 .set a11
Ga_Packet2 .set a12
 x0 = 0x800; delta between ping pong buffers
 x8 = 0x80000103; Dimensioned PR, stop, src update, ping pong dst
 x9 = 0x00000301; Dimensioned PR, no stop, dst update, ping pong src
 x10= 0x00000103; Dimensioned PR, no stop, src update, ping pong dst
 ; initialize the input PRs
 zero = &*(Ga_Packet0 = dba + fPktReqAddressBase0);
 zero = &*(Ga_Packet1 = dba + fPktReqAddressBase1);
 zero = &*(Ga_Packet2 = dba + fPktReqAddressBase2);
 ; first input packet transfer
 *Ga_Packet0.tPR_Options = x10; dimensioned PR, no stop
 *Ga_Packet0.tPR_Next = Ga_Packet1; 
 *Ga_Packet0.fPR_SrcBBPitch = zero;
 *Ga_Packet0.fPR_DstBBPitch = zero;
 *Ga_Packet0.tPR_SrcCCount = zero;
 *Ga_Packet0.tPR_DstCCount = zero;
 *Ga_Packet0.fPR_DstCCPitch = x0;
 *Ga_Packet0.fPR_SrcCCPitch = zero;
 ; second input packet transfer
 *Ga_Packet1.tPR_Options = x8; dimensioned PR, stop
 *Ga_Packet1.tPR_Next = zero; no next
 *Ga_Packet1.fPR_SrcBBPitch = zero;
 *Ga_Packet1.fPR_DstBBPitch = zero;
 *Ga_Packet1.tPR_SrcCCount = zero;
 *Ga_Packet1.tPR_DstCCount = zero;
 *Ga_Packet1.fPR_DstCCPitch = x0;
 *Ga_Packet1.fPR_SrcCCPitch = zero;
 ; write data back out to the destination
 *Ga_Packet2.tPR_Options = x9; dimensioned PR, no stop
 *Ga_Packet2.tPR_Next = Ga_Packet0; 
 *Ga_Packet2.fPR_SrcBBPitch = zero;
 *Ga_Packet2.fPR_DstBBPitch = zero;
 *Ga_Packet2.tPR_SrcCCount = zero;
 *Ga_Packet2.tPR_DstCCount = zero;
 *Ga_Packet2.fPR_SrcCCPitch = x0;
 *Ga_Packet2.fPR_DstCCPitch = zero;
 br = iprs; return to calling function
 nop;
 nop;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; do the actual crossfade, this part of code. handles I/O for the crossfade.
;; on entry: a9 points to our command packet
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 .align 512
 .global _CrossfadeImage
_CrossfadeImage:
ratio0 .set d1
srcPtr1 .set d2
srcPtr2 .set d3
dst .set d4
width .set d5
height .set d6
La_Image0 .set a2

Ga_Image1 .set a8
La_Image2 .set a3
Lx_Offset .set x0
Gx_Offset .set x10
Ga_Command .set a9; initialized by caller
 zero = &*(a0 = pba + fPP_reg_save0);
 *a0++ = a8;
 *a0++ = a9;
 sr = 0x2D; split ALU, halfword operations
 ; setup our loop, once through for each line
 height = *Ga_Command.tHeight;
 height = height - 1; loop height + 1 times
 le0 = ipe + (end - $);
 ls0 = ipe + (start - $);
 lr0 = height; also sets loop counter
 lctl = 0x9; enable le0, associate with lc0
 ; initialize some registers
 zero = &*(La_Image0 = dba + fBuff0);
 zero = &*(Ga_Image1 = dba + fBuff1);
 zero = &*(La_Image2 = dba + fBuff2);
 ; read in the command arguments
 width = *Ga_Command.tWidth; get the width
 ratio0 = *Ga_Command.tRatio; 
 srcPtr1 = *Ga_Command.tSrc1Address;
 srcPtr2 = *Ga_Command.tSrc2Address;
 dst = *Ga_Command.tDstAddress;
 ; read in the first two lines of source images 
 zero = &*(Ga_Packet0 = dba + fPktReqAddressBase0);
 zero = &*(Ga_Packet1 = dba + fPktReqAddressBase1);
 zero = &*(Ga_Packet2 = dba + fPktReqAddressBase2);
 *Ga_Packet0.tPR_SrcStartAddress = srcPtr1;
 *Ga_Packet0.tPR_DstStartAddress = La_Image0;
 *Ga_Packet0.tPR_SrcBACount = width; a count is width:b is 0
 *Ga_Packet0.tPR_DstBACount = width; a count is width:b is 0
 *Ga_Packet0.fPR_SrcBBPitch = width;
 *Ga_Packet0.fPR_DstBBPitch = width;
 *Ga_Packet1.tPR_SrcStartAddress = srcPtr2;
 *Ga_Packet1.tPR_DstStartAddress = Ga_Image1;
 *Ga_Packet1.tPR_SrcBACount = width; a count is width:b is 0
 *Ga_Packet1.tPR_DstBACount = width; a count is width:b is 0
 *Ga_Packet1.fPR_SrcBBPitch = width;
 *Ga_Packet1.fPR_SrcBBPitch = width;
 ; start reading in the first lines, auto increment source address
 *(pba + fPR_LinkedListStart) = Ga_Packet0;
 comm = comm 1\\28; issue a packet request 
 ; meanwhile start setting up output PR 
 *Ga_Packet2.tPR_SrcStartAddress = La_Image2;
 *Ga_Packet2.tPR_DstStartAddress = dst;
 *Ga_Packet2.tPR_SrcBACount = width; a count is width:b is 0
 *Ga_Packet2.tPR_DstBACount = width; a count is width:b is 0
 *Ga_Packet2.fPR_SrcBBPitch = width;
 *Ga_Packet2.fPR_DstBBPitch = width;
 ; wait until the first PR has completed
 zero = comm & 1\\29; keep testing the PR queued bit
poll0: br = [nz] ipe + (poll0 - $);
 nop;
 zero = comm & 1\\29; keep testing the PR queued bit
 ; start the second
 *(pba + fPR_LinkedListStart) = Ga_Packet0;

 comm = comm 1\\28; issue a packet request 
 
start:
 ; now we can process a line of data 
 *--sp = iprs; save iprs
 Lx_Offset = 0x0; default offset 
 call = ipe + (DoProcessing - $);
 zero = lc0 & 0x1; odd or even line?
 Lx_Offset =[eq] 0x800; other DRAM for the even lines
 iprs = *sp++; restore iprs
 ; wait until the PR has completed
 zero = comm & 1\\29; keep testing the PR queued bit
poll1: br = [nz] ipe + (poll1 - $);
 nop;
 zero = comm & 1\\29; keep testing the PR queued bit
 ; write data to the destination and read in two new lines of data 
 zero = &*(Ga_Packet2 = dba + fPktReqAddressBase2);
 *(pba + fPR_LinkedListStart) = Ga_Packet2;
end: comm = comm 1\\28; issue a packet request 
 sr = 0x36; no more split ALU
 zero = &*(a0 = pba + fPP_reg_save0);
 br = iprs; return to caller
 a8 = *a0++; restore a8 and a9
 a9 = *a0++;
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; actually do the crossfade on a line. the crossfade operation is: a*x+(1-a)*y
; The following registers are initialized on entry:
; ratio0
; Lx_Offset: offset to the image buffers
; The ALU must be setup for halfword split operations
; the ratio0 register should be saved
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
DoProcessing:
src0 .set d2
src1 .set d3
ratio1 .set d4
prod1 .set d5
prod0 .set d6
sum .set d7
 width = width >> 1; we do two samples each iteration
 *--sp = width; I need to reuse this reg later on!
 ratio1 =u 0xFFFF - ratio0; for second source
 ; setup pointer into my buffers
 Gx_Offset = Lx_Offset; need a global offset and a local offset
 zero = &*(La_Image0 = dba + fBuff0);
 zero = &*(La_Image0 += Lx_Offset);
 zero = &*(Ga_Image1 = dba+ fBuff1);
 zero = &*(Ga_Image1 += Gx_Offset);
 zero = &*(La_Image2 = dba + fBuff2);
 zero = &*(La_Image2 += Lx_Offset);
 d0 = ROT_8; prepare for ealu
 le1 = ipe + (loop_e - $); loop end
 lr1 = width - 2; # iterations (- 1)
 ls1 = loop_s;
 lctl = lctl 0xA0; enable looping, associate with lc 1
 ; init loop, prime the data buffers
 src1 =uh *Ga_Image1++; -> s1 
 src0 =uh *La_Image0++; -> s0

 prod1 =um ratio1 * src1; -> s1xxs1xx
 sum = ealu(ROT_8: sum\\8); s0xxs1xx -> xxs1xxs0
 prod0 =um ratio0 * src0; -> s0xxs0xx
 src1 = ealu(ROT_8: sum\\8); dummy operation for split multiply
 sum =m prod0 + prod1; sum = s0 + s1 (16 bits each)
 src1 =uh *Ga_Image1++; -> s1 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; the inner loop for the crossfade
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
loop_s: prod1 =um ratio1 * src1; -> s1xxs1xx
 sum = ealu(ROT_8: sum\\8); s0xxs1xx -> xxs1xxs0
 src0 =uh *La_Image0++; -> s0
 prod0 =um ratio0 * src0; -> s0xxs0xx
 src1 = ealu(ROT_8: sum\\8); dummy operation for split multiply
 sum =ub2 sum; sum -> s1
 *La_Image2++ =b sum; s0 -> image
loop_e: sum =m prod0 + prod1; sum = s0 + s1 (16 bits each)
 src1 =uh *Ga_Image1++; -> s1
 *La_Image2++ =b sum; s1 -> image
 br = iprs; return to caller
 width = *sp++;
 nop;









































Indexed Text Retrieval 


A fast, flexible technique for accessing information




Robert Krten


Robert is a principal with PARSE Software Devices. He can be contacted at
rk@parse.com.


With the glut of inexpensive storage devices (disk storage now costs as little
as $0.25/MB) and the availability of large bodies of free (or inexpensive)
data (telephone directories, Internet documents, FAQs, and the like), getting
data onto local storage and indexing it for fast and convenient access has
become vitally important. In this article, I'll present an approach to
developing a fast, flexible text-retrieval system. While the example I'll
develop focuses on a telephone-number database, the design and implementation
can easily be adapted to other types of text databases and formats.
My requirements for a telephone-number database were that it:
Support fairly large amounts of data (549 MB are required for all of Canada,
for example).
Index by phone number in less than one second.
Index by text keys in several seconds. 
I also wanted the search criteria for the database to be as flexible as
possible; for instance, it had to be able to "find all hospitals in Toronto"
or "find all the Smiths on Sunset Boulevard." Disk space was not a
consideration, because of the low price per megabyte.
I wanted the database to index in less than one second because I subscribe to
the phone company's caller-ID service, which delivers the phone numbers of
incoming calls between the first and second ring. I wanted to get associated
information off my hard disk quickly--before the second ring. This turned out
to be simple. In a classic database, the telephone number would serve as the
key into the database and would perhaps be hashed, and then looked up. In my
database, the telephone number represents a pathname. I can open the file
specified by the pathname and do a binary search on it. UNIX file systems are
not terribly efficient at searching through thousands of filenames to find the
file that you want to open, so I implemented a "digit-tree" file structure.


Digit Tree


In a digit-tree structure, a root directory contains the first digit of the
phone number ("6" for "613", for example). Each directory has subdirectories
named after the subsequent digits in the phone number. For example,
"6135141212" would be stored in a data file called /databaseRoot/ 6/1/3/5/1/4,
where the last 4 is not a directory, but a file containing all of the 613514
numbers, sorted by the last four digits of the phone number (referred to as
the "station code"). Example 1 illustrates what the 4 file might contain.
The advantages of this organization are numerous. First, the file system has
to open only a few directory levels (a reasonably quick operation) and search
through just one small file. My experience with the telephone directory for
Ontario, for instance, indicates that office codes (the "514" in the previous
example) are typically about half full--approximately 5000 records. Secondly,
a fair amount of data compression is built into this system. Since the
individual files are stored in a hierarchical structure, the area code and
office-code information are implied by the file pathname itself, and do not
have to be repeated as data within the file.
The other advantage is that the 613514 entry is still a small, flat ASCII text
file--I can edit it without a specialized (and substantially more limited)
database editing utility. 
Of course, if you wanted to squeeze some data space out of this scheme, the
station code--instead of being represented in 4-character ASCII followed by a
space--could be squeezed into just 14 bits (say, two bytes for ease, saving
three bytes).
So far, this isn't rocket science. Converting your existing telephone-number
database into that particular organization is left as an exercise to the
reader--I used a small C program that took a few hours to write and several to
run. In fact, strictly speaking, you can use most of the other ideas in this
article without reorganizing the database.


Search by Text


To index text keys in several seconds using the scheme described so far, the
database is split out by area code and office code over a large number of
subdirectories and files (I have 8221 files in my database at the time of
writing!). Obviously, it isn't practical to open every file looking for a
record that matches a bunch of keys.
Furthermore, not all telephone records have the same data in the same order.
As I accumulated telephone-directory information, I merged from many different
databases (Canadian White Pages CD-ROM, Ottawa/Hull Pay Phone database,
private databases of acquaintances, and the like), each with its own format.
(The example for 613514 is typical.) This precluded using the traditional
database-design trick of creating last-name, first-name, and street-name
indexes.
In fact, I had only two pieces of data that made any sense across all of the
different formats: a telephone directory number (DN) and some descriptive
text. Since the descriptive text is not "tagged" (as firstname, lastname,
address, and so on), it all has to be considered during a keyword search. This
means that every word in the database has to be indexed--the premise for the
"indexed text retrieval" (ITR) program I present in this article.


Indexed Text Retrieval 


By indexing every word, I can search on the logical AND/OR combination of any
words I choose. For example, to find all of the hospitals in Ottawa, I just
type itr hospital ottawa, where itr is the name of the ITR program, and
hospital and ottawa are text keywords. The ITR program will print out all
phone-number records that match the two keywords. (Note that this version of
ITR is case insensitive.)
Implementation of ITR requires two separate programs--an indexing program and
a lookup program. Listing One is the ITR.H file for the ITR program, while
Listing Two is the main ITR retrieval program, ITR.C. Other relevant files
(the ITR Indexer, a common library, and a makefile) are available
electronically; see "Availability," page 3.
Indexing every word involves opening each input file, reading each line
(station code and text), and parsing out the words. In my implementation, I
skip anything that is not "A" through "Z". Therefore, for each word, I have
the following information: filename (in this case, the area code and office
code), position (in this case, the station code), and the word itself. I don't
care about the word's relative position on the line--I just want it associated
with a given DN. Then when I specify the search keys, I don't have to specify
them in any particular order; "ottawa hospital" is equivalent to "hospital
ottawa".
Managing this amount of data is in itself a challenge since this index is
quite large. 


The Alpha Tree 


After the digit-tree approach succeeded, I decided to experiment with an alpha
tree. Using this scheme to find the keys for "ottawa," I simply open the file
"o/t/t/a/w/a.db" and find all of the keys. The keys are directly translatable
into the telephone-database record. To preserve disk space, I had to make some
decisions about the extent of the alpha tree. I first chose a maximum depth of
five levels. Depending upon the data that you wish to index, this restriction
may be drastically altered to suit your requirements.
To generate the index, I implemented a "bucket-sort" algorithm that works like
this: During analysis, I take each DN and word and generate a 10-byte entry
consisting of six bytes from the word (null padded) and four bytes
representing the DN. Then, I append this entry to the file with the same name
as the first character of the word. For example, the word "weather" is
stripped to just five characters ("weath"), and, along with the key
(6135141212), appended to a file called "w.1". The next word, "Number," is
also stripped to five characters ("numbe"), and along with the same key, is
appended to a file called "n.1".
Once all words in all files are processed, each ".1" file is opened and
analyzed. If the number of records in the file warrants splitting it up (this
is a tunable parameter in the source), a subdirectory is created and the
process repeats, using the next character of the word as the basis for the
filename. For example, when the "w.1" file is analyzed, the word "weath" is
written to "w/e.1". This process continues until either five levels of
directory structure have been generated, or the ".1" file is small enough not
to split. Finally, when all of the files are split, the partial word is
stripped out of the files since it is no longer required, making each record
contain only the 4-byte (telephone number) key.
The final result is an alpha-tree structure on disk with files consisting of
indexes representing phone numbers. This is similar to a classical hash
function, except that the length of the hash is flexible, depending upon the
number of entries. Table 1, which lists the files I have on my disk, shows
that not as many words start with "WEK" as with "WELC". Interestingly, as the
number of levels increases, so does the theoretical number of files. Table 2
summarizes the theoretical maximum number of files and the number actually
contained in my database.



Retrieval Software


To access the stored information, the retriever takes the keywords passed on
the command line and finds matching database records. If only one keyword is
specified, the retriever opens the associated keyword file; for example, if
the keyword is "welcome," the retriever opens "w/e/l/c.db". Then, for each
entry in the keyword file, the retriever opens the corresponding database
file, performing secondary matching on the database records as they go by. Any
records that fully match are printed.
The reason for a secondary match is that ".db" files are unique only as far as
their filenames go. For example, the w/e/l/c.db file contains keys for all
words that start with "welc": "welcome," "welch," and the like.
In the case of two or more keywords, the retriever opens all specified keyword
files and finds keys that are the same in all of them. (Since the key files
are sorted by key, this is relatively simple.) Once keys are found that are
the same in all files, the telephone-database file is opened and a secondary
match is performed to check that the keywords are all present in the database
entry.
The current version of the retriever (see Listing Two) provides only an AND
function--all keywords listed on the command line must match for the record to
be printed. It is simple to add an OR capability and partial-key matching.


Extending the ITR Concept


How can you extend this system to usefully index arbitrary bodies of text?
Start by revisiting the indexer, which generates alpha-tree entries that
contained the index. The index consists of a complete telephone number. You
interpret the telephone number in two parts; together, the area code and
office code make up the "filename," and the station code (the last four
digits) is the "record position" within the specified filename.
By revising the indexer to generate 8-byte indexes--four bytes for the
filename (as a file ID number, DOCNUM) and four for the word position within
that file--arbitrary text can be indexed. In fact, the sample
telephone-database indexer is a subset of this approach.
As an aside, you can't really store a complete telephone number into four
bytes. I "cheated" by creating a table of area-code and office-code pairs,
then stored a six-digit index in this table. For instance, my database has
8221 area-code/office-code pairs, which means that the index ranges from
0000010000 through to 0082219999--well within the range of four bytes. Some
string manipulation is then performed on the decimal ASCII representation of
the four bytes, and the area-code/office-code pair and station ID are split
out or merged in, depending upon the operation being performed.
The interpretation of the 4-byte file position is left up to you. For certain
types of database searches, the four bytes might represent the word's byte
offset from the start of the file. For other types, it may represent the
relative word number in the file. In the telephone-database example, it
represents the record number. 
Figure 1 illustrates a flexible approach to implementing an ITR program. In
this scheme, any documents being added to the ITR database are assigned a
document number in the DOCNUM database. This allows an association to be made
between the filename (which can be arbitrarily long and complex; hypertext
links, for example) and a 4-byte document number to be established.
Next, another database is established that allows the association between the
document number and a document type (DOCTYPE). The document type is used by
the front-end analysis scheme to choose an appropriate analyzer, and by the
back-end matching scheme to choose a corresponding matcher.
To index a file once the DOCNUM and DOCTYPE database entries have been
created, every word within the file is processed by the appropriate front-end
analyzer; the document number, word position, and word itself are sent to the
indexer. The indexer creates/updates the alpha-tree based upon the word,
adding the document number and word position at the appropriate place.
To retrieve a document, a generic matcher is passed the list of keywords being
searched for, and finds them in the alpha tree. Every time the generic matcher
finds a match (based upon the alpha-tree information), it calls the specific
matcher for that file type (from the DOCTYPE database) to verify the match. If
the match is verified, the appropriate information is printed. Again,
depending upon the exact requirements, there may be no need for a back-end
matcher; for example, if the alpha-tree information is complete, rather than
partial.
Much of the system's flexibility comes into play in the DOCTYPE database and
analyzer and matcher. The analyzer and matcher operate as a pair, so it is
entirely up to them to interpret the word position returned from the generic
matcher. As an example, the DOCTYPE could indicate a PostScript, spreadsheet,
word processor, or other type of arbitrary file or document.
Table 1: Sample list of files.
Filename Size (Bytes) Records
w/e/k.db 36 9
w/e/l/m.db 28 7
w/e/l/e.db 32 8
w/e/l/b.db 1060 265
w/e/l/c.db 2580 645
w/e/l/d/a.db 36 9
Table 2: Theoretical maximum number of files versus number actually observed
in sample database. Consuming 141.2 MB of index for the 549-MB database.
Level Maximum Actual Percentage
1 26 26 100.00
2 676 462 63.02
3 17 576 5 730 32.60
4 456 976 23 949 5.24
5 11 881 376 32 314 <0.01
Example 1: Contents of a typical digit-tree file.
1212 Weather Number, 24 hours / day, Ottawa Valley
1213 Liz and Brian's Pizza House, 2666 Temp Street, Ottawa
1443 Pay Phone, 55 Queensway Offramp
Figure 1: Implementing an ITR program. (a) INDEXER; (b) RETRIEVER.

Listing One
/* itr.h 
 * QNX 4
 * (C) Copyright 1993 by Robert Krten, all rights reserved.
 * This module contains the manifest constants and associated declarations
 * for the "ITR" program.
*/
/* manifest constants */
#define MaxDBLineLen 256 /* maximum length of database line */
#define NumKeys 10
#define KeyCacheSize 1024
#define MaxKeyWordLength 64
#define MaxFilenameLength 256
/* structure definitions */
typedef struct

{
 char keyword [MaxKeyWordLength];
 char fname [256];
 FILE *fp;
 long val;
 long *cache;
 int cachePtr;
 int maxcache;
} Key;
/* prototypes */
void init (void);
void ftrSearch (void);
void findNextKey (int);
void printKey (int);
void printMatch (int);
void initKeys (void);
int databaseMatch (char *, char *);

Listing Two
/* main.c -- QNX 4, main.c shell version 0.004
 * (C) Copyright 1995 by Robert Krten, all rights reserved.
 * This module represents the main module for the itr retriever.
 * This program will retrieve text based on a full-text retrieval database.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <ctype.h>
#include "itr.h"
#include "lib.h"
char *progname = "itr";
char phoneDB [MaxFilenameLength]; /* database filename roots */
char root [MaxFilenameLength]; /* base of itr database */
char npanxx [MaxFilenameLength]; /* base of document # xlat */
char *translation; /* allocated by read_translation */
int numtrans; /* number of above */
int longerThanIndex [NumKeys]; /* 1 if strlen (searchWord) > 5 */
long topKey; /* highest of any */
long maxKey; /* largest valid key */
int numSearch;
Key keys [NumKeys]; 
/* the main key structure */
main (argc, argv)
int argc;
char **argv;
{
 optproc (argc, argv); /* get keys from command line */
 init (); /* initialize search engine */
 itrSearch (); /* search engine */
}
void
optproc (argc, argv)
int argc;
char **argv;
{
 int i;
 numSearch = 0;

 for (i = 1; i < argc; i++) {
 if (numSearch >= NumKeys - 1) {
 fprintf (stderr, "%s: Too many search keys, limit is %d\n",
 progname, NumKeys);
 exit (1);
 }
 strcpy (keys [numSearch++].keyword, argv [i]);
 }
}
void
init ()
{
 char *p;
 if ((p = getenv ("PHONEDB")) != NULL) {
 strcpy (phoneDB, p);
 } else {
 strcpy (phoneDB, "/data/telecom/phoneDB");
 }
 if (phoneDB [strlen (phoneDB) - 1] != '/') {
 strcat (phoneDB, "/");
 }
 sprintf (root, "%s1", phoneDB);
 sprintf (npanxx, "%s/.npanxx", root);
 read_translation (npanxx); /* fetch the translation table */
 maxKey = numtrans * 10000 + 9999; /* maximum key value */
}
/* prepareSearchFname -- generate alpha-tree pathnames corresponding to each 
 * keyword specified.
*/
void
prepareSearchFnames ()
{
 int i;
 int l;
 FILE *fp;
 char shortFnames [NumKeys][11]; /* /a/b/c/d/e\0 */
 for (i = 0; i < numSearch; i++) {
 /* generate flag array */
 longerThanIndex [i] = strlen (keys [i].keyword) > 5;
 /* trim search key */
 for (l = 0; l < 5; l++) {
 shortFnames [i][l * 2 + 0] = '/';
 shortFnames [i][l * 2 + 1] = tolower (keys [i].keyword [l]);
 }
 shortFnames [i][10] = 0; /* terminate preventively */
 /* for short strings, the \0 that got stuck into shortFnames is
 * almost doing its job--there will be a trailing '/' which we
 * kill off here
 */
 if (shortFnames [i][strlen (shortFnames [i]) - 1] == '/') {
 shortFnames [i][strlen (shortFnames [i]) - 1] = 0;
 }
 /* now, see which ones are valid. If not valid, trim off two characters
 * at the end and try again. For short filenames, we (uselessly) trim 
 * off characters after the \0 -- this all comes out in the wash.
 */
 for (l = 4; l >= 0; l--) {
 sprintf (keys [i].fname, "%s%s.db", root, shortFnames [i]);
 if ((fp = fopen (keys [i].fname, "r")) != NULL) {

 break;
 }
 shortFnames [i][l * 2] = 0; /* kill off part of name */
 }
 if (fp != NULL) {
 fclose (fp);
 } else {
 fprintf (stderr, "%s: couldn't find database for 
 keyword %s!!!\n", progname, keys [i].keyword);
 exit (1);
 }
 }
}
/* itrSearch -- The search engine itself.
 * We change the keywords that we are searching for into actual alpha-tree
 * filenames (via preparseSearchFnames). Then, we open all searchkey files,
 * and try to find keys that match.
*/
void
itrSearch ()
{
 int i;
 int change;
 int allEqual;
 /* translate search strings into valid filenames */
 if (isdigit (keys [0].keyword [0])) { /* kludge -- if it's a number */
 if (strlen (keys [0].keyword) != 10) {
 fprintf (stderr, "%s: reverse number lookup requires a 
 10 digit DN!\n", progname);
 exit (1);
 }
 reverseLookup (keys [0].keyword);
 return; /* done */
 }
 prepareSearchFnames ();
 initKeys ();
 /* initialize variables for search */
 for (i = 0; i < numSearch; i++) {
 keys [i].val = 0;
 }
 topKey = 0; /* init, findNextKey uses it */
 /* now the real search algorithm begins. Initialize by searching first 
 * file, and assigning to 'topKey'
 */
 findNextKey (0); /* find next (ie: first) key for string zero */
 topKey = keys [0].val;
 while (topKey != -1) { /* until EOF */
 /* advance all keys that are below topKey, until they are either at
 * topKey, or above. If a key goes above topKey, reassign topKey and 
 * continue. If a key is equal, skip until all keys are equal, or 
 * topKey has been adjusted. If all keys are equal, emit a match, and
 * advance any key. Ensure that key has in fact advanced!
 */
 change = 1;
 while (change) {
 change = 0;
 for (i = 0; i < numSearch; i++) {
 while (keys [i].val < topKey && keys [i].val != -1) {
 findNextKey (i);

 if (keys [i].val > topKey) {
 topKey = keys [i].val;
 change = 1;
 }
 }
 if (keys [i].val == -1) {
 return;
 }
 }
 }
 /* since there is no more change, ensure all keys are equal */
 allEqual = 1;
 for (i = 0; i < numSearch - 1; i++) {
 if (keys [i].val != keys [i + 1].val) {
 allEqual = 0;
 }
 }
 if (!allEqual) {
 return; /* Couldn't find a match */
 }
 printMatch (keys [0].val); /* validate and then show matches */
 while (keys [0].val == topKey) {
 findNextKey (0); /* advance to next search */
 }
 topKey = keys [0].val;
 }
}
/* reverseLookup -- prints database entry associated with the passed DN */
void
reverseLookup (dn)
char *dn;
{
 char obuf [256];
 if (databaseMatch (dn, obuf)) {
 printf ("%s: %s\n", dn, obuf);
 } else {
 printf ("No match\n");
 }
}
void
initKeys ()
{
 int i;
 for (i = 0; i < numSearch; i++) {
 if ((keys [i].fp = fopen (keys [i].fname, "r")) == NULL) {
 fprintf (stderr, "%s: couldn't open %s for r\n",
 progname, keys [i].fname);
 exit (1);
 }
 if ((keys [i].cache = malloc (sizeof (long) * 
 KeyCacheSize)) == NULL) {
 fprintf (stderr, "%s: couldn't allocate a cache of %d
 bytes!\n", progname, sizeof (long) * KeyCacheSize);
 exit (1);
 }
 keys [i].maxcache = fread (keys [i].cache, sizeof (long),
 KeyCacheSize, keys [i].fp);
 if (keys [i].maxcache < KeyCacheSize) {
 fclose (keys [i].fp); /* close it, we've read it all */

 keys [i].fp = NULL; /* indicate closed */
 }
 keys [i].cachePtr = 0; /* current position for read */
 }
}
/* findNextKey -- fetches next key that is greater than "topKey" for the given
 * keyword number, and handles the cache aspects too.
*/
void
findNextKey (k)
int k; /* search string number */
{
 /* could be replaced with binary search for even greater speed */
 do {
 if (keys [k].cachePtr < keys [k].maxcache) {
 keys [k].val = keys [k].cache [keys [k].cachePtr++]; 
 /* "read" value */
 } else { /* exceeded cache */
 if (keys [k].fp == NULL) { /* no more to read, done */
 keys [k].val = -1; /* set EOF */
 } else {
 keys [k].maxcache = fread (keys [k].cache, sizeof (long), 
 KeyCacheSize, keys [k].fp);
 if (keys [k].maxcache < KeyCacheSize) {
 fclose (keys [k].fp);
 keys [k].fp = NULL;
 }
 keys [k].val = keys [k].cache [0];
 keys [k].cachePtr = 1;
 }
 }
 } while ((keys [k].val < topKey) && (keys [k].val != -1));
}
/* printMatch -- fetches matching record, and performs secondary matching
 * validation on it. Due to the algorithm used in the ITR program,
 * it is possible that a "false match" will be generated (see article).
 * The procedure ensures that no false matches are printed.
*/
void
printMatch (key)
int key;
{
 char number [256]; /* translated key -> number */
 char result [256]; /* database record fetched */
 char buf [16]; /* temp, key -> ASCII xlat */
 int npanxx; /* temp, NPA/NXX from key */
 int station; /* temp, station code from key */
 int matching [NumKeys]; /* matching key matrix 1==match */
 char word [256]; /* word buffer for 2ndary matches */
 char *ptr; /* buffer pointer for fetching words */
 int i;
 if (key == -1) {
 printf ("<EOF>\n");
 } else if (key > maxKey) {
 printf ("<INVALID %010ld>\n", key);
 } else {
 sprintf (buf, "%010ld", key);
 station = atoi (buf + 6);
 buf [6] = 0;

 npanxx = atoi (buf);
 sprintf (number, "%6.6s%04d", &translation [npanxx * 7], station);
 /* fetch complete database record into 'result' */
 if (databaseMatch (number, result)) {
 /* now check 2ndary matches on all keys */
 for (i = 0; i < numSearch; i++) {
 matching [i] = 0; /* initially, no matches */
 }
 ptr = result;
 while (getWordLC (&ptr, word)) {
 for (i = 0; i < numSearch; i++) {
 if (!strcmpi (word, keys [i].keyword)) {
 matching [i] = 1;
 }
 }
 }
 /* see if all matched 2ndary search */
 for (i = 0; i < numSearch; i++) {
 if (!matching [i]) {
 break;
 }
 }
 if (i == numSearch) {
 printf ("%s %s\n", number, result); /* all matched, print */
 } /* else didn't match *all* keys! */
 } else {
 /* internal bug, couldn't find key in database */
 printf ("Can't match %s in database!!!\n", number);
 }
 }
}
static char tmpbuf [MaxDBLineLen];
/* databaseMatch -- uses binary search with variable length lines to find 
 * the number in the text database.
*/
int
databaseMatch (number, result)
char *number; /* international DN */
char *result; /* looked-up name, or \0 */
{
 FILE *fpr; /* db open for read */
 char *ptr;
 int len;
 long top, bot; /* binary search values */
 long mid;
 int sts;
 createDigitTreeName (number, tmpbuf);
 if ((fpr = fopen (tmpbuf, "r")) == NULL) {
 return (0);
 }
 len = strlen (number); /* how big is the number? */
 bot = 0; /* start of the file */
 fseek (fpr, 0L, 2); /* get to end */
 top = ftell (fpr); /* limit the search */
 while (top - bot > 512) { /* while it's worthwhile */
 mid = (top + bot) / 2;
 fseek (fpr, mid, 0);
 fgets (tmpbuf,MaxDBLineLen, fpr); /* skip to end of damaged record */
 mid = ftell (fpr);

 fgets (tmpbuf, MaxDBLineLen, fpr);
 tmpbuf [strlen (tmpbuf) - 1] = 0;
 if (!(sts = strncmp (number + len - 4, tmpbuf, 4))) {
 ptr = tmpbuf + 4;
 while (*ptr && isspace (*ptr)) {
 ptr++;
 }
 strcpy (result, ptr);
 fclose (fpr);
 return (1);
 } else {
 if (sts < 0) {
 top = mid;
 } else {
 bot = mid;
 }
 }
 }
 fseek (fpr, bot, 0); /* rewind to last "bot" position */
 while (!feof (fpr) && fgets (tmpbuf, MaxDBLineLen, fpr) != NULL) {
 tmpbuf [strlen (tmpbuf) - 1] = 0;
 if (!(sts = strncmp (number + len - 4, tmpbuf, 4))) {
 ptr = tmpbuf + 4;
 while (*ptr && isspace (*ptr)) {
 ptr++;
 }
 strcpy (result, ptr);
 fclose (fpr);
 return (1);
 }
 if (sts < 0) { /* exceeded number, must not be there */
 return (0);
 }
 }
 fclose (fpr);
 return (0);
}
/* converts "num" to a digit-tree "path" */
createDigitTreeName (num, path)
char *num;
char *path;
{
 int len;
 char *ptr;
 *path = 0;
 sprintf (path, "%s/", root);
 ptr = &path [strlen (path)];
 len = strlen (num);
 while (len > 4) {
 *ptr++ = *num++;
 *ptr++ = '/';
 *ptr = 0;
 len--;
 }
 --ptr;
 *ptr = 0;
}


































































Porting VxDs from Windows 3.1 to Windows 95


The devil is in the details




Don Matthews


Don is president of Nexus Technologies, a company specializing in SCSI
networking software for Windows. He can be reached at nexus@usa.net.


Do you have a 16-bit Windows 3.1 application that communicates with a VxD of
your own design? It would certainly be nice if you could use the same VxD and
port the application to Windows 95 by simply recompiling it with a 32-bit
compiler. Unfortunately, there's a little more to it. The methods for
obtaining the entry point to the VxD and for calling its services have
changed, requiring modifications to both the calling application and the VxD
itself.
In this article, I'll present a VxD that works with both Windows 3.1 and
Windows 95 applications and describe the different techniques for calling its
services. I'll also describe an alternate approach to one of the most common
VxD programming problems: posting a message to an application. The application
and driver presented here support multiple cards, shared-memory allocation,
and interrupt processing.
Take a look at PORTAGE, a simple, 16-bit Windows 3.1 application that drives
the VxD and get acquainted with the various project files and directories; see
Table 1. (The complete PORTAGEsystem, including executables, source, and the
files in Table 1, is available electronically; see "Availability," page 3.)
All these projects work well in the Visual C++ environment, even though they
use external makefiles. Using an external makefile makes it easier to
understand the build process and the tools being used, but it also makes it
more difficult to manage large projects. If you're adapting this work to a
project with many files, I recommend letting Visual C++ create and maintain
the build environment for you.


The Test Harness


PORTAGE.EXE is a simple Windows application, modeled after the GENERIC sample
in the Windows 3.1 Programmer's Guide. It contains a standard window procedure
that processes messages and calls other procedures in response to user input,
and it illustrates the concepts required for servicing multiple hardware
devices with a single VxD. To service multiple devices, instance data--data
that is unique to each instance of the hardware--must be maintained down in
the VxD. Here, that data is allocated by the calling application (or DLL) and
then passed down to the VxD during a registration process. This makes it
possible to share data between an application or DLL running in Ring 3 and a
VxD running in Ring 0 so that multiple instances of the application may be
executed. Each time a new instance is run, the application allocates a data
structure that holds all the data of interest to both the VxD and the
application. When an instance is shut down, the VxD is notified and the memory
is freed.
The user interface for PORTAGE.EXE is easily extensible. There are two
pull-down menus: "VxD" and "Simulate." The VxD menu lets the developer call
various routines in the VxD; the Simulate menu artificially generates
conditions for testing purposes, without actually calling into the VxD. For
example, the VxD described in this article (PORTAGE.386) provides an API
routine that posts a message. Selecting "Post" from the VxD menu calls the
associated API routine in the VxD. But selecting Post from the Simulate menu
causes the application itself to call PostMessage() without invoking the VxD
services. When using a debugger, this allows you to isolate a problem to a
particular VxD or application-level routine. 
The VxD provides the application with an API that consists of three routines:
one to register the instance of the application with the VxD, another to post
a message, and a third to shut down the instance. The application-level
interface to the VxD routines consists of three functions in Listing One. The
first function, portage_vxd_register(), is called when the application
processes the WM_CREATE message. This function first allocates the memory to
hold the shared data structure. This is accomplished via a call to
GlobalAlloc(). This seemingly innocuous bit of programming is actually
somewhat tricky due to undocumented and/or misunderstood behavior of Windows
3.1 memory-management functions. For example, the call to GlobalAlloc() uses
the GPTR attribute, which is the same as using both the GMEM_FIXED and the
GMEM_ZEROINIT attributes. You might, therefore, presume that the memory is
fixed in place, but it is fixed only if the call is made from a DLL and not
from an application. GlobalLock() does not lock the memory in place either--it
only translates a memory handle into a pointer. The memory is "locked" only in
the sense that its <selector:offset> value does not change. Only a call to
GlobalFix() will keep the memory from moving around. In a DLL, this call would
be redundant when used with the GMEM_FIXED attribute. 
But why should we even worry about whether or not the memory is fixed? Since
the memory is being shared with a VxD, the VxD will map the Ring 3 address,
which is of the <selector:offset> form, into a Ring 0 address, which is
linear. If Windows moves the data by changing the selector's linear base
address, the VxD will no longer point to the same data as the application,
resulting in major problems. And that will definitely happen if the data is
not fixed in place. Furthermore, if the data will be touched at interrupt
time, it must be page-locked. In our case, there is no interrupt processing
down in the VxD, but there likely would be if the VxD dealt with actual
hardware devices--which is why it is shown here.
The VxD entry point is obtained through a call to INT 2Fh with AX=1684h. That
call provides the entry point to the VxD's Protected Mode API, which the
application then uses to call into the VxD and register this instance of the
application. The address of the shared-data structure is passed in as shown,
and the VxD returns a status code indicating success or failure. If
successful, the VxD will have written an index value, or instance-data handle,
into the shared-data structure. This instance-data handle is used in
subsequent calls to the VxD so that it can locate the correct instance data,
and it is not to be confused with the application-instance handle passed into
WinMain() when an application is started.
The portage_vxd_post() function instructs the VxD to post a message. Only the
function number and the instance-data handle must be sent to the VxD. The
application registers all other parameters for the call to PostMessage() in
the shared data structure in response to the Post selection in the VxD menu.
Finally, portage_vxd_shutdown() notifies the VxD that a particular instance is
being closed, or shut down. This routine is called in response to the
WM_DESTROY message and must also provide the VxD with only the function number
and instance-data handle. The VxD is told that the memory for this instance is
going away shortly, so after the call into the VxD, the application frees the
memory.


A Schizoid Driver


The source code for the VxD is contained in the file PORTAGE.ASM. This VxD
contains a Protected Mode (PM) API and a Virtual-86 Mode (V86) API, although
the V86 API only provides the GetVersion routine. Listing Two lists the three
PM API routines described earlier.
The first VxD routine invoked by the application is PORTAGE_PM_Register(). The
VxD maintains an array containing all the instance-data pointers for all
active application instances. It searches this array for the first entry equal
to zero. The resulting value is the instance-data handle, which is then
returned to the calling application. If no open slots are available, a status
value indicating failure is returned.
After finding an open slot, the PM address of the instance data is converted
to a linear address with the VMM service Map_Flat. The instance-data handle is
then written into the shared data structure, and the linear address is stored
in the table. The VxD can access members of the shared data structure simply
by placing the structure's linear address in EDI and using an offset. The
offsets are defined in an assembly-language data structure in the include file
PORTAGE.INC. It is extremely important to keep the assembly-language structure
definition in sync with the C-language structure definition. If a structure
member is added or removed from one but not the other, the communication
mechanism breaks down. When two different code sets attempt to access members
of the shared data structure using different offsets, results are
unpredictable. If you have many shared structures, or structures changed by
many different people, you should use a tool that addresses this problem, like
H2INC.
When an application closes, it relinquishes its instance-data handle by
calling PORTAGE_PM_Shutdown(). This routine simply places a zero in the array
of instance-data pointers maintained by the VxD, making that slot available
for a subsequent instance of the application.
Typically, VxDs interface with some hardware device, detecting certain
conditions or transferring data. Events of interest to the application layer
are usually communicated via a Windows message sent to a window registered
with the VxD. One common technique is to register the address of a DLL that
calls PostMessage(), but this requires that the DLL routine be in a fixed code
segment. What if your design doesn't include a DLL? Windows does not allow you
to place a routine in a fixed code segment in your application or statically
linked library, so why not eliminate the "helper" function altogether and make
the call to PostMessage() directly from the VxD? The code excerpt in Listing
Three illustrates this technique.
The code in PORTAGE_Schedule_System_VM() shows how to set up a callback to
another routine that performs a nested execution block for the purpose of
calling Ring 3 code. One neat trick is to pass the instance-data
pointer--initially in EDI--to the callback routine as reference data stored in
EDX. The callback occurs when the system is running in the context of the
system VM and the other specified conditions are satisfied. This routine,
PORTAGE_Schedule_System_VM(), may safely be called as-is from within an
interrupt service routine (ISR) executing in Ring 0.
The callback routine, PORTAGE_Do_Ring3_Callback(), sets up and executes a
nested execution block. But first it retrieves the reference data passed in
EDX and stuffs it into EDI. Now this routine has access to the shared data
structure. The application includes the address of PostMessage() in the shared
data structure, along with other information. If this VxD were an interface to
real hardware, some of this data (such as the particular Windows message or
other parameters) could be determined on the fly down in the VxD in response
to various hardware conditions.


The Harness Revisited


We'll now port the test application by building it with 32-bit, Visual C++ 2.x
rather than 16-bit, Visual C++ 1.5. The problems of recompiling your
application in the 32-bit world are usually fairly minor. The electronic
listings give details on setting compiler and linker command-line switches. Be
aware that the format for the link command has changed with the new set of
tools and that integer size has changed from 16 to 32 bits. This may not be a
problem in some places in your code, but if you use an integer in a data
structure that communicates with a VxD or some other process running on a
hardware device, be very careful. I recommend never using integers in these
cases; instead use some data type that precisely specifies the length of the
data, such as WORD or DWORD. 
Listing Four shows the new method for obtaining the entry point to the VxD
(see portage_vxd_register()). Instead of an explicitly called entry point, the
application obtains a handle to the VxD that is used in subsequent calls to
DeviceIOControl(). With the advent of the Win32 API, you can no longer use
inline assembly and INT 2Fh to obtain the entry point of a VxD; only C code
will do. Parameters that need to be passed into the VxD are specified as
parameters to the DeviceIOControl() routine. Those parameters are then placed
in a data structure, a pointer to which is passed into the VxD. The VxD can
then access the parameters as offsets from the pointer that is passed in.
The first two parameters to DeviceIOControl() are the device handle and the
function number. The documentation states that the rest of the parameters are
free-format; that is, you can place anything in those parameters, and they
will not be checked. However, this is only partially true. If you pass
pointers, you should place them in the positions in the parameter list where
casting is done to a pointer type. If not, the call to DeviceIOControl() will
fail. In fact, if you have many parameters, you are better off putting them in
a structure and simply passing a pointer to that structure, as shown in the
SDK. Also, it is impossible to call function zero in your VxD. That function
is called during the processing of the CreateFile() routine, but subsequent
attempts to call it with DeviceIOControl() will fail. If the call does fail,
useful error information can be obtained by calling GetLastError(). The Win32
SDK documents how to interpret that error information. 


Completing the Port


To use your VxD in Windows 95, you must use the new VMM include files in the
DDK and the new assembler, MASM 6.11. (These files will not assemble using the
old assembler, MASM 5.10B.) See the electronic listings for the exact
command-line switches.
Calling API routines in a Windows 95 VxD has changed significantly from
Windows 3.1. Rather than calling directly into an API routine whose entry
point was furnished to an application, a W32_DEVICEIOCONTROL message is now
sent to the VxD control procedure when the application calls
DeviceIOControl(). The handler for that message, as specified in the control
procedure, is the new W32 API handler down in the VxD. When
W32_DEVICEIOCONTROL is sent to the VxD, ESI contains a pointer to a data
structure, called DIOCParams, that contains all the parameters specified in
the call to DeviceIOControl(). This new W32 API handler is very much like the
old PM API handler, except that it must first extract the function number from
DIOCParams before dispatching control to the desired API routine. It also must
return a successful completion when given -1 as the function number; see
CloseFile() (which is automatically sent by the system when the application
exits).
The code for PORTAGE_W32_Register() is very similar to its PM counterpart; see
Listing Five. One major difference is that the address of the shared data
structure passed in from the application need not be converted to a linear
address. It's already a 32-bit address that can be used by the VxD directly.
This address is extracted from the DIOCParams structure and stored for future
use, just as before.

Finally, the code for posting a message has changed; see Listing Six. There is
now a SHELL service for posting a message, so you needn't explicitly schedule
the system VM and perform a nested execution block. The service
_SHELL_PostMessage() takes care of those details, providing a much simpler
interface for the VxD programmer.


Conclusion


A VxD written for Windows 3.1 can be modified so that it will work with both
16-bit Windows 3.1 applications and 32-bit Windows 95 applications. This VxD
also incorporates key concepts necessary for the support of multiple hardware
devices.
The porting process isn't so daunting once you realize that much of the
current VxD can be reused. There are just some additional messages that the
control procedure can respond to; some new services; and some new features,
like dynamically loading a VxD.
Table 1: PORTAGE projects and directories.
Project Description
\portage\w31\app 16-bit Ring-3 Windows 3.1 application.
\portage\w31\vxd 32-bit Ring-0 Windows 3.1 VxD.
\portage\w95\app 32-bit Ring 3 Windows 95 application.
\portage\w95\vxd 32-bit Ring-0 Windows 95 VxD.

Listing One
/**********************************************************************
 * PORTAGE application API Routines (Win 3.1 version), by Don Matthews.
 **********************************************************************/
HGLOBAL hMem; /* global memory handle */
SharedDataFormat *SharedData; /* data shared between the app and VxD */
DWORD VxD_Entry; /* VxD entry point */
BOOL portage_vxd_register( void )
{
 BOOL Result = FAILURE;
 hMem = NULL;
 SharedData = NULL;
 VxD_Entry = NULL;
 hMem = GlobalAlloc( GPTR, sizeof(SharedDataFormat) );
 if( !hMem ) return( Result );
 SharedData = (SharedDataFormat *)GlobalLock( hMem );
 if( !SharedData ) return( Result );
 GlobalFix( hMem );
 GlobalPageLock( hMem );
 _asm
 { mov ax, 1684h
 mov bx, PORTAGE_Device_ID
 int 2Fh
 mov word ptr [VxD_Entry + 2], es
 mov word ptr [VxD_Entry + 0], di
 }
 if( VxD_Entry )
 { _asm
 { mov ax, API_PORTAGE_PM_Register
 mov bx, word ptr [SharedData + 2]
 mov cx, word ptr [SharedData + 0]
 call dword ptr [VxD_Entry]
 mov Result, ax
 }
 }
 return( Result );
}
/****************************************************************/
BOOL portage_vxd_post( void )
{
 BOOL Result = FAILURE;
 WORD InstanceIndex;
 if( VxD_Entry )
 {

 InstanceIndex = (WORD)SharedData->InstanceIndex;
 _asm
 { mov ax, API_PORTAGE_PM_Post
 mov bx, word ptr [InstanceIndex]
 call dword ptr [VxD_Entry]
 mov Result, ax
 }
 }
 return( Result );
}
/****************************************************************/
BOOL portage_vxd_shutdown( void )
{
 BOOL Result = FAILURE;
 WORD InstanceIndex;
 if( VxD_Entry &&
 (SharedData->InstanceIndex != PORTAGE_Max_Instances) )
 {
 InstanceIndex = (WORD)SharedData->InstanceIndex;
 _asm
 { mov ax, API_PORTAGE_PM_Shutdown
 mov bx, word ptr [InstanceIndex]
 call dword ptr [VxD_Entry]
 mov Result, ax
 }
 }
 if( hMem )
 { GlobalPageUnlock( hMem );
 GlobalUnfix( hMem );
 GlobalUnlock( hMem );
 GlobalFree( hMem ); 
 }
 return( Result );
}
/****************************************************************/

Listing Two
;-------------------------------------------------------------------
; PORTAGE VxD API Routines (Windows 3.1 version), by Don Matthews
;-------------------------------------------------------------------
BeginProc PORTAGE_PM_Register
 Trace_Out "PORTAGE_PM_Register"
 ;
 ; Determine the next available instance index.
 mov eax, 0
RegisterLoop:
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 or edi, edi
 jz short RegisterContinue
 inc eax
 cmp eax, PORTAGE_Max_Instances
 jne short RegisterLoop
RegisterContinue:
 ; Convert the SharedData pointer to a linear address.
 push eax
 mov ah, Client_BX
 mov al, Client_CX
 VMMcall Map_Flat
 mov edi, eax

 pop eax
 cmp edi, -1
 je short RegisterError1
 Trace_Out "Map_Flat - SUCCESS!"
 ; 
 ; Save the instance index in the SharedData structure.
 mov dword ptr [edi.InstanceIndex], eax
 cmp eax, PORTAGE_Max_Instances
 je short RegisterError2
 Trace_Out "Found Instance Index #EAX"
 ;
 ; Save the SharedData pointer.
 mov PORTAGE_Instance_Table[ eax * 4 ], edi
RegisterExit:
 jmp PORTAGE_PM_API_Success
RegisterError1:
 Trace_Out "Map_Flat - FAILURE!"
 jmp PORTAGE_PM_API_Failure
RegisterError2:
 Trace_Out "ERROR - No available instance index!"
 jmp PORTAGE_PM_API_Failure
EndProc PORTAGE_PM_Register
;----------------------------------------------------------
BeginProc PORTAGE_PM_Post
 Trace_Out "PORTAGE_PM_Post"
 ;
 ; Load EDI with a pointer to the SharedData structure.
 ; Subsequent accesses to the members of the structure will
 ; be done using offsets into the structure from EDI, where
 ; EDI is an offset from DS.
 movzx eax, [ebp.Client_BX]
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 call PORTAGE_Schedule_System_VM
 jmp PORTAGE_PM_API_Success
EndProc PORTAGE_PM_Post
;----------------------------------------------------------
BeginProc PORTAGE_PM_Shutdown
 Trace_Out "PORTAGE_PM_Shutdown"
 ;
 ; Load EDI with a pointer to the SharedData structure. Subsequent
 ; accesses to the members of the structure will be done using offsets 
 ; into the structure from EDI, where EDI is an offset from DS.
 movzx eax, [ebp.Client_BX]
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 ; Zero out the instance data pointer
 mov edi, 0
 mov PORTAGE_Instance_Table[ eax * 4 ], edi
 jmp PORTAGE_PM_API_Success
EndProc PORTAGE_PM_Shutdown

Listing Three
;-------------------------------------------------------------------
; Calling PostMessage() from a VxD (Win 3.1 version), by Don Matthews
;-------------------------------------------------------------------
BeginProc PORTAGE_Schedule_System_VM
 mov eax, Cur_Run_VM_Boost
 VMMcall Get_Sys_VM_Handle
 mov ecx, PEF_Wait_For_STI or PEF_Wait_Not_Crit
 mov edx, edi

 mov esi, offset32 PORTAGE_Do_Ring3_Callback
 xor edi, edi
 VMMcall Call_Priority_VM_Event
 clc
 ret
EndProc PORTAGE_Schedule_System_VM
;------------------------------------------------------------
BeginProc PORTAGE_Do_Ring3_Callback
 Push_Client_State
 VMMcall Begin_Nest_Exec
 ; Retrieve the reference data
 mov edi, edx
 ; Check the window handle
 cmp dword ptr [edi.hWnd], 0
 je short Do_Ring3_Callback_Exit
 ; Push the PostMessage parameters for the Ring 3 callback
 mov eax, dword ptr [edi.hWnd]
 VMMcall Simulate_Push
 mov ax, word ptr [edi.msg]
 VMMcall Simulate_Push
 mov ax, word ptr [edi.wParam]
 VMMcall Simulate_Push
 mov ax, word ptr [edi.lParam + 2]
 VMMcall Simulate_Push
 mov ax, word ptr [edi.lParam + 0]
 VMMcall Simulate_Push
 mov cx, word ptr [edi.PostMessageAddr + 2]
 movzx edx, word ptr [edi.PostMessageAddr + 0]
 VMMcall Simulate_Far_Call
 VMMcall Resume_Exec
Do_Ring3_Callback_Exit:
 VMMcall End_Nest_Exec
 Pop_Client_State
 clc
 ret
EndProc PORTAGE_Do_Ring3_Callback

Listing Four
/**********************************************************************
 * PORTAGE application-level routines (Win 95 version) by Don Matthews
 **********************************************************************/
HGLOBAL hMem; /* global memory handle */
SharedDataFormat *SharedData; /* data shared between the app and VxD */
HANDLE hVxD; /* VxD handle */
BOOL portage_vxd_register( void )
{ BOOL Result = FAILURE;
 BOOL DeviceIOResult;
 DWORD Error;
 hMem = NULL;
 SharedData = NULL;
 hVxD = NULL;
 hMem = GlobalAlloc( GPTR, sizeof(SharedDataFormat) );
 if( !hMem ) return( Result );
 SharedData = (SharedDataFormat *)GlobalLock( hMem );
 if( !SharedData ) return( Result );
 GlobalFix( hMem );
 hVxD = CreateFile( "\\\\.\\PORTAGE",
 (DWORD)NULL, (DWORD)NULL, NULL,
 (DWORD)NULL, (DWORD)NULL, NULL );

 if( hVxD == INVALID_HANDLE_VALUE ) return( Result );
 if( hVxD )
 { SetLastError( 0 );
 DeviceIOResult = DeviceIoControl(
 hVxD,
 API_PORTAGE_PM_Register,
 (LPVOID) SharedData,
 (DWORD) NULL,
 (LPVOID) NULL,
 (DWORD) NULL,
 (LPDWORD) NULL,
 (LPOVERLAPPED) NULL );
 Error = GetLastError();
 if( DeviceIOResult ) Result = SUCCESS;
 else Result = FAILURE;
 }
 return( Result );
} 
/****************************************************************/
BOOL portage_vxd_post( void )
{ BOOL Result = FAILURE;
 DWORD InstanceIndex;
 BOOL DeviceIOResult;
 DWORD Error;
 if( hVxD )
 { InstanceIndex = SharedData->InstanceIndex;
 SetLastError( 0 );
 DeviceIOResult = DeviceIoControl(
 hVxD,
 API_PORTAGE_PM_Post,
 (LPVOID) InstanceIndex,
 (DWORD) NULL,
 (LPVOID) NULL,
 (DWORD) NULL,
 (LPDWORD) NULL,
 (LPOVERLAPPED) NULL );
 Error = GetLastError();
 if( DeviceIOResult ) Result = SUCCESS;
 else Result = FAILURE;
 }
 return( Result );
}
/****************************************************************/
BOOL portage_vxd_shutdown( void )
{ BOOL Result = FAILURE;
 DWORD InstanceIndex;
 BOOL DeviceIOResult;
 DWORD Error;
 if( hVxD &&
 (SharedData->InstanceIndex != PORTAGE_Max_Instances) )
 { InstanceIndex = SharedData->InstanceIndex;
 SetLastError( 0 );
 DeviceIOResult = DeviceIoControl(
 hVxD,
 API_PORTAGE_PM_Shutdown,
 (LPVOID) InstanceIndex,
 (DWORD) NULL,
 (LPVOID) NULL,
 (DWORD) NULL,

 (LPDWORD) NULL,
 (LPOVERLAPPED) NULL );
 Error = GetLastError();
 if( DeviceIOResult ) Result = SUCCESS;
 else Result = FAILURE;
 }
 if( hMem )
 { GlobalUnfix( hMem );
 GlobalUnlock( hMem );
 GlobalFree( hMem ); 
 }
 return( Result );
}
/****************************************************************/

Listing Five
;---------------------------------------------------------------
; Portage VxD API Routines (Win 95 version), by Don Matthews.
;---------------------------------------------------------------
BeginProc PORTAGE_W32_Register
 Trace_Out "PORTAGE_W32_Register"
 ; Determine the next available instance index.
 mov eax, 0
W32_RegisterLoop:
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 or edi, edi
 jz short W32_RegisterContinue
 inc eax
 cmp eax, PORTAGE_Max_Instances
 jne short W32_RegisterLoop
 ;
W32_RegisterContinue:
 ; Retrieve the pointer to the SharedData structure.
 mov edi, [esi.lpvInBuffer] ; SharedDataFormat *SharedData
 ; Save the instance index in the SharedData structure.
 mov [edi.InstanceIndex], eax
 cmp eax, PORTAGE_Max_Instances
 je short W32_RegisterError
 Trace_Out "Found Instance Index #EAX"
 ; Save the SharedData pointer.
 mov PORTAGE_Instance_Table[ eax * 4 ], edi
W32_RegisterExit:
 jmp PORTAGE_W32_API_Success
W32_RegisterError:
 Trace_Out "ERROR - No available instance index!"
 jmp PORTAGE_W32_API_Failure
EndProc PORTAGE_W32_Register
;---------------------------------------------------------------
BeginProc PORTAGE_W32_Post
 Trace_Out "PORTAGE_W32_Post"
 ; Load EDI with a pointer to the SharedData structure.
 ; Subsequent accesses to the members of the structure will
 ; be done using offsets into the structure from EDI, where
 ; EDI is an offset from DS.
 mov eax, [esi.lpvInBuffer] ; DWORD InstanceIndex
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 call PORTAGE_W32_Do_Ring3_Callback
 jmp PORTAGE_W32_API_Success
EndProc PORTAGE_W32_Post

;---------------------------------------------------------------
BeginProc PORTAGE_W32_Shutdown
 Trace_Out "PORTAGE_W32_Shutdown"
 ; Load EDI with a pointer to the SharedData structure.
 ; Subsequent accesses to the members of the structure will
 ; be done using offsets into the structure from EDI, where
 ; EDI is an offset from DS.
 mov eax, [esi.lpvInBuffer] ; DWORD InstanceIndex
 mov edi, PORTAGE_Instance_Table[ eax * 4 ]
 ; Zero out the instance data pointer.
 mov edi, 0
 mov PORTAGE_Instance_Table[ eax * 4 ], edi
 jmp PORTAGE_W32_API_Success
EndProc PORTAGE_W32_Shutdown

Listing Six
;--------------------------------------------------------------------
; PORTAGE VxD Posting Routines (Windows 95 version), by Don Matthews.
;--------------------------------------------------------------------
BeginProc PORTAGE_W32_Do_Ring3_Callback
 ;
 ; Check the window handle
 cmp dword ptr [edi.hWnd], 0
 je short W32_Do_Ring3_Callback_Exit
 ;
 ; Load the PostMessage parameters for the Ring 3 callback
 mov eax, dword ptr [edi.hWnd]
 movzx ebx, word ptr [edi.msg]
 movzx ecx, word ptr [edi.wParam]
 mov edx, dword ptr [edi.lParam]
 ;
 VxDcall _SHELL_PostMessage, < eax, ebx, ecx, edx, 0, 0 >
 ;
W32_Do_Ring3_Callback_Exit:
 clc
 ret
EndProc PORTAGE_W32_Do_Ring3_Callback


























Portable Multitasking in C++


A portable multitasking kernel with semaphores




Stig Kofoed


Stig holds an MSc from the Technical University of Denmark and specializes in
real-time software and data communication. He can be contacted at fed@bcp.dk.


Many applications (embedded and otherwise) need to handle multiple independent
activities at the same time. It is convenient for this logical behavior to be
reflected directly in the structure of the program. Since C++ has no direct
support for multiprogramming, however, the services of the operating system
(or of a multitasking kernel) are necessary to separate the program into
several processes or threads that execute concurrently. Facilities exist for
synchronization and communication between processes, using shared memory,
semaphores, message queues, and the like.
The C++ standard library does, however define functions that provide a limited
form of context switching. In this article, I'll present a small kernel that
implements non-preemptive multitasking using the library functions setjmp()
and longjmp(), and provides semaphores as a means of synchronizing processes
(called "tasks" in this article). The implementation does not use any
platform-dependent features and will run unmodified on most platforms (a C
version of the kernel with the same functionality is available electronically;
see "Availability," page 3).


The Task Switch


When the operating system or loader begins executing a C++ program, it
allocates a stack to be used by the program. The stack contains function-call
parameters, return addresses, and local variables, indicating the state of all
active function calls. This, together with the current point of execution and
the values of the static and global variables, constitutes the overall state
of the program.
When implementing multiple tasks in the same program, each task is given its
own stack and current point of execution, but all share the static and global
variables. Since a single-processor machine can execute only a single task at
any time, the multitasking behavior is obtained by continuously switching from
one task to another.
This kernel uses non-preemptive multitasking (that is, task switches only
occur at known points in the code). Preemptive multitasking would require
switching tasks at random times, typically using interrupts, and this is
inherently nonportable. Besides, non-preemptive multitasking creates fewer
problems with mutual exclusion and reentrancy of library functions.
In general, a task switch consists of saving the state of the currently
running task and setting the state of the next task, using setjmp() and
longjmp(), respectively. setjmp() saves the current state in a buffer and
returns 0. longjmp() returns the processor to a previously saved state, as
though setjmp() had returned a value other than 0. Example 1 shows the
function f() saving its state in buffer a and jumping to a previously saved
state in buffer b.
The setjmp() and longjmp() functions are typically used to escape out of a
deeply nested function call in case of an error or exception. In this case,
the program always jumps back to a previous state of the stack. The problem
when implementing real multitasking is that setjmp() and longjmp() don't
provide a way to allocate a stack for a new task, only to restore the stack to
a previous state. One way around this is to copy the stack of the task in and
out of the main stack area, as described in "Lightweight Tasks in C" by
Jonathan Finger (DDJ, May 1995), and The C++ Answer Book, by Tony L. Hansen
(Addison-Wesley, 1990). However, this approach also has disadvantages:
On many platforms this excessive copying is inefficient and results in poor
performance, especially since lightweight tasks are often tightly coupled,
making task switches frequent.
Objects in a stack cannot be shared between tasks. For instance, if a task
passes a pointer to an object in its own stack to another task, that object
will not be referenced correctly because the stack was moved when the other
task was running. This effectively limits the use of shared memory to static
and global variables only.


Allocating Stacks


An alternative is to let all stacks reside in the main stack area
simultaneously and use recursive function calls to wind down the stack and
mark off the allocated areas. This requires that the main stack area be large
enough to accommodate all the stacks in the program. The size of the main
stack can usually be set with a compiler or linker option or in some other way
(Borland C++ for DOS uses a global variable; link in a module containing the
declaration unsigned _stklen=16384; to set the stack to 16 KB. In Borland C++
for Windows, use a DEF file with the statement STACKSIZE 16384). 
This method also requires that the implementation of longjmp() be
nondestructive; that is, that it not destroy the stack when performing the
jump, and that the stack not be used in any other special way.
Before the kernel is initialized, the run-time system and main() have already
used some of the original main stack area; see Figure 1(a). There is no
portable way of knowing exactly how much has been used, so the program must
specify how much stack it assumes is left, and how much to reserve for the
main task.
During initialization, the kernel saves the current state using setjmp() and
calls a function that calls itself recursively until it has used enough stack
to be reserved for the main task. It then saves its state in a local control
block together with the size of the remaining free area, marks the area as
free, and jumps to the previously saved state using longjmp(); see Figure
1(b). The new control block now serves as a potential starting point for
allocating new stacks. All control blocks are linked together with pointers in
a singly linked list.
When a new stack is to be allocated, the linked list is searched using a
first-fit algorithm for a free area large enough. If the stack does not occupy
the entire area and the remaining area is large enough to contain another
stack of a predefined minimum size, the area is split using the recursive
function mentioned previously, creating a new free area; see Figure 1(c). The
original control block is then marked "used" and may be used to start
executing the task. 
When a stack is no longer needed, the control block is marked "free" and
possibly merged with a following or preceding free block. Figures 1(d), 1(e),
and 1(f) are examples of this. 


The Implementation


The interface to the kernel is shown in Listing One (task.h) and the
implementation in Listing Two (task.c). In the kernel, a task is simply a
function that takes a void pointer as an argument and doesn't return anything;
see the typedef for TaskFnp.
The recursive function eat() allocates the stacks and manages the control
blocks (of type Task). It is called with a pointer to a control block, and a
specified amount of stack to allocate. The control-block parameter must be a
local variable of the calling function, since the control-block address is
used to measure how much stack has been used so far. The measurement is
performed by the function dist(), which returns the number of bytes between
two control blocks. This function takes into account that the stack may grow
up on some platforms and down on others. The eat() function calls itself
recursively until the distance between the original control block and its own
local control block equals or exceeds the requested size. The allocated area
will therefore usually be slightly larger than requested, depending on the
size of the control block. After this, eat() marks the area as free, inserts
the control block in the global list of control blocks, adjusts the size of
the calling control block, and makes a longjmp() (not a return) back to it. 
The eat() function has now allocated a stack and is ready to execute a task
using that stack. It first waits until a new task is started. When eat()
continues , the control block contains the requested size of the stack, a
pointer to the task function, and the task argument. eat() checks if the block
is big enough to hold another stack; if so, it calls itself with a pointer to
its local control block to split the area in two. It then waits for the task
to be scheduled, after which the task function is called. If the task function
returns, the task has finished and the stack area is released by marking the
block as free. If the following or preceding block is also free, the newly
freed block will be merged with them. Once again, eat() has a free stack area,
and it loops back to wait for a new task to be started.
The kernel is initialized by calling task_init(). A control block is
initialized with the assumed size of the entire remaining stack, and eat() is
called to reserve an area for the main task. The control block is then copied
into the global variable main_task, to avoid losing the information when
task_init() returns. main_task now serves as the head of the linked list and
will contain the state of the main task.
New tasks are started with task_start(). This function scans through the
linked list of control blocks, looking to find a free block large enough for
the requested stack size. If this succeeds, the control block is activated by
setting the task parameters and jumping to it. When the eat() function has
finished the initialization, the control block is scheduled for execution and
task_start() returns True. If no free block is large enough, the function
returns False.
As noted earlier, a task is a function that takes a void pointer as argument
(TaskFnp). You can call task_start() with the same function pointer several
times to start multiple instances of the same task, possibly with different
parameters.


Scheduling Algorithm


This kernel uses a simple round-robin scheduling algorithm. A global "ready"
queue contains a list of all the tasks currently ready to run. The scheduler
always selects the first task in the ready queue as the next task to run when
performing a task switch. If the ready queue ever becomes empty, a deadlock
has occurred and the program is exited with an error message. The check is
done in schedule().
Class Queue maintains the singly linked list of tasks. A pointer head points
to the first element in the list, and a pointer tail, to the last element.
Each task in the queue contains a pointer chain to the next element; see
Figure 2. This structure makes it easy for the member functions first() and
append() to remove elements from the beginning and add elements to the end.
The constructor for Queue initializes the head and tail pointers to indicate
an empty queue.

When a new task is started, it is inserted at the end of the ready queue. A
task may allow other tasks to run by calling task_next(). The kernel will then
save the state of the currently running task and call schedule() to switch to
the next task. The currently running task is always pointed to by cur_task.


Semaphores


Semaphores synchronize the execution of tasks and typically control the access
to system resources. A semaphore consists of a nonnegative counter and a
"wait" queue similar to the ready queue. Listing Two shows that class
Semaphore is derived from Queue and has an integer count as member. The
counter is initialized to 0 in the Semaphore constructor.
The two operations possible on a semaphore are implemented by two member
functions. wait() decrements the counter if it is greater than 0; if not, the
running task performing the wait() is appended to the semaphore's wait queue.
The task is now blocked on the semaphore, and will not be released until
another task performs a signal() operation on it. The signal() operation moves
a task from the wait queue to the ready queue if the wait queue is nonempty;
otherwise, the counter is incremented.


The Dining Philosophers


To demonstrate the kernel facilities, Listing Three (philos.c) shows a
solution to the "dining philosophers," a classic synchronization problem. Five
philosophers are seated at a round table. Between every two philosophers lies
a fork. Each philosopher repeats the following cycle: He thinks and then he
eats. To eat, he must acquire the forks on both sides of him. The problem is
to synchronize access to the forks and avoid deadlock and livelock.
There are five independent activities in this program, one for each
philosopher. The program thus starts up five instances of a philosopher task,
each instance having a different ID as parameter. (The ID is an integer on the
main stack, not a global variable. This would not be possible if the stack
were changed by moving it.)
The time spent thinking and eating is nondeterministic. This is modeled by
calling task_next() a random number of times. The access to each fork is
conveniently controlled with a semaphore: A task waits on the semaphore to
acquire the fork and signals on the semaphore to release the fork.
A simple solution, where each philosopher waits first for the left, then for
the right fork, can lead to deadlock. If each philosopher in turn takes the
left fork, the right fork will never become available because it has been
taken by the philosopher to the right. The solution used here is taken from
Principles of Concurrent Programming, by M. Ben-Ari (Prentice-Hall, 1982), and
can be shown to be correct. The additional semaphore room ensures that
deadlock will not occur. 


Conclusion


I've tested the kernel presented here (along with the C version) without
problems on a wide variety of platforms, including DOS, Windows, OS/2, and
UNIX (Irix 5.2, UnixWare), and as an NLM on a Novell file server. It should
run on any platform that lets you set the program's stack size appropriately
and has a nondestructive implementation of setjmp() and longjmp().
Because of the non-preemptive multitasking and because all stacks reside in
the normal stack area, you can use most tools like debuggers and profilers. If
the compiler implements some kind of stack check, this can be used to indicate
that the total stack area is exceeded (that is, a stack was allocated, but
there wasn't room for it). Setting the total_stack parameter in task_init()
appropriately avoids this.
The kernel has intentionally been kept simple, but may easily be extended. I
have used the same principles to implement a portable version of a more
sophisticated C++-based multitasking kernel with more advanced facilities for
synchronization and communication between tasks.
Figure 1: (a) Stack before initialization; (b) stack after initialization; (c)
starting a new task a with stack size 5000; (d) starting a new task b with
stack size 3000; (e) a has finished, leaving b between two free blocks; (f) a
is started again with a smaller stack size.
Figure 2: Tasks in the ready queue are chained together. New tasks are
inserted at the tail. 
Example 1: Saving the current state and jumping to state b.
#include <setjmp.h>
jmp_buf a, b;
void f()
 {
 // ...
 if( setjmp( a ) == 0 )
 {
 longjmp( b, 1 );
 }
 // return here with longjmp( a, 1 )
 }

Listing One
// task.h -- Portable Multitasking in C++.
// Written by Stig Kofoed, 1995.
#ifndef TASK_H
#define TASK_H
#include <setjmp.h>
#define FALSE 0
#define TRUE 1
typedef void (*TaskFnp)( void *arg ); // task function pointer
struct Task
 {
 jmp_buf jmpb; // jump state
 int used; // used or free
 unsigned size; // size of block
 Task *next; // pointer to next control block
 TaskFnp fnp; // pointer to task function
 void *arg; // argument to task function
 unsigned stack_size; // requested stack size
 Task *chain; // next task in ready or semaphore queue

 };
class Queue
 {
 private:
 Task *head, *tail; // pointer to first and last element
 public:
 Queue();
 Task *first();
 void append( Task *p );
 };
class Semaphore : private Queue
 {
 private:
 int count; // semaphore counter
 public:
 Semaphore();
 void signal();
 void wait();
 };
extern Task main_task, *cur_task;
void task_init( unsigned total_stack, unsigned main_stack );
int task_start( TaskFnp fnp, void *arg, unsigned stack_size );
void task_next();
#endif

Listing Two
// task.c -- Portable Multitasking in C++.
// Written by Stig Kofoed, 1995.
#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
#include "task.h"
#define MIN_STACKSIZE 400 // minimum stack size
Task main_task, *cur_task; // main and current task
static Queue ready; // ready queue
static jmp_buf tmp_jmpb; // temporary jump buffer
static void schedule() // run next task
 {
 cur_task = ready.first();
 if( cur_task == NULL ) // no task to run
 {
 puts( "Deadlock!" );
 exit( 1 );
 }
 longjmp( cur_task->jmpb, 1 ); // restore state of next task
 }
static unsigned dist( Task *from, Task *to )
 {
 char *c1, *c2;
 c1 = (char *) from;
 c2 = (char *) to;
 if( c1 > c2 ) // stack grows down
 return( (unsigned) (c1 - c2) );
 else // stack grows up
 return( (unsigned) (c2 - c1) );
 }
static void eat( Task *p, unsigned size )
 {
 unsigned d;

 Task t;
 d = dist( p, &t );
 if( d < size ) // eat stack
 eat( p, size );
 t.size = p->size - d; // set sizes
 p->size = d;
 t.used = FALSE;
 t.next = p->next; // set link pointers
 p->next = &t;
 if( setjmp( t.jmpb ) == 0 ) // wait
 longjmp( p->jmpb, 1 );
 for( ;; )
 {
 if( t.stack_size + MIN_STACKSIZE <= t.size ) // test size
 {
 if( setjmp( t.jmpb ) == 0 ) // split block
 eat( &t, t.stack_size );
 }
 t.used = TRUE; // mark as used
 if( setjmp( t.jmpb ) == 0 ) // wait
 longjmp( tmp_jmpb, 1 );
 (*t.fnp)( t.arg ); // run task
 t.used = FALSE; // mark as free
 if( t.next != NULL && !t.next->used )
 {
 t.size += t.next->size; // merge with following block
 t.next = t.next->next;
 }
 p = main_task.next; // loop through list
 if( p != &t ) // if not first block
 {
 while( p->next != &t ) // locate previous block
 {
 p = p->next;
 }
 if( !p->used ) // if free
 {
 p->size += t.size; // then merge
 p->next = t.next;
 }
 }
 if( setjmp( t.jmpb ) == 0 ) // save state
 schedule();
 }
 }
void task_init( unsigned total_stack, unsigned main_stack )
 {
 Task tmp;
 tmp.size = total_stack; // initialize total stack area
 tmp.next = NULL;
 if( setjmp( tmp.jmpb ) == 0 ) // reserve main stack
 eat( &tmp, main_stack );
 main_task = tmp; // copy to global variable
 main_task.used = TRUE;
 cur_task = &main_task;
 }
int task_start( TaskFnp fnp, void *arg, unsigned stack_size )
 {
 Task *p;

 for( p = main_task.next; p != NULL; p = p->next ) // find free block
 {
 if( !p->used && p->size >= stack_size )
 {
 p->fnp = fnp; // set task parameters
 p->arg = arg;
 p->stack_size = stack_size;
 if( setjmp( tmp_jmpb ) == 0 ) // activate control block
 longjmp( p->jmpb, 1 );
 ready.append( p );
 return( TRUE );
 }
 }
 return( FALSE ); // not enough stack
 }
void task_next()
 {
 if( setjmp( cur_task->jmpb ) == 0 ) // save state
 {
 ready.append( cur_task );
 schedule(); // run next task
 }
 }
Queue :: Queue()
 {
 head = NULL;
 }
Task *Queue :: first() // return first element in queue
 {
 Task *p;
 if( (p = head) != NULL )
 head = head->chain;
 return( p );
 }
void Queue :: append( Task *p ) // append element to queue
 {
 if( head == NULL )
 head = p;
 else
 tail->chain = p;
 tail = p;
 p->chain = NULL;
 }
Semaphore :: Semaphore() // initialize semaphore
 {
 count = 0;
 }
void Semaphore :: signal() // signal on semaphore
 {
 Task *p;
 p = first();
 if( p != NULL ) // a task was waiting
 ready.append( p );
 else // no task was waiting
 count++;
 }
void Semaphore :: wait() // wait on semaphore
 {
 if( count > 0 ) // don't block

 count--;
 else // block on semaphore
 {
 if( setjmp( cur_task->jmpb ) == 0 )
 {
 append( cur_task );
 schedule();
 }
 }
 }

Listing Three
// philos.c -- Portable Multitasking in C++.
// Example: The Dining Philosophers.
// Written by Stig Kofoed, 1995.
#include <stdio.h>
#include <stdlib.h>
#include "task.h"
#define N 5 // number of philosophers
Semaphore forks[N], room;
void wait()
 {
 int i, n;
 n = rand() % 10; // simulate random duration
 for( i = 0; i < n; i++ ) // by switching to other tasks
 task_next(); // a random number of times
 }
void think( int n )
 {
 printf( "philosopher %d is thinking\n", n );
 wait();
 }
void eat( int n )
 {
 printf( "philosopher %d is eating\n", n );
 wait();
 }
void philosopher( int *n ) // philosopher task
 {
 for( ;; )
 {
 think( *n );
 room.wait(); // avoids deadlock
 forks[ *n ].wait(); // aquire forks
 forks[ (*n + 1) % N ].wait();
 eat( *n );
 forks[ *n ].signal(); // release forks
 forks[ (*n + 1) % N ].signal();
 room.signal();
 }
 }
int main()
 {
 int i;
 int id[5];
 task_init( 13000, 2000 ); // initialise tasks and semaphores
 for( i = 0; i < N; i++ )
 {
 id[i] = i; // philosopher ID

 task_start( (TaskFnp) philosopher, &(id[i]), 2000 );
 forks[i].signal();
 }
 for( i = 0; i < N-1; i++ )
 room.signal();
 for( i = 0; i < 1000; i++ ) // run the simulation for a while
 task_next();
 return( 0 );
 }






















































Using MAPI for Interapplication Communication


Delphi and VB do lunch




William Stamatakis


Bill is a Windows application developer at a major investment bank in New York
City and can be reached on CompuServe at 72274,1165.


Windows programmers are finding it increasingly necessary to develop
applications that communicate with one another. These applications are not of
the exotic, embedded-systems variety, but straightforward commercial
applications like sales-order tracking. Methods available for implementing
interprocess communication in Windows applications include facilities provided
by the Windows environment, such as DDE (Dynamic Data Exchange) and NDDE
(Network Dynamic Data Exchange), as well as a few third-party add-ons. DDE
lets Windows applications establish a conversation that permits a continuous,
automatic data exchange. DDE is limited to applications running locally in
your Windows environment; NDDE lets your application establish a conversation
with another app across a LAN.
App-Link, a VBX control from Synergy Technologies (Essex Junction, VT), is a
third-party add-on that is similar to Microsoft's NDDE, but easier to program.
However, App-Link is currently unable to route across a LAN because it is
limited to using NETBIOS for remote communication.
As an alternative interprocess-communication technique, the Messaging
Application Program Interface (MAPI) is underrated and often overlooked, yet
it is one of the easiest to work with. MAPI is an API developed by Microsoft
that has garnered widespread industry support. MAPI is not for everyone,
especially if you need high-bandwidth, low-latency, or fine-granularity
interapplication communications. But in many situations, the communications
requirements are modest, and therefore MAPI is a good fit.
In this article, I'll discuss using MAPI for interapplication communication,
and present two programs, one written in Microsoft Visual Basic and the other
in Borland's Delphi, that can communicate via the MAPI interface.


About MAPI


When the topic of MAPI arises, three APIs are usually mentioned: The Common
Messaging Call Application Program Interface (CMC API), Simple MAPI, and
Extended MAPI. CMC API contains ten high-level messaging functions that let
you create a mail-enabled application. It is a cross-platform API; therefore,
it is designed to be independent of the messaging service (Microsoft Mail, IBM
Profs, and the like), the operating system, and the underlying hardware. As a
result, applications on UNIX, OS/2, Windows, DOS, and Macintosh can implement
this technology. It was developed in alliance with the X.400 API Association
standards organization and e-mail vendors and users. For more on MAPI, see
"Using the Microsoft Mail API," by Jim Conger (DDJ, August 1994).
Simple MAPI contains a set of 12 messaging functions for creating mail-enabled
Windows applications. Simple MAPI functions make available capabilities such
as sending, receiving, and addressing messages. Messages can contain file
attachments and OLE objects.
Extended MAPI contains a richer set of messaging functions for more-complex
messaging schemes. It includes features like smart forms, which can replace
the standard send and receive forms, plus functionality to link information
entered in smart-form fields with other applications.
Using MAPI in your applications has a number of advantages: 
If your organization already has an e-mail system that supports MAPI, the
additional cost may be zero. Microsoft Mail is of course MAPI compliant; other
popular e-mail systems, such as Lotus cc:Mail, are also MAPI compliant or will
be within the next year. If your organization does not have MAPI-compliant
e-mail, the startup cost is reasonable: less than $50 per user for a ten-user
license.
Unlike most other interapplication messaging alternatives, the size of the
message is unlimited. Just as you can send a 20-page memo to your colleagues
via e-mail, so can your Windows app send a long list of SQL transactions to
another program.
In most of the other messaging systems, the application enters a busy state
while a message is being sent; the user must wait until the transmission is
complete before continuing to work. MAPI offloads message management to the
Mail Server, freeing up the application almost immediately after the message
is sent. Your Windows app almost instantly receives a response from the Mail
Server indicating success or failure.
Due to the messaging infrastructure, the recipient application does not have
to be running at the time the source application is sending the message. Just
like an e-mailed memo, the message goes to the intended recipient's mailbox.
Once the intended recipient application logs in, it will immediately receive
all messages, and can act on them at that time.
Using MAPI lets you send messages to a recipient app whose mailbox resides in
a different post office in some distant location. Within the mail server, the
Message Transfer Agent (MTA) ensures delivery of a message to another post
office across a wide area network (WAN). For instance, an application in New
York could communicate with one in London. The beauty of MAPI is that the
underlying mechanisms are transparent to both the end user and the developer.
Yet another worthwhile consequence of relying on your company's e-mail
infrastructure is the remote dial-up capability that is now an option with
most e-mail systems. With remote dial-up, users working off-site can login
through a dial-up line, then download e-mail to portable computers, disconnect
from the e-mail server, and read and write messages off-line. Your
applications can follow a similar scenario. For example, in the case of a
sales-tracking application, a salesperson out in the field can book orders on
a laptop, and the database app running on the laptop can transparently call
headquarters, upload new data and receive updated inventory information from
the host.


A Language-Independent Approach 


MAPI is designed to be independent of all Windows-based programming/macro
languages and almost all currently available e-mail services (Microsoft Mail,
IBM Profs, Lotus cc:Mail, and so on). Thus, multiple Windows apps written in
dissimilar programming/macro languages and using different e-mail services can
communicate with each other through MAPI. Obviously, there are many ways to
mix and match front-end and/or mail-enabled clients to back-end e-mail
services.
To demonstrate, I've written two simple Windows applications--one in Visual
Basic, the other in Borland Delphi. In the remainder of this article, I'll
describe their implementation.
The basic principle in creating a MAPI-compliant application is that almost
all the API calls require a valid session handle as one of the parameters.
Therefore, a MAPI session must first be established by logging into an e-mail
service. If successful, a session handle will be returned. 
Creating a mail-enabled application in Visual Basic is straightforward. The
online help is complete, and there is no need to refer to the hard-copy
documentation. Obviously, you must first have a MAPI-compliant e-mail service
installed on your network and access to valid user accounts. Microsoft Visual
Basic 3.0 Professional comes with a VBX called "MSMAPI.VBX," which contains
two controls: a session control (MAPIsession) and a message control
(MAPImessage). MAPIsession establishes a mail session by logging in. Once
logged in, your program uses MAPImessage to perform all e-mail functions
(compose a message, send a message, get new messages, and so on). No methods
or events are associated with these controls; all actions are performed
through the setting of properties on them. "Sender," the application I've
written, simply sends messages to valid user accounts on a given e-mail
service; see Listing One.
The code in the Form_Load event handler establishes a MAPI session. It first
sets the MapiSession1.LoginUI property to True, which activates the default
message-login dialog when the MAPI Session control attempts to connect to the
e-mail service. Next, the default user ID and password are set to null. Then,
the code attempts a MAPI session login, via the statement
MapiSession1.Action=SESSION_SIGNON. After the session is established, the MAPI
message control is informed of the SessionID, via
MapiMessages1.SessionID=MapiSession1.SessionID.
Once the message control is enabled, the application can successfully send
messages to any valid user account. If the end user enters a valid user in the
To: field, plus the desired Subject and Message, and clicks on the Send
button, the cmdSend_Click event is invoked.
In the cmdSend_Click event, several steps are taken prior to sending a
message. First, a Visual Basic error handler is activated to trap all errors
caused during the steps to send a message. Any time an error is encountered,
it automatically goes to the ErrHandler label, where a dialog box is
displayed, showing the description of the error in the body of the dialog and
the error number in the dialog title bar. The message-sending procedure is
terminated at this point. This error handling is minimal; you'll likely want
to implement a more graceful sequence.
To create a message, set the MapiMessages1.Action variable to MESSAGE_COMPOSE,
the type of recipient to RECIPTYPE_TO, and the recipient name to the name
typed in by the end user in the To: field. Follow the same sequence for the
Subject: field. The body of the message is set by the statement
MapiMessages1.MsgNoteText=txtMessage.Text. Finally, the message is sent to the
intended recipient: MapiMessages1.Action=MESSAGE_SEND.
When the user exits the Sender application, the Query_Unload event is invoked.
Here, the code in the event handler must disconnect the MAPI session, as shown
in the listing. 


The Delphi Counterpart


The application I've written in Delphi, "Receiver," shows messages sent to the
mailbox that Receiver logs on to, and, if the user requests it, the body of a
given message. Creating a mail-enabled application with Delphi is not quite as
easy as with Visual Basic, but simple nevertheless. Although some VBXs on the
market work with Delphi, this particular one doesn't. Therefore, I developed a
Delphi "unit" that contains wrapper functions for all the API calls found in
the MAPI SDK. Although my initial effort was a bit tedious, any Delphi
application that I develop from this point on will be built as easily as with
a VBX. It is quite possible that the Delphi wrapper will outperform the MAPI
VBX, because the wrapper lacks the typical VBX overhead. 
The code discussed here is shown in Listing Two. To establish a session with
an e-mail service, the end user clicks on the Logon button, which in turn
triggers the btnLogClick event handler. As with the VB application, the code
must establish a MAPI session--in this case, by calling the MAPILogon()
function shown in Example 1(a). Two of the parameters are worthy of note. The
fourth parameter in the list specifies options for the MAPI session. In this
case, the default e-mail service logon dialog will appear as an attempt to
connect to an e-mail service is made (MAPI_LOGON_UI). Once a session is
activated, all new messages pending for the account will be downloaded into
its mailbox (MAPI_FORCE_DOWNLOAD). 
The sixth parameter, hSess, is assigned a valid session handle, if the login
to the e-mail service was successful. All subsequent MAPI message calls will
require this session handle.
Next, if MAPILogon() returns SUCCESS_SUCCESS, the application fills in the
Messages list box with subjects from all the messages in the current account.
This happens in the procedure PopulateMessageListBox, which makes two MAPI
calls. The first, MAPIFindNext, searches for the next message to put in the
list box, starting at message ID CurrMsgId and stores it in MsgId; see Example
1(b). 
The message is then read into a MAPI message structure that contains all
pertinent information relating to the message, such as the Subject, Message
Body, Sender, and so on; see Example 1(c). The MsgId is the key parameter
here. The MAPIReadMail function expects a pointer to a pointer to a MAPI
message structure, which in this case is represented as @pMMsg.

The code can then access the different components of the message. In this
application, the Subject section of this message is added to the Messages list
box via the statement lstMessages.Items.Add(StrPas(pMMsg^.lpszSubject)).
To see the details of a message, the end user invokes the Show Details dialog,
which triggers the btnShowDetailsClick event handler. A specified message must
first be located using the MAPIFindNext() function. Once found, the message is
read into a MAPI message structure using the MAPIReadMail() function. Now, all
components of the message are available to the application; in this case the
Receiver, Sender, Subject, and Message Body are displayed in a Show Details
dialog. The code that populates the dialog is shown in Example 2.
Finally, to terminate a MAPI Session, the end user must click on the Logoff
button, which triggers the btnLogClick event handler. This event handler calls
the MAPILogoff function to terminate the session.
Example 1: (a) Establishing a MAPI session in Delphi; (b) locating the
message; (c) reading the message into a structure.
(a)
MAPILogon(0, '', '', MAPI_LOGON_UI or MAPI_FORCE_DOWNLOAD, 0, hSess)

(b)
MAPIFindNext(hSess, 0, nil, @CurrMsgId, MAPI_GUARANTEE_FIFO, 0, @MsgId)

(c)
MAPIReadMail(hSess, 0, @MsgId, MAPI_PEEK or MAPI_ENVELOPE_ONLY, 0, @pMMsg)
Example 2: Populating the Show Details dialog.
with dlgDetails do
 begin
 edtTo.Text := StrPas(pMMsg^.lpRecips^.lpszName);
 edtFrom.Text := StrPas(pMMsg^.lpOriginator^.lpszName);
 edtSubject.Text := StrPas(pMMsg^.lpszSubject);
 edtText.Text := StrPas(pMMsg^.lpszNoteText);
 ShowModal;
 end

Listing One
' -----------------------------------------------------------------------
' Sender App -- MAPI sender app written in VB by William Stamatakis, 1995.
' -----------------------------------------------------------------------
Option Explicit
' ------------------------------------------------------------------------
Sub cmdExit_Click ()
 Unload Me
End Sub
' ------------------------------------------------------------------------
Sub cmdSend_Click ()
 On Error GoTo ErrHandler:
 ' Compose a new message
 MapiMessages1.Action = MESSAGE_COMPOSE
 ' The recipient is the primary recipient (To:)
 MapiMessages1.RecipType = RECIPTYPE_TO
 ' Set Recipient Name
 MapiMessages1.RecipDisplayName = txtTo.Text
 ' Store the Subject section of the message
 MapiMessages1.MsgSubject = txtSubject.Text
 ' Store the main body of the message
 MapiMessages1.MsgNoteText = txtMessage.Text
 ' Send the message recipients specified
 MapiMessages1.Action = MESSAGE_SEND
 Exit Sub
 ErrHandler:
 MsgBox Error$, , "Error#: " & Err
 Exit Sub
End Sub
' ------------------------------------------------------------------------
Sub Form_Load ()
 ' Use the default Mail Login Dialog
 MapiSession1.LogonUI = True
 MapiSession1.UserName = ""
 MapiSession1.Password = ""
 ' Login
 MapiSession1.Action = SESSION_SIGNON

 ' Set the Session ID in the MapiMessage control equal to
 ' the new valid Session ID established by the MapiSession
 ' login, resulting in activating all message function
 ' capability on the current session id.
 MapiMessages1.SessionID = MapiSession1.SessionID
End Sub
' ------------------------------------------------------------------------
Sub Form_QueryUnload (Cancel As Integer, UnloadMode As Integer)
 ' Logoff before exiting Sender
 MapiSession1.Action = SESSION_SIGNOFF
End Sub

Listing Two 
{----------------------------------------------------------------------}
{ Receiver App, written in Borland Delphi by William Stamatakis, 1995. }
{----------------------------------------------------------------------}
unit Receive;
interface
uses
 SysUtils, WinTypes, WinProcs, Messages, Classes, Graphics, Controls,
 Forms, Dialogs, MAPI, ExtCtrls, StdCtrls, Buttons, Tobclist, Details;
type
 TfrmReceiver = class(TForm)
 lstMessages: TOBCListBox;
 Label1: TLabel;
 btnLogon: TBitBtn;
 Bevel1: TBevel;
 btnLogoff: TBitBtn;
 btnShowDetails: TBitBtn;
 btnExit: TBitBtn;
 procedure btnLogClick(Sender: TObject);
 procedure FormCreate(Sender: TObject);
 procedure btnShowDetailsClick(Sender: TObject);
 procedure btnExitClick(Sender: TObject);
 private
 procedure PopulateMessageListBox;
 public
 { Public declarations }
 end;
var
 frmReceiver: TfrmReceiver;
 hSess: LHANDLE; { MAPI session handle }
 slstMsgIds: TStringList; { var of type String list }
 MMsg: TMapiMessage; { var of type MAPI message structure }
 pMMsg: ^TMapiMessage; { pointer of type MAPI message structure }
implementation
{$R *.DFM}
{------------------------------------------------------------------------}
procedure InitMsgStruct; { Initialize MAPI message structure }
begin
 with MMsg do
 begin
 ulReserved := 0;
 lpszSubject := '';
 lpszNoteText := '';
 lpszMessageType := nil;
 lpszDateReceived := nil;
 lpszConversationID := nil;
 flFlags := 0;

 lpOriginator := nil;
 nRecipCount := 0;
 lpRecips := nil;
 nFileCount := 0;
 lpFiles := nil;
 end;
 new(pMMsg);
 pMMsg := @MMsg;
end;
{------------------------------------------------------------------------}
procedure TfrmReceiver.PopulateMessageListBox;
 { Fill Messages listbox with the Subject section of all messages
 from the currently logged in account's mailbox. 
 }
var
 MsgId, CurrMsgId: string;
 i: Shortint;
begin { Intialize variables }
 MsgId := '';
 CurrMsgId := '';
 slstMsgIds := TStringList.Create;
 { Loop through all messages in account's mailbox }
 while MAPIFindNext(hSess, 0, nil, @CurrMsgId, MAPI_GUARANTEE_FIFO,
 0, @MsgId) = SUCCESS_SUCCESS do
 begin
 { Read messages into the MAPI message structure pointer, pMMsg. }
 if MAPIReadMail(hSess, 0, @MsgId, MAPI_PEEK or
 MAPI_ENVELOPE_ONLY, 0, @pMMsg) = SUCCESS_SUCCESS then
 begin
 { Add message subject to Messages listbox }
 lstMessages.Items.Add(StrPas(pMMsg^.lpszSubject)); 
 { Add corresponding message id to string list object }
 slstMsgIds.Add(MsgId);
 end;
 CurrMsgId := MsgId;
 end;
end;
{------------------------------------------------------------------------}
procedure TfrmReceiver.btnLogClick(Sender: TObject);
{ Event handler proc for Logon and Logoff buttons }
begin
 { Logon button pressed}
 if (Sender = btnLogon) then
 begin
 { Logon to E-Mail Service }
 if MAPILogon(0, '', '', MAPI_LOGON_UI or MAPI_FORCE_DOWNLOAD, 0, 
 hSess) = SUCCESS_SUCCESS then
 begin
 PopulateMessageListBox; 
 { Fill Messages listbox with the Subject section of 
 all messages from the account's mailbox that was 
 just logged into.
 }
 btnLogon.Enabled := False; { Disable Logon Button }
 btnShowDetails.Enabled := True; { Enable Show Details Button }
 btnLogoff.Enabled := True; { Enable Logoff Button }
 end;
 end
 else { Logoff button pressed}

 { Logoff from E-Mail Service }
 if (MAPILogoff(hSess, 0, 0, 0) = SUCCESS_SUCCESS) then
 begin
 hSess := 0; { Reset MAPI session handle }
 lstMessages.Clear; { Empty out Messages listbox }
 slstMsgIds.Clear; { Empty string list object }
 btnShowDetails.Enabled := False;{ Disable Show Details button}
 btnLogoff.Enabled := False; { Disable Logoff button }
 btnLogon.Enabled := True; { Enable Logon button }
 end;
end;
{---------------------------------------------------------------------}
procedure TfrmReceiver.FormCreate(Sender: TObject);
begin
 InitMsgStruct; { Initialize MAPI message structure }
end;
{---------------------------------------------------------------------}
procedure TfrmReceiver.btnShowDetailsClick(Sender: TObject);
 { When "Show Details" button is clicked, show Details Dialog, 
 which contains the details of the currently selected message 
 in the lstMessages listbox.
 }
var
 MsgId: string; { Message Id of currently selected message }
 i: LongInt; { subcript var } 
begin { Set MsgId to the message id that corresponds }
 { to the message being sought }
 i := lstMessages.ItemIndex - 1; 
 { Set i = to the list item index
 of the currently selected item in
 the lstMessages listbox }
 if i < 0 then
 MsgId := ''{ Based on i set MsgId to message id located in }
 else { slstMsgIds which corresponds directly with list item }
 MsgId := slstMsgIds.Strings[i]; { in the lstMessages listbox }
 { Locate the message based on the message id selected. If found
 then read the message. If read successfully, then fill in the Detail
 Dialog prior to showing it. 
 }
 if MAPIFindNext(hSess, 0, nil, @MsgId, MAPI_GUARANTEE_FIFO, 0,
 @MsgId) = SUCCESS_SUCCESS then
 if MAPIReadMail(hSess, 0, @MsgId, MAPI_PEEK or MAPI_SUPPRESS_ATTACH, 
 0, @pMMsg) = SUCCESS_SUCCESS then
 begin
 with dlgDetails do
 begin
 edtTo.Text := StrPas(pMMsg^.lpRecips^.lpszName); { To: }
 edtFrom.Text := StrPas(pMMsg^.lpOriginator^.lpszName); { From: }
 edtSubject.Text := StrPas(pMMsg^.lpszSubject); { Subject:}
 edtText.Text := StrPas(pMMsg^.lpszNoteText); { Text: }
 ShowModal; { Show dialog }
 end;
 end;
end;
{------------------------------------------------------------------------}
procedure TfrmReceiver.btnExitClick(Sender: TObject);
begin
 if btnLogoff.Enabled then
 btnLogClick(btnLogoff);

 self.Close;
 end;
end.ODBC DRIVERS
}



























































Writing ODBC Drivers


A plug-compatible approach to client/server development




Dennis R. McCarthy


Dennis is an independent consultant who specializes in workflow- and
document-management technologies. He is currently working with the XSoft
division of Xerox on their InConcert workflow product. He can be contacted at
mccarthy@acm.org.


The Open Database Connectivity (ODBC) specification is intended to provide a
standard interface between applications and data sources. When an application
developer accesses data via ODBC, the application is easily portable to
multiple data sources. When a data-source vendor provides an ODBC driver, the
data source is instantly integrated with a wide variety of ODBC-enabled
applications and tools.
InConcert, a workflow manager from Xerox's XSoft division, is a software
system that facilitates the execution of business processes by office workers
and computers. As such, it stores business-process definitions and the state
of processes currently being executed. Additionally, InConcert provides an API
through which applications can retrieve and update this information.
However, this API does not support the emerging visual paradigm of developing
client/server applications. Using tools like PowerBuilder and Access, you
build applications visually by drawing controls onto windows, setting control
properties, and writing code in a scripting language. Calling a C or C++
function in a DLL is "out of paradigm." To better integrate InConcert with
visual tools (and thus reduce development costs), we're investigating a
"componentware" strategy. The idea is to enable users of client/server
development tools to access InConcert services within the tool's programming
paradigm by delivering these services through components that are "plug
compatible" with those tools. An ODBC driver is one such component.
Developing an ODBC driver can be an expensive proposition. Before committing
resources, management wanted to know the costs and benefits of having an ODBC
driver for InConcert. We decided to develop a proof-of-concept prototype on a
limited budget, taking only two weeks for the driver and demos. The goal was
to demonstrate the benefits of an ODBC driver, and to gain experience on which
we could base a credible estimate of the cost of developing a product
component. After the prototype was complete, a decision could be made
regarding its inclusion in the product plans.
We considered three approaches to developing our prototype ODBC driver:
Microsoft's ODBC SDK.
PageAhead's Simba Engine.
Syware's Dr. DeeBee ODBC Driver Kit.
Developing a driver starting from just the ODBC SDK requires in-depth
knowledge of ODBC that we didn't possess, and our budget didn't afford the
time or money to hire an expert. We could have developed the prototype with
the PageAhead Simba engine within a couple of weeks, but the price and time
required for royalty negotiations ruled it out. The Dr. DeeBee ODBC Driver Kit
was the best match for the scope of our project.


Dr. DeeBee ODBC Driver Kit


The Dr. DeeBee ODBC Driver Kit consists of a single diskette, printed
technical documentation, and brochures. The diskette contains ODBC driver
source code, a 16-bit driver and installer, and a 32-bit driver and installer.
The drivers are compiled versions of the source code. The kit comes with one
hour of telephone support (additional support is available for a fee). The
driver that you create using the kit is redistributable royalty free (subject
to limitations specified in the license agreement).
The technical documentation is a brief 17 pages. It contains an architecture
diagram, a listing of function names by source file, the supported SQL
grammar, and an implementation strategy. You are expected to refer to the ODBC
SDK documentation for explanations of ODBC concepts and functions. The
appendix on implementation strategy is the most useful part of the
documentation. It provides a 12-step outline for implementing your driver.
To build the driver from the source code, you need a C development environment
and the Microsoft ODBC SDK. XSoft uses Microsoft Visual C++ (Version 1.5), and
has a Level 2 subscription to the Microsoft Developer Network. The set of
CD-ROMs that come with this subscription includes the ODBC SDK and an online
copy of the Microsoft ODBC 2.0 Programmer's Reference and SDK Guide.
To install the kit, you simply copy all of the files from the diskette to a
directory on your hard drive. (A setup program has since been added.) I was
able to build and install the driver from the source code using nmake without
any problems.


Designing Your Driver


The source code that comes with the Dr. DeeBee ODBC Driver Kit implements an
ODBC driver for dBase files. All of the low-level file- and record-access
functions are in one source file: ISAM.C. To implement your ODBC driver, you
essentially replace the implementations of the 27 functions in that file. In
my case, that meant implementing these functions with calls to the InConcert
API.
My first step was to express the objects and operations defined by the
InConcert API in terms of SQL tables and statements. The API is object
oriented and defines its workflow services in terms of 14 object classes. Each
class defines operations that can be invoked through API functions. Some of
these operations get or set properties and relationships, while others change
the state of objects. An ODBC driver for InConcert must express objects as
rows in tables, and operations on those objects as SQL statements.
I chose to limit the scope of my prototype driver to two InConcert object
classes: users and pools. An InConcert pool is a group of users who share a
work queue. There is a many-to-many membership relationship between users and
pools that is expressed as three tables in the relational model: users,
members, and pools. My driver translates SQL statements on these tables into
calls to the InConcert API. For example, when an application inserts a row
into the Users table, the driver translates this into the API call that
creates an InConcert user. Table 1 summarizes the mapping from SQL tables and
statements to InConcert objects and operations.
As mentioned previously, the documentation for the Dr. DeeBee ODBC Driver Kit
includes a 12-step plan for implementing your driver. Each step specifies the
ISAM functions to implement and the ODBC functions to call to test your
implementation. The Microsoft ODBC SDK includes an ODBC Test program that
enables you to call ODBC functions without writing code. Before starting, I
annotated each step with the InConcert API calls that would be needed to
implement the ISAM functions. This helped to identify the fields that I needed
to add to the ISAM data structures for InConcert objects.


Implementing Your Driver


I followed the implementation strategy outlined in the driver-kit
documentation (skipping steps that did not apply to my driver). Table 2 lists
the steps with the functions implemented in each step and the InConcert API
functions called.
The first step was to implement the ISAMOpen and ISAMClose functions and test
them using the ODBC Test application (Full Connect and Full Disconnect
commands in the Connect menu). The functions call the InConcert API to open
and close an InConcert server session. I ran into my first problem when I
tried to compile and link my driver after adding calls to the InConcert API.
The C binding for the InConcert API wants compiler and linker settings that
are different than those used by the makefile that comes with the driver kit.
It took some experimentation and a few calls to the Dr. DeeBee support line to
find a combination of settings that satisfied both the ODBC driver and the
InConcert API. This took more time than writing the code.
The bulk of the logic in the driver consists of mapping SQL data-manipulation
statements (which manipulate rows and columns) to InConcert API functions
(which operate on objects). The ISAM component of the driver is expected to
implement the notion of the "current record." The ISAMGetData, ISAMPutData,
ISAMUpdateRecord, and ISAMDeleteRecord functions all act on the current
record. The ISAMNextRecord function advances the current record through a
table.
The InConcert API defines three data types for each object class: object, set,
and iterator. The API functions that retrieve users and pools return them as a
set. You use an iterator to successively get the elements from the set. The
iterator provides the notion of a "current element" in the set. When you call
the iterator function to get the current element, it is returned as a user or
pool object.
The first call to ISAMNextRecord calls the InConcert API to retrieve a set of
users or pools (depending on which table is opened), creates an iterator for
that set, and uses the iterator to get the first user or pool in the set.
Subsequent calls to ISAMNextRecord advance the iterator to the next element of
the set and get that user or pool. When there are no more elements in the set
for the iterator to return, the ISAMNextRecord function returns EOF.
Things are a bit more complicated for the MEMBERS table. InConcert has API
functions for retrieving all pools and retrieving all members of one pool. The
ISAMNextRecord function has a nested loop for the MEMBERS table. The outer
loop uses an iterator for the set of all users, and the inner loop uses an
iterator for the set of pools to which the current user from the user-set
iterator belongs.
The ISAMGetData function retrieves one column value from the current record;
see Listing One. This function is implemented by nested switch statements,
with a case for each column in each table. Each case contains a call to an
InConcert API function that retrieves a property of the current object (user
or pool). ISAMGetData is called in processing a SELECT statement and in
processing DELETE or UPDATE statements that have WHERE clauses.
The ISAMDeleteRecord function is called to delete the current record when
processing a DELETE statement. It contains a switch statement with a case for
each table. For the USERS and POOLS table, it deletes the current user or pool
object. For the MEMBERS table, it removes the current user object from the
current pool object.
The ISAMPutData function updates one column value from the current record. It
is also implemented by nested switch statements, with a case for each column
in each table. Id columns cannot be updated, and the UPDATE statement is not
allowed on the MEMBERS table. The cases for non-id columns of the USERS and
POOLS tables each contain a call to an InConcert API function that updates a
property of the current object (user or pool). ISAMPutData is called in
processing both UPDATE and INSERT statements.
When processing an INSERT statement, the driver first calls ISAMInsertRecord,
then calls ISAMPutData for each column value specified in the INSERT
statement, and finally calls ISAMUpdateRecord. When rows are being inserted
into the USERS or POOLS table, the ISAMInsertRecord calls the InConcert API to
create a new user or pool (using default property values defined by the
driver) and makes the new user or pool the current object. Subsequent calls to
ISAMPutData change the property values for the new user or pool, and the call
to ISAMUpdateRecord does nothing. INSERT statements on the MEMBERS table are
handled differently, since the API call to add a user to a pool takes the user
and pool objects as inputs. The call to ISAMInsertRecord clears the current
objects. The calls to ISAMPutData retrieve the user and pool identified by the
column values (names or ids). The call to ISAMUpdateRecord calls the InConcert
API to add the user to the pool.
Coding the InConcert API function calls revealed a deficiency in the Dr.
DeeBee ODBC Driver Kit. The InConcert API functions return a status object.
Every call to these functions is followed by a test of the status returned. In
case of an error, a numeric code and text message can be obtained from the
status object. However, the ISAM interface provided no means to return this
information to higher levels of the driver. Syware agreed this is a problem,
and it sent me instructions for changing the ISAM interface and the driver's
error-handling logic. In the interest of time, I decided to defer making these
changes until the driver was working.

At this point, the driver did what I wanted, at least when called from the
ODBC Test application. Comparing my ISAM.C file to the original that came with
the kit showed that I had written 750 lines of code, most of which was
produced by copying and pasting similar code. At each step, I first wrote and
tested only the code necessary to implement the functionality for one table
(or one column in a table). Once that code was working properly, I copied and
modified it to handle the other tables and their columns. For example, in the
ISAMGetData function, I first implemented and tested the cases for the Id and
Name columns of the Users table. The remaining cases were implemented by
copying one of these two cases and changing the InConcert API function called.


ODBC Driver Costs and Benefits


To demonstrate the benefits of having an ODBC driver for InConcert, I used
Crystal Reports and Microsoft Access (primarily because they were already
installed on my PC). The Crystal Reports demo showed how to create a report
(pools and their members) with a graphical user interface (GUI), without
programming. The Microsoft Access demo showed how to create a GUI for managing
users and pools, again without programming. This demo consists of attaching
the driver's USERS and MEMBERS tables to an Access database, running the
Access Form Wizard to create a master/subform form for the USERS and MEMBERS
tables, setting the join fields in the subform's property sheet, running the
form, creating a user (by inserting a row into the USERS table), and adding
the user to pools (by inserting rows into the MEMBERS tables).
The Access demo revealed a deficiency in my driver. Access does not allow you
to update an attached table unless it has a key. My driver was not reporting
any keys to Access, so Access did not allow me to insert records into the
Users or Members tables. ODBC drivers list indexes through the SQLStatistics
function. Dr. DeeBee support explained how to modify the source code in
RESULTS.C to list the primary key indexes for my tables, and then Access
allowed me to update the USERS and MEMBERS tables. (The driver kit has since
been upgraded so that SQLStatistics reports indexes based on the ISAM data
structures that define your tables.)
The demos communicated to my audience the benefits of having an ODBC driver
for InConcert. Seeing was believing when it came to developing applications
and reports without programming. Management's next question, however, was,
What's the level of effort needed to make this a product? The prototyping
effort gave me a solid basis for estimating: one week of labor spent on the
prototype driver multiplied by the inverse of the proportion of the InConcert
API the prototype covered, multiplied by the usual fudge factor for going from
prototype to production software. 


Conclusions


My goal was to produce a prototype of an ODBC driver for InConcert in a
limited amount of time. I was not an ODBC expert, and the project budget
didn't include the time to become one or the money to hire one. I could have
bought off-the-shelf tools if the price weren't too high, but contract
negotiations over royalties were beyond the scope of the project. The Dr.
DeeBee ODBC Driver Kit was a perfect match for my situation at the business
level.
At the technical level, I was able to follow Dr. DeeBee's recipe for
developing my driver. The bulk of my work was in implementing the ISAM
functions over the InConcert API. The kit is not perfect: In two cases I had
to make changes in other components of the source code (error reporting,
listing indexes), but I was able to do this within my time budget. It would be
better if future releases of the driver kit handled these functions through
the ISAM interface. Will we use the Dr. DeeBee ODBC Driver Kit to implement
the product version of the InConcert ODBC driver? This is still an open
question. PageAhead's Simba Engine is an attractive alternative for developing
production quality ODBC drivers. But even if we choose that route, using the
Dr. DeeBee driver kit was still worthwhile. It definitely makes driver
development feasible for those who have not been initiated into the mysteries
of ODBC.


For More Information


Dr. DeeBee ODBC Driver Kit
Syware Inc.
P.O. Box 91 Kendall
Cambridge, MA 02142
617-497-1376
Table 1: Expressing the InConcert Model as SQL tables and statements.
 Users Table Members Table Pools Table
Rows One per user One for each member of each pool One per pool
Columns Id, Name, Address UserId, PoolId Id, Name
SELECT Retrieve users Retrieve pools and their members Retrieve pools
INSERT Create a user Add a user to a pool Create a pool
UPDATE Change user's -- Change a pool's
 name/address name
DELETE Delete a user Remove a user from a pool Delete a pool
Table 2: Implementation steps.
Step ISAM Function InConcert Function Called
 Implemented
Connect ISAMOpen CIcClient_beginApplication
 CIcClient_directInitialize
 ISAMClose CIcClient_terminate
 CIcClient_endApplication
List Tables ISAMGetTableList
 ISAMGetNextTableName
 ISAMFreeTableList
List Columns ISAMOpenTable CIcUser_init, CIcUserSet_init
 ISAMCloseTable CIcPool_init, CIcPoolSet_init
SELECT ISAMRewind CIcUser_list
 ISAMNextRecord CIcPool_list
 ISAMGetData CIcUser_getPools
 CIcUser_getName
 CIcUser_getEMailAddress
 CIcPool_getName
UPDATE ISAMPutData CIcUser_setName
 ISAMUpdateRecord CIcUser_setEMailAddress
 CIcPool_setName
 CIcPool_addUser
INSERT ISAMInsertRecord CIcUser_create
 CIcPool_create

DELETE ISAMDeleteRecord CIcUser_deleteUser
 CIcPool_deletePool
 CIcPool_removeUser

Listing One
SWORD INTFUNC ISAMGetData(LPISAMTABLEDEF lpISAMTableDef,
 UWORD icol,
 SDWORD cbOffset, 
 SWORD fCType, 
 PTR rgbValue, 
 SDWORD cbValueMax, 
 SDWORD FAR *pcbValue)
{
 CIcString icString = CIcString_initNullString();
 CIcStatus icStatus = CIcStatus_init();
 CIcStatusSeverity severity;
 CIcPool icPool = CIcPool_init();
 CIcUser icUser = CIcUser_init();
 
 switch (lpISAMTableDef->tableId) {
 case USER_TABLE:
 severity = CIcUserSetIterator_getElement(lpISAMTableDef->icIterator,
 icUser,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 switch (icol+1) {
 case USER_ID_COLUMN:
 CIcObject_export (icUser, rgbValue);
 *pcbValue = 32;
 break;
 case USER_NAME_COLUMN:
 severity = CIcUser_getName (icUser, icString, icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcString_copy (icString, rgbValue);
 *pcbValue = CIcString_length(icString);
 break;
 case USER_ADDRESS_COLUMN:
 severity = CIcUser_getEMailAddress (icUser, icString, 
 icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcString_copy (icString, rgbValue);
 *pcbValue = CIcString_length(icString);
 break;
 default:
 return ISAM_ERROR;
 }
 break;
 case MEMBER_TABLE:
 switch (icol+1) {
 case MEMBER_USER_ID_COLUMN:
 severity = CIcUserSetIterator_getElement(lpISAMTableDef->
 icIterator,icUser,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcObject_export (icUser, rgbValue);
 *pcbValue = 32;
 break;

 case MEMBER_USER_NAME_COLUMN:
 severity = CIcUserSetIterator_getElement(lpISAMTableDef->
 icIterator,icUser,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 severity = CIcUser_getName (icUser, icString, icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcString_copy (icString, rgbValue);
 *pcbValue = CIcString_length(icString);
 break;
 case MEMBER_POOL_ID_COLUMN:
 severity = CIcPoolSetIterator_getElement(lpISAMTableDef->
 icIterator2,icPool,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcObject_export (icPool, rgbValue);
 *pcbValue = 32;
 break;
 case MEMBER_POOL_NAME_COLUMN:
 severity = CIcPoolSetIterator_getElement(lpISAMTableDef->
 icIterator2,icPool,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 severity = CIcPool_getName (icPool, icString, icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcString_copy (icString, rgbValue);
 *pcbValue = CIcString_length(icString);
 break;
 default:
 return ISAM_ERROR;
 }
 break;
 case POOL_TABLE:
 severity = CIcPoolSetIterator_getElement(lpISAMTableDef->
 icIterator,icPool,icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 switch (icol+1) {
 case POOL_ID_COLUMN:
 CIcObject_export (icPool, rgbValue);
 *pcbValue = 32;
 break;
 case POOL_NAME_COLUMN:
 severity = CIcPool_getName (icPool, icString, icStatus);
 if (severity != CIC_SUCCESS)
 return ISAM_ERROR;
 CIcString_copy (icString, rgbValue);
 *pcbValue = CIcString_length(icString);
 break;
 default:
 return ISAM_ERROR;
 }
 break;
 default:
 return ISAM_ERROR;
 }
 CIcString_free(icString);

 CIcStatus_free(icStatus);
 CIcPool_free(icPool); 
 CIcUser_free(icUser); 
 return NO_ISAM_ERR;
}


























































Data Models, CASE Tools, and Client/Server Development


Creating a two-way link between a CASE tool and DBMS




Tim Wittenburg


Tim, who is a team leader at AmeriData Consulting, has designed and developed
client/server applications for six years and is the author of Photo-Based 3D
Graphics in C++ (John Wiley & Sons, 1995). Tim can be reached on CompuServe at
70403,3570.


Building client/server applications often involves getting business experts
together with modelers, GUI designers, and developers. The activities of these
teams are usually divided into phases. In the first phase, a modeling team
gathers requirements from key persons within the organization. Process models
and/or a data model are then constructed using a computer assisted software
engineering (CASE) tool. Once the data model is completed, the modeling team
passes off the project to the development team. This begins the second phase,
in which the requirements and data model are merged into a functioning
information-management system. 
The transition between phases is critical because intangibles (user
expectations, for instance) can be lost in the shuffle. In this article, I'll
present DBA Assistant, a tool designed to smooth the transition between the
project phases in two ways: 
Transforming the data model developed in the first phase directly into a
physical database. 
Reverse-engineering the schema of an existing ODBC-compliant database and
exporting it to a CASE tool that prepares a physical entity-relationship
model. Exporting schemata is useful if the database moves from one database
management system (DBMS) to another.
Depending on the modeling tools, a direct link between the data model and the
DBMS may or may not exist. If not, database administrators (DBAs) must
manually enter the entire database definition into the DBMS. Despite the
important differences between a logical model and its physical-data-model
counterpart, a one-to-one correspondence exists between a percentage of the
entities and attributes in the logical and physical model. Because the
percentage of correspondence is often high, significant time could be saved if
a data definition language (DDL) description were generated directly from the
data model. This is precisely what my DBA Assistant program does.
When there is no clean break between the modeling and development efforts, it
is desirable to keep the data model synchronized with the physical database
definition as the model and database are refined. In many situations, no
convenient synchronization mechanisms exist. DBA Assistant can reduce the
synchronization effort.
DBA Assistant also lets the modeling team move the data model into an
approximation of a functioning database suitable for efficient prototyping of
client/server application screens. The drag-and-drop screen-building features
of most GUI development tools (PowerBuilder, Access, Visual Basic, and the
like) can only be used if an underlying database is available. Having a
representative database allows prototype screens to be prepared for the
purpose of exploring and resolving GUI design issues before full-scale
development begins. 
DBA Assistant, which is written in Microsoft Access Basic, provides a two-way
link between a CASE tool (System Architect from Popkin Software, in this case)
and a target DBMS. The program provides two essential capabilities: 
Enabling a logical data model to be translated into a physical database, as in
Figure 1. 
Enabling documentation of an existing database by exporting a database
structure such that it can be imported into a CASE tool and translated into an
entity-relationship diagram; see Figure 2.
A database schema can be imported into a CASE tool for generating an ER
diagram. This is useful when a database has been maintained over a long period
of time, perhaps by several individuals. Microsoft Access's data access object
(DAO) encapsulates many aspects of a relational database (including its
schema) in a set of objects which have been exposed in the Access Basic
development environment. DBA Assistant traverses the DAO to translate schema
information into a series of text files which can be imported into the System
Architect, from which a physical ER diagram or model can be prepared. In
addition, a text file is created to contain the database schema as represented
by a series of ANSI SQL Create Table statements. 
Most of Access's internal objects have been organized into the DAO hierarchy
and can be manipulated using the Access Basic language. My approach utilizes
properties of the DAO to describe the physical structure of a selected
database. My goal is to collect a description of all the tables in an Access
database (including descriptions of each column associated with a database
table) along with the data types and column lengths. To accomplish this, I'll
focus on two DAO collections--TableDefs and Fields. 


DAO Collections and Schema Exports


The TableDefs collection contains a description of all tables visible to an
Access database, including attached tables (for which read permission has been
granted). The number of tables in TableDefs is a property of the collection
numTables=localDB.tableDefs.count, where localDB is a pointer to the local
Access database. Table names can be retrieved by
tableName=localDB.tableDefs(i).name, and the entire list of tables can be
obtained by varying i from 0 to numTables-1. Once a table is identified, the
Fields collection lets you identify its attributes or fields.
After the table name is obtained from TableDefs, you must determine which
fields are associated with the table. The number of fields in each table in
Table-Defs can be obtained with numFields=localDB.tableDefs(i).fields.count.
You get the properties of each field in the table using property=localDB.table
Defs(i).fields (j). prop. By varying j from 0 to numFields-1, you can get the
properties of each field in a table in TableDefs. For example, to get the name
of the first column in the first table, enter
fieldName=localDB.tableDefs(0).fields(0).name. You can also get the datatype
and length of the field in bytes this way. 
Listing One presents the Access Basic function read-AccessDB, which traverses
the DAO, extracting the database schema from the DAO TableDefs and Fields
collections. The resulting schema information is stored in the dbDictionary
Access database table. Once the database dictionary is created by
readAccessDB, the CASE-tool-compatible text files and the DDL files are
created.
To export to System Architect, you first create the entity.txt file (all files
are available electronically; see "Availability," page 3) that lists the
entities to appear in the ER model. Next, create the datastrc.txt file, which
contains System Architect data structures that link the appropriate attributes
for each entity. A third text file, element.csv, is created in comma separated
value (CSV) format and lists all the data elements in the entire database,
along with the associated table and data type. Finally, table.sql is created;
it contains a description of the database using DDL statements. Each of these
text files is created in the procedure readAccessDB in Listing One.
The schema-export approach assumes a one-to-one correspondence between
physical database tables and entities in a physical ER diagram. Similarly, a
one-to-one correspondence is assumed between the columns in a database table
and attributes associated with an entity in a physical model. This means that
primary and foreign keys appear as attributes in the physical ER model
prepared from the exported schema. DBA Assistant assumes that the primary key
in each table is indicated by the prefix PK_ followed by the name of the
table. (For example, the primary key for the dbDictionary table would be
PK_dbDictionary.) When a member of the DAO Fields collection conforms to this
assumption, DBA Assistant adds a "@" prefix to the column name while creating
the exported-schema text files. System Architect then interprets the column as
a primary key and indicates it as such in the physical ER diagram.
At this point, you can generate a DDL statement that defines a database table
and all its columns; in other words, it defines an entity and its attributes,
together with a portion of the metadata. This requires the SQL Create Table
statement, a permutation of the schema information obtained earlier by
traversing the DAO (for example, CREATE TABLE table (field1 type [(size)]
[index1] [, field2 type [(size)] [index2] [, ...]])). The Access Basic
procedure readAccessDB generates the file table.sql at the same time the
CASE-compatible text files are generated.
Since Access's set of database data types differ from those which can be
selected in System Architect, you need a mapping function to transform the
data type of each Access database attribute into a corresponding data type
compatible with System Architect. To build this, you simply use a look-up
scheme; see the mapSATypeToAccessType function in Listing Two, which returns
the Access data type that most closely corresponds to a System
Architect-compatible data type supplied as the function's argument.


Using DBA Assistant


DBA Assistant is an Access application. To use the tool's schema-export
feature, open DBA Assistant. When Figure 3 appears, press the "Export a dB
Schema" button; Figure 4 will appear. Choose the Access database from which to
export by pressing the upper "Locate" button and selecting the desired .mdb
file. Specify the directory in which to place the exported schema text files
by pressing the lower "Locate" button. When the "Export the DB" button is
pressed, the schema of the selected Access database is exported. When the
procedure is complete, the four text files described previously should exist
in the same directory as the Access application.
To import definitions into System Architect, first create a new System
Architect encyclopedia. Select "Import Data" from the Definition menu; Figure
5 appears. Select "Entity" from the first combo box, then Text. Supply the
name of the entity text file produced by DBA Assistant (entity.txt) and click
the "OK" button. Repeat this for the data-structure file. Next, elements are
imported by choosing ".CSV" format and "Data" element from the appropriate
combo boxes. Enter the pathname of the element file created by DBA Assistant
(element.csv in Figure 5) and click "OK." 
At this point, all of the relevant information has been imported into System
Architect and the encyclopedia needs to be updated. Select "Dictionary Update"
from the System Architect File menu and answer "yes" to the question: Do you
wish to update all definitions? At this point a new ER diagram can be created,
and you can create entities by dragging and dropping onto it. 


Exporting the Data Model from System Architect


Building an Access database from an ER diagram begins by exporting a physical
data model from System Architect. First you export its statement, then import
the physical model into Access using DBA Assistant. A new ODBC-compliant
database is created from the model by generating and executing the appropriate
DDL statements. 
You can export the data-model elements from System Architect. ER model
attributes marked by the modeling team as primary keys in System Architect are
indicated in the element.csv file by an @ sign preceding the element name. 


Creating a Database from a Model



A database is created by reading the exported data model in element.csv into
Access using readElementsfile, which reads the ER model definition and
populates the dbDictionary table. Next, the entities and attributes defined in
dbDictionary generate the tables in the Access database file; see the Access
Basic procedure makeAccessTables. These procedures appear in Listing Two.
The procedure mapSATypetoAccessType performs a lookup-style substitution of
System Architect C Storage types to Access database data types. Character
strings longer than 255 are converted to the Access memo type. You may wish to
add other types of mappings for Access types such as Date/Time and Yes/No. 
The Access Basic procedure makeAccessTables constructs an SQL Create Table
statement from information contained in the internal dbDictionary table. This
statement executes in the remote database using the ODBC connection created
between DBA Assistant and the remote Access database file identified by the
user. This procedure could easily be extended to accept an ODBC data-source
name instead of an Access .mdb file name. In this way, databases could be
created in any ODBC-compliant DBMS.
The System Architect data-element export file contains only data elements. Any
additional elements designated as keys in the model are exported in the System
Architect entities export file. To transfer the keys, the entity.csv file is
entered into the DBA Assistant. When "Create Database" is pressed, the data
element and entity .csv files are processed to remove extra carriage returns.
This converts the entity export from System Architect into a valid .csv file.
This file is then imported into Access via TransferText. The resulting table
has the name "entity" and contains one record for each entity defined in the
System Architect model. The first column indicates the entity name; the second
contains the description of all the attributes associated with this entity.
Unique keys have an @ prefix, and foreign keys are indicated by the keyword
"FKFROM."
The Access Basic CreateKeys procedure reads the entity table, identifies all
keys of both types, and saves the key definitions in the dbKey Access table.
Once the keys are saved, the additional key columns are added to the existing
Access table definitions using the SQL Alter Table statement.
An associative entity or class defined in the data model may have attributes
that consist entirely of keys. In this case, the associative entity would not
have been defined in System Architect's data-element export file because
technically, none of the entity's attributes are data elements--they are keys.
Consequently, the entity name indicated in the System Architect entity export
file may not exist in the partially created database. In this case, issuing an
SQL Alter Table statement to the database for a table which does not exist
will result in an error. Fortunately, you can detect this error and respond
appropriately. The procedure first issues an Alter Table statement. If a
"table doesn't exist" error results, it constructs a SQL Create Table
statement using only the single key column. Subsequent key columns defined in
the model for this entity will be added using alternative table statements as
additional records in the dbKey table are processed. Since the key attributes
are not contained in the data element's exported file, there is no way to
determine the key column's data type from the information exported from System
Architect. Consequently, DBA Assistant assumes that all keys are of type long
integer. A workaround is to encode the data-type information in the
data-element name; for example, add a _Date suffix to the data-element name
for keys of type Date. Then, a mapping function can be constructed in Access
to identify the suffix and create the keys with the desired type information.
Finally, for each key column indicated in the dbKeys table, CreateKeys creates
a simple database index to speed performance during queries. Data-integrity
constraints can be placed on each index to indicate that the column is, say,
mandatory, or must be unique. DBA Assistant creates the simplest type of index
using the SQL Create Index statement as in CREATE INDEX IndexName ON TableName
(ColumnName);. A test data model (element.csv) lets you create databases.


Summary


DBA Assistant is useful for bridging the gap between modeling efforts and
creating the physical database upon which client/server applications are
constructed. The use of tools such as DB Assistant can result in
more-effective management of client/server development projects because the
amount of time and effort required to synchronize a physical ER Model to a
corresponding database is reduced.


For More Information


System Architect
Popkin Software & Systems
11 Park Place 
New York, NY 10007
800-732-5227

Microsoft Access 
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
206-882-8080
Figure 1: Path from a DBMS to a CASE tool.
Figure 2: Path from a CASE tool to a DBMS.
Figure 3: Opening DBA Assistant screen.
Figure 4: DBA Assistant Export screen.
Figure 5: System Architect Import Data screen.

Listing One
Sub readAccessDB (myDB As Database, theControl As Control, rootDirectory)
' save the structure of an Access database using the DAO hierarchy
' Access --> CASE Tool
 Dim crlf, dbName, numTables, counter, sql, tableName, foreignKeys
 Dim sqlTableName,createTable,theTypeCode,theLength,theName,theColumnName
 Dim thePureColumnName, sqlCol
 Dim theDescription
 crlf = Chr(10)
 Dim localDB As Database
 Set localDB = dbengine.workspaces(0).databases(0)
 Dim i As Integer, j As Integer, cols As Integer
 Dim theType
 DoCmd Hourglass True
 If Mid(rootDirectory, Len(rootDirectory), 1) <> "\" Then
 rootDirectory = rootDirectory & "\"
 End If
 Open rootDirectory & "entity.txt" For Output As #1
 Open rootDirectory & "datastrc.txt" For Output As #2
 Open rootDirectory & "element.csv" For Output As #3
 Open rootDirectory & "table.sql" For Output As #4
 Print #3, "Name,Description,Domain,Comments,Length,Business Unit,
 Column Name,Database,Table,Version,C Storage Type,C Storage Occurrences,

 Storage Class,Storage Picture,Display Picture"
 dbName = myDB.name
 numTables = myDB.tabledefs.count
 counter = 0
 sql = "delete from dbDictionary;" ' clear the dictionary
 localDB.Execute (sql)
 For i = 0 To numTables - 1
 tableName = myDB.tabledefs(i).name
 theControl = "Exporting table: " & tableName
 DoEvents
 foreignKeys = ""
 If (tableName <> "Paste Errors") And (Mid(tableName, 1, 4) <> "MSys") 
 Then 'ignore system tables
 Print #1, "<<" & tableName & ">>"
 Print #2, "<<" & tableName & ">>"
 Print #1, tableName
 cols = myDB.tabledefs(i).fields.count
 sqlTableName = "[" & tableName & "]"
 createTable = "drop table " & sqlTableName & ";"
 Print #4, createTable
 createTable = "create table " & sqlTableName & " ("
 Print #4, createTable
 For j = 0 To cols - 1
 theTypeCode = myDB.tabledefs(i).fields(j).type
 theLength = myDB.tabledefs(i).fields(j).size
 theName = myDB.tabledefs(i).fields(j).name
 theColumnName = myDB.tabledefs(i).fields(j).name
 theDescription = " " ' none for now. see TechNet article Q109136
 theType = "Undefined"
 Select Case theTypeCode
 Case DB_SINGLE
 theType = "Float"
 Case DB_TEXT
 theType = "Char"
 Case DB_DOUBLE
 theType = "Float"
 theLength = 8
 Case DB_MEMO
 theType = "Char"
 theLength = 1024
 Case DB_DATE
 theType = "Char"
 theLength = 8
 Case DB_LONG
 theType = "Long"
 theLength = 4
 Case DB_BYTE
 theType = "Byte"
 Case DB_integer
 theType = "integer"
 theLength = 2
 Case DB_BOOLEAN
 theType = "Char"
 theLength = 1
 Case DB_CURRENCY
 theType = "Float"
 theLength = 4
 Case Else
 theType = "Undefined:" & theTypeCode

 End Select
' identify the primary key to system architect by adding the @ prefix
 theColumnName = theName
 thePureColumnName = theName
 If Len(theName) > Len(tableName) And Mid(theName, 4, 
 Len(tableName)) = tableName And Mid(theName, 1, 3) = "PK_" 
 Then theColumnName = "@" & theName
 If Len(theName) > Len(tableName) And Mid(theName, 4, 
 Len(tableName)) = tableName And Mid(theName, 1, 3) = "FK_" 
 Then foreignKeys = foreignKeys & theName
 Print #3, """" & theName & """,,,,""" & theLength & """,
 Engineering,""" & theColumnName & """,""" & dbName & """,
 """ & tableName & """,,""" & theType & """,,,,"
 sql = "insert into dbDictionary (tableName,columnName,columnType,
 width,description) "
 sql = sql & "values (""" & tableName &""",""" & thePureColumnName &""",
 """ & theType & """," & theLength & ",""" & theDescription & """);"
 localDB.Execute (sql)
 If j < cols - 1 Then
 Print #2, """" & theColumnName & """ + "
 Else
 Print #2, """" & theColumnName & """"
 End If
 sqlCol = Mid(thePureColumnName, 1, 18)
 createTable = "[" & sqlCol & "]" & " " & UCase$(theType)
 If theType = "Text" Then
 createTable = createTable & " (" & theLength & ")"
 End If
 If j <> cols - 1 Then createTable = createTable & ","
 Print #4, createTable
 counter = counter + 1
 Next j
 createTable = ");"
 Print #1,
 Print #2,
 Print #4, createTable
 Print #4, "Create unique index PKI_" & sqlTableName & " on " & 
 sqlTableName & "(" & "PK_" & sqlTableName & ");"
 Print #4, "Grant all on " & sqlTableName & " to public;"
 Print #4, ' sql statement
 End If
 Next i
 localDB.Close
 Close #1
 Close #2
 Close #3
 Close #4
 DoCmd Hourglass False
End Sub
Sub readElementsFile (elementsFilePath, theControl As Control)
 Dim sql, recordCounter, errCounter
 Dim myDB As Database
 Dim i As Integer, j As Integer, cols As Integer
' DoCmd Hourglass True
 Set myDB = dbengine.workspaces(0).databases(0)
 sql = "delete from dbDictionary" 'clear the existing db definition
 myDB.Execute sql
 Open elementsFilePath For Input As #1
 Dim aName, description, domain, Comments, length, BusinessUnit, 

 columnName,aDatabase,aTable,Version, CStorageType, CStorageOccurrences,
 StorageClass, StoragePicture, DisplayPicture, a1, a2, a3
 recordCounter = 0
 Open "DBAssist.log" For Output As #2
 Do While Not EOF(1)
 Input #1,aName,description,domain,Comments,length,BusinessUnit,columnName,
 aDatabase,aTable,Version,CStorageType,CStorageOccurrences,
 StorageClass,StoragePicture,DisplayPicture,a1,a2,a3
 If recordCounter > 0 Then
' If Mid(columnName, 1, 1) = "@" Then
' columnName = Mid(columnName, 2, Len(columnName))
' End If
 If Len(aTable) = 0 Or Len(columnName) = 0 
 Or Len(CStorageType) = 0 Then
 Print #2, "Table, Column or Storage Type not defined. record: ", 
 recordCounter, aName, description
 errCounter = errCounter + 1
 Else
 theControl = aTable & ", " & columnName
 DoEvents
 sql = "insert into dbDictionary "
 sql = sql & "(columnName,width,tableName,columnType,description,
 sysArchColName) "
 sql = sql & "values (""" & columnName & """," & length & ",
 """ & aTable & """,""" & CStorageType & """,
 """ & description & """,""" & aName & """ );"
 myDB.Execute sql
 End If
 End If
 recordCounter = recordCounter + 1
 Loop
 myDB.Close
 Close #1
 Close #2
 DoCmd Hourglass False
 If errCounter > 0 Then theControl = "Errors occurred. check DBAssist.log"
End Sub
Sub readEntity (theControl As Control)
' create indices from information in the entity table
 Dim sql, recordCounter, errCounter
 Dim i As Integer, j As Integer, cols As Integer
' DoCmd Hourglass True
 Dim atsign, keyCounter, tableName, columnName, keyType, keyName
 Dim localDB As Database
 Set localDB = dbengine.workspaces(0).databases(0)
 localDB.Execute "delete from dbKey"
 
 Dim sn As Recordset
 sql = "select * from entity"
 Set sn = localDB.OpenRecordset(sql, DB_OPEN_SNAPSHOT)
 If sn.recordcount = 0 Then
 MsgBox "Entity table is empty"
 Exit Sub
 End If
 keyCounter = 1
 Dim theLine, firstquote, numItems As Integer
 recordCounter = 0
 sn.MoveFirst
 Dim tableDescription As String

 Do While Not sn.EOF
 tableName = sn("name")
 If Not IsNull(sn.description) Then
 tableDescription = sn.description
 numItems = getNumItems(tableDescription, "+")
 For j = 1 To numItems
 theLine = getItem(tableDescription, "+", j)
 atsign = InStr(theLine, "@")
 If atsign > 0 Then
 If atsign > 1 Then 'beginning of new table definition
 columnName = Mid(theLine, InStr(theLine, """") + 1, 99)
 columnName = Mid(columnName, 1, InStr(columnName, """") - 1)
 
 keyType = "Unique"
 If InStr(theLine, "FKFROM") > 0 Then keyType = "Foreign"
 keyName = "index" & keyCounter
 keyCounter = keyCounter + 1
 theControl = "Reading key: " & keyName & " on " & columnName
 DoEvents
 sql = "insert into dbKey "
 sql = sql & "(keyName,tableName,columnName,keyType) "
 sql = sql & "values (""" & keyName & """,""" & tableName & """,
 """ & columnName & """,""" & keyType & """);"
 localDB.Execute sql
 Else 'this line is part of a composite key
 firstquote = InStr(theLine, """")
 columnName = Mid(theLine, firstquote + 1, InStr(firstquote + 1, 
 theLine, """") - (firstquote + 1))
 keyType = "Unique"
 keyName = "index" & keyCounter
 keyCounter = keyCounter + 1
 If InStr(theLine, "FKFROM") > 0 Then keyType = "Foreign"
 keyCounter = keyCounter + 1
 theControl = "Reading key:" & keyName & " on " & columnName
 DoEvents
 sql = "insert into dbKey "
 sql = sql & "(keyName,tableName,columnName, keyType) "
 sql = sql & "values (""" & keyName & """,""" & tableName & """,
 """ & columnName & """,""" & keyType & """);"
 localDB.Execute sql
 End If
 Else
 ' no at sign - check for foreign key
 Dim foreignKey, k, firstChar
 firstChar = -1
 foreignKey = InStr(theLine, """ / FKFROM")
 If foreignKey > 0 Then
 ' check if the first char is a quote
 If Mid(theLine, 1, 1) = """" Then
 columnName = Mid(theLine, 2, InStr(2, theLine, """") - 2)
 Else
 ' Look for the columnName. Search backward from token until a " is found
 For k = foreignKey - 1 To 1 Step -1
 If Mid(theLine, k, 1) = """" Then
 firstChar = k + 1
 Exit For
 End If
 Next k
 columnName = Mid(theLine, firstChar, foreignKey - firstChar)

' Debug.Print columnName
 End If
 keyType = "Foreign"
 keyName = "index" & keyCounter
 keyCounter = keyCounter + 1
 theControl = "Reading foreign key:" & keyName & " on " & columnName
 DoEvents
 sql = "insert into dbKey "
 sql = sql & "(keyName,tableName,columnName, keyType) "
 sql = sql & "values (""" & keyName & """,""" & tableName & """,
 """ & columnName & """,""" & keyType & """);"
 localDB.Execute sql
 End If
 End If
 Next j
 End If
 recordCounter = recordCounter + 1
 sn.MoveNext
 Loop
 sn.Close
 localDB.Close
 DoCmd Hourglass False
End Sub

Listing Two 
Option Compare Database 'Use database order for string comparisons
Option Explicit
Sub createAccessTables (accessDBPath, theControl As Control)
' Create an Access database by reading the DB dictionary, generating and
' executing SQL DDL statements
' CASE Tool --> Access
'
On Error GoTo DBErrHandler
Dim localDB As Database, remoteDB As Database
Dim localSN As Recordset, remoteSN As Recordset
Dim localSQL, remoteSql
Set localDB = dbengine.workspaces(0).databases(0)
Set remoteDB = OpenDatabase(accessDBPath)
localSQL = "select * from dbDictionary order by tableName"
Set localSN = localDB.OpenRecordset(localSQL, DB_OPEN_SNAPSHOT)
If localSN.recordcount = 0 Then
 MsgBox "DB dictionary is empty. Cannot continue"
 Exit Sub
End If
Dim oldTableName, tableName, inType, outType, inLength,outLength,theColumnName
Dim lengthPhrase, sql
Dim createPrimaryKeys, indexColName
Dim theDescription, SAName
localSN.MoveFirst
tableName = localSN.tableName
oldTableName = tableName
'start the new SQL statement
remoteSql = "create table [" & tableName & "] 
 ([" & getKeyName(tableName) & "] Counter, "
createPrimaryKeys = True
Do Until localSN.EOF
 If tableName <> oldTableName Then
 remoteSql = Mid(remoteSql, 1, Len(remoteSql) - 1) 
 remoteSql = remoteSql & ");" 

 theControl = "Creating table: " & oldTableName
 DoEvents
 sql = "drop table [" & oldTableName & "]"
 remoteDB.Execute (sql)
nextStatement:
 remoteDB.Execute (remoteSql) 'create table in remote DB
 ' using standard SQL, create a primary key for this table
 If createPrimaryKeys Then
 remoteSql = "create unique index " & "pki_primary" & 
 " on [" & oldTableName
 remoteSql = remoteSql & "] ([" & getKeyName(oldTableName) & "]) 
 with primary;"
 remoteDB.Execute (remoteSql)
 End If
 'start the new SQL statement
 remoteSql = "create table [" & tableName & "] ([" & getKeyName(tableName) 
 & "] Counter, "
 End If
 inType = localSN.columnType
 inLength = localSN.Width
 theColumnName = localSN.columnName
 theDescription = localSN.description
 SAName = localSN.sysArchColName
 Dim aType, aLength
 ' SA does not have a Date/Time type. Retrieve from SA logical attribute name
 Call mapColNameToType(SAName, aType, aLength)
 Call mapSATypeToAccessType(inType, inLength, outType, outLength)
 If aType = "Date" And outType = "Text" Then outType = aType
 If aType = "Currency" And outType = "Float" Then outType = aType
 lengthPhrase = ""
 If outType = "Text" Then lengthPhrase = " (" & outLength & ") "
 remoteSql = remoteSql & "[" & theColumnName & 
 "] " & outType & lengthPhrase & ","
 oldTableName = tableName
 localSN.MoveNext
 If Not localSN.EOF Then tableName = localSN.tableName
Loop
Exit Sub
DBErrHandler:
Select Case Err
 Case 3376 'Table doesn't exist
 Case Else
 MsgBox Error(Err)
 MsgBox remoteSql
End Select
Resume nextStatement:
End Sub
Sub createKeys (mdbPath, theControl As Control)
 On Error GoTo errHandler
 Dim remoteDB As Database, localDB As Database
 Dim sn As Recordset
 Dim sql, indexSql,aTableName,aKeyName,aType,aKeyType,aColumnName,loopCounter
 Set localDB = dbengine.workspaces(0).databases(0)
 Set remoteDB = OpenDatabase(mdbPath)
 indexSql = ""
 loopCounter = 1
 sql = "select * from dbKey order by tableName, keyType desc"
 Set sn = localDB.OpenRecordset(sql, DB_OPEN_SNAPSHOT)
 If sn.recordcount = 0 Then

 MsgBox "DB key table is empty"
 Return
 End If
 Do While Not sn.EOF
 aTableName = sn.tableName
 sql = "alter table [" & aTableName & "] Add column "
 aKeyName = sn.keyName
 aKeyName = "index" & loopCounter
 aKeyType = sn.keyType
 aColumnName = sn.columnName
 aType = "Long"
 sql = sql & "[" & aColumnName & "] " & aType
 theControl = "Adding " & aKeyName & " to table: " & aTableName
 DoEvents
 remoteDB.Execute (sql)
 ' create simple keys
 indexSql = "create index " & aKeyName & " on [" & aTableName & "] 
 ([" & aColumnName & "] "
 indexSql = indexSql & ")"
 remoteDB.Execute (indexSql)
 indexSql = ""
 loopCounter = loopCounter + 1
 sn.MoveNext
 Loop
Exit Sub
errHandler:
Select Case Err
 Case 3380 'Ignore the error: Column already exists
 Case 3375 'Ignore the error: Table already has an index named ...
 Case 3376 'Handle the error: Table doesnt exist
 ' We assume: This table doesn't exist because all its columns are keys. 
 ' Create the table
 indexSql = "create table ["& aTableName &"] ([" & aColumnName & "] long )"
 remoteDB.Execute (indexSql)
 Case Else
 MsgBox Error(Err)
 MsgBox sql
End Select
Resume Next
End Sub
Function getKeyName (columnName)
 Dim keyName
 keyName = "PK_" & columnName
 getKeyName = keyName
End Function
Sub mapAccessTypeToAccessType (inType, inLength, outType, outLength)
 outType = inType
 outLength = inLength
End Sub
Sub mapColNameToType (colName, aType, aLength)
' This mapping assumes the column names consist of several words separated
' by spaces. Last grouping of letters is a mneumonic which indicates type and 
' length of the variable
Static inTypes(16), outTypes(16), lengths(16)
Dim numValues
numValues = 16
inTypes(0) = "ADD"
inTypes(1) = "CD"
inTypes(2) = "Ct"

inTypes(3) = "DT"
inTypes(4) = "DESC"
inTypes(5) = "FT"
inTypes(6) = "HR"
inTypes(7) = "ID"
inTypes(8) = "NM"
inTypes(9) = "NO"
inTypes(10) = "PCT"
inTypes(11) = "RT"
inTypes(12) = "TYP"
inTypes(13) = "YR"
inTypes(14) = "YN"
inTypes(15) = "Amt"
outTypes(0) = "Text"
outTypes(1) = "Text"
outTypes(2) = "Long"
outTypes(3) = "Date"
outTypes(4) = "Memo"
outTypes(5) = "Long"
outTypes(6) = "Integer"
outTypes(7) = "Text"
outTypes(8) = "Text"
outTypes(9) = "Long"
outTypes(10) = "Double"
outTypes(11) = "Double"
outTypes(12) = "Text"
outTypes(13) = "Integer"
outTypes(14) = "Yes/No"
outTypes(15) = "Currency"
lengths(0) = "50"
lengths(1) = "3"
lengths(2) = ""
lengths(3) = ""
lengths(4) = ""
lengths(5) = ""
lengths(6) = ""
lengths(7) = "15"
lengths(8) = "50"
lengths(9) = ""
lengths(10) = ""
lengths(11) = ""
lengths(12) = "3"
lengths(13) = ""
lengths(14) = ""
lengths(15) = ""
' get the type code from the column name
Dim i As Integer
Dim lastSpace
' find the last space in the columnName
lastSpace = 0
For i = 1 To Len(colName)
 If Mid(colName, i, 1) = " " Then lastSpace = i
Next i
If lastSpace = 0 Then
 MsgBox "Column name must contain at least one space: " & colName
 Exit Sub
End If
Dim inType
inType = Mid(colName, lastSpace + 1, Len(colName))

'Debug.Print inType
Dim found, j As Integer
found = 0
For j = 0 To numValues - 1
 If inType = inTypes(j) Then
 found = 1
 aType = outTypes(j)
 aLength = lengths(j)
' Debug.Print aType & ":"; aLength
 Exit Sub
 End If
Next j
If found = 0 Then
' MsgBox "mapColNameToType. Unknown type: " & inType
 aType = ""
 aLength = ""
End If
End Sub
Sub mapSATypeToAccessType (inType, inLength, outType, outLength)
' This mapping converts an SA "C Type" to a valid Access Type
' An input type of Char with length > 255 is converted to Access type Memo
Static inTypes(10), outTypes(10)
Dim numValues
numValues = 9
inTypes(0) = "char"
inTypes(1) = "int"
inTypes(2) = "unsigned"
inTypes(3) = "char near *"
inTypes(4) = "char far *"
inTypes(5) = "long"
inTypes(6) = "unsigned long"
inTypes(7) = "float"
inTypes(8) = "double"
outTypes(0) = "Text"
outTypes(1) = "Integer"
outTypes(2) = "Long"
outTypes(3) = "Text"
outTypes(4) = "Memo"
outTypes(5) = "Long"
outTypes(6) = "Long"
outTypes(7) = "Single"
outTypes(8) = "Double"
Dim found, j As Integer
found = 0
For j = 0 To numValues - 1
 If inType = inTypes(j) Then
 found = 1
 outType = outTypes(j)
 outLength = inLength
 If outType = "Memo" Then outLength = ""
' Debug.Print outType & ":"; outLength
 Exit Sub
 End If
Next j
If found = 0 Then
 MsgBox "Unknown type: " & inType
End If
End Sub
Sub preprocessEntityFile (entityFilePath, processedFilePath)

 On Error Resume Next 'turn off error reporting
 ' remove the extra carriage returns to make a valid .csv file, then
 ' import the file into access for further processing
 Dim recordCounter, errCounter
 Dim i As Integer, commasPerRecord As Integer
 Dim lineLength As Integer
 Dim theLine As String, lineBuffer As String
 Dim totalCommas As Integer, commaCount As Integer
 Open entityFilePath For Input As #1
 Open processedFilePath For Output As #2
 
 commasPerRecord = 12
 lineBuffer = ""
 totalCommas = 0
 Do While Not EOF(1)
 Line Input #1, theLine
 lineLength = Len(theLine)
 commaCount = 0
 For i = 1 To lineLength
 If Mid(theLine, i, 1) = "," Then
 commaCount = commaCount + 1
 End If
 Next i
 totalCommas = totalCommas + commaCount
 If totalCommas < commasPerRecord Then
 lineBuffer = lineBuffer & theLine
 Else
 lineBuffer = lineBuffer & theLine
 Print #2, lineBuffer
 lineBuffer = ""
 totalCommas = 0
 End If
 Loop
 Close #1
 Close #2
 DoCmd SetWarnings False
 Dim localDB As Database
 Set localDB = dbengine.workspaces(0).databases(0)
 localDB.Execute ("drop table entity")
 DoCmd TransferText , , "Entity", processedFilePath, True
 DoCmd SetWarnings True
 localDB.Close
 On Error GoTo 0 'turn on error reporting
End Sub
DDJ


















PROGRAMMING PARADIGMS


The Boston Marathon




Michael Swaine


I had a good feeling when I saw that the rental car was a convertible. Ah,
yes. This was going to be a pleasant MacWorld Expo for a change. Not like past
Boston sweat-a-thons. And not like the San Francisco MacWorld earlier this
year, at which vendors fussed about their booths, trying not to look worried
or defensive, and at which there was a depressing dearth of Neat New Stuff.
Despite the fact that this Boston MacWorld was occurring a scant few weeks
before the second coming of the Messiah (also known as "Windows 95"), it
promised to be an upbeat show.
Then my natural cynicism and paranoia, momentarily lulled by the breeze
blowing through my beard as I sped along Mass 3 with the top down, kicked in.
Convertible? I hadn't asked for a convertible. Then who had? Upbeat show? Says
who?
Paranoia was not slow with an answer. On the eve of the biggest media blitz in
the history of the world, Apple desperately needed some good press. It
couldn't hope to make much noise amid the August cacophony produced by the
Windows 95 release, but it certainly didn't want a downbeat story going out
from Boston's MacWorld Expo. It wanted happy little journalists, writing happy
little stories about the upbeat mood at the show. And, looking at the online
reports being filed daily on the net by my fellow journalists, I could see
that Apple was getting just what it wanted. Could Apple have arranged things
so that the Boston trip was just a little more pleasant for the press? Is that
how I happened to be given a convertible? Were all of us fourth estaters being
subtly manipulated to write upbeat stories?
No. Not possible. Apple probably wouldn't be quite that Machiavellian, but
more to the point, it simply couldn't have engineered the positive tone of the
press reports. It couldn't have done this because the chief reason for the
positive press reports was the (relatively) balmy weather in Boston this year.
Usually, Boston in August is Waterworld with cabs: humidity higher than that
at the ocean floor, yet somehow not actually raining. Before you get out of
Logan Airport, your clothes are as soggy as a Nepali knapsack. On the trek
between the show's two sites you feel like you're carrying an extra 30 pounds,
and you start to think that a dip in Boston Harbor might not be a bad idea.
How polluted could it be? As you head back to the airport, the glue in the
seams of your press-kit bags lets go.
This year, in marked contrast, Boston was what a fair reporter would have to
call balmy: not over 98 percent humidity, not over 95 degrees. Veteran
journalists checked their armpits in amazement. No, Apple didn't arrange for
this year's balmy Expo. Apple can't control the weather. Only Microsoft can do
that.


The New Evangelists


MacWorld Expo Boston is always held in two sites, placed as far apart as
possible, as dictated by the Boston cab drivers' powerful union. Anyway,
that's my theory. I took the long cab ride to Bayside, not just because I feel
compelled to walk every aisle of every show I attend, but particularly because
that's where Developer Central was. This is the second Developer Central under
the joint sponsorship of MacTech magazine and Apple. MacTech's ebullient
editor-in-chief/peppy publisher, Neil Ticktin, and his merry cohorts do a good
job. Their enthusiasm helps spread Apple's message that it is now a more
developer-friendly company.
There was some room for improvement. You still hear the horror stories--in
fact, you can still read them in MacTech magazine, in the August issue, for
example--about Apple's failings in developer relations. 
But Apple is trying hard to change. Change its behavior, you may be asking, or
change its image? Both, apparently.
Toward both these ends, it has rehired Guy Kawasaki as head evangelist. The
Windows world and the UNIX universe have nothing like Guy Kawasaki. Guy
promotes himself shamelessly, including waging a public campaign to get
himself on Apple's board of directors, yet he has high credibility. He writes
books on marketing and, as though that subject were not fluffy enough, pads
them with dating advice, yet he continues to be taken seriously. He quits
Apple and rails at it in print, yet they hire him back. Give him this: few
people have given more intense thought than Guy to what kind of company Apple
ought to be. His vision of Apple is coherent and has been articulated often:
It depends on good developer relations. Despite what I said about his
self-promotion, I'd say that Guy's return to Apple is good news for
Mac-platform developers.
Not to be outdone, Power Computing (Austin, TX), the first company to sell a
Mac clone, has also hired itself a head evangelist. I know him well: Bob
LeVitus, former editor of Macazine and columnist for MacUser, should do right
by developers. Bob's a good guy. 
But back to the Expo: By the time I'd done the aisles of the expanded
Developer Central, I felt like Steve Jasik looked. Steve, who knows more about
Apple's system software than most Apple programmers, sits at his Developer
Central workstation twice a year, demoing his debugger and disassembler to all
comers. This demoing chiefly consists of staring at hex dumps as they scroll
by. On my way out, I got snagged by one of Neil's minions. Or maybe it was one
of Apple's minions; anyway, a minion. The minion put a CD in my hand and
melted back into the ambient throng.


Was it Live or Was it HTML? 


The disk was named "Virtual Dev Central-08/95." It turned out to contain
everything that the real Dev Central contained, except for Jasik's hex dumps,
much of it in HTML. Netscape HTML. At least I didn't see any blinking text, or
any of those annoying unpaid ads for Netscape with which so many Web page
authors adorn their work.
Anyway, apparently I could have stayed in my air-conditioned hotel room and
visited Developer Central virtually. If I had, here's some of what I would
have seen: Apple was showing off QuickTime 3D and OpenDoc at the show, among
other development tools. The company announced OpenDoc Development Release 3,
including OpenDoc Development Framework (ODF) 1.0d9. This is a C++
object-oriented framework for building OpenDoc stand-alone applications; sort
of MacApp for OpenDoc.
Newton developers will probably already know that Apple has spun off BookMaker
from the latest release of the Newton Toolkit, which is supposed to allow much
faster development of Newton apps on a Mac. It features the promised selective
compilation option that lets developers profile code to see what sections
would benefit from being compiled to machine code, and a compiler to produce
that ARM code. 
Prices have been cut for some developer products and services, and this is
particularly welcome in the Newton realm: The upgrade is just $99.00, NTK 1.5
is $299.00, and BookMaker 1.1 is $199.00. I watched the Pippin demo for a
while. You know, Apple's planned entry into the settop box fray? The real
thing; there was no demo on the CD-ROM. I was puzzled. Surely they don't
intend to compete with game boxes? Tell me they plan to turn this thing into a
net-access device or something like that. Apple's analog to Microsoft Network,
eWorld, is going scriptable. Apple is now publishing a proposed Apple Events
suite for eWorld-like transactions. Between now and the end of the year, Apple
is folding AppleLink, or its users anyway, into eWorld, so eWorld is going to
see some increased attention, at least in the developer community.


Guide Gets Guides


And then there's Guide.
It's more than a little ironic that the anarchic world of Microsoft Windows
(anarchic in comparison to Apple's, that is) has more consistent online
application documentation than Apple. Guide is Apple's second attempt at
providing something along these lines. Balloon help, a good system for a very
limited purpose, has had trouble igniting interest among developers; even, to
Apple's mortification, among developers at Claris.
I've written about Guide here before. It's Apple's help engine, designed for
creating and running interactive, step-by-step help systems that can be built
on top of applications or custom software and authoring systems without any
modification to the underlying software. Or: It's a tool for creating custom
help systems designed to address the specific needs of your customers and
custom software solutions. Or: It's another technology Apple wants third-party
application developers to spend their weekend evenings (or whatever time they
still have unallocated) implementing. 
Apple makes the distinction that a Guide is an electronic teacher rather than
an electronic book, like most Windows and Mac online documentation. Guides are
designed to lead the user through learning how to do something. Dev Central
showcased several tools to make Guide development easier. Not that Guides are
particularly hard to create; Guide is organized around a simple scripting
language. StepUp Software (Dallas, TX) built a product called "Guide Composer"
using two scripting tools for HyperCard: Double-XX from Itty-Bitty Computers
and WindowScript from SDU.
In case you need help developing Guides, there are now three books on the
subject: Danny Goodman's Apple Guide Starter Kit, by Danny Goodman and Jeremy
Joan Hewes (Addison-Wesley, 1995); Apple Guide Complete, by Apple Computer
(Addison-Wesley, 1995); and Real World Apple Guide, by Jesse Feiler (M&T
Books, 1995). Each includes a CD-ROM with tools for creating guides. Danny and
Jeremy's is probably the best introduction for novices; the Apple book is fat
and authoritative; and the Jesse Feiler book has tips on creating customized
Apple Guide applications with MPW, MacApp CodeWarrior, and Think C.


Web Development Gets Tamed


The most interesting products at the show were PageMill and SiteMill from
Ceneca Communications (Palo Alto, CA). PageMill is just a Web-page editor, but
with SiteMill, it becomes a serious Web-site development environment. SiteMill
automatically checks links throughout your Web site to ensure that they don't
get broken. Since you can pretty much expect links to break sooner or later,
this is really important for true Web-site management, as opposed to vanity
home-page development.
But SiteMill also displays all your pages, images, page titles, and so forth.
It automatically fixes changed links when possible and warns you when it can't
(if the resource is unreachable). It alerts you about unused resources. And it
allows drag-and-drop link creation. It is very cool. The latest version of
WebSTAR from StarNine (Berkeley, CA), formerly Chuck Shotton's WWW-based
MacHTTP, was being demonstrated at the show. This version is faster and can be
configured remotely from any Web browser. StarNine was even prouder of a study
that showed that 66 percent of the commercial Web servers in use are running
WebSTAR or MacHTTP. When you look at all sites, not just commercial sites,
this dominance entirely disappears, but the commercial-site figures support
their claim that their product line is to Web publishing today what PageMaker
initially was to desktop publishing.


CompuServe Gets Hip



I have always thought of the powers that be at CompuServe as a bunch of
clueless Midwest mainframers. (I mean this in the nicest possible way. I
myself was once a clueless Midwest mainframer.) But I have to admit that these
Ohioans threw the best party at the Expo. It was held at Mama Kin, a club that
was purchased by the rock band Aerosmith as a venue, not for their own
performances, but to showcase new artists. There were hints that Aerosmith
frontman Steven Tyler might show up there, although that didn't happen. (For
those of you who don't know, Tyler is the guy who looks just like Mick Jagger,
but isn't. Mick Jagger, of course, is the singer of the Windows 95 theme song,
"Start Me Up," which Mac fanatics now refer to as "You make a grown man cry.")
But Tyler wasn't even missed. There was too much going on.
The party had an Upstairs/Downstairs ambiance: Two upstairs rooms were
reserved for VIPs like me, and stocked with lots of faux Chee-tos and other
culinary delights. Bouncers politely kept the riff-raff downstairs.
Down there, two bands played to a packed house. Morphine, a very hot, bluesy
band comprising guitar, sax, and drums, went over very big. And very loud: at
one point Peter, a longtime friend and former Apple employee, came over and
shouted in my ear that he didn't think these guys were used to playing to an
audience most of whom were standing right in front of the speakers. At least I
think that's what he said.
In the front bar area, there was a very hip artist. Known as "The Butt
Sketcher," he was doing free, 2-1/2 minute drawings of party-goers that they
could take home with them. The gimmick: All his drawings are southern
exposures of people facing north. It's all he does, and he apparently makes a
very good living at it. We do live in an age of specialization.
But a party isn't defined just by what goes on inside. Any guests who
attempted to leave early quickly discovered that CompuServe had booked the
party into a venue next door to Fenway Park in the middle of a Red Sox game.
Or not the middle, exactly: The game let out halfway through the party. Most
guests decided not to fight the Sox fans for the cabs, and stuck around for
another hour. Leaving then, they found themselves stepping gingerly between
the motorcycles of the fifty or so Hell's Angels who had shown up and were
lounging about the entrance.
All in all, it was a party to remember.


No Place Like Home


Back home in the Bay area, I was favored with a report from the place where
the software really meets the road: the Second Annual Robot Wars at Fort Mason
Center in San Francisco. Although I attended last year's Robot Wars, I missed
this year's event, but at least I got the virtual experience. My friend,
Jurgen, attended and shot video of the whole fracas, and brought the video to
Stately Swaine Manor that evening. He had videotaped the carnage to show on
his Web site, which is at http://www.ix.de.
The event, you may recall from my report last year, is basically gladiatorial
combat among homemade robots, mostly radio-controlled machines built from
washtubs and 2x4s and a dizzying variety of junk-heap and machine-shop
materials. The combatants rip at each other with chain saws and bash each
other with iron arms, all the time dodging nets and swinging wrecking balls
that the sponsors of the event include to give some additional excitement to
the already wild action. This year, the show was bigger, with more sponsors
and more participants, and ran two days instead of one, but it was just as
insane.
As we sat at my kitchen table discussing the battling robots in the video that
Jurgen had shot that morning and that he would be publishing on his Web site
later, it struck me how much of my life involves technology that was the stuff
of science fiction when I was a child. As John Rennie, editor-in-chief of
Scientific American, says in the 150th anniversary issue of his magazine, "the
future is now not even when it used to be." I half expect to wake up one
morning to find that the future has slid right past the present and into the
past. See you in cyberspace.















































C PROGRAMMING


A C++ Ray-Casting Engine




Al Stevens


Last spring's Dr. Dobb's Sourcebook of Games Programming (May/June 1995)
included the article "Ray Casting in C++," by Mark Seminatore. The article was
timely for me, because at the time I was casting about (har, har) for a
ray-casting algorithm to use in a C++ book about Windows games programming.
Ray casting is the technique that many 3-D game programs use to render and
animate scenes that occur within a maze of perpendicular walls of a uniform
height. The algorithm uses texture mapping to render walls, doors, props, and
sprites. The ray-casting algorithm Mark implemented supports scenes similar to
those in Wolfenstein, an early 3-D maze game. More-contemporary games use
more-complex and better-optimized algorithms to support texture-mapped floors
and ceilings, stairs, sloping and angular hallways, and player movement in
three axes (pitch, yaw, and roll). My requirements are satisfied by the
earlier technology, which is fortunate, since those techniques have been
published and are widely understood.
Mark's article addressed the concepts behind ray casting in some detail. He
also explained how to use assembly language to optimize the algorithm. Two
books, Tricks of the Game Programming Gurus, by LaMothe, Ratcliff, Seminatore,
and Tyler (Sams Publishing, 1994) and Gardens of Imagination, by Christopher
Lampton (Waite Group Press, 1994) also provide explanations of ray casting and
source code that implements the algorithm. Mark's ray caster and the one in
the Gurus book are similar, although he did not write the chapter on ray
casting.
I downloaded Mark's program and dissected it to see how adaptable it was to my
needs. The algorithm is well implemented, but I wanted to change or add
several things. Mark built his ray caster as a C library with functions that
you can call from a C or C++ program. I wanted mine to be pure C++ because I
wanted to make many modifications and I am more comfortable with C++. I also
made changes to the following items:
The program supports only one size viewport, a 240x120 window in a 320x 200
graphics resolution display, and I wanted a resizable viewport up to
full-screen.
Mark's ray caster does not support doors that open and close in the maze.
There is no support for props in the maze scene, such as potted plants,
tables, and so on.
The player is confined to the inner corridors of the maze. My program needs to
take the player outside the maze to view the outer walls.
There is no provision for displaying a map of the maze, which I need, at least
for testing.
There is no support for inserting animated characters in the scene. In fact,
none of the published algorithms include that feature, which limits the kind
of game you can design. What fun is wandering around an unpopulated maze? You
need monsters, bad guys, and banjo players, and that implies support for
animated sprites.
The ray caster in the Gurus book uses matching pairs of wall tiles with one of
each pair being darker than the other. I modified the maze logic to include
this feature. The darker tiles display on the x walls, and the lighter tiles
display on the y walls. This adds to the effect of depth in the maze.
Finally, the C ray caster does not support corridors longer than some maximum
length, which depends on the current viewing angle. I corrected that problem
at the expense of CPU cycles by using floating-point math for the distance
computation. If you build a maze with short corridors and no large outside
spaces, you can recover those cycles by removing the LONGCORRIDORS global
variable in consts.h.
I used Wolfenstein as a model for a 3-D maze engine to allow programmers to
design game programs with all of its features. It's not there yet, but it's
close, close enough to be the subject of a "C Programming" column project.
I used Mark's code only to determine how the algorithms work. My code is a
complete rewrite in C++ rather than a wrapper around a C library. This
decision was not meant to diminish his achievement, which is significant, but
rather to produce an enhanced library, one with the features that I need in my
simulation.
My first objective was to match the performance of the C version of the ray
caster by implementing only the features that Mark supports. In addition to
the C ray caster, he published an optimized version with the time-critical
functions rewritten in assembly language. It's really fast, but his C version
is plenty fast enough on my 4DX2-66V. That configuration, which was
leading-edge technology not long ago, has sunk to the low end of mainstream
hardware in these days of behemoth operating systems and compilers. I figure
that with the old 486 as a target, I'll be okay. The project will eventually
be ported to a Windows 95 program, and anyone with anything much slower than a
486/66 won't like Windows 95, anyway. For that reason, I decided to hand code
only those optimizations that I can do in C++.
The first version of "Raycast," my unimaginative name for the project, matched
the frames-per-second performance reported by Mark's C program when it
terminates. By tweaking some loops, I was able to outrun the earlier program,
but as I add features, the engine is becoming progressively slower. My
objective is to achieve a rate of frames per second fast enough that a game
has sufficient residual processing cycles between frames to do game-specific
chores, such as keeping score, moving sprites around, playing sound effects,
and so on.
The Raycast engine published here supports DOS programs. The final version, to
be published in the book, will run under Windows, probably with the fast video
drivers that come with the new Microsoft Game SDK. I'll discuss the engine's
interface from a programmer-user's point of view. The code published with this
column shows how to use the engine. The code for the engine itself can be
downloaded or sent for.


The Maze


Listing One is maze.dat. Each character position represents a cell in the
maze. The "1" characters are wall tiles. In this example I use only one wall
tile, although there could be many. If you wanted tapestries, walls with
graffiti, walls with pictures, and walls with textures other than the dull,
prison-like concrete block walls in my example, you would build appropriate
PCX files and add them to the game. The "D," "d," "E," and "e" characters in
the maze represent doors. The "D" and "E" doors are those that you face in the
y direction; the "d" and "e" doors are faced in the x direction. There is one
"G" character in the maze, which represents a flower pot. The logic for
displaying this flower pot and other props and characters with transparent
qualities is not yet complete. I'll update the library when I finish that
part.
The maze consists of 64 lines of text with 64 characters each. The blank lines
in the maze have 64 space characters.


Modifying System Parameters


Listing Two is consts.h. This file contains constant values and some inline
functions that the program uses. You can modify some of these values to change
how the program operates. I've already discussed the LONGCORRIDORS global
constant. The viewangle variable defines the width of the scene that you can
see. It can be any value between 30 and 90, but 60 seems to provide a
realistic rendering.
There are four values for specifying the default viewport, which I have set to
a full-screen view. The values identify the screen coordinates for the
upper-left corner of the viewport and its height and width. The program uses
the videomode, screenheight, and screenwidth constants. To use a different
mode and resolution, you must modify the VGA class to operate with those
parameters.
Four constants specify the closest you can get to a wall when moving around
and the furthest you can be from a door and still open it. There are constants
that identify bitmap numbers associated with texture tiles. Those constants
figure prominently in some inline functions later in the file.
The mazewidth and mazeheight constants specify the dimensions of the maze in
tiles. The tilewidth and tileheight constants specify the dimensions of each
texture-mapping tile in pixels. All those values are 64 in the existing
implementation.
There are three constants each for the number of increments of movement for
each step and the number of degrees of rotation for each turn when the game
player moves around in the maze. There are Faster and Slower functions in the
RayCaster class that use these values.
The algorithm uses the inline functions isWall, isDoor, isProp,
isAutomaticDoor, and isTransparent to determine the properties of a tile. If
you add texture-mapping tiles to the game beyond those in the example, you
must modify not only the bitmap numbers mentioned earlier, but also the inline
functions. A prop is a stationary tile with transparent parts. The image for a
prop is cast as a flat surface positioned in the center of the tile space. A
prop always faces the player. An automatic door is one that opens when you
move close to it. Other doors must be opened and closed by a call to the
RayCaster::OpenCloseDoor function. The isTransparent function returns True if
the tile has transparent properties. Doors, props, and (later) sprites have
transparent properties. The Maze class is sensitive to the bitmap numbers that
you assign, so if you add texture tiles, you must look into that code as well.
It should be obvious what code needs to be modified.


The Keyboard


The RayCaster class encapsulates only the video parts of a game program. To
demonstrate the engine, you need to use the keyboard. Keyboard operations in a
game program usually differ from those in other programs. The player can hold
down several keys at once and the program must acknowledge them appropriately.
To support that kind of keyboard operation, I built a Keyboard class. Listing
Three is keyboard.h, which declares the class. The listing begins with a
number of constants to define key values that are not in the ASCII character
set. For example, there is no ASCII value for the Alt key. After declaring an
object of type Keyboard, the program can call two functions to interrogate the
keyboard.
The wasPressed function tests to see if the key specified in the argument is
now being pressed. If so, the function waits until the key is no longer being
pressed and then returns True; otherwise it returns False.
The isKeyDown function returns True immediately if the key is being held down
at the time of the call. The function does not wait for the key to be released
before returning.


The Ray Caster



Listing Four is raycast.h, the header file that declares the RayCaster class.
It's a complicated class, and I'll describe its public interface. You
construct a RayCaster object by passing the name of the maze file, a pointer
to an array of char pointers that specify the PCX files of the texture-mapping
tiles, the initial x- and y-coordinates, and viewing angle of the player. You
can also specify a ViewPort object as the last argument to start out with
other than the default full-screen ray caster. Many games use part of the
screen to display other things.
The DrawFrame function displays one frame on the screen based on those values.
The SetPosition function changes those values, and GetPosition retrieves the
current settings. Four functions move the player one step through the maze in
the four cardinal directions relative to the player's current view angle. Two
functions rotate the player a specified number of degrees to the right or
left. If the degrees argument is 0, the functions use the values from
consts.h. The ToggleMap function displays and hides the maze map. The
OpenCloseDoor function opens or closes a door if the player is close enough to
one. The isInDoorway function returns True if the player is in the specified
doorway. The Slower and Faster functions change the player's speed of motion.
If you are debugging, the TraceFrame function enables trace and tracend macros
to display values on the standard output device during the next execution of
DisplayFrame.


The Game Program


Listing Five is main.cpp, which, together with maze.dat and a set of PCX
files, constitutes an example game. An array of character pointers in main.cpp
identifies the PCX files. The order of this array must correspond with the
constants defined in consts.h. Therefore, WALL01.PCX represents bitmap tile
number 1, and so on. The array of ViewPort objects represents a set of
viewport sizes that this game allows the user to step through.
The program starts by declaring a RayCaster object from the free store. Then
it loops until the player presses the Esc key. Each time through the loop, the
program calls the DisplayFrame function. Then it tests to see if the player
has pressed any important keys. The up- and down-arrow keys move the player
forward and backward. The right- and left-arrow keys rotate the player one
increment unless the Alt key is also being pressed, in which case the player
moves sideways. The spacebar opens and closes doors, the Ins key toggles the
map display, the F and S keys make the player move faster and slower, and the
plus and minus keys increase and decrease the size of the viewport. Changing
the viewport size involves a lot of recalculating, so the program simply
deletes the current RayCaster object and constructs a new one with the new
ViewPort object.
The try/catch mechanism is compiled only when you are debugging. Otherwise,
exception handling is disabled by the makefile to avoid the overhead that
exceptions impose on each function with automatic class objects.
The game program calls VGA::ResetVideo when the game is over to depart from
graphics mode. The RayCaster destructor does not do this automatically: This
prevents changes in viewport size from invoking video mode resets and the
attendant flicker.


The PCX Files


The PCX files are built with a paint program. They have a resolution of 64x64
pixels and 256 colors. They should all use compatible palettes. Palette color
0 is reserved for transparent parts of the image.


Another Reason to Avoid gotos


One of the unavoidable aspects of teaching C involves the dreaded goto
statement. For many years now, programmers have been conditioned to shrink in
horror from any code that uses goto. Language gurus preach loudly that goto is
unnecessary, unstructured, and unwise. Structured programming zealots assert
with authority that any algorithm known to mankind and other species can be
expressed using only the three constructs of structured programming: sequence,
selection, and iteration. I and many others have fallen sway to that ideology,
yet, most modern programming languages, including Pascal, which was designed
to teach structured programming, support a goto statement, implying that there
must be a reason for it. Therefore, when we teach those languages, we first
teach the behavior of the goto statement, then we teach caution in its use.
Many books teach that goto in C can be used to jump out of nested loops, where
a simple break statement would break out of only the innermost loop. To
preserve structure, you can cause the outer loops to respond to the breaking
condition, but that usually requires extra Boolean variables or tests that a
simple goto can avoid. Example 1 illustrates this point. If there are
situations where you want to break out of the j loop without breaking out of
the i loop, Example 1 does not work. You need to contrive a Boolean variable
and test it, as in Example 2.
There is nothing wrong with either of these programming idioms. They preserve
structure, avoid gotos, and work. However, when optimization and performance
are your goals, the propriety of structure can help to defeat your purpose. If
the loops exist in a time-critical operation, such as the sweeps in a ray
caster, such code can seriously degrade performance in the name of structure.
The initialization and testing of that bool variable in Example 2 costs
processor cycles. A peek at the assembly-language code generated by the
compiler illustrates this point. Example 3 shows that using a goto produces
tighter code, which might make the difference between a successful
time-critical program and a turkey.
Last week, I taught an introductory C course to a class of 15 aerospace
engineers. The course is the first in a series of two one-week sessions that
teach C and C++ to people who already understand programming. Presumably, they
will take the intervening three weeks to practice C and become familiar with
its syntax so that they can tackle the C++ extensions comfortably.
When we got to the goto exercises, I discussed what I just presented in the
three examples and then reinforced the structured-programming position by
stating that in over ten years of publishing code, I had used the goto
statement only in exercises that illustrate its behavior.
Then I added that my experiences with the ray caster might necessarily change
that track record. I explained that I might finally have found a valid reason
for using the goto statement in a C or C++ program, and pointed to one such
statement on the projected screen. I said, however, that if the
structured-programming gods could hear me say that, they would no doubt strike
me down with a lightning bolt. At that very moment, the bulb burned out in the
overhead projector.
You have been warned.


Source Code


The source-code files for the Raycast project are free. You can download them
from the DDJ Forum on CompuServe, on the Internet by anonymous ftp, and from
DDJ Online; see "Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Make sure to include a
note specifying which project you want. The code is free, but if you care to
support my Careware charity, include a dollar for the Brevard County Food
Bank.
Example 1: Causing outer loops to respond to the breaking condition.
for (i = 0; i < MAXI; i++) {
 for (j = 0; j < MAXJ; j++ {
 if ( breaking_condition )
// whatever it is
 break;
 // ...
 }
 if (j < MAXJ) 
// contrived condition
 break;
 // ...
}
Example 2: Testing a Boolean variable.
for (i = 0; i < MAXI; i++) {
 bool breaking = false; 
// contrived variable
 for (j = 0; j < MAXJ; j++ {
 if ( breaking_condition ) {
 breaking = true;
 break;

 }
 // ...
 }
 if (breaking == true)
 break;
 // ...
}
Example 3: A goto can produce tight code for time-critical programs
 for (i = 0; i < MAXI; i++) {
 for (j = 0; j < MAXJ; j++ {
 if ( breaking_condition }
 goto breakout;
 // ...
 }
 // ...
 }
breakout:
 // ...

Listing One
 
 11111111111111111111111111E1111111111111111111111111111111 
 1 e 
 111D111111111111111111111 111111111111111111111111111111 
 1 1 1 1 
 1 1 1 1 
 1 d 11111111111111 1111111111111111111 d 1 
 1 1 11 11 1 1 
 1 1 11111 1111111 111111111111111 11 1 1 
 1111111 11 111111111D111 11 1111111 
 1 1 111111 111 1 1111111111111111 1 1 
 1 1 111111111 1111 1 11 1 1 
 e d 11111111 111111 1111111111111 11111 d 1 
 1 1 1111 1 1 1111 1 1 
 1 1 1111 1 1 1111 1 1 
 1111111 1111 111D111 1 d 1111 1111111 
 1 1 1111 1 1 1 1 1 1 1 
 1 1 1111 1 1 1 1 1 1 1 
 1 d 1111 1 1 1 1 1 d 1 
 1 1 1111 111G111 d 1 1 1 1 
 1 1 1111 1 1 1 d 1 1 
 1111111 1111 1 1 1 1 1111111 
 1 1 1111111111111111111D111111111111 11111 1 1 
 1 1 1111 1 1111 1 1 
 1 d 1111 1111 d e 
 1 1 1111111111111111111D111111111111111111 1 1 
 1 1 1111 1 1 1 1111 1 1 
 1111111 1111 1111111111111111 1 1 1 1 1111 1111111 
 1 1 1111 1 1 1 11 1 1111 1 1 
 1 1 111111111 1 111111111111 1 1 11 1111 1 1 
 1 d 1111 1 1 111 1111 d 1 
 1 1 111111 111111 11D111111111111 11111111 1 1 
 1 1 111111 11 1 1 1 1 
 1111111 1111111111111111 11111111D111111111111 111D111 
 1111111 111 
 1111111 1111111 
 1111111111111111111111111111111E11111111111111111111111111 
 
 11111111111 

 1 1 
 1 d 
 1 1 
 1 1 
 11111111111 
 

Listing Two
// ------------ consts.h
#ifndef CONSTS_H
#define CONSTS_H
// data types
typedef short INT; // 16 bits
typedef unsigned short UINT; // 16 bits
typedef long LONG; // 32 bits
typedef unsigned long DWORD; // 32 bits
typedef unsigned short int bool;
const bool false = 0;
const bool true = 1;
// these are values to change to modify the raycasting algorithm
#define LONGCORRIDORS
const INT viewangle = 60; // viewing angle
const INT defx = 0; // default x
const INT defy = 0; // default y
const INT defviewwidth = 320; // default viewing width
const INT defviewheight = 200; // default viewing height
const INT videomode = 0x13; // 320 x 200 x 256
const INT screenwidth = 320; // horizontal screen resolution
const INT screenheight = 200; // vertical screen resolution
const INT hitwallforward = 64; // closest you can get to a wall
const INT hitwallsideways = 8; // closest you can get to a wall
const INT hitwallbackwards = 64; // closest you can get to a wall
const INT doordistance = 128; // furthest from door to open door
const INT maxoverlays = 20; // maximum transparent tile overlays
const INT mindistance = 10; // minimum distance cast to wall
const INT maxdistance = 2048; // maximum distance cast to wall
const INT maxheight = 1024;
const INT bitmapcount = 30; // maximum number of tile bitmaps
const INT doorjam1 = 3; // bitmap # for doorjam
const INT doorjam2 = 5; // bitmap # for doorjam
const INT door1 = 7; // bitmap # for door1
const INT door2 = 16; // bitmap # for door2
const INT flowerpot = 25; // bitmap # for flowerpot
const INT openinterval = 8; // pixels per open interval
const INT autoclose = 25; // frames until auto door close
const INT transparencies = door1; // this and higher can be transparent
const INT mazewidth = 64; // maze width
const INT mazeheight = 64; // maze height
const INT tilewidth = 64; // width of a bitmap tile
const INT tileheight = 64; // height of a bitmap tile
const INT fastspeed = 10; // increments per step
const INT mediumspeed = 12;
const INT slowspeed = 15;
const INT fastrotation = 6; // degrees of rotation per turn const INT
mediumrotation = 4;
const INT slowrotation = 1;
const INT ceilingcolor = 90;
const INT floorcolor = 139;
// self-adjusting values
const INT mazexmax = mazewidth * tilewidth;

const INT mazeymax = mazeheight * tilewidth;
const LONG mazexmaxl = ((LONG)mazexmax << 16); 
const LONG mazeymaxl = ((LONG)mazeymax << 16);
inline bool isWall(INT bmpno)
{
 // ---- modify this function if wall tiles are added
 return bmpno > 0 bmpno < doorjam1;
}
inline bool isDoor(INT bmpno)
{
 // ---- modify this function if door tiles are added
 return bmpno >= door1 && bmpno <= door2 + 8;
}
inline bool isProp(INT bmpno)
{
 // ---- modify this function if prop tiles are added
 return bmpno >= flowerpot;
}
inline bool isAutomaticDoor(INT doorno)
{
 // ---- modify this function if door tiles are added
 return doorno >= door2 && doorno <= door2 + 8;
}
inline bool isTransparent(INT tileno)
{
 return tileno >= transparencies;
}
#ifdef NDEBUG
#define Assert(p) ((void)0)
#else
void MyAssert(char* cond, char* file, int line);
#define Assert(p) ((p) ? (void)0 : MyAssert(#p, __FILE__, __LINE__))
#endif
#endif

Listing Three
// -------- keyboard.h
#ifndef KEYBOARD_H
#define KEYBOARD_H
#include <dos.h>
#include "consts.h" 
const int homekey = 71+128;
const int pgupkey = 73+128;
const int endkey = 79+128;
const int pgdnkey = 81+128;
const int f1key = 59+128;
const int f2key = 60+128;
const int f3key = 61+128;
const int f4key = 62+128;
const int f5key = 63+128;
const int f6key = 64+128;
const int f7key = 65+128;
const int f8key = 66+128;
const int f9key = 67+128;
const int f10key = 68+128;
const int f11key = 87+128;
const int f12key = 88+128;
const int uparrow = 72+128;
const int dnarrow = 80+128;

const int rtarrow = 77+128;
const int lfarrow = 75+128;
const int inskey = 82+128;
const int delkey = 83+128;
const int altkey = 56+128;
const int pluskey = 78+128;
const int minuskey = 74+128;
const int esckey = 27;
class Keyboard {
 static void interrupt (*oldkbint)(...);
 static void interrupt newkbint(...);
 static bool kys[128];
 static int scancodes[256];
 static unsigned char scancode;
public:
 Keyboard();
 ~Keyboard();
 bool wasPressed(int ky);
 bool isKeyDown(int ky);
};
inline bool Keyboard::isKeyDown(int ky)
{
 return kys[scancodes[ky]];
}
#endif

Listing Four
// ------------- raycast.h
#ifndef RAYCAST_H
#define RAYCAST_H
#include "trigtabl.h" #include "tables.h"
#include "maze.h"
#include "pcx.h"
#include "vga.h"
#include "map.h"
#include "consts.h"
// ----- one vertical ray-cast slice
struct Slice {
 UINT bmptile; // tile number
 UINT distance; // distance from player
 UINT column; // tile column to render
 INT mazeindex; // index into maze of tile position
};
// ------ ray caster viewport
struct ViewPort {
 INT x, y; // upper left origin
 INT viewwidth, viewheight; // viewport dimensions
};
// ------ default viewport values (defined in consts.h)
const ViewPort fullscreenviewport = {
 defx,
 defy,
 defviewwidth,
 defviewheight
};
// ---- ray caster class
class RayCaster {
 VGA vga; // video object
 TrigTables trigtables; // trig tables

 HeightTable heighttable; // height table
 ScaleTable scaletable; // vertical scale table
 Maze maze; // the game's maze
 Map map; // displayable map
 Slice slice; // one ray-cast slice
 Slice* slices; // -> an array of slices
 INT overlaycount; // nbr of transparent tiles hit
 PCXBitmap bmps[bitmapcount];
 INT *ocounts; 
 char *screenbuffer; // -> the screen buffer
 INT x, y, angle; // player's position
 INT viewwidth, viewheight; // viewport dimensions
 LONG sinangle, cosangle; // 
 INT rayangle; // angle of the ray
 INT itilex, itiley; // coordinates of tile
 LONG ltilex, ltiley; // coordinates of tile
 DWORD raylength; // length of ray
 bool mapon; // true if map is displaying
 INT xmazeindex; // index into x maze
 INT ymazeindex; // index into y maze
 INT hitwallmargin; // margin for hit detection
 INT speed, rotation; // movement speed controls
 // --- private functions
 INT EastWest(INT angle); // true if facing due east or west
 INT NorthSouth(INT angle); // true if facing due north or south bool
CastSlice(INT i); // cast one slice
 UINT CastXRay(); // cast an X ray
 UINT CastYRay(); // cast a Y ray
 enum direc { forward, backward, rightward, leftward };
 void Move(direc dir); // move the player 1 step
 void CvtRayLength(DWORD& raylength);
 void SetAngles();
 bool SameCell(INT mazeindex1, INT mazeindex2);
 bool SameCoord(INT c1, INT c2);
 INT Degree(INT x); // cvt degrees to viewport angle
 INT unDegree(INT d); // cvt viewport angle to degrees
public:
 RayCaster(char* mazename, char* pcxs[],
 INT px,
 INT py,
 INT pangle,
 ViewPort vp = fullscreenviewport);
 ~RayCaster();
 // ---- draw one frame of the maze
 void DrawFrame();
 // ---- set the player's position and angle facing
 void SetPosition(INT xp, INT yp, INT angl);
 // ---- get the player's current position and angle facing
 void GetPosition(INT& xp, INT& yp, INT& angl);
 // ---- functions to move the player
 void MoveForward();
 void MoveBackward();
 void MoveLeftward();
 void MoveRightward();
 // --- functions to rotate the player
 void RotateRight(INT degrees = 0);
 void RotateLeft(INT degrees = 0);
 // ---- turn the map on and off
 void ToggleMap();
 // ----- open or close the door immediately in front of player

 void OpenCloseDoor();
 // ----- test if player is in the doorway of specified door
 bool isInDoorway(INT doorid);
 // ----- command player to change speed
 void Slower();
 void Faster();
#ifndef NDEBUG
 bool tracing;
 void TraceFrame();
#endif
};
inline INT RayCaster::Degree(INT x)
{
 return trigtables.Degree(x);
}
inline INT RayCaster::unDegree(INT d)
{
 return (INT)((LONG)d * viewangle / viewwidth);
}
inline INT RayCaster::EastWest(INT angle) {
 return (angle == Degree(90) 
 angle == Degree(270));
}
inline INT RayCaster::NorthSouth(INT angle)
{
 return (angle == Degree(0) 
 angle == Degree(180));
}
inline void RayCaster::MoveForward()
{
 Move(forward);
}
inline void RayCaster::MoveBackward()
{
 Move(backward);
}
inline void RayCaster::MoveLeftward()
{
 Move(leftward);
}
inline void RayCaster::MoveRightward()
{
 Move(rightward);
}
inline void RayCaster::RotateRight(INT degrees)
{
 if (degrees == 0)
 degrees = rotation;
 angle += Degree(degrees);
 if (angle >= Degree(360))
 angle -= Degree(360);
}
inline void RayCaster::RotateLeft(INT degrees)
{
 if (degrees == 0)
 degrees = rotation;
 angle -= Degree(degrees);
 if (angle < 0)
 angle += Degree(360);

}
inline void RayCaster::ToggleMap()
{
 mapon ^= true;
}
inline void RayCaster::SetAngles()
{
 sinangle = trigtables.SinAngle(angle);
 cosangle = trigtables.CosAngle(angle);
}
// ------- debugging slice tracing functions
#ifndef NDEBUG
 #define trace(x) if(tracing)cout<<(#x ": ")<<(x)<<' '
 #define tracend() if(tracing)cout<<endl
inline void RayCaster::TraceFrame() {
 tracing = true;
}
#else
 #define trace(x) ((void)0)
 #define tracend() ((void)0)
#endif
#endif

Listing Five
// ---------- main.cpp
#ifndef NDEBUG
#include <time.h>
#include <iostream.h>
#endif
#include "raycast.h"
#include "keyboard.h"
// ---- pcx files
// ---- the order of these entries must match consts in consts.h
static char *pcxs[] = {
 "wall01.pcx",
 "wall02.pcx",
 "doorjam1.pcx",
 "doorjam1.pcx",
 "doorjam2.pcx",
 "doorjam2.pcx",
 "door1.pcx",
 "door2.pcx",
 "door3.pcx",
 "door4.pcx",
 "door5.pcx",
 "door6.pcx",
 "door7.pcx",
 "door8.pcx",
 "door9.pcx",
 "door10.pcx",
 "door11.pcx",
 "door12.pcx",
 "door13.pcx",
 "door14.pcx",
 "door15.pcx",
 "door16.pcx",
 "door17.pcx",
 "door18.pcx",
 "flpot1.pcx",

 "flpot2.pcx",
};
// ---- view ports: changed by pressing + and -
static ViewPort vps[] = { // x y ht wd (position and size)
// --- -- --- --- 
 { 120, 75, 80, 50 },
 { 110, 69, 100, 62 },
 { 100, 62, 120, 74 },
 { 90, 57, 140, 86 },
 { 80, 50, 160, 100 },
 { 70, 52, 180, 112 },
 { 60, 45, 200, 124 },
 { 50, 37, 220, 136 },
 { 40, 30, 240, 150 },
 { 30, 23, 260, 162 },
 { 20, 15, 280, 174 },
 { 0, 0, 320, 200 },
};
const int nbrvps = sizeof vps / sizeof(ViewPort);
int vpctr = nbrvps - 1; // viewport subscript
int main()
{
 RayCaster* rp = 0;
 INT x = 360, y = 950, angle = 0;
#ifndef NDEBUG
 char* errcatch = 0;
 // ---- for computing frames/per/second
 long framect = 0;
 clock_t start = clock();
 try {
#endif
 // ----- ray caster object
 rp = new RayCaster("maze.dat", pcxs, x, y, angle, vps[vpctr]);
 // ---- keyboard object
 Keyboard kb;
 while (!kb.wasPressed(esckey)) {
 // ----- draw a frame
 rp->DrawFrame();
#ifndef NDEBUG
 framect++;
#endif
 // ----- test for player movement commands
 if (kb.isKeyDown(uparrow))
 rp->MoveForward();
 if (kb.isKeyDown(dnarrow))
 rp->MoveBackward();
 if (kb.isKeyDown(rtarrow)) {
 if (kb.isKeyDown(altkey))
 rp->MoveRightward();
 else
 rp->RotateRight();
 }
 if (kb.isKeyDown(lfarrow)) {
 if (kb.isKeyDown(altkey))
 rp->MoveLeftward();
 else
 rp->RotateLeft(); }
 // -------- open and close door commands
 if (kb.wasPressed(' '))

 rp->OpenCloseDoor();
 // ----- command to turn the map on and off
 if (kb.wasPressed(inskey))
 rp->ToggleMap();
 // ----- commands to change player movement speed
 if (kb.wasPressed('f'))
 rp->Faster();
 if (kb.wasPressed('s'))
 rp->Slower();
 // ----- commands to change the size of the viewport
 if (kb.wasPressed(pluskey)) {
 if (vpctr < nbrvps-1) {
 rp->GetPosition(x, y, angle);
 delete rp;
 rp = new RayCaster("maze.dat",
 pcxs, x, y, angle, vps[++vpctr]);
 }
 }
 if (kb.wasPressed(minuskey)) {
 if (vpctr > 0) {
 rp->GetPosition(x, y, angle);
 delete rp;
 rp = new RayCaster("maze.dat",
 pcxs, x, y, angle, vps[--vpctr]);
 }
 }
#ifndef NDEBUG
 if (kb.wasPressed(delkey))
 // ------- turn on tracing for one frame
 rp->TraceFrame();
#endif
 }
#ifndef NDEBUG
 }
 catch (char* errmsg) {
 errcatch = errmsg;
 }
 // ---- get current position to report for testing
 if (rp)
 rp->GetPosition(x, y, angle);
 clock_t stop = clock();
#endif
 delete rp;
 VGA::ResetVideo();
#ifndef NDEBUG
 cout << "Frames/sec: "
 << (int) ((CLK_TCK*framect) / (stop-start)) << endl;
 cout << "Position (x, y, angle) :"
 << x << ' ' << y << ' ' << angle << endl;
 if (errcatch)
 cerr << errcatch << endl;
#endif return 0;
}






































































ALGORITHM ALLEY


Permutation Generation Using Matrices




Mani G. Iyer


Mani is a senior programmer/analyst at a mutual-funds company in Boston. He
has a masters degree in Computer Science from the University of Bombay, India
and is a CCP. Mani can be reached at iyer@usa1.com.


In his classic paper "Permutation Generation Methods" (Computing Surveys 9,
1977), Robert Sedgewick surveys the various permutation generation methods
published until then, categorizing them as follows:
Methods based on exchanges in which the n! permutations of n are obtained by a
series of (n!-1) exchanges. Algorithms proposed by M.B. Wells, J. Boothroyd,
B.R. Heap, S.M. Johnson, H.F. Trotter, and F.M. Ives fall into this category.
Algorithms not based on exchanges. Some algorithms, for instance, use cyclic
rotations to obtain the n! permutations. One such algorithm is by G.W. Langdon
(I arrived at his method independently). Other algorithms proposed by Fischer
and Krause and R.J. Ord-Smith generate lexicographic permutations.
Sedgewick discusses the different classes of algorithms and details the
analysis and implementation of the most prominent. For example, Langdon's
cyclic method can be implemented with only a few computer instructions and can
be made to run very fast on computers with hardware-rotation capabilities.
The algorithm I propose here generates permutations in a novel way, using
cyclic rotations (implemented without hardware dependencies) and matrices. The
only restrictions on the implementation language are the availability of a
"string copy" function and efficient pointer manipulations.


The Algorithm


The first step toward generating all n! permutations of 1,2,...,n is to
generate the pivot permutations. A pivot permutation is obtained by certain
rules and is used to generate an nxn matrix, which in turn yields 2n
permutations. Consequently, you need n!/2n pivot permutations. Before
discussing pivot-permutation generation, however, certain definitions are in
order.
A "permutation" is a sequence of pi, where i=1,2,...,n and pi is a unique
integer between 1 and n, inclusive. The function right rotate(f,l) of a
permutation yields a sequence where pf=pl and pi=pi-1 for i=f+1,...,l. 
The function left rotate(f,l) of a permutation yields a sequence where pl=pf
and pi=pi+1 for i=f,...,l-1. A "full-right rotate" and a "full-left rotate"
are special cases where f=1 and l=n. 
A "cyclic-permutation matrix" is an nxn matrix whose first row is a pivot
permutation and whose ith row is obtained by a full-right rotate of the
(i-1)th row for i=2,...,n. In all rotations, f (first) is </=l (representing
last).
Pivot permutations are obtained by performing rotations, right or left, but
only in one direction throughout the process. For the purpose of this
discussion, I'll stick with right rotate, although left works equally well.
Successive pivot permutations are generated using (n-1) right rotates with
l=(n-1). The nth right rotate will yield the starting sequence; you can refer
to these as (n-1)th order pivots. Using each of the (n-1) pivots, generate the
(n-2)th order pivots using (n-2) right rotates with l=(n-2). Continue the
process until l=2, at which point no more pivot permutations can be generated,
leaving you with n!/2n pivot permutations. The fact that the nth right rotate
of an nth order pivot permutation yields the original sequence in the (n+1)th
order implies that the third-order pivots are the ones actually used to
generate the nxn matrices.
For a given third-order pivot permutation, using this pivot permutation as the
first row of an nxn matrix, generate the other (n-1) rows of the
cyclic-permutation matrix by performing (n-1) full-right rotates. After
creating the matrix, read the rows and columns of the matrix to yield 2n
permutations. Repeating the process for the other pivot permutations will
yield (n!/2n)x2n=n! permutations.
Figure 1 and Figure 2 show the algorithm generating all permutations of
1,2,3,4, and 5 (that is, n=5). Reading the rows and the columns of the 12
matrices in Figure 2 yields the 120 permutations of {1,2,3,4,5}. Consequently,
the algorithm can be coded recursively in an elegant manner; see Example 1.
Listing One is the complete C implementation of the permutation-generation
algorithm. The array num, which is one relative, is defined as a character
array to facilitate using character pointers in the memcpy functions and
building the cyclic-permutation matrices via an array of pointers.
In rightRotate(f,l), the right rotates are performed by cloning the substring
of num indexed by f and having length l into the array temp, which is zero
relative. The final result is obtained by getting the string from temp,
indexed by (l-1) and having length l. For example, if num={0,3,2,4,1,5} and a
rightRotate (1,4) needs to be performed, then the first and second memcpy
functions from num to temp would yield temp equal to {3,2,4,1,3,2,4,1}. The
string indexed by 3 and having length 4 in temp is copied back into num
indexed by 1, thus yielding num as equal to {0,1,3,2,4,5}.
In createCyclicMatrix(), the entire array num is cloned into temp, and the
pointer array p is created by pointing to the different indexes of temp. Since
a cyclic-permutation matrix is created by performing (n-1) full-right rotates
on the pivot permutation, it is only natural to create the matrix as an array
of pointers. For example, if a cyclic-permutation matrix needs to be created
for the pivot permutation stored in num as {0,2,4,1,3,5}, then temp would be
equal to {2,4,1,3,5,2,4,1,3,5} and p would be created as described in Table 1.
The permutations are obtained by reading the n rows and n columns of the
cyclic-permutation matrix stored in p.


Salient Features of the Algorithm


Any of the n! permutations of n can be used as an initial permutation to
generate the n! permutations of n. This is because every permutation of n must
be either a row or a column of a cyclic-permutation matrix and any row or
column of the matrix can be used as a pivot permutation to generate the
matrix.
The cyclic-permutation matrix can be created using full-left rotates, but the
matrix will have to be created bottom up; that is, the pivot permutation is
the nth row of the matrix, and for i=(n-1),...,1, the ith row is created by
performing a full-left rotate on the (i+1)th row.
The ith order pivot permutations can be created using one of the following:
Right rotate (1,i).
Left rotate (1, i).
Right rotate (n-i+1, n).
Left rotate (n-i+1, n).
In fact, between different orders of pivot permutations, the direction of
rotates can be changed. The direction cannot be changed within the same order.
In the cyclic-permutation matrix, the primary diagonals or the secondary
diagonals, depending on the direction of rotates used (primary for right and
secondary for left), have the same number. This is true for the ixi matrix
obtained by the ith order pivot permutations; for example, the matrix obtained
by (1,i) elements or (n-i+1) elements of the i pivot permutations. The
uniqueness of the permutations can be attributed to this property.


Conclusion


Sedgewick notes that the exchange method by B.R. Heap will run fastest on most
computers. In Data Structures and Program Design in C (Prentice-Hall, 1991),
Robert L. Kruse, Bruce P. Leung, and Clovis L. Tondo claim that the
linked-list, or simple-exchange algorithm is at least as efficient as the Heap
method. Both the Heap and linked-list algorithms have been implemented using
the C programs supplied by Kruse et al. in their book. I've used these
programs for performance comparisons to the Matrix method, on a 486-based PC
running under DOS and a Sun workstation under Solaris. The results are as
follows:
If the processing of a given permutation is implemented as a simple printf
function resulting in mere enumeration of the permutations, the Matrix method
is only slightly faster than the Heap and linked-list methods. This slight
difference may be attributed to the integer conversion involved in the
printing of a character.
If the processing of a given permutation is implemented as a dummy function,
the Matrix method is faster than the other two methods by a factor of two.
If parallelism were available, the algorithm could be made to run much faster,
since the generation of cyclic-permutation matrices from the pivot
permutations would be treated as independent processes, thus making larger
values of n tractable.

Figure 1: Generating pivot permutations
 Fifth-order Fourth-order Third-order Pivot Pivot Pivot
 1 2 3 4 5 4 1 2 3 5 2 4 1 3 5
 1 2 4 3 5
 4 1 2 3 5
 3 4 1 2 5 1 3 4 2 5
 4 1 3 2 5
 3 4 1 2 5
 2 3 4 1 5 4 2 3 1 5
 3 4 2 1 5 
 2 3 4 1 5
 1 2 3 4 5 3 1 2 4 5
 2 3 1 4 5
 1 2 3 4 5
Figure 2: Cyclic-permutation matrices generated by the 12 pivot permutations.
[ 2 4 1 3 5 ] [ 1 2 4 3 5 ] [ 4 1 2 3 5 ] [ 1 3 4 2 5 ] [ 4 1 3 2 5 ]
[ 5 2 4 1 3 ] [ 5 1 2 4 3 ] [ 5 4 1 2 3 ] [ 5 1 3 4 2 ] [ 5 4 1 3 2 ]
[ 3 5 2 4 1 ] [ 3 5 1 2 4 ] [ 3 5 4 1 2 ] [ 2 5 1 3 4 ] [ 2 5 4 1 3 ]
[ 1 3 5 2 4 ] [ 4 3 5 1 2 ] [ 2 3 5 4 1 ] [ 4 2 5 1 3 ] [ 3 2 5 4 1 ]
[ 4 1 3 5 2 ] [ 2 4 3 5 1 ] [ 1 2 3 5 4 ] [ 3 4 2 5 1 ] [ 1 3 2 5 4 ]
[ 3 4 1 2 5 ] [ 4 2 3 1 5 ] [ 3 4 2 1 5 ] [ 2 3 4 1 5 ] [ 3 1 2 4 5 ]
[ 5 3 4 1 2 ] [ 5 4 2 3 1 ] [ 5 3 4 2 1 ] [ 5 2 3 4 1 ] [ 5 3 1 2 4 ]
[ 2 5 3 4 1 ] [ 1 5 4 2 3 ] [ 1 5 3 4 2 ] [ 1 5 2 3 4 ] [ 4 5 3 1 2 ]
[ 1 2 5 3 4 ] [ 3 1 5 4 2 ] [ 2 1 5 3 4 ] [ 4 1 5 2 3 ] [ 2 4 5 3 1 ]
[ 4 1 2 5 3 ] [ 2 3 1 5 4 ] [ 4 2 1 5 3 ] [ 3 4 1 5 2 ] [ 1 2 4 5 3 ]
[ 2 3 1 4 5 ] [ 1 2 3 4 5 ]
[ 5 2 3 1 4 ] [ 5 1 2 3 4 ]
[ 4 5 2 3 1 ] [ 4 5 1 2 3 ]
[ 1 4 5 2 3 ] [ 3 4 5 1 2 ]
[ 3 1 4 5 2 ] [ 2 3 4 5 1 ]
Example 1: Recursive algorithm for generating permutations by the matrix
method.
matrixPermute (n)
{
 if (n = 3) createCyclicMatrix
and return;
 for (n - 1) times
 {
 rightRotate (1, n - 1);
 matrixPermute (n - 1);
 }
}
Table 1: Simulation of the cyclic-permutation matrix by an array of pointers.
i p[i] *(p[i]+0) *(p[i]+1) *(p[i]+2) *(p[i]+3) *(p[i]+4) 
0 temp+5 2 4 1 3 5
1 temp+4 5 2 4 1 3
2 temp+3 3 5 2 4 1
3 temp+2 1 3 5 2 4
4 temp+1 4 1 3 5 2

Listing One
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#define MAX 20
char num[MAX + 1];
int n; 
main (int argc, char *argv[])
{
 int i;

 void matrixPermute (), createCyclicMatrix (), rightRotate ();
 if (argc != 2)
 {
 fprintf (stderr, "Usage: permute <string>\n");
 exit (1);
 }
 n = atoi (argv [1]);
 if (n < 3 n > MAX)
 {
 fprintf (stderr, "number must be between 3 and %d\n", MAX);
 exit (1);
 }
 for (i = 1; i <= n; ++i)
 num [i] = i;
 matrixPermute (n);
}
void matrixPermute (int k)
{
 int i, temp;
 if (k == 3) 
 {
 createCyclicMatrix ();
 return;
 }
 temp = k - 1;
 for (i = 0; i < temp ; ++i) 
 {
 rightRotate (1, temp);
 matrixPermute (temp);
 } 
}
void createCyclicMatrix ()
{
 char *p[MAX], temp[2*MAX];
 int i, j;
 /* create the cyclic permutation matrix P as an array of pointers */ 
 memcpy (temp, num + 1, n);
 memcpy (temp + n , num + 1, n);
 
 for (i = 0; i < n; ++i)
 p[i] = temp + n - i;
 
 /* generate the 2n permutations from the cyclic permutation matrix P */
 for (i = 0; i < n; ++i)
 {
 /* print the ith row */
 for (j = 0; j < n; ++j)
 printf ("%d ", *(p[i] + j));
 printf ("\n");
 /* print the ith column */
 for (j = 0; j < n; ++j)
 printf ("%d ", *(p[j] + i));
 printf ("\n");
 }
}
void rightRotate (int f, int l)
{
 char temp [2*MAX], *saveptr;
 int i;

 saveptr = num + f;
 memcpy (temp , saveptr, l);
 memcpy (temp + l, saveptr, l);
 memcpy (saveptr, temp + l - 1, l);
}


























































PROGRAMMER'S BOOKSHELF


Four Feynman Books




Michael Swaine


Toward the end of August 1954, Richard Feynman was peeking over the top of a
copy of the journal Advances in Physics at an attractive librarian in the
Caltech library.
He had come to the library expressly to look at the librarian, which seemed to
him a pleasant way to pass a boring afternoon; the issue of Advances in
Physics was just a cover for his girl-watching. But an article by Herbert
Frhlich that posed a problem involving slow electrons moving in a polarizable
crystal caught his interest. In the article, Frhlich claimed that solving the
problem would go a long way toward an understanding of superconductivity.
Feynman didn't see how this problem had anything to do with superconductivity,
a subject that attracted him as much as the librarian, but it was a pretty
little problem, anyway. He started playing with it as he walked back to his
office.
Deciding the problem would make a good research assignment, Feynman began
explaining the problem to his graduate assistant. "I think there must be a
variational principle of some kind for estimating path integrals," he told the
student, "I think you should try to find it."
The student asked Feynman how he should approach the problem, and Feynman
worked through some equations. Staring at the result, the student asked,
"Doesn't that just solve the problem?"
It did. Feynman had solved the problem while explaining it. It was a
difficulty he often had with graduate students; he enjoyed solving problems
too much to give them away.
Feynman's solution proved to be quite powerful and useful. He wrote to
Frhlich in early September, telling him of the librarian incident and
describing his solution to the problem. Now, he went on, "what do we have to
do to understand superconductivity?"
That story, recounted in Jagdish Mehra's The Beat of a Different Drum: The
Life and Science of Richard Feynman, is a revealing view of the kind of person
Richard Feynman was.
Who Richard Feynman, the scientist, was should require no explanation,
although some readers may know of him only for one or two of his achievements.
How he worked on the atomic bomb at Los Alamos during World War II. His 1965
Nobel prize for his fundamental work in quantum electrodynamics. Feynman
diagrams, which changed the way physicists look at physics. The Feynman
Lectures, which changed the way the subject is taught.
But Feynman the person was arguably at least as interesting as Feynman the
scientist. He was, in his own witty self-characterization, a curious fellow.


A Curious Fellow


A problem confronts anyone wanting to tell Feynman's story: There are a number
of Feynman stories, and Feynman himself has already told all the best ones.
To say that there are a number of Feynman stories is an understatement. Anyone
who tried to write at any length about Feynman without telling some of those
stories would not be doing justice to the man. He really was a curious fellow,
both in the sense of being perceived as eccentric and in the sense of
approaching life with an insatiable scientific curiosity.
But the perceived eccentricity was apparently just a consequence of the way
the man Feynman chose to live his life: He pursued, with a wide-eyed
innocence, whatever subjects appealed to him, whether or not they were in his
area of specialization, whether or not they seemed to others to be proper
matters of scientific interest. Example: picking locks on safes containing
top-secret files at Los Alamos during the war, merely to amuse himself. He
apparently actually lived his life by the motto that became the title of his
second popular book: What do you care what other people think?
Still, Feynman probably did care what other people thought of him in at least
one sense: He enjoyed being perceived as eccentric. He collected the best
stories about his eccentricities in two autobiographical books: Surely You're
Joking, Mr. Feynman! Adventures of a Curious Character, and What Do You Care
What Other People Think? Further Adventures of a Curious Character.
In these books, you can read about Feynman picking locks at Los Alamos,
sniffing footprints to see how bloodhounds did it, playing the drums, dancing
the samba, and doing cube roots in his head. You can see his drawings. And you
can read about how he solved the mystery of the Challenger disaster.
But you won't come to know the man and his work.


A Biography of a Scientist


Both James Gleick and Jagdish Mehra have set out to tell the story of this
extraordinary man, but they take different paths.
Gleick's book, Genius: The Life and Science of Richard Feynman, is a fairly
conventional biography, beginning with Feynman's Russian and Polish immigrant
parents in Far Rockaway, New York early in this century. Gleick follows
Feynman from a boyhood of science experiments and radio repairing, through
college and marriage and a strange isolation in the New Mexico desert during
World War II, and on to fame and achievement at Cornell and Caltech.
Gleick makes a good story of it.
Early in his adult life, Feynman faced a great tragedy in a setting of great
drama. A young man meets a woman and falls in love. She discovers she is dying
of tuberculosis. War breaks out. They marry and move away from everything and
everyone they know to live in the desert under severe secrecy. While she lies
dying slowly in a hospital in the desert, he labors miles away with the best
physicists in the world in a feverish rush to build the Armageddon weapon that
they have been told America needs to end the war.
Gleick tells that story right and gives it its proper place in Feynman's life
story.
He also deals well with Feynman's reaction to these events. It's possible to
see Feynman as a cold and unfeeling man who shrugged off his wife's death with
remarkable ease. It's also possible to see him as something quite different.
Gleick lets us see both sides, drawing no conclusions.
Although he doesn't shrink from the science in Feynman's life, Gleick is
primarily telling a life story.


A Scientific Biography


Mehra's book is something else.
The Beat of a Different Drum is an account of a scientist's life, written by a
scientist, and treating the subject's work as fully as important as the other
aspects of his life. Given that Richard Feynman was one of the most important
and prolific scientists of recent time, this results in a book with a lot of
pretty heavy science.
Mehra had already written biographies of physicists Heisenberg, Dirac, and
Pauli when Feynman asked him to do for him what he had done for them. The Beat
of a Different Drum is based on many extensive interviews with Feynman
regarding all aspects of his life and work. Mehra had other sources, too, of
course: He talked with relatives about Feynman's life, and to scientists like
Murray Gell-Mann about his science. The book is certainly well researched.
The completeness with which he covers the science is especially impressive. He
seems to have devoted a chapter to every significant research program of
Feynman's. Take, for example, his chapter on what Feynman called "the only law
of nature I could lay a claim to"--the theory of weak interactions.
Mehra begins with seven pages of historical background on the problem, going
back to Marie and Pierre Curie, before Feynman enters the picture at the Sixth
Rochester Conference on High Energy Nuclear Physics at Rochester, New York, in
April 1956.
His entrance is typical Feynman. By chance, he finds himself rooming with
experimenter Martin Block, who tosses an offhand question at Feynman as they
are about to turn in.
The question concerns the theta-tau puzzle, a hot topic in physics that year.
Two particles, referred to as "theta" and "tau," are identical with respect to
key properties, leading to the conclusion that theta and tau are actually just
different names for the same particle. But studying how the particles decay
leads to the conclusion that they differ in intrinsic parity, meaning that
they can't be the same particle.
The best theoreticians of the field, including Murray Gell-Mann, had been
wrestling with this apparent paradox without success. In their room that
night, Block says to Feynman, "What is this big deal about the parity thing?
Maybe they are the same particle and [parity is not conserved]."

Feynman seems about to tell him how dumb he was, Block recalled later, but
then he begins to think about it. The two sit up half the night hashing it
out, and the next morning Feynman stands up in front of the great
theoreticians and proposes the idea.
Gell-Mann and the other theoreticians don't completely ignore the brash young
man, but they don't jump up and down in excitement, either. Nevertheless,
Feynman tackles the puzzle in earnest when he returns to Caltech. Mehra tells
of one shining moment when Feynman jumps up in the middle of a meeting and
shouts "I understand everything!" The following year, Feynman and Gell-Mann
coauthor the crucial Physical Review paper on the matter.


A Question of Style


The two biographies differ in style and structure.
Gleick has a distinctive and engaging writing style.
Mehra writes with admirable clarity when he is explaining Feynman's physics,
although when he discusses Feynman's life his style often fails to bring out
the drama of the events. Worse, he occasionally sounds too much like Richard
Feynman. Mehra conducted numerous interviews with Feynman for the book, so at
most points, he had Feynman's own utterances to draw upon. It appears that he
fell to the temptation too often. Feynman's style, when it appears in
quotations, really is refreshingly direct and unaffected. When it creeps into
the narrative of the book, it is annoyingly colloquial, repetitive, and
semiliterate. Here's Gleick:
Long afterward, when they were old men, after they had shared a Nobel Prize
for work done as rivals, they amazed a dinner party by competing to see who
could most quickly recite from memory the alphabetical headings on the spines
of their half-century-old edition of the Encyclopedia Britannica.
Here's Feynman:
In ancient Egypt and Greece the priests and oracles used to look at the veins
in sheep's livers to forecast the future, and that's the kind of pictures I
was drawing to describe physical phenomena. I thought that if they really turn
out to be useful it would be fun to see them in the pages of the Physical
Review. I was conscious of the thought that it would be amusing to see these
funny-looking pictures in the Physical Review.
And here's Mehra, sounding like Feynman:
California had a law that all schoolbooks used by all the kids in all public
schools of the state had to be chosen by the State Board of Education. So they
had a committee, the Curriculum Commission, to examine the books and give them
advice on which books to approve.
The books differ, too, in structure. By hanging the whole story on the human
chronology of Feynman's life, Gleick is able to construct a more cohesive
narrative than Mehra. You can see it in as simple a thing as their chapter
heads. Gleick has six, with titles like "Far Rockaway" and "Caltech." Mehra's
book, in contrast, has 26 chapters, most of them with titles like
"Action-at-a-distance in electrodynamics: the Wheeler-Feynman theory." and
"The space-time approach to quantum electrodynamics."
Despite its shortcomings as a human story, Mehra's book is a remarkable record
of Feynman's work. If you want to know about Feynman and his physics, read
Mehra. If you want to know the story of Feynman's life, read Gleick. But if
what you really want is to know the best Feynman stories, then you'd better
read Richard Feynman.
The Beat of a Different Drum
Jagdish Mehra
Oxford University Press, 1994, 630 pp., $35.00 ISBN 0-19-853948-7
What Do You Care What Other People Think?
Richard P. Feynman, as told to Ralph Leighton 
Bantam, 1988, 255 pp., $9.95 ISBN 0-553-34784-5
Genius
James Gleick
Pantheon, 1992, 533 pp., $14.00 ISBN 0-679-40836-3
Surely You're Joking, Mr. Feynman!
Richard P. Feynman, as told to Ralph Leighton
Bantam, 1985, 322 pp., $4.50 ISBN 0-553-25649-1


































SWAINE'S FLAMES


From the Stately Swaine Manor Mailbox


Roger P. Kovach of Bolinas, California, writes to say that my description of
MRML ("mind reading markup language") filled him with nostalgia. "Back in 1959
when I was programming a Philco Transac 2000...the necessities of programming
and debugging mothered my invention of a new instruction, mnemonic code RPI,
Read Programmer's Intention. Obviously, this saved many an hour of agonizing
effort."
Students of the history of computing may remember Kovach as the inventor of
Kovach's Bit Squeezer, which created narrower bits so that more stuff could be
stored in limited memory.
A lot of readers sent in their favorite vanity net addresses, real and
imaginary, but while there were some decent tries (lunatic@fringe,
god@heaven.gov, thebestcar@thebestprice, editor@large, not@home,
FWIW@LeastITried), nothing really inspired came over the transom.
The reason may be, as Mark Clouden suggests, that, "unlike a vanity license
plate, the vain one is not restricted to a handful of letters/numbers. In an
email name there is nothing to make you look twice, think a bit, then smack
your forehead (ow!) in recognition." Mark's address: iCode@dfm.com. Which he
does.
Many readers submitted their own net addresses in my no-contest for shortest
net address, including Tony Godshall, (eight characters, counting the @ and
the period) and Rickard Lind (seven characters). Tony matches, and Rickard
beats, the previous record, so they will be receiving their no-prizes in
no-time. But I'm sure Rickard's address is not the shortest. Any more entries?
After coming across his own name in this column, Mike Morton wrote, "I hadn't
realized you considered me a no-prize winner in that anagram contest a ways
back. Perhaps you should officially notify winners that they've won no prize?
But I think you could encourage more people to enter your contests if you got
more specific about what prize they won't get if they win. Not giving them a
bell would allow you to offer a 'no-bell prize'. It wouldn't come with a large
sum of money, wouldn't get you a trip to Sweden, and wouldn't require you to
give a speech to some stuffy academy. I think your readers would go wild...." 
Mike's name was mentioned on the same NPR program on which Will Shortz (the
puzzle expert who wrote the riddles for the Riddler in Batman Forever)
appeared. "My near-association with Will Shortz," Mike goes on, "hasn't gotten
me near a share of Batman Forever's eight trillion dollars, so if you can get
me a deal for a cut of that, I'll gladly share with you."
Done, Mike. Glad I could help. While you're waiting for the check to arrive,
I'll add to your fame and commemorate one of the most absurd IPOs ever by
publishing your Top Ten Anagrams for "Netscape Communications," along with the
copyright notice you've taken to insisting on since I made you famous by
publishing your earlier anagrams:


Top Ten Anagrams for "Netscape Communications"


(Copyright (c) 1995 by the author, Mike Morton <mike@morton.com>. All rights
reserved. You may reproduce this, in whole or in part, in any form provided
you retain this paragraph unchanged.)
 10. Companies can't consume it.
 9. I cannot compute sans mice.
 8. Can't access 'net...I'm on opium.
 7. Um, options scam can entice.
 6. Net's uncommon capacities.
 5. Connect communities, ASAP.
 4. Mosaic IPO, etc., can stun men.
 3. Optimum 'net access: An icon.
 2. Connect it up; amass income.
And the number one anagram for "Netscape Communications":
 1. Mosaic, minus neat concept.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com




























OF INTEREST
Aspect Software has released dbWeb, a tool that provides 32-bit ODBC database
connectivity to HTTP Web servers running under Windows NT. The software
provides full insert/update/delete capabilities as well as query-by-example
record selection for dynamic SQL and stored procedures. 
HTTP servers supported by dbWeb include EMWAC HTTPS, Purveyor, Website, and
Netscape. Platforms supported are Microsoft SQL Server, Access, Sybase,
Oracle, and any ODBC-compliant database. dbWeb 1.0 sells for $695.00.
Aspect Software Engineering
2800 Woodlawn, Suite 100
Honolulu, HI 96822
808-539-3782
http://www.aspectse.com
Lever Software has released Mr. Phelps, a tool that provides copy protection
for C/C++ applications. In particular, the software provides functions for
copy protection, selective disabling, and installation. With Mr. Phelps, you
can limit usage for a specific date, number of days, or number of runs.
Executables are disabled via deletion or through a password. Support for
multiple-file installation is provided. The Windows/DOS tool comes with
complete C source code and is priced at $99.00.
Lever Software Systems
19 Clinton Place
Utica, NY 13501
800-638-7250
Wintertree Software has announced Version 3.4 of its Sentry Spell-Checker
Engine. Version 3.4 adds Delphi support, improved dictionary control and
management, and a visual look compatible with other Windows spell checkers.
The engine supports four types of dictionaries: ignore-type (words always
considered correctly spelled); exclude-type (words always considered
incorrect); auto-change (words automatically replaced); and conditional-change
(words replaced only after confirmation). It is available as either a Windows
DLL, or in ANSI C source for all other platforms. A companion Dialogs DLL can
check and correct the contents of standard Windows edit controls, including
Windows 95 Rich Text Edit controls. French, Italian, German, and Spanish
dictionaries are also available.
Wintertree Software 
69 Beddington Avenue
Nepean, ON
Canada K2J 3N4
613-825-6271
Micro System Designs is making its FixFlop program available at no charge.
This small, memory-resident program patches MS-DOS to add support for the
1.68-MB format used to distribute new Microsoft products. This allows users to
backup the distribution disks for Windows 95 and other new products. MSD's
disk-copy program, DiskDupe, has also been enhanced to support the new format.
FixFlop and a shareware version of DiskDupe are available on CompuServe (GO
MSDESIGNS, Library 13), on the World Wide Web at http://www.msd1.com/msd1, or
via FTP at ftp.msd.com in pub/msd1.
Micro System Designs 
10062 Miller Avenue, Suite 104
Cupertino, CA 95014
408-446-2066
PC Crisis Line has announced an independent end-user telephone-support service
designed to supplant vendor-based technical-support lines. Support costs $3.00
per minute, with a two-minute minimum. 
PC Crisis Line
490 San Antonio Road, Suite G
Palo Alto, CA 94306
800-828-4358
VB Code Master 1.5 is a Visual Basic add-in which provides an object-oriented
browser to VB's design environment, as well as automating coding tasks such as
error handling and dialog construction. While VB Code Master Professional
sells for $129.00, a "lite" version is available at no charge at
http://www.mindspring.com/~fbunn/teletech.html or in CompuServe's MSBASIC
library 9 (CDMLITE.EXE).
Teletech Systems
750 Birch Ridge Drive
Roswell, GA 30076
404-475-6985
CThrough++ from BISS is an integrated development environment for C/C++
programmers working in the OS/2 Workplace Shell that, among other
capabilities, allows you to graphically design class hierarchies and represent
inheritance graphs. The compiler-independent toolset also provides
class-browsing mechanisms, networking capabilities, activity logging, version
control, class archiving and retrieving, and automatic application generation
(EXEs and DLLs). CThrough++, available for OS/2 2.1 and IBM's C/Set++, sells
for $495.00. 
BISS GmbH
Chaukenweg 12
D-26388 Wilhelmshaven, Germany
+49-4423-92890
100031.1733@compuserve.com
The Software Wedge is a program that automatically captures serial data at up
to 56 Kbaud, feeding it into any Windows, DOS, NT, or Windows 95 application,
either as keystrokes or via DDE. The Wedge can be programmed to apply a
variety of transformations to incoming data, and to output commands through
the serial port to control attached instruments.
In addition to data parsing, filtering, and formatting, the Software Wedge
supports math and string functions, 50 hot keys, and a virtual instrument mode
for testing.
The Wedge program comes for both Windows (WinWedge Pro 3.0, $395.00) and DOS
(DOSWedge Pro, $295.00). 
T.A.L. Technologies Inc.
2027 Wallace Street
Philadelphia, PA 19130
215-763-7904
Integrated Data Systems and Portable Graphics have entered into a strategic
alliance to codevelop software products that deliver 3-D graphics and VR
interfaces to Internet users, content providers, and corporations developing
World Wide Web sites. 
IDS will use its system-design and networking tools to develop new technology
and software tools that can deliver 3-D information to computers through the
Internet. Portable Graphics will provide IDS with open Inventor and OpenGL, as
well as experience in visualization and multiplatform portability. 
The first software apps planned include: VRealm, a WWW browser that
incorporates advanced image, video, audio, animation, and VR techniques;
VRealm Builder, an authoring package that allows the authoring of 3-D objects
and "worlds" to be viewed with the VRealm browser or any other browser
compliant with the VRML 3-D graphical standard; and VRML Internet extensions
that will be bundled into the Open Inventor SDK. Platforms to be supported
include UNIX, Solaris, Windows NT, Windows 95, OS/2 Warp, and Macintosh System
7.5.
Integrated Data Systems Inc.
6001 Chatham Center Drive, No. 300
Savannah, GA 31405
912-236-4374
Portable Graphics
2201 Donley Drive, No. 365
Austin, TX 78758-4838

512-719-8000
Mathematica can now do fuzzy logic, with the help of the Fuzzy Logic Pack.
This set of fuzzy-logic design tools is intended to help you quickly design
and model fuzzy systems. The toolkit includes a variety of built-in functions
that let you define input and output, create fuzzy sets, and represent them in
either 2-D or 3-D graphical form, respectively. The Fuzzy Logic Pack, which is
written in the Mathematica language, comes with source code. The software
requires Mathematica 2.2 and is priced at $695.00.
Wolfram Research
100 Trade Center Drive
Champaign, IL 61820
217-398-0700
Todd Enterprises has begun shipping its CDR-XPRESS, an integrated hardware and
software system for duplicating recordable CDs. Fully configured, the system
can duplicate and verify up to 18 CDs per hour. CDR-XPRESS's bit-for-bit
duplication scheme allows you to copy CDs for any operating system--Macintosh,
UNIX, DOS, OS/2, and proprietary systems. The minimum PC-based system sells
for $19,750.00.
Todd Enterprises
31 Water Mill Lane
Great Neck, NY 11021
516-487-3976
Autodesk has begun shipping prerelease versions of its AutoCAD Runtime
Extension (ARX), an object-oriented programming environment for AutoCAD
Release 13. The ARX SDK, which supports both C++ and COM-based programming
interfaces, provides a CAD engine that is the basis for creating software
encompassing the entire design-workflow process. The Industry Foundation
Classes (IFC) are the first component of object-based tools being developed
jointly by Autodesk and the Industry Alliance for Interoperability (IAI), and
they are being implemented using ARX. The IFC and their Guidelines define, at
a basic level, the items most commonly used in buildings, and create
descriptions of intelligent, real-world objects that are transferred through
software applications from different vendors, from design to estimating,
through engineering to construction, and on to facilities management. These
real-world objects have relationships to each other that let them react
intelligently to changes in the project model. 
Autodesk
111 McInnes Parkway
Sausalito, CA 94903
415-507-5000
Virtual Genetic Systems has released its Hyperspace Search and Recovery
Vehicle (HSRV), a genetic algorithm framework written in C++. HSRV supports a
variety of encoding schemes, genetic operators, and population structures.
Many common phenotypes are supported using predefined chromosomes and genetic
operators (integer and real-valued vectors, text strings, algebraic and
regular expressions, and geometric shapes). A definition of the solution
domain and a fitness function are all that is required to construct a genetic
algorithm. User-defined phenotypes, encoding schemes, genetic operators,
population structures, and operator sequences are easily assembled to address
user-specific needs. HSRV currently supports Borland C++ 4.x, Watcom C/C++
10.x, Microsoft Visual C++ 2.x, and GNU C++ 2.5.8. 
Virtual Genetic Systems
2981 Highway 66
Ashland, OR 97520
503-488-8906
Microtec Research has released a PowerPC version of its developer kit. The
Spectra Advanced Developer's Kit includes a C/C++ compiler, source-level
debugging capabilities, graphical build tools, and the VRTX real-time
operating system. The kit also includes RISC optimization software, which
Microtec licensed from Apogee. Additionally, the kit includes an
object-oriented run-time library that facilitates working with kernel objects
such as tasks, queues, and semaphores. The kit is initially available only for
SPARCstation platforms, starting at $14,795.00.
Microtech Research
2350 Mission College Boulevard
Santa Clara, CA 95054
408-980-1300
Digigami has announced Weblisher, a publishing tool for the World Wide Web
that integrates into word processors and automatically converts standard
documents into HTML. The Weblisher Wizard steps you through a conversion
process for document styles, embedded pictures, or OLE objects, specifying
global page properties such as headers and footers. Once installed, Weblisher
adds a "Save As Web Pages" option to your word processor's File menu.
Weblisher supports HTML 1.0, 2.0, 3.0, HTML+, and HTML Netscape. Web pages
created with the tool are compatible with most Web browsers, including NCSA
Mosaic, Spyglass Enhanced NCSA Mosaic, Netscape Navigator, WinWeb, and Cello.
Weblisher sells for $495.00, and supports Microsoft Word 6.0 and 2.0, Ami Pro
3.x, Lotus WordPro, and WordPerfect 6.1. 
Digigami
7514 Girard Avenue, Suite 1-440
La Jolla, CA 92037
619-551-9559
http://www.digigami.com/weblisher
SQA has announced the availability of SQA LoadTest, a tool for load, stress,
and multiuser testing of Windows client/server applications. SQA LoadTest
supports client/server applications running on TCP/IP, NETBIOS, or IPX/SPX
networks. SQA LoadTest is offered as an add-on product for users of SQA
TeamTest, or bundled as part of SQA Suite, SQA's integrated testing tool.
SQA LoadTest can be used to test any server (UNIX, Windows NT, or OS/2)
connected to Windows clients. SQA LoadTest allows test scripts from SQA Robot
to be immediately distributed to Windows client machines over the network
without any changes. Test results are automatically logged in the integrated
SQA Test Repository for rapid access and analysis by all members of a testing
and development team. Pricing for SQA LoadTest starts at $10,000 for the
five-station version.
SQA Inc. 
10 State Street
Woburn, MA 01801
800-228-9922
Emultek has announced the release of Rapid Design 3.0, a Windows-based tool
for prototyping embedded system UIs. It allows you to quickly create an
interactive working prototype that operates exactly as the real-world system. 
Rapid Design 3.0 comes with an extensible object library. Objects are the
building blocks of any Rapid simulation. These prefabricated, reusable
components can be physical items such as switches, dials, displays, pipes,
system schemata, or drawings. Objects can also be nonphysical items such as
timers, numbers, strings, or data stores. By applying Rapid's visual technique
these objects are linked to create fully interactive simulations in a
completely code-free environment. Rapid Design 3.0 sells for $6,000.00.
Emultek Inc.
284 Racebrook Road
Orange, CT 06477 
800-368-5835
75052.2265@compuserve.com.
PowerBasic 3.2 now provides support for Pointer variables. The addition of
pointers allows programmers to address target memory locations in any fashion.
Version 3.2 also adds underscores in variable names, along with enhanced
communications support, including that for the 16550 UART chip. 
PowerBasic Inc.
316 Mid Valley Center
Carmel, CA 93923
408-659-8000








































































EDITORIAL


Shock Treatment


One measure of a technology's importance is the number of people it ultimately
affects. Granted, the space program didn't land thousands of people on the
moon, but it did introduce a generation to the wonders of powdered orange
juice, not to mention putting Teflon-coated skillets in kitchens around the
world. Based on this criteria, what's going on in the electric-utility
industry may outweigh any number of pop technologies, even media darlings like
the Internet and World Wide Web. 
Three factors are driving the current interest in electrical-power
distribution: market penetration, deregulation, and smart technology. How many
homes or businesses have you been in recently that didn't have electricity?
With a market penetration of virtually 100 percent, enabled by years of
government-granted monopolies, even Bill Gates is envious. 
And like the telecommunications industry of a decade or two ago, the
electrical-power retail-distribution industry is undergoing fundamental
changes, primarily deregulation at both the state and federal levels.
Historically, regulatory compacts between utilities and government required
that utilities sell electricity to customers within specified service areas,
with prices and profits regulated by government. Now, regulators are pushing
for a competitive market which would, among other things, eliminate
traditional service areas, allowing customers to buy electricity from any
provider. In some scenarios, utilities would be broken into independent
companies--those which generate power and those that distribute it. San Diego
Gas & Electric is already doing this, as it offloads low-margin generating
facilities to focus on high-margin transmission and distribution. With the
growing need to efficiently shift electricity from one power grid to another
in a competitive marketplace, industry leaders have realized that, as much as
anything else, success depends on building effective networks. (Just this
fall, for instance, Continental Power Exchange cut a deal with the TVA to
trade energy through its real-time computer network.) 
Believing it knows something about networks, Novell has stepped into the fray
in an ambitious effort with UtiliCorp, a Missouri-based utility company.
Called the "Smart Energy Network Alliance," the Novell-UtiliCorp project will
develop and market applications that let you optimize energy use on a 24-hour,
real-time basis by turning ordinary electrical lines into computer networks.
Ideally, to establish network communications, you'd simply plug an intelligent
device into an electrical outlet. 
At the heart of the proposal is Novell's Embedded Systems Technology (NEST)
Powerline, which uses existing power lines for bidirectional data transfer at
up to 2 Mbits/sec. Novell claims this data rate is about 20 times that of
previous techniques, and powerful enough to handle more than a dozen PCs
without installing special cables. At the other end of the line, utility
companies could use NEST Powerline technology to automate and manage specific
devices, allowing for demand-side power management. Of course, this requires
NEST-enabled devices or adapters (which would have embedded chips costing
about $1.00 each) that effectively turn individual devices into nodes on a
LAN. Analysts estimate that if appliances were connected to intelligent
power-management systems, the average household would save maybe $250 per
year. But power management isn't the only application for smart electricity
systems. Refrigerating systems, for instance, could signal problems with
compressors or other components back to a repair shop. Meter reading for
billing purposes could also be automated. 
To turn the Smart Energy Network Alliance vision into a reality, Novell is
seeding developers by cutting NEST-related licensing fees, and UtiliCorp will
likely slash charges to customers in Missouri, West Virginia, Minnesota,
Kansas, and Canada. Both companies plan on selling devices that send and
receive information. At the same time, the alliance will try to convince
appliance and office-equipment manufacturers to buy into the scheme, thereby
providing blades for Smart Energy razors.
And Novell isn't alone in this arena. Microsoft is working with TCI--the
nation's largest cable company--and Pacific Gas & Electric to develop
applications for managing energy use in home appliances. Likewise, Echelon has
installed processor-network nodes, which let electrical outlets and appliances
communicate with each other, into hundreds of Texas homes. 
The upshot of this process is that within the next few years we'll be able to
choose electrical suppliers, just as we now choose telephone-service
providers. Likewise, commercial customers will be able to negotiate a single
low rate from utilities. Utilicorp, for instance, is already supplying
electricity to over 400 Service Merchandise stores in 37 states, and Detroit
Edison is trying to cut a similar deal for all General Motors facilities. 
In short, new opportunities are emerging that don't necessarily target the
desktop--but which potentially dwarf today's markets. From personal power
generators to intelligent appliances, new markets are emerging that don't
target--yet may dwarf--today's desktop. In short, you may be shocked by the
opportunities that lie in the years ahead. 
Jonathan Ericksoneditor-in-chief













































LETTERS


The Study was Flawed


Dear DDJ,
The Stanford study about PhDs and jobs that Jonathan Erickson referred to in
his October 1995 "Editorial" is faulty, at least when it comes to
computer-science majors. Instead of 50 percent of PhDs not finding jobs that
require a PhD, only 3.6 percent found themselves in that situation. Check out
http://cra.org/. Follow the "What's New" link and look at the section "Major
Error...''. Do the industry a favor and mention the revised number.
Paul Long 
plong@perf.com 
http://www.teleport.com/~pciwww/
DDJ responds: Thanks for your note, Paul. We followed your advice and went
over to http://cra.org/ for a look-see. What we found was an interesting
story, thanks to Ed Lazowska (lazowska@cs.washington.edu) of the University of
Washington, Seattle. 
After meeting with Jeffrey D. Ullman from Stanford and Robert W. Ritchie from
Hewlett-Packard, William Massy, the senior author of the study, discovered
that a critical input parameter to his model--the total number of doctorates
employed in the field--was incorrect for computer science. 
Historical data from the National Science Foundation (NSF) revealed a steady
increase to 19,800 for the two years prior to the year used as input data, and
5376 for that year. This anomaly was not noted in preparing the input data for
the model. The anomaly was due to a change in definition by those in NSF
responsible for collecting the data. Massy has recently reconsidered and
decided to treat the figure of 5376 as inappropriate for use in his model. He
has decided to extrapolate the historical data to a figure of 21,000 instead.
As a result, computer science now shows an employment gap of only 3.6 percent
instead of the published 50.3 percent. This number is one of the lowest of all
fields, and suggests that the projections of 15 years ago, calling for a
build-up to 1000 computer science PhDs per year, are in remarkable agreement
with the revised Massy-Goldman estimate of current demand.
Massy says he is going to release the corrections to everyone who received the
report. We also expect Massy and Goldman to issue a full, revised report
shortly.
It's worth noting that the Massy-Goldman report was just that--a technical
report, totally unrefereed and replete with numerous obvious typographical
errors, in addition to the methodological glitch in the case of computer
science and engineering. Reports in The New York Times and other publications
would lead you to believe that the Massy-Goldman study was a "survey," when in
fact the study involves a first-order Markov model. The study is perfectly
clear on this point, so the misleading statements are solely due to The New
York Times and other publications that reported on the study.
Nevertheless, Lazowska went on to tell us that all is not rosy in
computer-science PhD land--jobs are ever-harder to come by, and students
continue to be educated for jobs that are less likely to be found. He sums up
by reporting that, while the report is badly in error quantitatively, it's
important to acknowledge that there are indeed problems to be addressed.


GNU C/C++


Dear DDJ,
I've been using GNU's C++ compiler for both DOS (DJGPP) and Linux for a year
now. In his August 1995 "C Programming" column, Al makes some great comments
about the compiler--it's price tag, for instance. For a high-school student
with no money, GNU's compiler offers an excellent way for me to develop my
programming skills. Some of his comments, however, are a bit off.
First, if you set the environment variables C_INCLUDE_PATH,
CPLUS_INCLUDE_PATH, and LIBRARY_PATH, the compiler will be able to find the
standard libraries. Optionally, you can set the DJGPP variable to file DJGPP,
ENV in the base directory, which automatically sets up those variables and
much more. 
Second, if you specify -o filename on the command line, the compiler will send
its output to filename, instead of a.out. Finally, you can run the file just
by typing go32 filename, which runs the coff format, without any other
postprocessing. You can also type coff2exe filename and convert the file to an
EXE format.
Wesley Griffin
Sykesville, Maryland
Dear DDJ,
The September 1995 Dr. Dobb's Journal included the article "Examining C/C++
Compilers," by Tim Parker. I won't comment on the reviewer's evaluations,
because independent evaluations are the purpose of a review, but I would like
to clarify a few points.
Cygnus Support has no special or authority over GNU software releases, no
special relationship with the GNU project or the Free Software Foundation.
Cygnus's staff contributes substantial amounts of work to GNU software
development, and we appreciate this; but the Free Software Foundation does not
give a special status to any company. We'd be equally grateful to Ready-to-Run
if they were to help develop GNU software. If you contribute to GNU
development, we'll appreciate your help, too.
The review mentioned two places to buy copies of the GNU C compiler and
related software, but it omitted the Free Software Foundation itself. The
Foundation is a tax-exempt charity for software development; but in addition
to donations, we raise much of our funds by selling freely redistributable
books and CD-ROMs. 
The Foundation sells a CD-ROM of compilation software, including binaries for
Solaris, SunOS 4, and HP-UX as well as sources; the price is $220.00 if an
organization is paying or reimbursing, and $55.00 if an individual is buying
with his own money. When you buy a CD-ROM from the Free Software Foundation,
most of the price goes toward development of more free software. To contact
the Foundation, write to Free Software Foundation, 59 Temple Place, Suite 330,
Boston, MA 02111-1307 USA, 617-542-5942, fax: 617-542-2652,
fsforder@gnu.ai.mit.edu. 
The review referred in passing to "nasty conditions" for use of the GNU C
Library in proprietary applications. Readers might appreciate knowing
precisely what these conditions are. Here are the requirements for
distributing a proprietary application on Solaris that uses the GNU C Library:
The GNU C library source has to be provided to the users in some way.
The GNU C library must be a separate shared library, so that the user can
modify the library and run the application with the modified library. (Other
alternatives are also available, but this one is most convenient.)
The user must be legally permitted to modify the application for the user's
own use (though modification can be grounds to cancel the warranty).
The user must be allowed to do reverse engineering to debug these changes.
(None of these conditions apply to using the GNU C compiler, linker, and
assembler--only the GNU C library.)
The other compilers reviewed have their own conditions for use; readers might
want to compare the conditions and judge which are more nasty. The review did
not help readers make the comparison. Like most reviews, it rated the various
compilers on criteria such as speed, functionality, reliability, and price. It
did not rate them on the criterion of what a user can legally do with them
after "buying" a copy.
We can guess, in general, what kinds of restrictions the other compilers have.
It is surely forbidden (not to be identified with "wrong" or "immoral") to
give a copy to your friend. You may be forbidden to install them on several
machines, or on machines at different physical locations; you may even be
forbidden to have several users run the compiler at one time on one machine.
GNU software has no such restrictions, because it is free software. The word
"free" refers to freedom, not price: Legal conditions for using GNU software
are always designed to protect users' freedom from intermediaries who might
strip it away. You are free to redistribute copies; you are free to install
GNU software on any number of machines, anywhere at all; and once it is
installed, any number of users can run it.
You are also free to read the source code for the GNU compiler, and even to
make changes, or commission a service company of your choice to make changes
to your specifications. You can't do this at all with a proprietary compiler.
When reviews neglect this aspect of how programs differ, they forget in effect
that not all programs are proprietary. I hope that future reviews in Dr.
Dobb's Journal will rate all programs on user freedom along with speed,
quality, and price.
Richard Stallman
Free Software Foundation
Cambridge, Massachusetts
rms@gnu.ai.mit.edu


Putting Oconomowoc on the Map 


Dear DDJ,
As former president of the Oconomowoc Senior High School Computer Club, 1980
and 1981, and even then a subscriber to a magazine then known for
"orthodontia" as well as computer "calisthenics," I must protest several
points in your October 1995 "Swaine's Flames."
First, Oconomowoc is not populous enough to have its own broadcast news
station. Even the stations in Madison and Milwaukee have ordinary-looking
reporters. Why, just last night, I saw a chubby reporter. And Madison even had
an all-female news team.

"Oconomowocians?" Oconomowoc residents refer to themselves as "Coonies" after
the high school's mascot, a raccoon. Live bait and computers? No way. Live
bait and doughnuts, or live bait and video rentals, maybe. No Safeways,
either. 
Oconomowoc is situated between several very nice lakes, so that homeless
marine biologist should be able to hitch-hike to the University of
Wisconsin-Madison to get a job as a limnologist, or even as a cabdriver, if he
has his doctorate.
I get the joke, though. Why did you pick Oconomowoc? Did you think that
humorous references to Oshkosh are over-used? 
As for Windows 95, a friend said "It makes a grown man cry." Personally, I
think it's just a scam to generate more cash for Bill, as well as to teach
people that they actually wanted to buy Windows NT.
John Foust
Jefferson, Wisconsin 
syndesis@beta.inc.net


Kudos for Linus


Dear DDJ,
Congratulations on recognizing Linus Torvalds for his work with the Linux
operating system. Linus is a god to us Linux disciples.
It was almost an insult to compare him to this other guy, Alexander Stepanov.
I realize that DDJ has always been off its own tangent, ever since the
"Running Light Without Overbyte" days, but this is too much.
It is by no means clear that C++ templates are a usable language feature. Of
course, I'm a C++ skeptic from way back. I'm still waiting to see a case study
where the claimed maintenance benefits of OOP are proven. But I've even read
articles by serious C/C++ authors who suspect templates might be too
complicated to be usable.
And what's this about a scholarship for a deserving student? Linus is a
student! Give him the money. And he learned UNIX all by himself, not like
Stepanov who had to have Bjarne explain it to him in person.
Charles Hall
Raleigh, North Carolina
DDJ responds: Thanks for your letter, Charles, but you're the first person
we've heard from who didn't think Stepanov deserved recognition for
spearheading a major body of work that will shape the future direction of
software development. This doesn't detract from our admiration of Torvalds,
who deserves accolades as well.


Play it on the Radio


Dear DDJ,
While I enjoyed Al Stevens' October 1995 "C Programming" column on MidiFitz,
one of his comments about Microsoft's MFC shows a serious communication
problem. Al clearly doesn't understand the "radio-button" metaphor. Maybe he
is too young to remember the AM car radios that are the physical model. In
those radios, you had a number of buttons, but only one station was selected
at a time.
MFC properly returns an index from the first button. Perfect for indexing into
an array or driving a switch statement. On those old radios, you can't select
multiple buttons, so having multiple Boolean variables is the wrong model.
MFC does support exactly the type of multivalued logical fields; they are
called "check boxes." You just can't expect radio buttons to work the way
check boxes do.
It is likely that Al's complaint about MFC's documentation is valid. There is
a serious minor publishing industry that helps folks understand MFC. But Al
started with the wrong model. His expectations were wrong, not MFC's
implementation.
Pat Farrell 
http://www.isse.gmu.edu/students/pfarrell
Al responds: Of course I remember the old-style car-radio buttons. They were
popular with my generation because they were easy to find in the dark by
groping forward from the back seat. But then, maybe you are too young to have
watched submarine races from the back seat of a '50 Ford.
It was not clear to me at the time what you could do with the value passed via
DDX into a variable from a radio-button group. It wasn't clear how the visual
designer decides which button has which indexed value. The programmer
interface is less than intuitive to an MFC novice in this case.
All that notwithstanding, I disagree with your position that multiple bools
are the wrong model. The radio-button metaphor is for the user. The programmer
has different needs. For example, if I decide that a check box fits the bill
better than two radio buttons, I'd like to be able to do it without changing a
lot of code. Each radio button in a group has its own identity. It's not
asking much to be able to tell if a specific button is the pushed one without
considering its position in a group. (You can do it in D-Flat.) As I said in
the column, it's easier to forego the DDX solution and implement that kind of
flexibility myself.


Numeric Typos


Dear DDJ,
In the October 1995 issue of DDJ, Louis Plebani's article "Common-Fraction
Approximation of Real Numbers" contained two typographical errors that made
the mathematics of an otherwise very enjoyable article a little more difficult
than it should have been.
In the first paragraph of the second page, the equation ab- bc=-1 should be
ad-bc=-1. Continuing in the same sentence, k=c+d should be k=b+d.
Please continue with mathematically oriented articles, they are extremely
informative.
Alan J. Livingston
Copiague, New York
74150.1754@compuserve.com












































































Visual Programming in 3-D


Cube provides true visual programming via executable graphics




Marc Najork


Marc is a member of the research staff at Digital Equipment Corp.'s Systems
Research Center in Palo Alto, CA. He can be reached at najork@src.dec.com.


As illustrated by names like "Visual C++," "Visual Basic," "VisualAge,"
"Visual Objects," and the like, the current fashion among programming
environments is to include the word "visual" wherever possible in the tool's
moniker. Strictly speaking, most of these packages are not really visual, in
that they do not use a purely (or even predominantly) visual notation to
represent a computation. Instead, the visual aspects often serve as a
rudimentary graphical scaffold on which pieces of program text are hung. 
There is a range of approaches among these packages. Visual C++ provides
little more than a unified GUI to conventional program-development tools
(compiler, text editor, debugger, and resource editor). In tools such as
Visual Basic, the primary interface is a direct-manipulation UI builder that
allows you to lay out widgets in a window without having to write any
code--until later in the game. Packages such as VisualAge even allow you to
build simple applications in a purely visual way, by connecting predefined
components (represented by icons) with line segments. However, in all of these
packages, you eventually arrive at a point where you must resort to a language
such as C++ or Basic to get the job done. 
A true visual-programming language, on the other hand, can be considered
"executable graphics," with no hidden text Although commercial instances of
visual-programming languages are scarce, visual programming as a discipline
has been around almost as long as computer graphics: William Sutherland (now
director of research at Sun Microsystems) implemented the first
visual-programming environment in 1965, two years after his brother Ivan
created "Sketchpad," the first computer-graphics application. Sutherland's
system was far ahead of its time: It had an integrated development
environment, where you could use a pen-based graphics editor to interactively
draw a dataflow diagram, then immediately execute and debug it. In fact, it
must have appeared quite outlandish to most, given that at the time, the TX-2
at MIT's Lincoln Labs was the only computer in the world that had the required
graphics hardware.
Indeed, the field of visual programming remained mostly dormant until the
mid-1980s, when graphics hardware became widely available. Since then, there
has been considerable interest in the research community (the IEEE Symposium
on Visual Languages just went into its eleventh year), and there is a small
but growing number of commercial software systems that meet the stringent
definition of a visual-programming language; for example, National
Instruments' LabView or Pictorius's Prograph.
If commodity graphics hardware was the enabling factor for the success of
visual languages, then we can expect that the eventual arrival of cheap,
high-quality, virtual-reality hardware will foster a new crop of
visual-programming languages--ones that use a three-dimensional instead of a
two-dimensional notation.
This hypothesis was the motivation for my work on Cube, which to my knowledge
is the first 3-D visual programming language. My goals in this project were to
show that programming in 3-D is feasible, investigate whether the third
dimension can provide a richer level of expression (as opposed to being a mere
eye-catcher), and gain a better understanding of what new tools and techniques
are needed for building and using 3-D programming environments. My ambitions
did not go so far as to build a full-strength system. The current
implementation of Cube is still very much a prototype, and the Cube programs I
have written are the classic toy examples found in entry-level programming
textbooks. However, the language, despite being purely graphical, has
first-class computational strength, incorporating such notions as recursion,
higher-order predicates, and user-defined types. In a few respects, Cube's
computational expressiveness goes beyond some conventional text languages; for
example, it allows multiple textual solutions to a computation.


Visual Language Pros and Cons 


Not surprisingly, the arguments for the merits of visual languages are as old
as visual languages themselves. 
The human mind is visually oriented. Evolution has equipped us with a powerful
visual cortex; humans can process visual information rapidly, and are very
good at discovering graphical relationships (such as connectivity or
inclusion) in complex pictures. So, visual notation provides for fast
information transfer, and moves part of the mental workload from the cognitive
to the perceptual level. 
Graphical representations provide a syntactically rich language. The elements
of a picture have a multitude of attributes, such as shape, size, position,
orientation, color, and texture, all of which can be used to encode meaning.
Also, graphical representations allow for concrete metaphors, such as icons in
place of names. 
There are also a number of problems associated with graphical representations:
Screen space. Visual languages use screen real estate much less frugally than
textual ones. However, this problem can be alleviated by use of
high-resolution displays.
Input. While it might be faster to read a visual program than a textual one,
it takes longer to write one--most people type faster than they draw. However,
only a small fraction of a programmer's time is actually spent on entering
code. A much larger fraction is spent on designing and debugging, and on
understanding other people's code.
Naming. It is harder to come up with good icons than with good names.
Presumably, this is a cultural rather than an intrinsic problem.


Why 3-D?


A number of arguments have been made, by me and others, for 3-D over 2-D
visual languages:
More information. Three-dimensional allows us to use one more dimension to
convey semantic information. (In Cube, you use the third dimension to stack
2-D dataflow diagrams on top of each other.) 
Easier layout. The metaphor most commonly used in visual languages is the
dataflow metaphor, in which programs are represented by boxes connected by
lines. In 2-D, it is often impossible to avoid intersecting lines (not every
graph is planar), whereas in 3-D, such intersections can easily be avoided.
Space efficiency. A 3-D representation alleviates the screen- space problem.
Although the physical screen real estate remains unchanged, the virtual space
of a 3-D image is larger than that of a 2-D image. The user can access
different parts of the space by looking at it from different positions and
angles. The drawback, however, is that parts of the picture may be occluded at
times.
Three-dimensional notations are well-suited, if not ideal, for programming in
virtual realities. 


Temperature Conversion


To illustrate programming with a truly visual language, I'll first present a
program that converts temperature values from Celsius to Fahrenheit and vice
versa. Recall that x degrees Celsius correspond to 1.8x+32 degrees Fahrenheit.
The program in Figure 1(a) describes this relation. It consists of two opaque
green "predicate cubes," labeled with the symbols "*" and "+", respectively,
and four transparent green "holder cubes." Two of the holder cubes are filled
with opaque green "value cubes" that are labeled "1.8" and "32," respectively.
The other two holder cubes are empty: The left one is meant to receive a
Celsius temperature value, and the right one, a Fahrenheit value.
In Cube, addition and multiplication are viewed as ternary predicates rather
than binary functions. This is similar to writing plus(a,b,c) in place of
a+b=c. So, each of the two predicate cubes has three arguments (also known as
"ports") shown as labeled indentations in the cube's side walls. The label of
each port identifies the argument as being a, b, or c in the notation above.
The two "input" ports of the multiplication predicate are connected by a
"pipe" to the left empty holder cube and to the holder cube containing the
value 1.8; the "output" port of the multiplication predicate is connected to
one "input" port of the addition predicate, its other "input" port being
connected to the holder cube containing the value 32, and its "output" port
being connected to the right empty holder cube.
Cube, like most other visual languages, is based on a dataflow metaphor.
Holder cubes and ports of predicate cubes are connected by pipes, and values
flow though these pipes. Pipes are undirected; data simply flows from full to
empty holder cubes and ports. Cube differs in this respect from most other
dataflow languages, which use directed connections. Also, there is no real
distinction between the "input" and the "output" of a predicate. If you put a
value cube (say, 10) into the empty holder cube on the left and run the
program, the system will fill the right holder cube with the value 50.
Likewise, if you put a value (say, 68) into the right empty holder cube, as in
Figure 1(b), the system will fill the left holder cube with the value 20; see
Figure 1(c). 
If you are familiar with Prolog, you'll find Cube's undirected dataflow
resembles the undirected nature of logic variables in Prolog. Indeed, the
underlying semantics of Cube are quite similar to those of Prolog. 
This example has introduced some important syntactic elements of Cube (holder
cubes, predicate cubes, value cubes, ports, and pipes), and has explained the
dataflow metaphor and how Cube programs are undirected (that is, input and
output arguments are often interchangeable). However, it did not show us a
convincing use of the third dimension; the temperature-conversion program is
essentially a flat dataflow diagram. 


The Classic Factorial



The next example is a program for computing the factorial of a number. We can
define "factorial" recursively as follows: The factorial of 0 is 1 (this is
called the "base case"), and the factorial of n (where n > 0) is n times the
factorial of n-1 (the recursive case). 
Figure 2(a) shows the Cube definition of factorial: A transparent green cube,
called a "predicate definition cube," with the icon "!" on its top. Inside the
definition cube are two transparent boxes, called "planes," that are stacked
on top of each other. Each plane contains a dataflow diagram. There are two
pipes coming out of each plane and leading into two ports set into the side of
the definition cube. I'll call the left pipe the "input pipe" and the right
pipe the "output pipe."
Once defined, the factorial predicate can be used anywhere within a Cube
program, including inside its own definition. During execution, every opaque
predicate cube referring to factorial is replaced by (a copy of) the
transparent cube defining it. Values that flow through the "input" port into
the cube are split up and flow in parallel into each plane, feeding into the
data flow diagram within that plane. In the process of evaluation, a plane
might fail, in which case it is taken out of the computation, or it might
succeed, in which case its result flows out of the plane towards the output
port of the predicate.
From this description, it is apparent that the evaluation of a Cube program
can yield any number of solutions. If all planes of a predicate fail, then
there are no solutions, if more than one plane succeeds, then there are
multiple solutions. As it happens, all the programs presented here yield, at
most, one solution.
The upper plane in Figure 2(b) describes the base case. The plane contains two
holder cubes, the left one containing the value 0, and the right one
containing 1. The left holder cube is connected to the input pipe, and the
right holder cube is connected to the output pipe. During execution, the
outside world must either send the input pipe a value compatible with the
value inside the connected holder (in this case, 0) or not send any value at
all; otherwise, the plane will fail. The same holds true for the output pipe.
The lower plane in Figure 2(c) describes the recursive case. The input pipe
enters the plane on the left side and splits up. One of its ends connects to
one port of a predicate cube labeled ">"; the other port of this cube is
connected to a holder cube containing the value 0. This will ensure that
during execution, the value flowing through the input pipe is greater than 0
(and cause the plane to fail if it isn't). 
The input pipe also leads into one input port of a subtraction predicate,
whose second input port is connected to a holder cube containing 1. The output
of the subtraction predicate flows into the input port of the factorial
predicate (that is, a recursive use of the predicate we are about to define),
and the output of factorial flows into one input port of a multiplication
predicate. The other input port is connected to the input pipe, and its output
port is connected to the output pipe. 
Having defined factorial, you can use it in the same manner as addition or
multiplication. Figure 2(d) is just such an application. The program contains
the factorial-definition cube and a predicate cube referring to it. The input
port of this predicate is connected to a holder cube containing the value 5,
while its output port is connected to an empty holder cube. Running the
program will fill the empty holder cube with the value 120; see Figure 2(e).


Mapping a Predicate Over a List


The final example is the Cube analog of map, the classic textbook example of a
higher-order function. The standard version of map takes a unary function f
(such as, factorial) and a list [x1, ... , xn] as its arguments, and returns a
list [f(x1), ... , f(xn)]. Alternatively, you can give the following recursive
definitions of map: "Applying map to a function f and the empty list yields
the empty list" (the base case), and "Applying map to a function f and a list
with head h and tail t yields a list whose head is f(h) and whose tail is the
result of applying map recursively to f and to t" (the recursive case).
Recall that Cube is a logic programming language, and that results are treated
as arguments. So, the map predicate cube takes three arguments: a binary
predicate (such as factorial) and two lists (the second list being the
"output"). There are two special value cubes for describing lists: a cube that
denotes the empty list, and another cube that attaches a value to the front of
a list. (In Scheme and other functional languages, these two "constructors"
are referred to as nil and cons.) The two value cubes are not Cube primitives;
they are derived from a user-supplied type-definition cube that describes the
list type. 
Figure 3(a) shows the predicate-definition cube for map. It has two ports in
its sides that take the input and the output list, and a third port set into
its top that takes the binary predicate. The port is labeled "-->"; you can
use the argument not only by routing a pipe to it, but also through a
predicate cube labeled with the same icon. Finally, the definition cube
contains two planes, one for the base case and one for the recursive case.
Pipes run from the ports for the input and the output list into both planes. 
The lower plane in Figure 3(b) describes the base case. The plane contains two
holder cubes, each of which contains a value cube representing the empty list.
The left holder cube is connected to the input pipe, and the right holder
cube, to the output pipe. The plane will fail if the outside world puts a
nonempty list into either pipe; otherwise, it will succeed and put empty lists
into both pipes.
The upper plane in Figure 3(c) describes the recursive case. The input pipe
enters the plane from the left and connects to a holder cube that is filled
with a cons value cube. The output pipe enters the plane from the right and
connects to a holder cube that is filled with another cons value cube. True to
the spirit of Cube's bidirectional nature, the cons cube can be used to
compose a list (that is, to attach an element in front of a list) as well as
to decompose it (that is, to split a list into head and tail). 
If a list flows through the input pipe into the left holder cube, it is split
into a head and a tail. The head value flows out through the upper pipe and
into the input port of the predicate cube labeled "-->", the binary predicate
passed in as an argument. The tail value flows out through the lower pipe and
into the input port of the map predicate (a recursive reference to the
predicate we are currently defining). The output ports of the binary predicate
and of map flow to the right cons cube, where they are composed into a list.
This list finally leaves the plane through the output pipe on the right. So,
the meaning of this plane can be stated as follows: "Given a nonempty list,
split it into a head and a tail, apply the user-supplied binary predicate
"-->" to the head and map recursively to the tail, and combine the results
into a new list."
Figure 3(d) is an example of the map predicate. The program contains the
map-definition cube and a predicate cube referring to it. The input port of
map is connected to a holder cube filled with the list [1,2,3], the output
port is connected to an empty holder cube, and the factorial predicate is
slotted into the port on the top of map. The program is supposed to apply
factorial to each element of the list [1,2,3]. Running the program will cause
the result--the list [1,2,6]--to flow into the holder cube on the right.
Figure 3(e) shows a close-up of the result.


The Implementation of Cube


All figures in this article were created by a prototype implementation of a
Cube environment written in Modula-3. It consists of four major functional
components: a renderer, a type-inference system, an interpreter, and a
rudimentary editor.
The renderer does not rely on any dedicated 3-D hardware. I use a variation of
the z-buffer algorithm combined with alpha blending to perform hidden-surface
removal and to deal with transparent surfaces. The renderer delivers very
realistic pictures, but is rather slow: It takes about three seconds on a
233-MHz Digital AlphaStation 400 4/233 to render Figure 1 at a resolution of
640x512 pixels--too slow to navigate the scene at interactive speeds.
Therefore, the renderer displays a wireframe rendition of the scene whenever
the viewpoint or the scene content changes, and starts (or restarts) a
background thread to generate the high-quality rendition. If the scene remains
unchanged long enough for the background thread to complete its task, the
high-quality image eventually replaces the wireframe rendering. 
The type-inference system translates the external representation of a Cube
program into a simpler, intermediate representation, and then employs the
Hindley-Milner type-inference algorithm (the algorithm used by virtually all
statically typed functional languages) on it. The Hindley-Milner algorithm not
only verifies that a given program is type correct, it also infers the type of
each variable used. Cube uses these inferred types to provide feedback to the
user: It fills each empty holder cube with a type cube representing the
inferred type of the holder cube. For example, the empty holder cubes in
Figure 1 would be filled with the type cube representing the real numbers.
The interpreter translates the internal representation of Cube programs into
another intermediate representation, and then evaluates this intermediate
form. Cube is an inherently concurrent language: The various planes of a
predicate definition and the various predicates in each plane are evaluated in
parallel. The interpreter simulates this concurrency by using a time-slicing
approach. 
Evaluating a Cube program yields a (possibly empty) set of solutions that the
user can interactively browse and examine. When the user selects a particular
solution, the interpreter fills each empty holder cube with a value cube
representing the computed instantiation of the holder cube.
Finally, the editor allows the user to interactively construct new Cube
programs. The current system implements only a core part of the needed
functionality (that is, the ability to create predicate and holder cubes, to
fill them with values, and to connect them through pipes). 
When I implemented Cube, I had no access to virtual-reality hardware, so the
input is entirely mouse based. As it turns out, editing 3-D scenes with a 2-D
input device is extremely cumbersome. A more-appropriate input device, such as
a data glove, would speed up the editing process considerably.


Conclusion


The proliferation of low-cost, high-resolution graphics displays has laid the
groundwork that will allow visual languages to move into the mainstream of
computing. I believe that their initial success will be in providing intuitive
programming environments with a shallow learning curve to nonprogrammers or
casual programmers: For instance, by enabling laboratory technicians to
construct virtual lab instruments (as done by LabVIEW), by enabling scientists
to build customized data visualizations (as done by AVS and IRIS Explorer), or
by enabling end users to easily connect software components.
Professional programmers, on the other hand, still prefer textual
environments, potentially with GUI front ends (as in Visual C++) and
containing GUI builders (as in Delphi). However, visual languages are making
some inroads here as well, for instance, in the form of the dataflow language
that is part of VisualAge.
At the moment, 3-D visual programming is still the domain of research. This
might change if virtual-reality hardware becomes commonplace. In fact, looking
at the history of graphical user interfaces and of 2-D visual languages, I
believe it is actually quite likely to change.
The Cube project has not produced the programming environment of the next
century (nor was it intended to), but it has shown the feasibility of 3-D
visual programming, and it has helped in identifying the research problems
that need to be tackled in order to make programming in virtual realities a
(more than virtual) reality.
For more information on Cube, visit
http://www.research.digital.com/SRC/personal/najork/cube.html on the World
Wide Web.
Figure 1(a): The temperature-conversion program.
Figure 1(b): Converting 68 degrees Fahrenheit to Celsius.
Figure 1(c): The temperature-conversion program after evaluation.
Figure 2(a): The factorial predicate.
Figure 2(b): The upper plane.
Figure 2(c): The lower plane.
Figure 2(d): Computing the factorial of 5.
Figure 2(e): The factorial predicate after evaluation.
Figure 3(a): The map predicate.
Figure 3(b): The lower plane.
Figure 3(c): The upper plane.
Figure 3(d): Mapping factorial over the list [1,2,3].
Figure 3(e): The map predicate after evaluation.




































































Visually Constructing Delphi Components


Creating visual tools




Al Williams


Al, a consultant specializing in software development, training, and
documentation, can be contacted at 72010.3574@compuserve.com.


Borland's Delphi visual-programming environment has a split personality. It
lets you drag-and-drop Visual Component Library (VCL) components around the
screen without deriving new classes. Yet building those VCL components is
anything but visual. To create a component, you have to write naked Pascal
code, understand class derivation, and more.
In this article, I'll present a form-based Delphi program called "CompBld"
that helps you write components. It won't make the process as visual as
application development, but it's far better than starting from scratch. Along
the way, I'll examine how to write Delphi components (including a custom
property editor) and how to use objects in applications. 


Inside VCL


When you add a component to a form, Delphi automatically creates a variable to
hold an instance of the component. You then customize the component by
altering its properties; see Figure 1. Contrast this with environments such as
OWL or MFC, where you subclass existing classes to customize them. When
building applications with Delphi, you may never need to resort to class
derivation. 
As a component developer, however, you must subclass. Each time you create a
new component, you'll derive it from an existing class (TComponent or one of
its subclasses). You'll also have to be aware of some other Delphi issues:
 All Delphi objects are references to objects that Delphi dynamically creates
on the heap. In TForm1.Button.Text:='Close';, for instance, the Button field
is a pointer to a heap object. The compiler automatically dereferences the
pointer, so you don't need to use the clumsy pointer notation. This is similar
to a C++ reference variable.
 Components have properties (pseudo-variables), methods (function calls), and
events (message-handling functions). As a component designer, you need to
define these items and control their visibility to other portions of the
program.
 Properties may have side effects. When a component user reads or modifies a
property, Delphi can call a function. For example, setting a button's Enable
property to True modifies the property and also calls code that makes a
standard EnableWindow() call.
 Properties and methods may be private (visible only inside a unit), protected
(visible inside a unit and to descendant classes), public (visible globally),
or published (like public, but available in the Object Inspector at design
time).


Anatomy of a Component


A component is simply a class derived from TComponent or a subclass of
TComponent. Writing a component is a non-visual operation. You can use
Delphi's Component Expert, but it writes little code for you. To run Component
Expert, select New Component from the File menu. Example 1 is the code you can
expect it to generate.
The class(TComboBox) syntax in Example 1 indicates that the component derives
directly from TComboBox. From this skeleton, you'll need to fill in any
private variables you want to use, and add properties, methods, and
occasionally, events. Methods are just ordinary procedure or function
definitions. Properties, however, are unlike anything you've used in other
programming languages. They are variables, function calls, and more, all
rolled into one. Events are simply a special form of properties.


More About Properties


Properties are part of what makes Delphi so powerful. A component can create
properties that appear on the Object Inspector in the design environment. In
the class definition in Example 2, the Delimiter property will appear in the
Object Inspector window because of the published keyword preceding it. The
default value is a comma. When a component user reads or writes this property,
it will directly access the FDelimiter variable. If you want a read-only
property, omit the write method. The default value is optional, and only works
for ordinal and small-set types.
In this case, a property is little more than a variable. However, you can also
define read and write procedures that execute at the proper time. In Example
3, DynamicProp defines functions to control access to the property. When the
component user writes x:=TDDJComponent.DynamicProp;, you can think of it as
x:=TDDJComponent.GetDynProp;. Of course, you couldn't really call GetDynProp
directly, since it is private to the class, but the result is the same.


Methods and Events


To define a method, just write an ordinary function or procedure in the public
or protected declaration sections. Because of properties, you won't use
methods as often as you do in other languages. Consider a custom edit control
where you want to encrypt the contents. You could write a method to do the
encryption. But if you published an encrypt property instead, the write
function would perform the encryption operation when the component user
changed the property to True. Properties are powerful; you'll often use them
instead of directly calling methods. This better encapsulates your
implementation and makes the values available at design time.
As a component user, you assign methods to an event handler using the Object
Inspector. This simply assigns a method pointer to a special event property.
As a component builder, you must not do this. If you use the event handler,
you'll conflict with any user attempts to intercept the event.
To solve this problem, you need to know how Delphi events work. Consider the
OnKeyDown handler. Inside components that process WM_KEYDOWN (the underlying
message handler), you'll find a KeyDown handler. This method uses the message
keyword to indicate it handles a Windows message. KeyDown checks to see if the
OnKeyDown property is set. If so, KeyDown calls the routine pointed to by the
property.
To change default event processing for WM_KEYDOWN, you override KeyDown, not
OnKeyDown. This allows users to continue to assign events as usual. Defining a
custom event is somewhat more difficult.
To define a new event, you have to define a custom message, message structure,
event type, and message-handling procedure. You'll also want to publish a
property for user-defined handlers, as in the component that needs to handle a
custom WM_USER message in Example 4.


Handling Default Values


You can specify a default value for any ordinal or small-set property using
the default keyword. This may not work as you expect, however. While Object
Inspector initializes the property's value to this default, Delphi never saves
the default values when it writes your project code. You must set the default
values in the object's constructor, and be sure to use the same value in both
the default clause and the constructor. Example 5 is a typical constructor for
a component with two default properties.



Automating the Process


Designing a reasonable user interface for entering method and event
definitions can be difficult because you have to handle a varying number of
arguments. Luckily, it is easy to write methods and override existing events;
you essentially write self-contained functions. You rarely need to create new
events. The most tedious (and most common) task is writing properties. You
have to declare a variable to hold the results and possibly create read and
write functions. You also need to write a constructor to handle the default
values. Why not automate these steps?
Figure 2 shows CompBld in action. The left portion of the screen is similar to
the Component Expert (except for the comments field). However, the right side
gives you a complete tool for creating properties. Simply enter the property
name and select the type. CompBld will automatically provide read and write
variable names. If you want a read or write method, type another name in the
appropriate slot. If you want a read-only property, delete the entry in the
write field. Finally, supply any default value and select public or published.
Press the Add button (or press Enter) and the property will appear in the list
box. To alter a property (or delete it), select it in the list box and make
any changes.
Listing One is the main CompBld program (SDIMAIN.PAS). The bulk of the program
deals with UI issues. The real meat is in the SaveItemClick procedure. This
ugly piece of code writes the skeleton component file. I wanted to prevent
this code from knowing too much about the property's user interface. The
SaveItemClick procedure deals with a TPropVals object. This is a good example
of the power of a user-defined class in a Delphi application. This object
knows how to transfer data from components on the form to a data structure and
back (the SetValues and GetValues methods). The property list box stores a
list of these objects in its Objects array (a standard part of the list box).
When SaveItemClick needs to work with the properties you define, it simply
walks the Objects array. The PAddClick handler copies the correct values into
the Objects array. When you click on an entry in the list box, PBoxClick
reverses the process. It copies the data from the Objects array to the
controls on the form (using the TPropVals.SetValues method).


Improving CompBld with a Custom Component


The first version of CompBld had values hardcoded in the combo boxes. A
more-general implementation would read values from an INI file. Since this is
a useful variation on a combo box, it calls for a custom component. These are
the design criteria for the TIniCombo component:
The control should act just like an ordinary combo box.
 Each TIniCombo will have a Filename, Section, and Item property, as well as a
Separator property (a character property for the item delimiter).
When the TIniCombo initializes, it should read a string from the file using
the appropriate section and item name. The combo box then fills with the
values as delimited by the separator character. The ReRead method also
performs this function.
Calling the Write method places the current contents of the combo box back in
the INI file in the same format.
TIniCombo will provide a custom property editor for the Filename property,
which will bring up the standard file-open dialog, but will only store the
base filename. Programs should assume an INI file is in the Windows directory,
which may differ between machines.
Initially, the ReRead method may seem superfluous. Why not read the file each
time the Filename property changes? The problem is that the filename is not
the only key to the data. You also need the section and item strings. If you
made any of these properties refresh the control, users would need to set the
properties in a specific order.
When I added TIniCombo to CompBld, I also extended the UI. When you enter a
value in a combo box that doesn't appear in the list, the code adds it to the
list (see ComboExit). If you press Ctrl-Del while in a combo box, it will
remove the item from the list (this occurs in the ComboKey procedure). In
either case, CompBld writes the combo-box data back to the INI file.


Inside TIniCombo


Listing Two is TIniCombo. Since a TIniCombo looks just like a TComboBox, it
derives directly from this class. If you wanted to hide some parts of the
normal combo box, you'd need to derive from TCustomComboBox, a special class
that defines almost everything as protected. You could then change the
attributes in a derived subclass. In a derived class, you can publish items
from your base class, but you can't unpublish them. In this case, you don't
want to change anything, so start with TComboBox.
Delphi has many objects that aren't visible. One of these (TIniFile) is
exactly what TIniCombo needs to read and write to an INI file. The only
problem with TIniFile is that it uses standard Pascal strings, which are
limited to 255 characters. However, Delphi also supports NULL-terminated
strings (using character arrays and type PChar). You can use these strings and
the normal Windows API calls to break the 255-character limit, provided you
also rewrite all the string-manipulation code in TIniCombo.
The TIniCombo design calls for the box to initialize itself on creation. The
obvious method would be to set everything up in the constructor, but this
won't work. Regardless of the settings in Object Inspector, the object is
empty when the constructor runs, because Delphi hasn't loaded the property
values yet. The correct place to initialize is during the Loaded procedure. Be
sure to use inherited to call the base class in case it needs to initialize,
too. The TIniCombo.Loaded routine checks the Filename property. If it is not
empty, the code calls ReRead to set everything up.


Custom Property Editors


Delphi allows you to write custom editors for properties. One example of a
property editor is that supplied by Delphi for the Font property. Pressing the
button on this property opens a dialog allowing you to select a font. However,
many property editors are not so visual. For example, a property of type Char
uses an editor that you may never notice. Its sole purpose is to translate
non-printable characters to hex notation (for example, Control-A is #01).
Writing a custom property editor isn't difficult. Simply derive a class from
TPropertyEditor, override a few procedures, and call RegisterPropertyEditor.
Your editor will translate between the property and a string representation
that Object Inspector displays. There are six methods to override in your
property editor:
GetValue, which returns a string for Object Inspector to display.
SetValue, which converts a string from Object Inspector into a property value.
Edit, which displays a dialog box and returns a string.
GetValues, which returns a list of values for enumerated properties.
AllEqual, which determines if the user can set the property on multiple
objects simultaneously.
GetAttributes, which informs Object Inspector of the editor's capabilities.
A simple editor (like the one for Char properties) may only supply GetValue
and SetValue. If you want to supply GetValues, AllEqual, or Edit, you'll need
to override GetAttributes and return the proper values (see Table 1). These
values are elements of a set. Include the ones that apply to your editor (see
Listing Two for an example).
TPropertyEditor also provides several methods to read and write common types
(see Table 2). In many cases, GetValue and SetValue will simply call these
built-in methods. If you want to do range checking or other validation, you
can raise an EPropertyError to signal a problem.
TIniCombo defines a special type for the INI filename (TIniFilename). This
allows it to supply a custom editor that affects only this type. The property
editor brings up a file-open dialog. Since the user's PC may be set up
differently than the programmer's PC, you don't want to store the INI file's
directory. The TIniFileNameEditor object removes the directory name before
setting the property value. The component registers the editor when it
registers itself.
You can call RegisterPropertyEditor to achieve at least three different
effects:
Register your editor to handle all properties of a given type by specifying a
type name.
You can also specify a component name. Your editor will handle only properties
of a given type that occur in the named component or components derived from
it by specifying a component name.
Limit the editor to properties with specific names.


Conclusion


Building components is an important way to realize Delphi's full potential.
While Delphi gives app developers a first-class visual environment, it doesn't
do much for component builders. CompBld can take some of the work out of
creating new components.
Adding custom property editors to your components can make them even more
intuitive for users. The easier a component is to use, the more likely other
programmers are to incorporate it in their applications.
Component developers frequently create new classes. However, new classes can
simplify application development, too. The TPropVals class in CompBld, for
example, makes the code easier to understand and maintain.
Figure 1: Object Inspector.
Figure 2: CompBld in action.

Example 1: Code generated by Component Expert.
unit DDJ;
interface
uses
 SysUtils, WinTypes, WinProcs,
 Messages, Classes, Graphics,
 Controls, Forms, Dialogs,
 ExtCtrls;
type
 TDDJComponent = class(TComboBox)
 private
 { Private declarations }
 protected
 { Protected declarations }
 public
 { Public declarations }
 published
 { Published declarations }
 end;
procedure Register;
implementation
procedure Register;
begin
 RegisterComponents('Samples', [TDDJComponent]);
end;
end.
Example 2: Class definition.
type
 TDDJComponent = class(TComboBox)
 private
 FDelimiter : Char;
 published
 property Delimiter : Char read FDelimiter write FDelimiter
 default ',' ;
 end;
Example 3: The DynamicProp property calls functions.
type
 TDDJComponent = class(TComboBox)
 private
 function GetDynProp : Integer;
 procedure SetDynProp(const value :Integer);
 published
 property DynamicProp : Integer read GetDynProp write SetDynProp;
 end;
Example 4: A component that handles WM_USER.
{ Define message }
const WM_MYMSG=WM_USER;
{ Event Structure ( parses arguments }
TMyMsg = record
 Msg: Cardinal;
 Unused : Word; { wParam }
 Flag : LongInt;{ lParam }
 Result : LongInt;
end;
{ Define event type }
TMyEvent = procedure(Sender : TObject;
 Flag : LongInt) of object;
 .
 .

 .
private:
{ Event handler pointer }
 FOnMyEvent : TMyEvent;
{ Message handling procedure }
 procedure WMMyMsg(
 var Msg : TMyMsg); message WM_MYMSG;
{ Ordinary procedure receives parsed arguments
 and calls the event handler if set }
 procedure MyMsg(Sender : TObject; Flag : Boolean);
published:
 { Event handler shows up in Object Inspector }
 property OnMyEvent : TMyEvent read FOnMyEvent
 write FOnMyEvent;
 .
 .
 .
procedure TXControl.WMMyMsg(var Msg:TMyMsg);
var
 Boolean flg;
begin
 flg:=True;
 if Msg.Flag=0 then flg:=False;
 MyMsg(self,flg);
end;
procedure TXControl.MyMsg(Sender : TObject;
 Flag : Boolean);
begin
 if Assigned(FOnMyEvent) then FOnMyEvent(self,Flag);
end;
Example 5: Typical constructor for a component with two default properties.
 .
 .
 .
published:
 property HighMark read FHighMark
 write FHighMark default 100;
property LowMark read FLowMark
 write FLowMark default 33;
 .
 .
 .
constructor TCompDDJ.Create(AOwner: TComponent);
begin
 inherited Create(AOwner);
 HighMark:=100; { default value }
 LowMark:=33; { default value }
end;
Table 1: Property-editor attributes.
Attribute Definition
paValueList Call GetValues to retrieve enumerated values.
paDialog Call Edit to open a property-specific editor dialog.
paMultiSelect Allow the property to apply to multiple selected components.
paSubProperties This property has subproperties.
Table 2: Built-in property methods.
Property Types Read Write
Floating point GetFloatValue SetFloatValue
Event GetMethodValue SetMethodValue
Ordinal GetOrdValue SetOrdValue

String GetStrValue SetStrValue

Listing One
unit Sdimain;
{ DDJ Component Builder -- Al Williams }
interface
uses WinTypes, WinProcs, Classes, Graphics, Forms, Controls, Menus,
 Dialogs, StdCtrls, Buttons, ExtCtrls,SysUtils,
 IniCombo;
type
 TMainForm = class(TForm)
 MainMenu: TMainMenu;
 FileMenu: TMenuItem;
 SaveItem: TMenuItem;
 ExitItem: TMenuItem;
 N1: TMenuItem;
 SaveDialog: TSaveDialog;
 Help1: TMenuItem;
 About1: TMenuItem;
 StatusBar: TPanel;
 SpeedPanel: TPanel;
 SaveBtn: TSpeedButton;
 ExitBtn: TSpeedButton;
 CName: TEdit;
 Label1: TLabel;
 Label2: TLabel;
 Label3: TLabel;
 Label4: TLabel;
 CBase: TIniCombo;
 CGroup: TIniCombo;
 GroupBox1: TGroupBox;
 CProp: TEdit;
 PRead: TEdit;
 PType: TIniCombo;
 PWrite: TEdit;
 PDefault: TEdit;
 PAdd: TButton;
 PRemove: TButton;
 PBox: TListBox;
 Label5: TLabel;
 Label6: TLabel;
 Label7: TLabel;
 Label8: TLabel;
 Label9: TLabel;
 CComment: TMemo;
 BPublish: TRadioButton;
 BPublic: TRadioButton;
 New1: TMenuItem;
 NewBtn: TSpeedButton;
 procedure ShowHint(Sender: TObject);
 procedure ExitItemClick(Sender: TObject);
 procedure SaveItemClick(Sender: TObject);
 procedure About1Click(Sender: TObject);
 procedure FormCreate(Sender: TObject);
 procedure CNameChange(Sender: TObject);
 procedure CPropChange(Sender: TObject);
 procedure PAddClick(Sender: TObject);
 procedure PBoxClick(Sender: TObject);
 procedure PChange(Sender: TObject);

 procedure FormClose(Sender: TObject; var Action: TCloseAction);
 procedure PRemoveClick(Sender: TObject);
 procedure New1Click(Sender: TObject);
 procedure ComboExit(Sender: TObject);
 procedure ComboKey(Sender: TObject; var Key: Word; Shift: TShiftState);
 private
 { Private declarations }
 procedure Init; { make new document }
 public
 { No Public declarations }
 end;
 PType = (pPublic,pPublished); { Type of property }
{ This non-visual object represents one property associated with the
component.
 SetValues and GetValues methods transfer data from the form to/from object }
 TPropVals = class(TObject)
 PropName : String;
 PropType : String;
 ReadName : String;
 WriteName : String;
 Default : String;
 PrType : PType;
 public
 procedure SetValues(nm : TEdit; typ : TComboBox; r,w,d :
 TEdit; pubsh, publc : TRadioButton);
 procedure GetValues(nm : TEdit; typ : TComboBox; r,w,d :
 TEdit; pubsh, publc : TRadioButton);
end;
var
 MainForm: TMainForm;
implementation
uses About;
{$R *.DFM}
procedure TMainForm.ShowHint(Sender: TObject);
begin
 StatusBar.Caption := Application.Hint;
end;
procedure TMainForm.ExitItemClick(Sender: TObject);
begin
 Close;
end;
{ The save is the bulk of the code we need to write the unit name (based on 
 the file name), the uses clause, the type and implementation sections with
 stubs for all the functions.
}
procedure TMainForm.SaveItemClick(Sender: TObject);
const
 cr = chr(10);
var
 fout : TextFile;
 i : Integer;
 ct : Integer;
 vals : TPropVals;
 unitname : String;
begin
 if SaveDialog.Execute then
 begin
 AssignFile(fout,SaveDialog.FileName);
 Rewrite(fout);
 unitname:=ExtractFileName(SaveDialog.Filename);

 i:=Pos('.',unitname);
 if i<>0 then
 unitname:=Copy(unitname,1,i-1); { remove extension }
 Writeln(fout,'Unit ' + unitname + ';'+cr);
 if CComment.Text <> '' then
 Writeln(fout,'{' + CComment.Text + '}'+cr);
 Writeln(fout,'interface'+cr);
 Writeln(fout,'uses SysUtils, WinTypes, WinProcs, Messages, Classes,');
 Writeln(fout,' Graphics, Controls, Forms, Dialogs, StdCtrls;'+cr);
 Writeln(fout,'type');
 Writeln(fout,CName.Text + '= class('+ CBase.Text +')');
 Writeln(fout,'private');
 ct := PBox.Items.Count;
 { dump property variables here }
 for i:=0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 Writeln(fout,' F'+vals.PropName + ':' + vals.PropType+';');
 end;
 { dump property functions here }
 for i:= 0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 if (vals.ReadName <> '') and (vals.ReadName <> 'F'+vals.PropName) then
 Writeln(fout,' function '+vals.ReadName+' : '+vals.PropType+';');
 if (vals.WriteName <>'') and (vals.WriteName <> 'F' + vals.PropName) then
 Writeln(fout,' procedure '+vals.WriteName+'( const Value : '+
 vals.PropType+');');
 end;
{ Write constructor header }
 Writeln(fout,cr+'public');
 Writeln(fout,' constructor Create(AOwner : TComponent); override;');
{ Public properties }
 for i:= 0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 if vals.PrType=pPublic then
 begin
 Write(fout,' property ' + vals.PropName + ' : ' + vals.PropType);
 if vals.ReadName<>'' then
 Write(fout,' read '+vals.ReadName);
 if vals.WriteName<>'' then
 Write(fout,' write '+vals.WriteName);
 if vals.Default<>'' then
 Write(fout,' default '+vals.Default);
 Writeln(fout,';');
 end;
 end;
{ Published properties }
 Writeln(fout,cr+'published');
 for i:= 0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 if vals.PrType=pPublished then
 begin
 Write(fout,' property ' + vals.PropName + ' : ' + vals.PropType);
 if vals.ReadName<>'' then
 Write(fout,' read '+vals.ReadName);
 if vals.WriteName<>'' then

 Write(fout,' write '+vals.WriteName);
 if vals.Default<>'' then
 Write(fout,' default '+vals.Default);
 Writeln(fout,';');
 end;
 end;
 Writeln(fout,'end;');
{ Take care of Register function and implemenation }
 Writeln(fout,cr+'procedure Register;'+cr);
 Writeln(fout,'implementation');
 Writeln(fout,cr+'procedure Register;');
 Writeln(fout,'begin');
 Writeln(fout,' RegisterComponents('''+CGroup.Text+''',['+CName.Text+']);');
 Writeln(fout,'end;');
{ write constructor }
 Writeln(fout,cr+'constructor '+CName.Text+'.Create(AOwner:TComponent);');
 Writeln(fout,'begin');
 Writeln(fout,' inherited Create(AOwner);');
{ Set up defaults here }
 for i:= 0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 if vals.Default<>'' then
 Writeln(fout,' '+vals.PropName+':='+vals.Default+';');
 end;
 Writeln(fout,' { your code here }');
 Writeln(fout,'end;');
{ Write get/set functions }
 for i:= 0 to ct-1 do
 begin
 vals:=TPropVals(PBox.Items.Objects[i]);
 if (vals.ReadName<>'') and (vals.ReadName <> 'F'+vals.PropName) then
 begin
 Writeln(fout,cr+'function '+Cname.Text+'.'+vals.ReadName+
 ' : '+vals.PropType+';');
 Writeln(fout,'begin');
 Writeln(fout,' result:=F'+vals.PropName+';');
 Writeln(fout,'end;');
 end;
 if (vals.WriteName<>'') and (vals.WriteName <> 'F'+vals.PropName) then
 begin
 Writeln(fout,cr+'procedure '+Cname.Text+'.'+vals.WriteName+
 '(const Value : '+vals.PropType+');');
 Writeln(fout,'begin');
 Writeln(fout,' F'+vals.PropName+':=Value;');
 Writeln(fout,'end;');
 end;
 end;
 Writeln(fout,cr+'end.');
 CloseFile(fout);
 end;
end;
procedure TMainForm.About1Click(Sender: TObject);
begin
 AboutBox.ShowModal;
end;
procedure TMainForm.FormCreate(Sender: TObject);
begin
 Application.OnHint := ShowHint;

 Init; { Set up as new document }
end;
{ The TPropVal object contains information about a property }
{ Set up component values from a TPropVals object }
procedure TPropVals.SetValues(nm : TEdit; typ : TComboBox; r,w,d :
 TEdit; pubsh, publc : TRadioButton);
begin
 nm.Text:=PropName;
 typ.Text:=PropType;
 r.Text:=ReadName;
 w.Text:=WriteName;
 d.Text:=Default;
 if PrType = pPublished then
 pubsh.Checked:=True
 else
 publc.Checked:=True;
end;
{ Set up a TPropVal based on component values }
procedure TPropVals.GetValues(nm : TEdit; typ : TComboBox; r,w,d :
 TEdit; pubsh, publc : TRadioButton);
begin
 PropName:=nm.Text;
 PropType:=typ.Text;
 ReadName:=r.Text;
 WriteName:=w.Text;
 Default:=d.Text;
 if (pubsh.Checked) then
 PrType:=pPublished
 else
 PrType:=pPublic;
end;
{ Come here when name changes }
procedure TMainForm.CNameChange(Sender: TObject);
var en : Boolean;
begin
if CName.Text = '' then
 en:=false
else
 en:=true;
CProp.Enabled:=en;
PBox.Enabled:=en;
end;
{ Come here when property name changes }
procedure TMainForm.CPropChange(Sender: TObject);
var
 en : Boolean;
begin
if CProp.Text='' then
 en := false
else
 en := true;
PType.Enabled:=en;
PRead.Enabled:=en;
PWrite.Enabled:=en;
PDefault.Enabled:=en;
BPublish.Enabled:=en;
BPublic.Enabled:=en;
PAdd.Enabled:=en;
PRead.Text:='F'+CProp.Text;

PWrite.Text:='F'+CProp.Text;
PDefault.Text:='';
end;
{ Click on Add button }
procedure TMainForm.PAddClick(Sender: TObject);
var
 vals : TPropVals;
 idx : Integer;
 istring : String;
begin
 vals:=TPropVals.Create;
 vals.GetValues(CProp,PType,PRead,PWrite,PDefault,BPublish,BPublic);
 istring:=Cprop.Text;
 idx := PBox.Items.IndexOf(istring);
 if idx = -1 then { new item }
 idx:=PBox.Items.AddObject(istring,vals)
 else
 begin { replace item }
 PBox.Items.Objects[idx].Free;
 PBox.Items.Objects[idx]:=vals;
 end;
 PBox.ItemIndex:=idx;
 PBoxClick(nil); { Update remove button, et. al. }
 ActiveControl:=CProp; { reset to property field }
 CProp.SelectAll;
end;
{ Click on list box }
procedure TMainForm.PBoxClick(Sender: TObject);
var
vals : TPropVals;
begin
if PBox.ItemIndex <> -1 then
 begin
 vals:=TPropVals(PBox.Items.Objects[PBox.ItemIndex]);
 vals.SetValues(CProp,PType,PRead,PWrite,PDefault,BPublish,BPublic);
 PRemove.Enabled:=True;
 end
else
 PRemove.Enabled:=False;
end;
{ Come here when any property field changes }
procedure TMainForm.PChange(Sender: TObject);
var
 vals : TPropVals;
 idx : Integer;
 istring : String;
begin
 vals:=TPropVals.Create;
 vals.GetValues(CProp,PType,PRead,PWrite,PDefault,BPublish,BPublic);
 istring:=CProp.Text;
 idx := PBox.Items.IndexOf(istring);
 if idx <> -1 then
 begin
 PBox.Items.Objects[idx].Free; 
 PBox.Items.Objects[idx]:=vals;
 end;
end;
procedure TMainForm.FormClose(Sender: TObject; var Action: TCloseAction);
begin

if MessageDlg('Really Quit?',mtConfirmation,[mbOK,mbCancel],0)=mrOK then
 Action:=caFree
else
 Action:=caNone;
end;
{ On Remove Button }
procedure TMainForm.PRemoveClick(Sender: TObject);
begin
Pbox.Items.Delete(PBox.ItemIndex);
CProp.Text:='';
PRead.Text:='';
PWrite.Text:='';
PDefault.Text:='';
PBoxClick(nil);
end;
procedure TMainForm.New1Click(Sender: TObject);
begin
if MessageDlg('Start new component?',mtConfirmation,[mbYes,mbNo],0)=mrYes then
 Init;
end;
procedure TMainForm.Init;
begin
 { empty form }
 CName.Text:='';
 CBase.Text:='TComponent';
 CGroup.Text:='Samples';
 CComment.Text:='';
 PBox.Clear;
 BPublish.Checked:=True;
 CProp.Text:='';
 PRead.Text:='';
 PWrite.Text:='';
 PDefault.Text:='';
end;
{ When combo box exits, add any new item to the list and write to INI file}
procedure TMainForm.ComboExit(Sender: TObject);
var
 cb : TIniCombo;
 temp : String;
begin
 cb:=Sender as TIniCombo;
 if cb.Items.IndexOf(cb.Text)=-1 then
 begin
 temp:=cb.Text;
 cb.Items.Add(temp);
 cb.Write;
 cb.ReRead;
 cb.ItemIndex:=cb.Items.IndexOf(temp);
 end;
end;
{ Look for ^Del and remove item from INI file and list }
procedure TMainForm.ComboKey(Sender: TObject; var Key: Word;
 Shift: TShiftState);
var
cb : TIniCombo;
begin
 if (Key=VK_DELETE) and (Shift=[ssCtrl]) then
 begin
 cb:=Sender as TIniCombo;

 cb.Items.Delete(cb.Items.IndexOf(cb.Text));
 cb.Text:='';
 cb.Write;
 end;
end;
end.

Listing Two
unit Inicombo;
interface
uses
 SysUtils,
 WinTypes,
 WinProcs,
 Messages,
 Classes,
 Graphics,
 Controls,
 Forms,
 Dialogs,
 StdCtrls,
 IniFiles,
 DsgnIntf;
type
 TIniFileName = String[255];
 TIniCombo = class(TComboBox)
 private
 FFilename : TIniFileName;
 FSection : String;
 FItem : String;
 FSeparator : Char;
 protected
 procedure Loaded; override;
 public
 procedure Reread;
 procedure Write;
 constructor Create(AOwner : TComponent); override;
 published
 property Filename : TIniFileName read FFilename write FFilename;
 property Section : String read FSection write FSection;
 property Item : String read FItem write FItem;
 property Separator : Char read FSeparator write FSeparator default '';
 end;
TIniFileNameEditor=class(TPropertyEditor)
public
 function GetValue : String; override;
 procedure SetValue(const Value : String); override;
 function GetAttributes : TPropertyAttributes; override;
 procedure Edit; override;
end;
procedure Register;
implementation
procedure Register;
begin
 RegisterPropertyEditor(TypeInfo(TIniFileName),nil,'',TIniFileNameEditor);
 RegisterComponents('Samples', [TIniCombo]);
end;
constructor TIniCombo.Create(AOwner : TComponent);
begin

 inherited Create(AOwner);
 Separator:=''; { default value }
end;
{ Setup after everthing is ready }
procedure TIniCombo.Loaded;
begin
 inherited Loaded;
 if Filename <> '' then Reread;
end;
procedure TIniCombo.Reread;
var
Ini : TIniFile;
Work : String;
Entry : String;
n : Integer;
begin
Ini:=TIniFile.Create(FFilename);
Work := Ini.ReadString(FSection,FItem,'');
if Work <> '' then Clear;
n:=Pos(FSeparator,Work);
while n<>0 do
 begin
 Entry:=Copy(Work,1,n-1);
 Items.Add(Entry);
 Delete(Work,1,n);
 n:=Pos(FSeparator,Work);
 end;
{ Add last one }
if Length(Work)<>0 then Items.Add(Work);
end;
procedure TIniCombo.Write;
var
Ini : TIniFile;
Work : String;
Entry : String;
n : Integer;
begin;
Ini:=TIniFile.Create(FFilename);
{ build string }
Work:='';
for n:=0 to Items.Count do
 begin
 Work:=Work + Items[n];
 if n<> Items.Count then Work:=Work + FSeparator;
 end;
Ini.WriteString(FSection,FItem,Work);
end;
function TIniFileNameEditor.GetAttributes : TPropertyAttributes;
begin
 result:=[paMultiSelect,paDialog];
end;
function TIniFileNameEditor.GetValue : String;
begin
 result:=GetStrValue;
end;
procedure TIniFileNameEditor.SetValue(const Value : String);
begin
 SetStrValue(Value);
end;

procedure TIniFileNameEditor.Edit;
var
dlg : TOpenDialog;
dbuf : array[0..255] of Char;
begin
 dlg:=TOpenDialog.Create(Application);
 try
 dlg.Filename:=GetStrValue;
 dlg.DefaultExt:='INI';
 dlg.Filter:='Ini Files*.INIAll Files*.*';
 GetWindowsDirectory(dbuf,256);
 dlg.InitialDir:=StrPas(dbuf);
 if dlg.Execute=TRUE then
 SetStrValue(ExtractFileName(dlg.FileName));
 finally
 dlg.Free;
 end;
end;
end.












































Extending Visual Basic's Comm Control


Adding Xmodem support




Michael Floyd


Michael is executive editor for DDJ and author of Developing Visual Basic 4
Communications Applications (Coriolis Group, 1995). He can be contacted at the
DDJ offices or 76703.4057@compuserve.com.


Support for OLE custom controls (OCXs) is a new feature in Visual Basic 4.0
(VB4). One OCX, the Visual Basic communications (Comm) control, hides the
low-level details of serial communications (fetching characters from the UART
and the like) while providing a high-level interface based on the event-driven
model. You simply write code in response to events. When a character comes
through the serial port, you can grab it using an Input method. Thus, the Comm
control works fine when streaming text from the serial port into a terminal
window. However, the Visual Basic Comm control does not directly support
protocols such as Xmodem for transferring binary files--a necessity for any
real communications program.
There are several options for supporting binary transfers. You can, for
instance, buy an enhanced, add-on Comm control from Crescent Software (the
original developer of the bare-bones Visual Basic Comm control) that supports
most popular file-transfer protocols--Xmodem, Ymodem, Zmodem, and the like.
Alternatively, you can write your own Comm control, or extend the minimal
control included with VB4. In this article, I'll describe how to extend the
Comm control by adding support for the Xmodem protocol (sometimes called
"Modem7"). Although this implementation only supports checksums for error
detection, you can easily add a cyclic redundancy check (CRC).


Comm-Control Crash Course


The Comm control supports both event-driven and polling methods to send and
receive data through the serial port. It generates a single event, OnComm,
which can trap events such as receiving data through the serial port and
errors such as a full transmit buffer. You can control how often the
CommEvReceive event is generated by setting the RThreshold property. For
example, setting RThreshold to 1 causes an CommEvReceive event to be generated
each time a character is received in the input buffer. The value of an event
or error is stored as an integer in the CommEvent property, so you can use
CommEvent to determine the most recent event or error. For polling, you can
disable the generation of the CommEvReceive event by setting the RThreshold
property to 0.
You can begin "talking" to the serial port in as few as ten lines of code.
Characters are sent using the Output function and received using the Input
function; see Example 1. You start by setting the CommPort property to
establish which serial port your modem is connected to. If you are on COM1:,
set ComPort=1. You then establish the port settings with the Settings
property. A typical setting might be Settings="9600,n,8,1". Before opening the
communications port, you also need to tell the Comm control how many
characters to fetch from the input buffer. The InputLen property sets or
returns the number of characters to be read when using the Input procedure.
Setting InputLen to 0 tells the communications control that the entire receive
buffer should be read when using the Input procedure. 


TinyComm


To demonstrate how to develop a serial-communications program, I've written a
terminal program called "TinyComm." One of the reasons I chose this name is
that the program's Xmodem code is based on that presented by Al Stevens in his
"C Programming" column (DDJ, February and March 1989). TinyComm is a minimal
implementation of a terminal program, but despite its simplicity, TinyComm is
more robust than the sample VBTERM application supplied with Visual Basic.
TinyComm allows you to select a remote system from a phone-book entry and dial
that system. Once connected, you can log in and interact with the remote
system. Here, I will focus on just a few of TinyComm's subroutines; the
complete project is available electronically (see "Availability," on page 3). 
TinyComm consists of three forms: a main window (TinyCom.frm), a
phone-book-dialer window (Phonebk.frm), and an About box (AboutBox.frm). The
main window consists of the terminal window and six pull-down menus. The
terminal window presents some interesting challenges. A text-box control
provides much of the functionality this window will require. However, some
modifications must be made to support communications. For instance, characters
typed into the text box must be sent to the serial port and echoed back on the
screen. Additionally, when data comes through the serial port, you need to
display it to the screen. You would also like to be able to resize the window,
and have the data adjust itself properly.
CommCtrl.OnComm() (I've named this instance of the control CommCtrl) uses a
Case statement to direct the program to the proper handler. When data is
received through the serial port, the CommEvReceive case is triggered.
CommEvReceive first grabs data from the communications control's receive
buffer and assigns it to the TerminalTxt string variable. This gives us a
chance to filter the data before displaying it on the screen. Next, the cursor
position in the terminal window is determined by calculating the length of
text in the Window and assigning that value to TerminalWindow.SelStart.
Normally, the SelStart property determines the starting point of text that has
been highlighted and selected in the text box. If no text has been selected,
as in this case, SelStart indicates the position to begin inserting text.
To handle backspace characters, you test for the ASCII-character value 8H. If
a backspace is detected, you subtract 1 from the current cursor position and
assign that as the new SelStart position. On the screen, this moves the cursor
back one space and erases the character that was in this position. Finally,
the filtered text is displayed on the screen using SelText. Of course, there
are plenty of other things you could filter and handle here, including special
characters used in terminal emulation.
The other important form involves the phone book, an Access database (.MDB)
file. This version of TinyComm provides no direct interface to the phone book,
so you must add entries using the Data Manager. The PhoneBook form consists of
a data grid and five button controls, three of which are currently disabled.
The fourth button control closes the PhoneBook form, and the final button,
Dial, dials the phone number of the system currently selected in the data
grid.
The DialBtn_Click() subroutine retrieves the dialup information from the
database and passes Comm control settings and the phone number to CallNum().
DialBtn_Click() first disables the Dial and Quit buttons and enables the
Cancel button (which aborts the dialing process). Next, the database is opened
and the record pointer is moved to the first record in the database. I use
FindFirst to locate the NameStr in the database, so a SQL query is constructed
and stored in the Query variable. The phone number associated with NameStr is
placed in the Number variable and passed to CallNum for dialing. When control
returns from CallNum, the buttons are reset to their prior state. 


Xmodem Refresher


The Xmodem protocol is well documented and has been covered extensively in DDJ
(most recently in "Intelligent XYModem," by Tim Kientzle, December 1994).
Still, a quick refresher is in order. According to the Xmodem protocol, data
are broken into 128-byte chunks and packaged into blocks (or packets) for
transmission. Each data packet is prefixed with a special Start of Header
(SOH) character, a block number, and the one's complement of the block number.
This is followed by 128 bytes of data. Finally, a checksum character is
appended to the data packet. Figure 1 shows how the message block is packaged.
In general terms, the transfer begins with the receiver sending out a series
of Negative Acknowledgment (NAK) characters at ten-second intervals. When the
sending program sees a NAK, it sends the first block of data. The receiving
program examines the data block and checks it for problems. If there are none,
the receiver sends an Acknowledgment (ACK) character indicating the block is
fine. If, on the other hand, there is a problem with the transmission, a NAK
is sent. The sender responds to a NAK by resending the bad block, and to an
ACK, by sending the next block. The process continues until an End of
Transmission (EOT) character is received or the file transfer is aborted. 
When the first packet is received, the receiver examines the first byte for
the SOH character (01H). If the header character is found, the receiver
assumes that the message block is valid and begins the file transfer in
earnest. The receiving program next takes the block number and calculates the
one's complement to the block number. This value is compared to the one's
complement sent by the sending program. If the two values match, everything is
fine and the receiver extracts the 128 bytes of data. The one's complement is
computed in VB by performing a bitwise Not(). If the two values do not match,
there is an error in the transmission and the packet must be resent. The
receiving program notifies the sender by sending a NAK (&H15).
The next step in the process is for the receiver to calculate a checksum of
the 128 bytes and compare this value to the checksum that was sent by the
remote system. Assuming the two checksum values match and the one's complement
values match, the receiver sends an ACK character (&H6). The checksum is
calculated by summing each of the data bytes. Figure 2 shows the algorithm for
downloading a file using Xmodem.


Implementing Xmodem


Listing One shows the download_xmodem() routine, which is based on C code
written by Al Stevens. Thus, if you're a C programmer, you should be able to
follow the Visual Basic code without difficulty. However, while variables and
subroutines may follow the general structure of Al's code, there are some
significant differences. 
The download_xmodem() routine takes a file handle as an argument. Thus, the
calling routine must create a valid file handle and open the file prior to
calling download_xmodem(). The calling routine is also responsible for closing
the file after the file transfer is complete. The code in Figure 3 can be used
to call the download_xmodem() subroutine.
My version of download_xmodem() polls the serial port for input rather than
using event-driven methods. This is accomplished by first setting the InputLen
property to 1, which tells the control to receive characters through the comm
port, one at a time. Next, the Comm control's RThreshold property is set to 0,
thus disabling the generation of the OnComm event. In TinyComm, disabling
CommEvReceive has the side effect of disabling output to the terminal window.
Unfortunately, this also disables all other events and error messages
processed by this event. I've temporarily disabled the CommEvReceive event to
simplify the discussion and to focus on Xmodem rather than event processing.
Note, however, that the global TerminalMode variable has been set to False at
the beginning of Listing One. Code in OnComm's receive event handler checks
TerminalMode and disables output to the window when the variable is set to
False. 
One difference between the C and Visual Basic versions of download_xmodem()
shows up in the ReadComm() subroutine. To be useful to download_xmodem(), each
string character retrieved from the Comm control's Input method must be
converted to an integer value representing its ASCII equivalent. The
ReadComm() subroutine shown in Listing One grabs a string character from the
serial port, converts it to an integer, and returns the result. Visual Basic's
Asc() function performs the conversion. If a null string is encountered,
ReadComm() returns 0. 
Another difference involves the Delay() function (which Al calls the sleep()
function). Delay() is used to pause the system for a predetermined period of
time. For example, to initiate the file transfer, download_xmodem() sends a
NAK and checks the input buffer to see if an ACK response character has been
sent. If not, the subroutine waits approximately six seconds (I've shortened
the delay time), then sends out another NAK. Visual Basic provides a Timer
control that can be used for just this purpose. The Timer control uses the
PC's system clock to generate an event after a set period of time (specified
in milliseconds). However, the accuracy of the Timer control is limited by the
system clock, which generates a clock tick every 1/18th of a second. When the
Timer event is generated, I increment a global variable called SecondsElapsed.
The Delay() subroutine loops until the desired number of seconds have elapsed.

Delay() also periodically issues a DoEvents(), which hands control over to
Windows to process other events within the system. Without DoEvents(), the
system appears to be hung. I've found Delay() useful in many situations, and
have even included it as part of a scripting language for TinyComm that I call
"TinyScript.''


Conclusion



Clearly, I've only touched on TinyComm's highlights. You can add many
features, including more event and error handlers. Most terminal programs
support file capture, as well as ASCII file send and receive capability. I
have also shown only the basics of Xmodem support. There is, of course, a
complimentary upload_xmodem() subroutine. In addition, it is rather easy to
add CRC support to the Xmodem subroutines, and you will undoubtedly want to
take full advantage of the visual controls supplied by Visual Basic. 
Example 1: Opening the serial port and initializing the modem.
CommCtrl.CommPort = 1
CommCtrl.Settings = "9600,n,8,1"
CommCtrl.InputLen = 0
CommCtrl.PortOpen = True
CommCtrl.Output = "ATZ" + Chr(13) + chr(10)
Do
 DummyVar = DoEvents()
Loop Until CommCtrl.InBufferCount >= 2
InString$ = CommCtrl.Input
CommCtrl.PortOpen = False
Figure 1: Xmodem data packet.
Figure 2: Xmodem's download algorithm.
Send NAKs every 10 seconds until a packet is received
If packet received then check for SOH
If SOH then
 get block number
 calculate One's complement to block number
 compare complement to the complement sent in the packet
 If local complement <> remote complement then
 Send NAK and repeat process
 Else
 get 128 bytes of data
 calculate checksum
 compare local checksum to packet checksum
 If local checksum <> packet checksum then
 Send NAK and repeat process
 Else
 write data to file
 send ACK
 End If
 End If
Read next SOH
If SOH = EOT then transmission successful
If SOH = CAN then transmission aborted
If SOH = &H01 then get next packet (repeat)
Figure 3: Calling download_xmodem().
FileHandle = FreeFile 'Get the next free file handle
FileName = "SomeFile"
Open FileName For Output As FileHandle
download_xmodem (FileHandle)
Close

Listing One
' xmodem.bas -- Michael Floyd -- Dr. Dobb's Journal, December 1995.)
Global Const RETRIES = 12
Global Const CRCTRIES = 2
Global Const PADCHAR = &H1A
Global Const SOH = &H1
Global Const EOT = &H4
Global Const ACK = &H6
Global Const NAK = &H15
Global Const CAN = &H18
Global Const CRC = "C"
Global tries, SecondsElapsed As Integer 
Global InBuffer As String

Sub Delay(Seconds) 
 SecondsElapsed = 0
 If Seconds < 1 Then
 Terminal.Timer1.Interval = 1000 * Seconds
 Else
 Terminal.Timer1.Interval = 1000
 End If
 Terminal.Timer1.Enabled = True 'Enable timer
 Do While SecondsElapsed <= Seconds
 If I Mod 10 = 0 Then DoEvents
 Terminal.Label1.Caption = SecondsElapsed
 I = I + 1
 Loop
 Terminal.Timer1.Enabled = False
End Sub
Sub download_xmodem(FileNum)
Dim buffer, Checksum, Block, RemoteChecksum, RemoteComplement,
_____LINEEND____
 RemoteBlockNumber, SOHChar As Integer
Dim ByteArray$(1 To 128)
 TerminalMode = False 'Disable output to terminal
 Block = 0
 SOHChar = 0
 fst = True
 Terminal.CommCtrl.InBufferCount = 0 'Flush the Input buffer
 Terminal.CommCtrl.InputLen = 1 'Receive one char at a time
 Terminal.CommCtrl.RThreshold = 0 'Disable generation of OnComm Event
 tries = 0
 TIMEOUT = 6
 test_wordlen
 ' send NAKs until the sender starts sending
 Do While (SOHChar <> SOH) And (tries < RETRIES)
 tries = tries + 1
 Terminal.CommCtrl.Output = Chr$(NAK)
 Delay 1
 SOHChar = ReadComm()
 If SOHChar <> SOH Then
 Delay 6
 End If
 Loop
 Do While tries < RETRIES
 ' -- Receive the data and build the file --
 Terminal.Label1.Caption = "Block " + Str(Block + 1)
 If Not (fst) Then
 TIMEOUT = 10
 SOHChar = ReadComm()
 If TimedOut() Then
 MsgBox "Timed Out"
 End If
 If SOHChar = CAN Then
 MsgBox "CAN Received"
 Exit Do
 End If
 If SOHChar = EOT Then
 Terminal.CommCtrl.Output = Chr$(ACK)
 MsgBox "EOT Received"
 Exit Do
 End If
 If SOHChar <> SOH Then
 If SOHChar = EOT Then

 Terminal.CommCtrl.Output = Chr$(ACK)
 MsgBox "EOT Received"
 Exit Do
 End If
 Do While (SOHChar <> SOH)
 If tries >= RETRIES Then
 MsgBox "SOH errors!"
 Exit Do
 End If
 tries = tries + 1
 Terminal.CommCtrl.InBufferCount = 0 'Flush Input buffer
 Terminal.CommCtrl.Output = Chr$(NAK)
 Delay 1
 SOHChar = ReadComm()
 Loop
 End If
 End If
 fst = False
 TIMEOUT = 1 ' Switch to one sec. timeouts
 
 RemoteBlockNumber = ReadComm() ' Read block number
 RemoteComplement = ReadComm() ' Read 1's complement
 Checksum = 0
 DLInfo.Label1.Caption = "Block: " + Str(RemoteBlockNumber) + _____LINEEND____
 " SOHChar: " + Str(SOHChar)
 ' ---- data block -----
 For I = 1 To 128
 buffer = ReadComm()
 Buf$ = Buf$ + Chr$(buffer)
 Checksum = Checksum + buffer
 Next
 Checksum = Checksum And 255
 ' ---- checksum from sender ----
 RemoteChecksum = ReadComm()
 ' --- Handle resent blocks ---
 If RemoteBlockNumber = Block Then
 FilePos = Seek(FileNum)
 Seek FileNum, FilePos - 128
 ' --- handle out of synch block numbers ---
 ElseIf RemoteBlockNumber <> (Block + 1) Then
 receive_error "No next sequential block", CAN
 Exit Do
 End If
 Block = RemoteBlockNumber
 ' --- test the block # 1's complement ---
 BlocksComplement = (Not RemoteBlockNumber And &HFF)
 If (RemoteComplement And &HFF) <> BlocksComplement Then
 receive_error "One's complement does not match", NAK
 End If
 ' --- test chksum or crc vs one sent ---
 If Checksum <> RemoteChecksum Then
 receive_error "non-matching Checksums", NAK
 End If
 ' --- write the block to disk ---
 For I = 1 to Len(Buf$)
 Print #FileNum, Mid(Buf$, I, 1)
 Next I
 Terminal.CommCtrl.Output = Chr$(ACK)
 Delay 0.5

 Loop
 If SOHChar = EOT Then
 MsgBox "Transfer Complete"
 Else
 MsgBox "Transfer Aborted"
 End If
 TIMEOUT = 10
 Terminal.CommCtrl.InBufferCount = 0 'Flush the buffer
 Terminal.CommCtrl.InputLen = 0 'Receive all chars in buffer
 Terminal.CommCtrl.RThreshold = 1 'Enable generation of OnComm Event
 TerminalMode = True 'Enable output to terminal
End Sub
 Function ReadComm() As Integer
 Dim Tmp As String
 ' ReadComm reads a character from the Comm control's input buffer
 ' and returns the ASCII value of that character. If a null string is
 ' encountered, ReadComm returns 0.
 
 If Terminal.CommCtrl.InBufferCount > 0 Then
 Tmp = Terminal.CommCtrl.Input
 If Tmp <> "" Then
 ReadComm = Asc(Tmp)
 Else
 ReadComm = 0
 End If
 Else
 ReadComm = 0
 End If
End Function
Static Sub receive_error(ErrorMsg, Rtn)
 tries = tries + 1
 If TIMEOUT = 1 Then
 MsgBox "error " + ErrorMsg
 End If
End Sub
Sub test_wordlen()
 Settings = Terminal.CommCtrl.Settings
 If InStr(Settings, ",8,") = 0 Then
 MsgBox "Must be 8 Data Bits"
 tries = RETRIES
 End If
End Sub
Function TimedOut() As Integer
 Ticker = 1
 If Ticker = 0 Then
 TimedOut = True
 Else
 TimedOut = False
 End If
End Function









































































A C++ Integrator Class


C++ classes for solving differential equations




Darrel J. Conway


Darrel, who holds a PhD in physics, has worked in educational and industrial
settings building numerical models of gravitating systems ranging from black
holes to Earth-orbiting spacecraft. He can be contacted at
71203.1415@compuserve.com.


When modeling the behavior of systems, scientists and engineers must
frequently solve differential equations. In this article, I'll present a C++
implementation of a class hierarchy that facilitates this process by allowing
for rapid incorporation of new integration methods as they become available.
As an example of this hierarchy, I'll focus on adaptive stepsize Runge-Kutta
integrators. The resulting code contains two integrators: a fifth-order
integrator with fourth-order error control (discussed in Numerical Recipes in
C, by William Press, et al.), which I refer to as a "Runge-Kutta 4(5)
integrator," and the "Runge-Kutta 7(8) integrator" derived by J.H. Verner.
(Verner presents several different sets of coefficients for Runge-Kutta
integrators, along with a description of the methodology used to produce
them.) As you'll see, incorporation of the second integrator into the class
structure is a simple matter of entering the number of stages and the
coefficients for the integration algorithm.
The general problem I'll address involves a system of i variables, r, in some
initial configuration that can be represented by a set of differential
equations of the form in Example 1. In this case, t is the independent
parameter for the system, and r(i) is the ith variable of the system. I'll
solve for the values of the variables r at some other value of t. To
illustrate how the integrator classes can be used to solve this problem, I'll
set up and integrate the Newtonian three-body problem (see the accompanying
text box).
I designed the integration-class structure to be reusable without modification
by implementing the following features:
The integrators do not contain details about the system being integrated.
These details are passed to the integrator after an instance of the class is
created. Among other things, this means that the integrator classes do not
know the dimensions of the system being integrated until an integrator object
is instantiated.
The differential equations that provide derivative information to the
integrators are specified outside of the integrator classes. 
The integrator-class structure is flexible. For instance, if someone derives a
new integration implementation for one of the techniques implemented in the
class hierarchy, the new derivation can be incorporated into the class
structure with minimal coding. 
The integrators are reusable for many different sets of differential
equations. 


The Class Hierarchy


As Figure 1 illustrates, the integration algorithms can be divided into three
levels. All integration schemes share several features, including information
about the derivatives (represented generically by Example 1), the initial
conditions at the start of the integration, and a function to take the desired
integration step. These features are defined in the Integrator base class.
The integrators need access to information about the system being integrated.
This information is provided through the Derivative class and its children;
see Figure 2. Each type of integration scheme is derived from the Integrator
class; Figure 1 shows two such classes: the AdaptiveRungeKutta class and
PredictorCorrector class. Each class contains algorithm-specific details of
the integrator; for instance, the AdaptiveRungeKutta class contains pointers
to the arrays of coefficients needed for the Runge-Kutta algorithms, along
with functions used to evaluate the error incurred during a step and the
implementation of the integration stepping routine.
Specific integrators are derived from these intermediate classes. A typical
implementation at this level contains the coefficients and descriptions needed
to create an instance of the integrator. All of the code required to step the
integrator, including error control, is encapsulated in the intermediate
class. This formulation allows you to rapidly build multiple, yet different,
integrators of a given type. 
The code presented here, although written and tested using Borland C++ 4.5, is
"vanilla" C++, which should port easily to other development systems and
platforms. However, I haven't defined the functions for the copy constructor
or the assignment operator for the classes, so you'll have to write them if
you need them. 


The Integrator and Derivative Classes


Listing One is the class definition for the Integrator class and Listing Two
presents the Integrator base class. The Integrator class can be used as the
skeleton for any integrators you implement. It provides pointers to the memory
used by the initial state of the variables and the memory location for the
integrated variables, along with functions to set these pointers. The
Integrator class tracks the number of variables being integrated in the
dimension variable, which is set by the constructor for the class. Finally,
the Integrator class contains a pointer to Derivative, a base class that
provides the framework needed to obtain the derivative information in Example
1.
Integrator is a pure virtual class. Each integrator derived from it needs the
Step() function to initiate an integration. This function takes one parameter:
a floating-point value for the size of the integration step. Each integration
scheme performs different mechanizations to calculate the integration step.
Step() is included in the base class to make the interface to the integration
consistent across the integrators.
The Integrator class cannot be used without corresponding derivative
information. This can be problematic if the resulting Integrator is specific
to the set of equations contained in the class. Consequently, I decided to
create the Derivative class. Like Integrator, the Derivative class is a
skeleton used to make the interface to the derivative information consistent;
see Listing Three. Derivative does not contain any data, and (aside from the
empty constructor and destructor) consists of a single pure virtual function
interface, GetDerivatives(), that takes as its input the state of the
independent variable and a pointer to the array of dependent variables whose
derivatives need to be calculated. The array of dependent variables is
replaced by the corresponding derivatives in GetDerivatives(). Thus, classes
derived from the Derivative class write the calculated information to a data
array owned by Integrator. This violates the encapsulation of the integrator's
internal data, but I decided to implement the integrator this way to improve
performance.
The connection between the derivative classes and the integrators makes them
logical members of a larger entity: the class that represents the physical or
mathematical system being modeled. Once created, the tight coupling between
the integrator and the derivatives is hidden from the rest of the program.
(This is the type of entity which I'll implement for the three-body system.)
Listings Four and Five present the ThreeBody class, a sample method for the
implementation of derivative information. ThreeBody is derived from the
Derivative class and inherits GetDerivatives(). The code implements the
derivatives needed for the three-body problem.
You still need to specify where the starting and ending data are stored at
each integration step. These data are set using the initState and finalState
pointers; pointers are set using InitState() and FinalState(). The memory
needed for these data is controlled outside of the Integrator class hierarchy.


The Adaptive Step Runge-Kutta and Runge-Kutta 4(5) Classes


Once you've set up the derivative information, the dimension of the
integration, and the memory locations for the integration data, the external
interfaces for the integrator are set. Next, you need to create a class that
captures the essence of an integration technique, but leaves out the
implementation details. I've implemented the adaptive stepsize Runge-Kutta
integrators in the AdaptiveRungeKutta class to illustrate this middle layer of
the class structure (see Listings Six and Seven).
Runge-Kutta integrators work by taking a series of substeps (or stages) from
the initial point and evaluating the derivatives of the variables at these
stages. The accumulated data calculated across the stages is used to generate
the final integrated data. For a total step of size h, you can write the ith
stage ki(n) for the nth variable y(n) in the integration scheme as in Example
2.
The arrays ai and bij contain constants specific to a particular
implementation of the Runge-Kutta algorithm. Example 2 describes how to
generate the ith stage when you have the preceding i-1 stages. The first stage
is generated using Example 3. After all of the stages have been generated, the
integration step is given by Example 4, where s is the number of stages in the
implementation and ci is a third set of constant coefficients. Example 2
through Example 4 are implemented in Listing Seven (see the Step() function).
You can control the accuracy of a Runge-Kutta implementation by performing two
independent integrations of the starting state across the same interval. If
you then subtract the results of these integrations, you will get a local
estimate of the error incurred by the numerical integration. This estimate can
be used to adapt the stepsize for the integration, and thus to control the
error at each step. Numerical Recipes provides a good summary of this form of
stepsize control, so I won't go into details. 
The error estimate D(n) for the nth variable can be calculated from two
Runge-Kutta steps of order m and m-1. If the implementation is performed
appropriately, the mth order step and the (m-1)th order step will use
identical Runge-Kutta stages. The error estimate will be simple to calculate
from the already determined values of the stages using Example 5, where ci* is
the coefficient needed for the (m-1)th order Runge-Kutta step. GetError() uses
this equation to determine the largest component of D and returns the value of
this component; see Listing Seven.
Suppose you want to control error to some local accuracy a. You perform an
integration across a step of size h and find that the achieved accuracy was d.
If d>a, you need to decrease the requested step and repeat the integration
from the same initial state in order to preserve the desired accuracy of the
integration. On the other hand, if d<a, the step was good. You may want to
increase the stepsize the next time you take an integration step so that you
do not accumulate error from the addition of many small steps. (You may also
want to increase the stepsize to make the integration proceed more rapidly!)
In either case, you need a way to adapt the step. This adaptation is performed
using Example 6, where s is a "safety" factor, used to keep the new step hnew
from growing too large. A typical value for s is 0.9. NewStep() (Listing
Seven) performs this calculation and returns the new step.
One wrinkle to this algorithm that I've included in the implementation is the
ability to take fixed-size Runge-Kutta steps with full error control. I needed
a way to perform accurate integration and still produce integration steps that
are evenly spaced in time so that simulations would run correctly. A typical
implementation of a Runge-Kutta integrator allows the steps to adjust freely,
producing integrated steps that are not evenly distributed in time. If you are
interested in simulating gravitating systems, such a procedure will produce
large steps when the gravitating bodies are far apart and small steps when
they are close together. When the resulting points are plotted, the bodies
appear to move rapidly when they are far apart and slowly when they are close
together. In reality, the bodies move most rapidly when they are closest
together. More raw integrated points are produced in this region because the
forces are stronger, so the integrator needs to take smaller steps to preserve
the accuracy of the integration. The AdaptiveRungeKutta class uses a flag
named fixedStep to determine if the integrator should take fixed steps with
full error control. When fixedStep is true, Step() will loop through the
integration, taking multiple integration steps if necessary, and return
results when the requested total step has been performed. If fixedStep is
false, Step() will return the "raw" integration step taken by the integrator.
The AdaptiveRungeKutta class is pure virtual because the coefficients ai, bij,
and cj are never specified at this level of the class hierarchy.
SetCoefficients() is provided for this function but not implemented at this
level. Specific implementations of the adaptive Runge-Kutta integrators are
left to the third layer of the class hierarchy.
Each implementation of a Runge-Kutta integrator must set the powers used to
adjust the stepsize (see Example 6), the value of the safety factor s, and the
coefficients used to perform the integration. The file rk45.cpp (available
electronically; see "Availability," page 3) illustrates this procedure for the
RungeKutta45 class. The constructor for the class is called with an integer
specifying the dimension of the problem being integrated. The Runge-Kutta 4(5)
implementation is a six-stage integrator, so this parameter is passed along
with the dimension to the AdaptiveRungeKutta constructor. The RungeKutta45
constructor sets the exponents and the safety factor, and then calls
SetCoefficients() to set up the coefficients ai, bij, and cj. The coefficients
used by this implementation were derived by Cash and Karp; see Table 1. Most
of the work needed to implement the Runge-Kutta integrator is contained in the
AdaptiveRungeKutta class. The implementation-specific classes function
primarily to set up the implementation details. rk45.cpp is an example of this
technique. For the RungeKutta45 class, the class constructor sets the tables
of coefficients needed to perform the integration.



Solving the Three-Body Problem


The program starsys.cpp (available electronically) implements the Trinary
class used to collect the Derivative and Integrator classes together with the
arrays of initial- and final-state data into one class for integration. The
Trinary class is the "entity" discussed earlier. The driving program creates
an instance of this class and calls TakeAStep() with the desired stepsize in
order to tell the Trinary system to perform an integration step. 
The Trinary constructor initializes the initial state array with data
appropriate to a system consisting of three equal-mass bodies located equally
distant from each other, moving tangentially to the circle that passes through
the masses. This configuration is the starting point for one of the few known
analytic solutions to the three-body problem, so it is a good test case for
the integrators. The masses should move in such a way as to maintain the
symmetry of the initial conditions: Lines connecting the masses should always
form an equilateral triangle.
The symmetric problem is a good test case for the integrators, but it can be
rather boring. The Trinary class contains Star1(), Star2(), and Star3() to
change the initial conditions of the problem. Each function takes a pointer to
arrays of positions and velocities, along with a variable for the mass, so
that you can experiment with different configurations of the system.
stars.cpp (available electronically) is a simple driver for the Trinary
system. As it stands, this driver integrates the default configuration for
about an orbit and a half. Figure 3 shows the results of such an integration.
The conditions needed for a more-complicated three-body system are included in
the code in stars.cpp, but are commented out. You can generate the data for
this system by removing the comment tokens.
The file rk78.cpp incorporates the 7(8) integrator into the class structure.
Since the lowest level of the integrator hierarchy consists of the
implementation details, this class contains the same function calls as the
RungeKutta45 class. The Trinary class can use this integrator by changing the
preprocessor definition of RKTYPE in starsys3.h (available electronically)
from RungeKutta45 to RungeKutta78. Table 2 shows the results of integration of
the test problem through 750,000 simulated seconds for both of these
integrators.


Conclusions


The class structure here can be adapted to many different types of
differential equations. I have applied it to two- and three-body gravitating
systems, and to the Lorenz equations found in Chaos theory. Any continuous,
differentiable system that can be modeled using equations of the form given in
Example 1 can be solved using the integrators presented here. Finally, the
structure used to separate the Runge-Kutta algorithm from the
implementation-specific details can be extended to other types of integrators.


References


Cash, R. and A.H. Karp. "A Variable Order Runge-Kutta Method for Initial Value
Problems with Rapidly Varying Right-Hand Sides." ACM Transactions on
Mathematical Software, 16 (1990), 201.
Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical
Recipes in C, Second Edition. Cambridge, U.K.: Cambridge University Press,
1992. 
Verner, J.H. "Explicit Runge-Kutta Methods with Estimates of the Local
Truncation Error." SIAM Journal of Numerical Analysis, 15 (1978). 
The Three-Body Problem
The three-body problem can be stated as follows: Given three gravitating
bodies in an initial configuration in space (for instance, the configuration
in Figure 4) with some specified initial velocities, find how these bodies
move over time. 
This problem is sufficiently complicated that no analytic solution exists for
the general problem. Special cases of initial conditions have solutions that
provide a good framework for testing integrators. For the three-body problem
in this article, the parameter t in Example 1 is time, and the integrator is
used to make the system evolve in time. The variables for the problem are the
positions x, y, and z and velocities vx, vy, and vz of these bodies,
represented in a Cartesian system. The state of each body can be represented
by the current value of t and six components representing the position and
velocity in space. Example 7 presents the differential equations for the ith
body.
These equations exist for each of the three bodies in the problem. G is
Newton's gravitational constant, the distance between body i and body j is
given by rij, and each body i has mass mi. We need to take the equations in
Example 7 along with an initial set of values for the positions and velocities
(x1, y1, z1, vx1, vy1, vz1, and so on) and integrate these equations
numerically to obtain the new values for the variables after a time step Dt.
One sample integration of this system is presented in this article. A second
solution is shown in Figure 5.
--D.J.C.
Figure 1: Integrator class hierarchy.
Figure 2: Derivative class structure.
Figure 3: Results of integration of the symmetric three-body system.
Figure 4: The three-body system.
Figure 5: Results of integration of an asymmetric three-body system.
Example 1: Differential equations solved by the integrator class.
Example 2: Generating the ith stage when you have the preceding i-1 stages.
Example 3: First stage generated (from Example 2). 
Example 4: Integration step. 
Example 5: Calculating the error estimate. 
Example 6: Performing adaptation.
Example 7: Differential equations for the ith body of the three-body problem.
Table 1: Coefficients for the RungeKutta45 class.
i = 1 2 3 4 5 6
ai 0 1/5 3/10 3/5 1 7/8
bi1
bi2 1/5
bi3 3/40 4/40
bi4 3/10 -9/10 6/5
bi5 -11/54 5/2 -70/27 35/27
bi6 1631/55296 175/512 575/13824 44275/110592 253/4096
ci 37/378 0 250/621 125/594 0 512/1771
ci* 2825/27648 0 18575/48384 13525/55296 277/14336 1/4
Table 2: Comparison of final position from Runge-Kutta 4(5) and Runge-Kutta
7(8) integrations.
 Runge-Kutta 4(5) Runge-Kutta 7(8) Difference

x1 4697.00964137395 4697.00964144131 -0.00000006736
y1 36.5844745040766 36.5844743862259 0.0000001178507
z1 0 0 0 
x2 -2316.82173637934 -2316.82173651791 0.00000013857
y2 -4086.02190850241 -4086.02190850171 -0.0000000007
z2 0 0 0

x3 -2380.18790499465 -2380.18790492343 -0.00000007122
y3 4049.43743399842 4049.43743411535 -0.00000011693
z3 0 0 0

Listing One
// integrat.h -- Integrator class definition
#ifndef INTEGRAT_H
#define INTEGRAT_H
#include <math.h>
#include <stdio.h>
#include "derivs.h"
#define FALSE 0
#define TRUE 1
class Integrator {
 protected:
 int dimension; // Size of the "state" vector
 double *initState; // Pointer to state for integration
 double *finalState;// Pointer to integrated state
 // Pointer to derivative class
 Derivative *ddt;
 public:
 Integrator(int dim);
 virtual ~Integrator(void);
 // Access to the integration variables
 void InitState(double *istate)
 { initState = istate; }
 void FinalState(double *fstate)
 { finalState = fstate; }
 void Derivatives(Derivative *deriv);
 virtual double Step(double h) = 0;
};
#endif

Listing Two
// integrat.cpp -- Integrator class functions
#include "integrat.h"
Integrator::Integrator(int dim)
{
 dimension = dim;
}
Integrator::~Integrator(void)
{
 return;
}
void Integrator::Derivatives(Derivative *deriv)
{
 ddt = deriv;
}

Listing Three
// derivs.h -- Class for derivative skeleton for the integrator class
#ifndef DERIVS_H
#define DERIVS_H
#define TRUE 1
#define FALSE 0
#include <math.h>
class Derivative {
 private:
 // No private data or functions in the skeleton

 public:
 Derivative(void)
 { }
 virtual ~Derivative(void)
 { }
 virtual int GetDerivatives(double dt, double *where) = 0;
};
#endif

Listing Four
// 3body.h -- Class for gravitational forces for the example problem
#ifndef THREEBODY_H
#define THREEBODY_H
#include "derivs.h"
class ThreeBody : public Derivative {
 private:
 double mu1, mu2, mu3;
 double r1[3], r2[3], r3[3];
 double v1[3], v2[3], v3[3];
 public:
 ThreeBody(void);
 ~ThreeBody(void);
 int GetDerivatives(double dt, double *where);
 void Mu1(double m1) { mu1 = m1; }
 void Mu2(double m2) { mu2 = m2; }
 void Mu3(double m3) { mu3 = m3; }
};
#endif

Listing Five
// 3body.cpp -- gravitational forces for the example problem; provides 
// 1st derivative information of the state. Tailor code to match your problem.
#include "3body.h"
ThreeBody::ThreeBody(void)
{
 mu1 = 100;
 mu2 = 100;
 mu3 = 100;
}
ThreeBody::~ThreeBody(void)
{
 return;
}
int ThreeBody::GetDerivatives(double dt, double *where)
// Note: dt is not used here; it may be needed in other systems
{
 double d12, d23, d13; // Distances from origin
 double d12cubed, d23cubed, d13cubed;
 // Distances from origin cubed
 // where comes in with positions and velocities
 r1[0] = where[0]; r1[1] = where[1]; r1[2] = where[2];
 v1[0] = where[3]; v1[1] = where[4]; v1[2] = where[5];
 r2[0] = where[6]; r2[1] = where[7]; r2[2] = where[8];
 v2[0] = where[9]; v2[1] = where[10]; v2[2] = where[11];
 r3[0] = where[12]; r3[1] = where[13]; r3[2] = where[14];
 v3[0] = where[15]; v3[1] = where[16]; v3[2] = where[17];
 d12 = sqrt((r1[0] - r2[0]) * (r1[0] - r2[0]) +
 (r1[1] - r2[1]) * (r1[1] - r2[1]) +
 (r1[2] - r2[2]) * (r1[2] - r2[2]));

 d12cubed = d12 * d12 * d12;
 if (d12cubed == 0.0) // masses must be separated
 return FALSE;
 d23 = sqrt((r2[0] - r3[0]) * (r2[0] - r3[0]) +
 (r2[1] - r3[1]) * (r2[1] - r3[1]) +
 (r2[2] - r3[2]) * (r2[2] - r3[2]));
 d23cubed = d23 * d23 * d23;
 if (d23cubed == 0.0) // masses must be separated
 return FALSE;
 d13 = sqrt((r1[0] - r3[0]) * (r1[0] - r3[0]) +
 (r1[1] - r3[1]) * (r1[1] - r3[1]) +
 (r1[2] - r3[2]) * (r1[2] - r3[2]));
 d13cubed = d13 * d13 * d13;
 if (d13cubed == 0.0) // masses must be separated
 return FALSE;
 // and returns with velocities...
 where[0] = v1[0]; where[1] = v1[1]; where[2] = v1[2];
 where[6] = v2[0]; where[7] = v2[1]; where[8] = v2[2];
 where[12] = v3[0]; where[13] = v3[1]; where[14] = v3[2];
 // and accelerations
 // m1
 where[3] = - mu2 * (r1[0] - r2[0]) / d12cubed
 - mu3 * (r1[0] - r3[0]) / d13cubed;
 where[4] = - mu2 * (r1[1] - r2[1]) / d12cubed
 - mu3 * (r1[1] - r3[1]) / d13cubed;
 where[5] = - mu2 * (r1[2] - r2[2]) / d12cubed
 - mu3 * (r1[2] - r3[2]) / d13cubed;
 // m2
 where[9] = - mu1 * (r2[0] - r1[0]) / d12cubed
 - mu3 * (r2[0] - r3[0]) / d23cubed;
 where[10] = - mu1 * (r2[1] - r1[1]) / d12cubed
 - mu3 * (r2[1] - r3[1]) / d23cubed;
 where[11] = - mu1 * (r2[2] - r1[2]) / d12cubed
 - mu3 * (r2[2] - r3[2]) / d23cubed;
 // m3
 where[15] = - mu1 * (r3[0] - r1[0]) / d13cubed
 - mu2 * (r3[0] - r2[0]) / d23cubed;
 where[16] = - mu1 * (r3[1] - r1[1]) / d13cubed
 - mu2 * (r3[1] - r2[1]) / d23cubed;
 where[17] = - mu1 * (r3[2] - r1[2]) / d13cubed
 - mu2 * (r3[2] - r2[2]) / d23cubed;
 return TRUE;
}

Listing Six
// adapt_rk.h -- Runge-Kutta class definition
#ifndef ADAPT_RK_H
#define ADAPT_RK_H
#include "integrat.h"
class AdaptiveRungeKutta : public Integrator {
 protected:
 int stages; // Number deriv. evals in algorithm
 double *ai; // Portion of step at ith stage
 double **bij; // Portions of previous stages to use
 double *cj; // For RK step taken
 double *errorest; // Estimate of the error
 int fixedStep; // Boolean flag for fixed step mode
 double accuracy; // Maximum errorest for good step
 double safety; // Safety factor for adaptive steps

 double goodpower; // 1/(RK order)
 double badpower; // 1/(RK order - 1)
 // The following are internal workspaces for the algorithm
 double *accumulated;
 double *workstate;
 double **stepk;
 double GetError(double **stepk);
 double NewStep(double maxError, double h);
 double independent;// indep. variable (e.g. time)
 public:
 AdaptiveRungeKutta(int st, int dim);
 virtual ~AdaptiveRungeKutta(void);
 virtual void SetCoefficients(void) = 0;
 double Step(double h);
 void Accuracy(double goodness)
 { accuracy = goodness; }
 void FixedStep(int tf)
 { if (tf == 0) fixedStep = FALSE; else fixedStep = TRUE; }
};
#endif

Listing Seven
// adapt_rk.cpp -- Runge-Kutta functions
#include "adapt_rk.h"
AdaptiveRungeKutta::AdaptiveRungeKutta(int st, int dim) : Integrator(dim)
{
 int i;
 stages = st;
 fixedStep = TRUE;
 accuracy = 1e-7;
 independent = 0.0;
 // Set the pointers to the arrays to NULL for now
 ai = new double[stages];
 cj = new double[stages];
 errorest = new double[stages];
 bij = new double*[stages];
 for (i = 0; i < stages; i++)
 bij[i] = new double[stages];
 accumulated = new double[dimension]; // Working arrays
 workstate = new double[dimension];
 stepk = new double*[stages];
 for (i = 0; i < stages; i++)
 stepk[i] = new double[dimension];
}
AdaptiveRungeKutta::~AdaptiveRungeKutta(void)
{
 int i;
 if (ai)
 delete [] ai;
 if (bij) {
 for (i = 0; i < stages; i++)
 delete [] bij[i];
 delete [] bij;
 }
 if (cj)
 delete [] cj;
 if (errorest)
 delete [] errorest;
 if (accumulated)

 delete [] accumulated;
 if (workstate)
 delete [] workstate;
 if (stepk) {
 for (i = 0; i < stages; i++)
 delete[] stepk[i];
 delete [] stepk;
 }
}
double AdaptiveRungeKutta::Step(double h)
{
 double stepsize = h; // Next stepsize to take
 double desiredStep = h;
 double stepSoFar = 0.0; // Step taken to date
 double stepTaken; // Current step
 double maxErrorFound = 0.0;
 int i, j, k;
 int fixedPassed = TRUE;
 if (fixedStep) // If in fixed step mode, don't
 fixedPassed = FALSE; // break loop until desired step
 for (i = 0; i < dimension; i++)
 workstate[i] = initState[i];
 do {
 stepTaken = stepsize;
 for (i = 0; i < stages; i++) {
 // Fill accumulated with info from the previous stages
 for (j = 0; j < dimension; j++) {
 accumulated[j] = workstate[j];
 for (k = 0; k < i; k++) {
 accumulated[j] += bij[i][k] * stepk[k][j];
 }
 }
 // Calculate the data for the current stage
 if (!ddt -> GetDerivatives(independent + stepTaken * ai[i],
 accumulated))
 return 0.0; // Bad derivative ==> No step taken
 for (j = 0; j < dimension; j++)
 stepk[i][j] = stepTaken * accumulated[j];
 }
 // Here's the step used:
 for (j = 0; j < dimension; j++) {
 finalState[j] = workstate[j];
 for (i = 0; i < stages; i++)
 finalState[j] += cj[i] * stepk[i][j];
 }
 // Here we test the step
 maxErrorFound = GetError(stepk);
 // And calculate the next step to be taken
 stepsize = NewStep(maxErrorFound, stepTaken);
 // Now accumulate for fixed step mode
 if (fixedStep) {
 // Step was good
 if (maxErrorFound < accuracy) {
 stepSoFar += stepTaken;
 if (stepsize > desiredStep - stepSoFar)
 stepsize = desiredStep - stepSoFar;
 // Update the working state if not at the end
 if (stepSoFar < desiredStep)
 for (i = 0; i < dimension; i++)

 workstate[i] = finalState[i];
 // If at end, set flag
 else
 fixedPassed = TRUE;
 }
 }
 // Stop if step is good, else try again with the new step.
 } while ((maxErrorFound > accuracy) !fixedPassed);
 if (fixedStep)
 return stepSoFar;
 return stepTaken;
}
double AdaptiveRungeKutta::GetError(double **stepk)
{
 int i, j;
 double biggestError = 0.0, currentError;
 for (i = 0; i < dimension; i++) {
 currentError = 0.0;
 for (j = 0; j < stages; j++)
 currentError += errorest[j] * stepk[j][i];
 currentError = fabs(currentError);
 if (currentError > biggestError)
 biggestError = currentError;
 }
 return biggestError;
}
double AdaptiveRungeKutta::NewStep(double maxError, double h)
{
 double newstep;
 if (accuracy >= maxError) // Step was good
 newstep = safety * h * fabs(pow(accuracy/maxError, goodpower));
 else // Step was too big
 newstep = safety * h * fabs(pow(accuracy/maxError, badpower));
 return newstep;
}




























Examining the Windows 95 Layered File System


Adding functionality to block devices




Mark Russinovich and Bryce Cogswell


The authors are researchers in the computer science department at the
University of Oregon. Mark can be reached at mer@cs.uoregon.edu and Bryce at
cogswell@cs.uoregon.edu.


One major difference between Windows 95 and its predecessors, Windows 3.1 and
Windows for Workgroups (WFW) 3.11, is how Windows 95 implements its file
systems. Windows 95 introduces a "layered" approach to file-system management,
dividing translation of a high-level file access to an actual physical request
into multiple, distinct parts. Unfortunately, this new organization has
created a plethora of new terminologies and APIs. In addition, Windows 95
Device Driver Kit (DDK) documentation is often vague, incomplete, and
misleading. 
In this article, we'll briefly discuss how Windows 3.1 and WFW 3.11 implement
their file systems, then present an overview of the Windows 95 file system.
Our exploration of the file system focuses on the vendor supplied driver (VSD)
layer. VSDs are virtual devices (VxDs) that can hook onto the path of device
accesses for any block-based device such as a hard disk, CD-ROM, or floppy
drive. Microsoft designed the VSD layer to let third-party vendors add
functionality to the file system. An extensive API was added to the file
system so that VSDs can alter device requests (or create new ones), making it
possible to develop VSDs to perform functions ranging from block-device
monitoring and data encryption to mirrored or RAID disk management.
To demonstrate how a VSD is built, we'll describe the design and
implementation of a monitoring VSD that interfaces with a Win32 program to
display information about block-device accesses. Besides serving as a basis
for your own custom VSDs, the application will return useful information about
your block-device performance and show how to connect a Windows GUI program
with a virtual device.


Out With the Old


Windows 3.1 has the most-simplistic file system of the Windows incarnations.
When a Windows or DOS application makes a request to read data from a file,
for instance, the request is sent to DOS, which then passes it to the BIOS. If
you're lucky, you have what's called a "Fast Disk"-compatible hard disk.
(Choose virtual-memory information from the Enhanced 386 information on your
control panel, and a check box will tell you if fast-disk access is possible,
and if so, turned on.) In that case, instead of using the BIOS to do disk I/O,
a virtual device called "WDCTRL.386" handles the request in 32-bit protected
mode, bypassing the slower real-mode BIOS. Here, your file request is
translated to a physical request by real-mode DOS, which has the request
serviced in protected mode by WDCTRL.386.
WFW 3.11 introduced the prototype of Windows 95 block-device management. When
a WFW or DOS program requests a file, the request is passed to a virtual
device called "IFSMgr.386," which passes it to VFAT.386, a virtual device that
implements the DOS file system in protected mode. After VFAT.386 has converted
the request to a logical device request, it sends it to IOS.386, the I/O
system supervisor. If the target hard disk is Fast Disk compatible, the
request is serviced in WDCTRL.386; otherwise, it is sent to the BIOS. Thus, in
WFW 3.11, if you have a Fast Disk-compatible disk, your file accesses are
handled entirely by VxDs, bypassing real-mode DOS completely, and giving you
maximum performance.


In With the New


Windows 95 takes the concept of protected-mode disk access a step further than
WFW. To maximize Windows 95 performance, Microsoft made it easy for hard-disk
manufacturers to make their own versions of WDCTRL.386-type drivers so that
disk access can bypass the BIOS. Microsoft also wants to allow Windows 95 to
seamlessly integrate any new or odd block-device hardware (a flash memory card
used as a disk, for instance) into Windows' file-system management scheme.
Therefore, Microsoft had to divide the WFW block-request path, which extends
from the application to the hardware, into much more specialized layers. The
new scheme is called the "Installable File System" (IFS).
The IFS is made up of 32 logical layers, each containing one or more virtual
devices, through which block-device requests pass. Fortunately for
performance, most layers are empty for typical hardware. For hard disks, a
file-system request will usually only pass through about five virtual devices
on the way to the hardware. Figure 1 shows how the layers are organized, while
Figure 2 shows a typical request path. The smallest numbers represent higher
layers of abstraction, with the topmost layer being the entry point to the
file system. Higher numbers are closer to the hardware with the highest number
(bottom layer) being the virtual devices that access the hardware directly.
The IO Supervisor (IOS) manages requests as they pass through the file-system
hierarchy. Each device on the chain can select requests based on the logical
or physical drive to which the request is directed. The devices can also view
the result of a request as it passes back up the chain to the application.
Furthermore, the VxDs on the chain can service requests themselves and not
pass them to lower levels, or they can generate requests themselves. The VFAT
virtual device handles many requests by reading or writing to a memory cache
via the VCACHE virtual device. 


Layers, Layers, Layers


At this point, we'll provide an overview of what occurs (or can occur) at each
level of the file system (again, see Figure 1). Remember that most block
devices do not require an entry at each level in the chain.
IFS Manager (IFSMgr) manages high-level I/O requests from applications. It
takes a call directed at a specific logical drive and passes it down the
correct call-down chain to the appropriate tracker, FSD, and so on.
Volume trackers work with groups of devices with identical removability rules.
For example, the CD-ROM volume tracker ensures that a CD with a file system on
it is in the drive before it will allow any requests to pass through to lower
layers.
File system drivers (FSDs) work with all devices of a particular type, such as
hard disks or CD-ROM devices. They take incoming logical requests generated by
IFSMgr and translate them into physical requests to pass to lower levels. In
addition, FSDs can initiate logical error recovery for devices such as disks.
VFAT.VXD is the standard FSD and is provided by Microsoft. VFAT takes a
request in the form of "read 20 bytes from file c:\foo.bar at offset 300" and
turns it into one or more physical sector reads from drive C. In other words,
it is what gives the file system its structure.
Type specific drivers (TSDs) work with all devices of a particular type. They
take a logical request generated by an FSD and translate it into a physical
sector request. They generally reside in the same layer as their corresponding
FSDs, but are lower in the chain.
SCSI-izers. SCSI devices require more-complex request packets than other
devices such as the more-prevalent IDE/ESDI devices. SCSI-izers take a general
physical request and create a SCSI Request Block (SRB) that contains detailed,
SCSI-specific information about the request such as the Logical Unit Number
(LUN) and Target (SCSI targets can have up to seven LUNs hanging off them).
Vendor supplied drivers (VSDs). As mentioned, Microsoft created this special
layer for third-party developers. The VSD layer functionality is determined by
the VSD writer. Possible uses include: block-device monitors, low-level
secondary disk caches (caching in flash memory, for example), data encryption,
and RAID disk management.
SCSI port drivers take incoming requests and determine which SCSI miniport
driver should field them. Multiple SCSI types can be loaded onto the same
system, each of which may require a custom SCSI miniport driver. The SCSI port
driver is also in charge of initializing the miniport drivers.
SCSI miniport drivers (MPDs) are the hardware drivers for SCSI devices. They
manage the interrupt and I/O port-level interaction with the device to carry
out requests from above. They can also perform adapter-specific error
recovery.
Port drivers (PDRs) (for non-SCSI hardware) carry out the same functions as
the SCSI port and miniport drivers. They provide the 32-bit disk access that
previously was the sole domain of WDCTRL.386, interacting directly with the
hardware to perform I/O. 
Real mode mapper (RMM). With the introduction of plug-and-play BIOS, and by
including many hardware-specific port drivers, Windows 95 can provide 32-bit
access for most disk hardware. However, Windows 95 might be run on an older PC
with esoteric hardware, so it must make allowances for the case where it can't
provide a port driver to handle disk I/O in protected mode. A system might
also use real-mode disk-driver software that provides functionality not
available in the Windows 95 protected-mode counterpart. For these situations,
the last entry on the chain of protected-mode VxDs is an RMM instead of a port
driver. RMMs call down to a real-mode driver to perform hardware I/O and
return results up the file-system chain. Microsoft provides the RMM.
Real-mode drivers are hardware drivers required by the hardware or software
configuration of a particular system. Microsoft discourages use of real-mode
drivers because performance can suffer (due to the overhead of transitions
from protected to real mode and slower execution in real mode), but makes
allowances for them for flexibility and backward compatibility. Most PCs
running Windows 95 will not have real-mode drivers.
In general, the upper layers are written by Microsoft, while the lower layers
are provided by disk-drive manufacturers. The layer for programmers to play
with is the VSD. 


The Devmon Application


Devmon (short for "DEVice MONitor") is a block-device monitoring application
that demonstrates the design of a VSD. This application consists of a VSD
virtual device that monitors and times all block-device requests passing
through the VSD layer, and a Windows 95 32-bit GUI program that reads the
monitored data and displays it textually in a window. Besides serving as the
basis for your own VSD designs, Devmon (see Figure 3) contains a useful
example of how a virtual device and a 32-bit Windows 95 program can
communicate. In addition, Devmon will tell you about the characteristics of
all the block devices in your system, allow you to enable and disable
monitoring of requests to the various devices, and tell you how long each
request takes. (Complete source code, executables, and other binaries for
Devmon are available electronically; see "Availability," page 3.)
The Devmon Windows program initiates communication with the Devmon VSD through
the Win32 DeviceIoControl interface. This interface provides the only means
whereby a Win32 program can communicate with a virtual device. The first step
in establishing communication is the CreateFile command. The filename
parameter for this call must be the name of the virtual device to be opened.
Virtual device names differ from regular filenames because they contain an
initial two backslashes followed by a period, another backslash, and then the
name of the virtual device. For example, the name for the Devmon VSD is
\\.\devmon.vxd. Note that in a C string, a backslash is a special character,
so to specify one backslash, you must enter two in the string; for example,
\\\\.\\devmon.vxd. 

After the file has been opened, the program can send commands to the virtual
device by calling DeviceIoControl (see Example 1) with the handle returned by
the CreateFile call. By using the buffers, the program can pass arbitrary
amounts of information back and forth with the device. The dwIoControlCode
parameter is a VxD-specific function code used to specify the operation to be
carried out by the VxD.
Upon starting, Devmon opens communication with the Devmon VSD and requests
that the VSD pass it the device control blocks (DCBs) of the physical devices
configured in the system. Devmon then creates a menu that allows the user to
select interesting information about the DCB and to enable and disable
monitoring of the DCB's device. 
Several times a second, the application performs a DeviceIoControl on the VSD,
asking it to pass copies of the latest device-access requests sent through the
IFS. These are printed in the main window and include a number for the
request, the request type (read, write, and so on), the logical drive to which
the request is directed (C:, for example), the sector at which the request is
directed (if appropriate), the number of sectors associated with the request
and finally, the time required to service the request. 


The Devmon VSD


The VSD layer is created by the IOS, which, at boot time, looks for VxDs in
the system\iosubsys directory under the Windows 95 main directory. It tries to
load as a dynamic VxD any file in that subdirectory with the extension VXD.
When the VSD receives the SYS_DYNAMIC_DEVICE_INIT call, it responds by
registering itself with the IOS. This is accomplished by calling IOS_Register
VxD service and passing in a device registration packet (DRP). The DRP
structure is in Listing One and the code registering our monitoring VSD with
the IOS is in Listing Two. 
What makes a VSD a VSD and not a member of some other layer of the file system
is the load number it returns as the load-group-number (DRP_LGN) in the DRP.
This tells the IOS at which layer to put the VxD in the hierarchy. VSDs have
nine levels to choose from: 8-10 and 12-17. The placement of a VSD is somewhat
arbitrary, but a few guidelines can be followed:
The further down in the layers the VSD is placed, the fewer layers lie between
the VSD and the hardware. This can be important if you want to see requests
that will actually go all the way down to the hardware. If you are above a VSD
layer that is caching, your VSD will see requests that might be handled by the
other VSD's cache; if you are below that VSD, you will see only the requests
it can't handle and is passing down to the hardware. 
Layer 11 is the SCSI-izer layer, where incoming requests obtain a SCSI command
block containing additional information specific to SCSI devices. This is
significant if you wish to make new requests from your VSD.
The IOS responds to a VSD's registration by passing back a return value
indicating what it should do. If all is well with the DRP, the return value
will simply indicate success. If something is wrong, the VSD can be told to
unload itself. The registration also lets the VSD provide the IOS with the
address of the procedure to call to service asynchronous messages. This
address is passed as an entry in the DRP. After the registration, this
asynchronous event procedure (AEP) will act as the communications channel from
the IOS to the VSD. Another communications channel is returned in the DRP by
the IOS and is the address of the IOS procedure that the VSD can call with its
own requests. This is located in the DRP_ilb (IOS linkage block) field of the
DRP.
Once the registration has successfully completed, the VSD begins receiving
messages from the IOS through the AEP. The IOS can send about two dozen
different types of messages, but most VSDs will only be interested in a
handful. Usually, the most interesting are the AEP_CONFIG_DCB and
AEP_BOOT_COMPLETE messages. At system boot time, the IOS performs a
handshaking initialization with the file-system drivers. In this phase, the
system determines which physical and logical block devices are attached. The
IOS will send the VSD an AEP_CONFIG_DCB whenever it registers a new physical
device and provides the DCB. 
The DCB (see Listing Three) contains information about the physical device,
including its bus type and unit number, the number of heads, tracks, and
sectors contained on the device, and flags that specify device behavior. A VSD
can ignore the AEP_CONFIG_DCB messages or, if it wants to receive requests
that are associated with that particular DCB, it can send the IOS a request
message, asking it to insert the VSD on the device's call-down chain. IOS
request messages are data structures passed on the stack to the IOS's request
procedure (taken from the ILB). The Devmon VSD monitors all block devices, so
it has itself put on the call-down chain for every physical device that is
configured.
The AEP_BOOT_COMPLETE message tells the VSD that the file system has finished
initializing and that all devices have been registered. VSDs associated with
only certain types of devices can tell the IOS to unload the VSD if none of
the devices are present in the system.
After initialization, each block device in the system has a DCB associated
with it. That DCB specifies call-down and call-back chains pointing to the
VSDs through which file-system requests for that device will pass.
When the boot sequence is complete, the VSD begins getting requests through
the call-down chains on which it inserted itself. (The routine that's called
with requests was indicated by the VSD in the call-down insertion commands.)
The call-down routine receives a pointer to a data structure called the I/O
Packet (IOP), which contains another data structure called the I/O Request
(IOR). The IOP and IOR contain all the information about the particular
request including its type, pointers to buffers associated with it (such as
read or write buffers), and parameters indicating the physical sector on the
device that the request wants to access. 
Each VSD is responsible for calling the next VSD in the chain. The IOP
contains a pointer to the DCB associated with the request, and the DCB
contains a pointer to the current position in the call-down chain. To call the
next VSD, its address is read from the chain, the chain pointer in the DCB is
moved to the next location, and the address is called. The call-down data
structure is in Listing Four, and a code fragment demonstrating these steps is
in Listing Five.
Many VSDs will want to view requests not only as they pass down to the
hardware, but also as they return up the chain to the original caller. To do
this, the VSD must insert itself on the call-back chain. This chain is managed
by a list of data structures pointed to by the IOP_callback_ptr entry of the
IOP. To insert itself on the chain, the VSD must set the IOP_cb_address of the
call-back entry to the address of its call-back procedure. It must then move
the pointer to the next call-back entry for the layer just above it in memory.
These steps are in Listing Six. The Devmon VSD inserts itself on the call-back
chain for all requests so that it can determine how long a request took to be
serviced by the layers below it (which are essentially the device drivers and
hardware). 
If the VSD is servicing requests itself, it can indicate immediately to layers
above it that the request was serviced or postpone this notification until
later. To inform the layers above that a request has been serviced, the VSD
does not call the layer below; instead it simply calls up the call-back chain.
The IOP_callback_ptr is adjusted so that it points at the call-back entry for
the layer above the VSD, and then the VSD calls the IOP_cb_address procedure
with the IOP on the stack. If the VSD wishes to service the routine later, it
simply returns a 0 in the eax register and performs the callback when it wants
to indicate request completion. Listing Seven provides the callback data
structure, and Listing Eight demonstrates the code for performing the
callback.
The Devmon VSD uses an IOS feature called the "expansion area." This is a
block of data that the IOS allocates for each IOP for use by the devices in
the IOP's call-down chain. A VSD must tell the IOS that it wants some
expansion area allocated for it when it inserts itself on a DCB's call-down
chain at DCB-configuration time. The expansion area can be used for whatever
purpose the VSD desires; in the case of Devmon, it is used to pass a time
stamp from the call-down procedure to the call-back procedure. Thus, the VSD
can determine the time it took to complete a request by comparing the current
time in the call-back procedure with the time stored in the request's
expansion area. The address of the expansion area is computed by adding the
offset stored in the DCB_cd_entry's DCB_cd_expan_offset field to the IOP's
address. 
If a VSD has put itself on the call-back chain, it must service call-back
calls by continuing to call up the chain using the same method as that
described for initiating the call-back chain. The Devmon VSD's call-back
procedure stores a copy of the request, along with a time stamp, in a buffer
that it provides to the Devmon Windows program.
VSDs can initiate new device requests themselves. If disk mirroring were
desired, for example, a VSD would let requests to the primary drive pass
through as normal, but it would also initiate identical requests for writes to
the secondary drive. Initiating a new request requires that the VSD call the
IOS service that allocates a new IOP, fill in the IOP with the correct
parameters, and then initiate the request. The VSD can send the request to all
virtual devices on the target device's call-down chain, or just the devices
beneath itself. If the request is a new SCSI request that cannot be
constructed by copying a similar request, the VSD must make sure that the
SCSI-izer layer processes the request as described earlier.


Conclusion


The Windows 95 file system provides opportunities for third-party drivers to
add new functionality to block devices. Without a doubt, increasing use of
Windows 95 will mean a growing desire for data encryption and protection. The
Devmon VSD is a framework you can extend to take advantage of these coming
needs. 
Figure 1: File-system layers.
Figure 2: Typical file-system request chain.
Figure 3: Running the Devmon program.
Example 1: Format of DeviceIoControl call.
BOOL DeviceIoControl(
 HANDLE hDevice, // handle of the device
 DWORD dwIoControlCode, // control code of operation to perform
 LPVOID lpvInBuffer, // address of buffer for input data
 DWORD cbInBuffer, // size of input buffer
 LPVOID lpvOutBuffer, // address of output buffer
 DWORD cbOutBuffer, // size of output buffer
 LPDWORD lpcbBytesReturned, // address of actual bytes of output
 LPOVERLAPPED lpoOverlapped // address of overlapped structure
);

Listing One
typedef struct DRP { 
 CHAR DRP_eyecatch_str[8]; // eye catcher string
 ULONG DRP_LGN; // drivers load group
 PVOID DRP_aer; // pointer to async event outine
 PVOID DRP_ilb; // ILB virtual address
 CHAR DRP_ascii_name[16]; // Name of the device 
 BYTE DRP_revision; // driver revision
 ULONG DRP_feature_code; // Feature Code 
 USHORT DRP_if_requirements; // I/F Requirements
 UCHAR DRP_bus_type; // type of I/O bus if port driver
 USHORT DRP_reg_result; // Registration Results
 ULONG DRP_reference_data; // field passed in on initialize
 UCHAR DRP_reserved1[2]; // filler for alignment 

 ULONG DRP_reserved2[1]; // reserved
} DRP, *PDRP;

Listing Two
BeginProc VSD_Device_Init
 ; call IOS to register. Before returning: IOS will call our
 ; Async_Event routine with the following messages:
 ; AEP_INITIALIZE
 ; AEP_CONFIG_DCB
 push OFFSET32 _Drv_Reg_Pkt ;packet (DRP)
 VxDCall IOS_Register ;call registration
 add esp,04 ;Clean up stack
 ; decide our status based on the information that IOS gives us
 cmp _Drv_Reg_Pkt.DRP_reg_result,DRP_REMAIN_RESIDENT
 ; should we stay?
 je short VSD_Init_Done ; yes: return
 cmp _Drv_Reg_Pkt.DRP_reg_result,DRP_MINIMIZE ; should we minimize?
 je short VSD_Init_Done ; yes: we can't minimize any more than
 ; normal, so just return with success
 stc ; error
VSD_Init_Done:
 ret

Listing Three
typedef struct _DCB_COMMON {
 ULONG DCB_physical_dcb; // DCB for physical device
 ULONG DCB_expansion_length; // total length of IOP extension filled
 // Link fields follow
 PVOID DCB_ptr_cd; // pointer to calldown list
 ULONG DCB_next_dcb; // pointer to next DCB 
 ULONG DCB_next_logical_dcb; // pointer to next logical dcb
 // for physical device
 BYTE DCB_drive_lttr_equiv; // drive number (A: = 0, etc.)
 // set up by iosserv during logical
 // device associate processing.
 BYTE DCB_unit_number; // either physical drive number
 // (sequential drive number or'd
 // with 80h) or unit number within
 // tsd. set up by iosbid for disk
 // physical dcb's. set up by tsdpart
 // for disk logical dcb's. set up by
 // tsdaer for cdrom physical dcb's.
 USHORT DCB_TSD_Flags; // Flags for TSD
 // Volume Tracking fields follow
 ULONG DCB_vrp_ptr; // pointer to VRP for this DCB
 ULONG DCB_dmd_flags; // demand bits of the topmost layer
 ULONG DCB_device_flags; // was BDD_Flags
 ULONG DCB_device_flags2; // second set of general purpose flags
 ULONG DCB_Partition_Start; // partition start sector
 ULONG DCB_track_table_ptr; // pointer for the track table buffer
 // for ioctls
 ULONG DCB_bds_ptr; // DOS BDS corresp. to this DCB
 // (logical DCB's only)
 ULONG DCB_Reserved1; // reserved
 ULONG DCB_Reserved2; // reserved
 BYTE DCB_apparent_blk_shift; // log of apparent_blk_size
 BYTE DCB_partition_type; // partition type
 USHORT DCB_sig; // padding and signature
 BYTE DCB_device_type; // Device Type

 ULONG DCB_Exclusive_VM; // exclusive access handle to device
 UCHAR DCB_disk_bpb_flags; // bpb flags see defines below
 UCHAR DCB_cAssoc; // count of logical drives
 // associated with this logical DCB
 UCHAR DCB_Sstor_Host; // indicates a sstor host volume
 USHORT DCB_user_drvlet; // the userdriveletter settings
 USHORT DCB_Reserved3; // reserved
 ULONG DCB_Reserved4; // reserved
} DCB_COMMON, *PDCB_COMMON;
typedef struct _DCB {
 DCB_COMMON DCB_cmn;
 ULONG DCB_max_xfer_len; // maximum transfer length
 // Actual geometry data follows
 ULONG DCB_actual_sector_cnt[2]; // number of sectors as seen below 
 // the tsd. 
 ULONG DCB_actual_blk_size; // actual block size of the device 
 // as seen below the tsd. 
 ULONG DCB_actual_head_cnt; // number of heads as seen below 
 // the tsd. 
 ULONG DCB_actual_cyl_cnt; // number of cylinders as seen 
 // below the tsd. 
 ULONG DCB_actual_spt; // number of sectors per track as 
 // seen below the tsd. 
 PVOID DCB_next_ddb_dcb; // link to next DCB on DDB chain
 PVOID DCB_dev_node; // pointer to dev node for this device
 BYTE DCB_bus_type; // Type of BUS
 BYTE DCB_bus_number; // channel (cable) within adapter 
 UCHAR DCB_queue_freeze; // queue freeze depth counter
 UCHAR DCB_max_sg_elements;// max # s/g elements
 UCHAR DCB_io_pend_count; // number of requests pending for
 // this DCB (Vol Track Layer use)
 UCHAR DCB_lock_count; // depth counter for LOCK MEDIA
 // SCSI fields follow
 USHORT DCB_SCSI_VSD_FLAGS; // Flags for SRB builder 
 BYTE DCB_scsi_target_id; // SCSI target ID
 BYTE DCB_scsi_lun; // SCSI logical unit number 
 BYTE DCB_scsi_hba; // adapter number relative to port drv.
 BYTE DCB_max_sense_data_len; // Maximum sense Length 
 USHORT DCB_srb_ext_size; // miniport srb extension length 
 BYTE DCB_inquiry_flags[8]; // Device Inquiry Flags 
 BYTE DCB_vendor_id[8]; // Vendor ID string 
 BYTE DCB_product_id[16]; // Product ID string 
 BYTE DCB_rev_level[4]; // Product revision level 
 BYTE DCB_port_name[8];
 UCHAR DCB_current_unit; // used to emulate mltpl log. devices
 // with a single physical device
 ULONG DCB_blocked_iop; // pointer to requests for an inactive
 // volume
 ULONG DCB_vol_unlock_timer; // unlock timer handle
 UCHAR DCB_access_timer; // measures time between accesses
 UCHAR DCB_Vol_Flags; // Flags for Volume Tracking 
 BYTE DCB_q_algo; // queuing algorithm index 
 BYTE DCB_unit_on_ctl; // relative device number on ctlr
 ULONG DCB_Port_Specific; // bytes for PORT DRIVER use 
 ULONG DCB_spindown_timer; // timer for drive spin down 
 DCB_BLOCKDEV DCB_bdd;
} DCB, *PDCB;
// define the device control block (dcb) for logical disk devices
typedef struct _LOG_DCB {

 DCB_COMMON DCB_cmn;
} LOG_DCB, *PLOG_DCB;

Listing Four
typedef struct _DCB_cd_entry {
 PVOID DCB_cd_io_address; // addr of request routine
 ULONG DCB_cd_flags; // demand bits
 ULONG DCB_cd_ddb; // driver's DDB pointer
 ULONG DCB_cd_next; // pointer to next cd entry
 USHORT DCB_cd_expan_off; // offset of expansion area
 UCHAR DCB_cd_layer_flags; // flags for layer's use
 UCHAR DCB_cd_lgn; // load group number
} DCB_cd_entry, *pDCB_cd_entry;

Listing Five
 mov eax, [ebx].IOP_calldown_ptr ; get call down address
 mov eax, [eax].DCB_cd_next ; get next calldown entry
 mov [ebx].IOP_calldown_ptr, eax ; reset calldown pointer
 push ebx ; place IOP on stack
 call [eax].DCB_CD_IO_Address ; call next layer
 add esp, 4 ; restore stack

Listing Six
 ; insert our callback in the callback stack as default for all requests
 mov eax, [ebx.IOP_callback_ptr] ; set Callback
 mov [eax.IOP_cb_address], offset32 VSD_callback ; pointer
 ; use the calldown pointer as reference data in the callback entry
 ; so that we can find the offset in the IOP to the expansion area
 mov edx, [ebx.IOP_calldown_ptr] ; get CD ptr
 mov [eax.IOP_cb_ref_data], edx ; set reference
 ; add a callback entry to the callback stack, for the next layer
 add [ebx.IOP_callback_ptr],size IOP_callBack_entry ; move down

Listing Seven
typedef struct IOP_callback_entry {
 ULONG IOP_CB_address; // call back address
 ULONG IOP_CB_ref_data; // pointer to callback ref data
} IOP_callback_entry;

Listing Eight
 mov edi, [esi].IOP_callback_ptr
 sub edi, size IOP_CallBack_Entry ; point to next available
 ; call-back entry
 mov [esi].IOP_callback_ptr, edi ; update call-back Pointer
 ; IOP pointer is passed on the stack
 push esi ; IOP's offset
 call dword ptr [edi] ; make the call
 add esp, 4 ; restore stack











































































TERSE: A Tiny Real-Time Operating System


A signature-scheduled, dataflow OS for distributed embedded systems




Barry Kauler


Barry is a member of the computer and communication engineering department at
Edith Cowan University. He can be contacted at bkauler@scorpion.cowan.edu.au.


Since their introduction in the 1970s, microcontrollers have found their way
into embedded systems ranging from washing machines to airplanes. Today's
complex systems use multiple microcontrollers--each dedicated to a specific
control and/or monitoring application--distributed across a network. 
Because many microcontrollers may have only 1024 bytes of ROM and 64 bytes of
RAM, shoehorning both a real-time operating system and an application onto a
single device can be a challenge, to say the least. Clearly, a small, reliable
real-time operating system is a fundamental requirement for many
embedded-system projects. To that end, I've developed TERSE (short for "Tiny
Embedded Real-time Software Environment"), an operating system that's only
about 260 bytes in size without network support, and only 450 bytes with it.
Although originally written for the 8051 family of microcontrollers
(specifically, the Phillips 87C750), TERSE can easily be ported to other
controllers. In this article, I'll focus on TERSE's design, implementation,
and use. The 8051 assembly-language source code for TERSE51 Version 1.13 is
available electronically from DDJ (see "Availability," page 3),
ftp://scorpion.cowan.edu.au/pub/terse, or
http://scorpion.cowan.edu.au/science/terse/terse.htm.
TERSE takes care of all message interaction between nodes on a distributed
system. TERSE fires (executes) a node when all required messages have arrived
(and in accordance with a schedule calculated at run time). It carries the
output messages to other nodes, even those on different processors.
Consider Figure 1(a), where only one processor unit (PU) is necessary. After
node 1 finishes, messages go off to nodes 2 and 3, which are then eligible to
fire (this is concurrency). However, since both nodes are on the same PU, they
can't really execute concurrently. Since TERSE doesn't support timeslicing,
one node would execute, then the other. However, you can simulate timeslicing
between nodes 2 and 3 by splitting each node into a serial string of nodes
using the scheduling-table setup to jump back and forth between each
concurrent path (or thread). Node 4 would then be eligible to fire.
Before calling a node, TERSE puts the input messages into registers, so the
node knows exactly where they are. Specifically, the 8051 has registers R0 to
R7, and TERSE puts the input messages into R1 to R4, along with a Flags
variable that indicates whether or not the messages have arrived.
Just before a node exits to the operating system, it pushes the output
messages onto the stack. Each message contains the destination node, terminal,
and data so that TERSE will know where to send messages. 
Figure 1(b) illustrates a more complicated problem with three interacting
processors. PU3 is physically connected to a device ("GIZMO A") on which
mutual exclusion must be enforced. With TERSE, you instantiate nodes 1 and 2
(or more) internally with identical code. TERSE automatically allows only one
to execute. 


How TERSE Works


TERSE uses one interrupt on each PU. This works a bit like the Microsoft
Windows message queue--when a node exits, its messages are placed in the
buffer (or queue). This buffer lives in the stack, so messages are posted by
the PUSH instruction. Despite this, the stack can still be used in the normal
fashion, within each node.
When a message arrives from another PU, it is received by an interrupt
routine, which either puts it in the stack/buffer or forwards it to the next
PU in the ring.
To post messages to destination nodes, the messages are simply popped offthe
stack, and TERSE automatically sends off any messages addressed to other PUs.
Some dataflow operating systems employ a static scheduling technique with a
fixed node-firing sequence. Since this is far too restrictive for a real-time
reactive embedded application, TERSE uses a combination of static and run-time
calculation of scheduling. In Figure 1(b), after node 1 exits it's logical to
fire node 2 first (so it can send a message off to PU3) and then fire node 3.
TERSE examines currently active "arcs" (data being output from one node and
sent to the next), particularly those that have messages on them, as well as
the order of the messages' arrival. With this information, TERSE calculates a
single 16-bit signature, then looks up this signature in a lookup table to
determine which node to fire next. Thus, the operating system is completely
deterministic.
Remember that the messages are all sitting in the stack in order of their
arrival. TERSE simply looks through this string of messages and calculates a
code (similar in concept to the CRC code used in disk storage) unique to that
string. This "signature-scheduling" technique is one of TERSE's
more-attractive features.
Consider again PU2 in Figure 1(b). If node 1 posts a message to node 2 and to
node 3 (in that order) when node 1 exits, then those two messages on the stack
form a unique pattern; hence a unique 16-bit signature. TERSE will see in the
lookup table that node 2 is to be fired next. If node 1 posts the message to
node 3 before pushing the message to node 2, a different pattern is formed and
a different signature is generated. For node 1 to control whether node 2 or
node 3 fires next, both signatures must be in the lookup table. This scenario
illustrates how TERSE allows run-time variations on the execution path--but it
can only happen if you specifically put those signatures into the lookup
table.
Another possibility is that node 1 is a rogue that erroneously outputs
messages due to buggy internal code, (which may be in the wrong order, only
one message, or nonexistent). "Wrong" message patterns generate signatures
that are not in the lookup table, leading TERSE's signature scheduling
capabilities to automatically call node 0, which is always reserved as the
error handler. Node 0 can get a good idea of what is wrong by looking at the
messages on the stack. (TERSE avoids "signature explosion" due to messages
arriving asynchronously from other PUs.)
In short, TERSE allows run-time scheduling, but only over paths that you have
allowed. If a node "goes wrong," TERSE knows this as soon as the node exits.
If a node gets hung up internally, the processor will need a different timeout
mechanism (a watchdog timer, for example); alternatively, TERSE's network
interrupt routine can have a timeout check. (TERSE acceptably handles any
other interrupts, too--they simply go to an ISR inside a node.)
TERSE also supports "notifications"--messages that take no part in the
signature calculation, and thus do not affect node firing even if they go to a
node. Notifications are an important extension of the classic dataflow model
for real-time reactive engineering applications. 
In terms of a classic dataflow diagram like Figure 2, a TERSE diagram
corresponds to the unbroken-line arcs and circles, while TERSE's
signature-scheduling table corresponds to the dashed arcs and circle.
Notifications, however, are not covered in the classic model of Figure 2.
These can be thought of as global variables, but without the disadvantages. A
notification, in global-variable terms, is only written to at one place in the
diagram, but can be read from many places--this is very important for stable
designs.


Putting it Together


To put the TERSE design to work, my prototype testbed was the Phillips DS750
development kit that includes a board with the 87C750 and 87C752 EPROM
microcontrollers, an assembler, and a simulator that lets you test programs on
a PC or download them to the board over a serial cable. The 87C750 has 1 KB of
EPROM, 64 bytes of RAM, and various I/O. 
I designed TERSE51 to work with up to eight processors (connected in a simple
ring) using a parallel port--split into two nybbles, one going each way. The
87C750 does not have a UART, so I left that option out. (I'd like to see
someone modify TERSE for other network designs and topologies.
Microcontrollers with built-in network processors, Motorola's Neuron, and
Charles Moore's F21 are likely candidates.) 
Designing a TERSE-based system begins with scribbling flow diagrams on the
back of an envelope. The target system affects the design process. Basically,
a microcontroller is associated with a piece of hardware. Microcontroller A
might control Gizmo A, while Microcontroller B might monitor Widget B, and so
on. In some cases, two or more microcontrollers might be involved with one
physical entity; or microcontrollers of different speeds, instruction sets,
I/O, or memory may be required.
The top layer of software functionality, therefore, tends to follow the
physical functionality of the system, and this is how I recommend you design
the top-layer diagram. Remember that the internal bus of a processor is much
faster than the network, and it is usually easy to select the functionality of
each processor to meet speed/resource requirements.
Within a microcontroller, the flow-diagram should usually require only one
decomposition step. The top layer is a single node to represent the entire PU,
which decomposes to a large-grain dataflow diagram.
Various rules can be followed for this decomposition, such as a shared
function being a clone node, a shared resource, and so on. The objective is no
more than 32 nodes per PU, since that's all this version of TERSE supports. A
PU may have two or more isolated diagrams (no arcs between them), though TERSE
designs are based upon identifiable start and end nodes in each PU, so that
there is a clearly defined cycle.
Ideally, decomposition should proceed such that all communication between
parts of the program, within the PU or to another PU, takes place at the
dataflow messaging level. Interprocess communication should not take place
directly from within a node, to another, with shared variables--this is a last
resort!
If nodes have many pages of source code, decomposition should proceed until
each node contains small code segments. One printed page of source text, or
shorter, is reasonable.
Decomposition can proceed in other ways. You could delay the division of
software functionality onto specific processors and take a more-abstract
software approach first. 
Listing One illustrates how to write code for each node. Input messages are in
registers R1-R4, and Flags contains flags indicating the presence or absence
of a message. Normally, a message would be there, but TERSE allows the nodes
to fire even if there isn't. An example of this is the Notification message
(it also allows TERSE to perform a GOTO, if required). The POSTMSG macro makes
it easy to post messages in the correct format.
A node has full use of the stack, but the second, third, and fourth register
banks must not be directly used because they hold the stack (though it could
be shifted). All of the registers from 20h upwards are free, except for five
reserved locations, one of which must be bit-addressable.
Apart from writing the code for each node, you must also fill in the
signature-scheduling lookup table; see Listing Two. The first two rows show
the signature and the corresponding node that TERSE will execute. TERSE
locates the column with the correct signature and uses that index value to
perform a programmed jump via the nodeptr: array to go to the node.
Working out the signatures is fairly straightforward. The program SIG-CALC.ASM
(Listing Three) prompts you to enter the destination node and terminal of each
active message and returns the signature. (SIG-CALC.EXE, the executable
version, is available electronically.) Go through the flow diagram step by
step, plotting all allowable paths and entering a corresponding signature for
each.
It is easy to change TERSE51 to work with signatures smaller than 16 bits,
even down to eight bits; I recommend the 8-bit signature only for
single-processor designs due to the probability of "chaos" in a distributed
system.
I'm currently working on a GUI that will simulate designs and generate code.
In the meantime, I use National Instruments' LabView for timing analysis. With
LabView, I string the nodes of each PU along a time axis, with execution times
entered to each node. The network is treated as a "virtual PU," with its own
time axis. The network nodes uncover any timing problems, as the cycle time of
the receiver diagram must always be less than that of the sender diagram,
which can be checked by the network nodes. The LabView solution is temporary
but useful.



Conclusion


There are numerous possibilities for taking TERSE to the next level. For
instance, TERSE51 supports up to 32 nodes per processor. On a more powerful
system--say, an 80x86-based platform--TERSE's capabilities could be expanded.
You could timestamp messages and use an algorithm that enables all processors
to automatically synchronize their internal clocks, allowing two or more
processors to perform some operation at "exactly" the same time.
You could also implement a faster scheduling-table lookup. TERSE51 1.13 uses a
linear search, but you could presort entries in order of numerically ascending
signature value for faster lookup.
When scaling up, having a single lookup table per processor becomes a problem.
Consequently, you might want to use multiple independent diagrams on the one
processor, each with its own cycle times. These could be considered "logical
processors." 
The first network designed for TERSE51 is admittedly primitive. Porting TERSE
to a processor like the Standard Microsystems Corp. COM20051 chip with inbuilt
Network Coprocessor, will also improve performance.
Finally, it would be very useful to develop a program that analyzes the
scheduling tables and determines whether the diagram will always complete,
detect deadlocks, determine reachability of all nodes, and--in conjunction
with the diagrams--perform timing analysis.
The electronic clearinghouse for TERSE-related information is at
ftp://scorpion.cowan.edu.au/pub/terse. Updates will be posted as TERSE
undergoes development. To participate in the further development of TERSE,
please upload any contributions to the /incoming directory and notify me via
e-mail.
Figure 1: (a) One-processor problem; (b) three-processor problem.
Figure 2: A conventional flow diagram, showing dataflow and control flow.

Listing One
node0: ;node-0 is the error-handler
;test for DEADLINE/SHUTDOWN/RESTART....
 jnb flags.0,nodead
 jnb flags.1,nosh2
 ;SHUTDOWN. If responding remote or local broadcast msg, it has
 ;already been forwarded, so now do local response...
 ;...PUT ALL I/O INTO A SHUTDOWN STATE
myself: ajmp myself
nosh2: jnb flags.2,nore2
 ;RESTART. If responding remote or local broadcast msg, it
 ;has already been forwarded, so now do local response...
 mov sp,#emptyst
 ajmp start1 ;restart program.
nore2: ;DEADLINE OVERRUN.
 ;could maybe broadcast a global shutdown, or some error msg...
 ajmp nodereturn ;do this to carry-on regardless.
nodead:
;test for wait-state....
 jnb flags.5,notwait
 ;can perform deadline timeout if required, else just return...
 ajmp nodereturn
notwait:
 ;...process more msgs
;default behaviour of node-0...
 mov sp,#emptyst ;dump any msgs left on stack.
;for consistency, error-handler should *not* post any msgs. 
;By emptying stack, after exit from here execution will restart 
;from whatever node corresponds to signature=0 in the lookup table.
 ajmp nodereturn
;...................................................................
node1: 
;assuming this is starting-node, it will not have any i/p msgs.
 ;...code...
 POSTMSG 1,2,1,0,#34h ;pu-1,node-2,term-1,not-notif.,immediate-data.
 POSTMSG 1,3,1,0,#00h ;pu1,node3,term1,not-notif,immediate-data.
 ajmp nodereturn
;...................................................................
node3:
 jnb flags.1,n31 ;jump if no msg on terminal-1.
 ;...msg is in r1...process it
n31:
 POSTMSG 1,4,1,0,32h ;pu1,node4,term1,notnotif,direct-data.
 ajmp nodereturn
;..................................................................

node2:
 jnb flags.1,n21 ;jump if no msg on terminal-1.
 ;... msg is in r1...process it
n21:
 POSTMSG 1,4,2,0,#00h ;pu1,node4,term2,0,immediate-data.
 ajmp nodereturn
;...................................................................
node4:
 jnb flags.1,n41 ;jump if no msg on terminal-1.
 ;...msg is in r1...process it
n41: jnb flags.2,n42 ;jump if no msg on terminal-2.
 ;...msg is in r2...process it
n42:
 POSTMSG 2,1,1,0,#56h ;remote message!
 POSTMSG 1,2,2,1,#11 ;notification, back to node-2,term-2.
 ajmp nodereturn

Listing Two
signatures: DB 255,00,00h,0A0h,0C8h, 012 ;starting node has signature=0.
sighigh: DB 255,00,03h, 00h, 00h, 034
nodenum: DB 0, 1, 3, 2, 4,WAIT 
nodeptr: ;these occupy 2 bytes each.
 ajmp node0 ;1st node in table is error-handler(always node-0).
 ajmp node1 ;this must be in ascending order
 ajmp node2 ;of node number.
 ajmp node3 ;
 ajmp node4

Listing Three
.MODEL SMALL
.STACK
.DATA
 DB 10 DUP(0)
asciitbl DB 7 DUP(0)
 DB "$","$"
intromsg DB 0Ah,0Dh
 DB "Type-in active-set of messages, crtl-z to clear, crtl-x to quit"
 DB 0Ah,0Dh
intr2 DB "Do not enter remote-output, nor Notification messages.",0Ah,0Dh
intro3 DB "FOR TERSE51: Node range = 1 - 32. Terminal range = 1 - 4."
 DB 0Ah,0Dh
 DB "FOR TERSE51: Signature-calc starts from latest msg on stack."
 DB 0Ah,0Dh,0Ah,0Dh,"$"
destnode DB "Destination-node: $"
destterm DB " Destination-terminal: $"
sigtxt DB " SIGNATURE:$"
newline DB 0Dh,0Ah,"$"
signature DW 0 ;16-bit signature
input DB 0 ;binary input
header DB 0 ;used for sig-calc
.CODE
start:
 mov ax,@DATA
 mov ds,ax
start2:
;intro message...
 mov ah,9
 lea dx,intromsg
 int 21h

start3:
;Ask for destination node....
 mov ah,9
 lea dx,destnode
 int 21h
 call getinput ;comes back with "input", in binary...
 mov al,input
 and al,00011111b
 shl al,1
 shl al,1
 shl al,1
 mov header,al
;ask for destination terminal...
 mov ah,9
 lea dx,destterm
 int 21h
 call getinput
 mov al,input
 dec al ;as terminal no. stored as 0 - 3.
 and al,011b
 shl al,1
 or header,al
;calculate the signature....
 mov ax,signature
 test ax,8000h
 jz sig1
 xor ax,0100010000010001b
sig1: xor al,header
 rol ax,1
 mov signature,ax
 
;convert it to ascii....
 mov ax,signature
 mov dx,0
 call bin2dec
 mov BYTE PTR asciitbl+7,"$" ;hex o/p stuffs this up.
;display result...
 mov ah,9
 lea dx,sigtxt
 int 21h
 mov ah,9
 lea dx,asciitbl
 int 21h
 ;also display signature in hex...
 mov bx,signature
 mov cl,4
 rol bx,cl
 mov ax,bx
 and ax,000Fh
 cmp al,9
 jbe xx
 add al,7
xx: add al,30h
 mov asciitbl+1,al
 rol bx,cl
 mov ax,bx
 and ax,000Fh
 cmp al,9
 jbe yy

 add al,7
yy: add al,30h
 mov asciitbl+2,al
 rol bx,cl
 mov ax,bx
 and ax,000Fh
 cmp al,9
 jbe zz
 add al,7
zz: add al,30h
 mov asciitbl+3,al
 rol bx,cl
 mov ax,bx
 and ax,000Fh
 cmp al,9
 jbe mm
 add al,7
mm: add al,30h
 mov asciitbl+4,al
 mov BYTE PTR asciitbl+5,"h"
 mov BYTE PTR asciitbl+6,0Ah
 mov BYTE PTR asciitbl+7,0Dh
 mov BYTE PTR asciitbl+8,"$"
 mov ah,9
 lea dx,asciitbl
 int 21h
 jmp start3
getout:
 mov ax,4C00h
 int 21h
getinput:
 mov input,0
getinput2:
 mov ah,0
 int 16h
 cmp ax,2C1Ah ;crtl-z
 jne notcrtlz
 mov signature,0 
 mov ah,9
 lea dx,newline
 int 21h
 pop ax ;dump return address
 jmp start2
notcrtlz:
 cmp ax,2D18h ;crtl-x
 jne notcrtlx
 mov ah,9
 lea dx,newline
 int 21h
 jmp getout
notcrtlx:
;test if outside range 0-9...
 cmp al,30h
 jb below0
 cmp al,39h
 ja above9
inrange:
;echo it to screen...
 mov ah,0Eh

 push ax
 mov bx,02
 int 10h
 pop ax
 and al,0Fh ;convert to binary
 mov ah,input
 and ah,0Fh ;see if 2nd char already there
 jnz second
 ;mov cl,4
 ;shl al,cl
 mov input,al ;save in lo-nibble.
 jmp getinput2
second:
 mov cl,4
 shl input,cl
 mov ah,input
 or al,ah ;combine them
 mov input,al ;save. ...value is in bcd format.
below0: ;any char outside range, terminates entry
above9: ; /
;convert input to binary....
 mov al,input
 and al,0F0h
 mov cl,4
 ror al,cl
 mov dl,10
 mul dl ;result --> ax
 mov bl,input
 and bl,0Fh
 add al,bl ;now have binary value.
 mov input,al ; /
 ret 
;...................................................................
bin2dec PROC ;requires binary number in dx:ax...
 lea di,asciitbl
 mov bx,di
 add di,6 ;8
 ;mov BYTE PTR [di],0
 ;dec di
ssss:
 mov cx,10
 div cx
 add dl,30h
sssss:
 mov [di],dl
 dec di
 mov dx,0
 cmp ax,0
 jne ssss
 mov dl," "
 cmp bx,di
 jbe sssss
 ret
bin2dec ENDP
 END start




































































An Application-Access Security Model


Designing security systems based on job-related roles




Mark Robinson


Mark is a PowerBuilder consultant with Toronto-based Data Management
Consultants (DMC), specializing in the delivery of custom Oracle solutions.
Mark can be reached at 75462.422@compuserve.com.


One of the fundamental roadblocks at the beginning of any client/server
development project is implementing access security. From user accounts and
menu accessibility to managing individual controls on a window, application
security is usually implemented as a mishmash of unrelated, nonuniform,
hardcoded, poorly considered attempts to block unwanted access to an
application's functionality. In this article, I'll present an integrated,
generic, reusable model of network security based on object-oriented concepts.
To illustrate these concepts programmatically, I'll provide examples written
in PowerBuilder that demonstrate the flexibility and power of this model. The
code is provided only to demonstrate communication methods between objects,
not as a fully coded security handler.
As client/server and network technologies grow, so does the target user group
of an application. Applications are continually breaking the traditional
limitations of being focused on specific functional areas. The current trend
is to develop enterprise-wide applications with a large, heterogeneous user
community. At the same time, companies are attempting to provide access to
their data on a need-to-know basis. There is an ever-growing need to approach
application/data accessibility as a concept on its own.


Security Goals


Access-security implementations have two main goals. The most important is to
prevent access to private information by unauthorized users. This can be
accomplished by blocking access to an application, module, window, or even a
control of a window. The second goal is to make the application easier to use
by not showing users functionality that does not have a direct bearing on
their job. This allows large applications to appear smaller and easier to use.
From a practical point of view, a security implementation must function as an
integrated unit. But during the analysis and design phases, most projects
focus only on what data must be secure, not how to secure it. When the
development phase starts, security is implemented on an ad hoc basis, with
each developer being responsible for the security of his or her own modules.
The underlying concept of this article is that security should be integrated
into the functional foundation of an application. Developers will interface
with a common security object to limit accessibility to their modules. This
object will know how to inform other objects about the current user's
accessibility privileges.


Object Leveling


Object-based applications form a natural hierarchy of objects. Application
objects contain window objects, which contain control objects. From the
perspective of a security model, all of these objects can be leveled, thus
allowing object-based security. Object leveling is the process of manipulating
all different classes of objects in the same manner by creating a common
interface. Thus, the security handler interacts with all windows, child
windows, buttons, and controls in the same way; it does not differentiate
between different classes of objects. Objects with very specific hierarchical
requirements can be handled by each object individually. The security handler
assumes the role of informant, informing objects of initial settings and mode
changes.


Security-Model Overview


In this security model, a relationship is formed between business-oriented
tasks and application-oriented objects. For our purposes here, I'll use two
groupings: roles and objects. Users can have one or more roles within a
company, and a role can be performed by one or more users. Objects can be
manipulated by one or more methods, and methods can be applied to one or more
objects; see Table 1.
A relationship between roles and objects is formed to define the methods used
when the object is instantiated. This relationship defines the accessibility
and the level of control that a role has over an object.
Designing a security system based on your own job-related roles allows your
company to use a common, enterprise-wide definition of data accessibility. A
warehouse clerk, for instance, can only update product inventory. Designing a
security system based on a set of your own common business objects allows you
to manipulate data in a consistent manner from any application. A
product-inventory object can be made accessible through any application and
will always look and behave the same way. By mapping roles to objects, your
company will benefit from a single security database with which all
applications will interact, as shown in Figure 1.


Security-Handler Components


The security handler is a self-contained object with two broad areas of
responsibility. First, it must have the ability to determine a user's roles
and accessible objects from the security database. Because this database is
intended to be corporate wide, it makes sense to maintain the security data in
its own database. Security settings are generally static, so the security
handler will read in all available information for this user. This is an
implementation choice that can be made during the build phase. The trade-off
here is local-memory storage requirements versus application speed.
Secondly, the security handler must be able to instruct objects to apply the
defined methods at the appropriate time. The security handler has two methods
of communicating with objects. The first and most common is Object Direct,
which is generally used when the object is first created and is about to be
displayed to the user; see Figure 2. Using this method, the security handler
informs a specific object about the rights and privileges of the current user.
The object will then configure itself accordingly. The second method of
communication is Object Broadcast, which allows the security handler to make
application-wide announcements; see Figure 3. Usually, these announcements
involve state changes within an application, such as a window going into
update mode. The security handler will broadcast a state-change message to all
parent windows, which will then process the message and inform all of their
dependent windows and controls.


Security Database


The database for this model contains the relationships formed between roles
and objects. The database is enterprise wide; if built carefully, these
relationships are valid for any application. For Object Direct communication,
the required fields are Role Name, Window Name, Control Name, Method Name, and
Attribute Value. Object Broadcast tends to be application specific and does
not rely on a database. When the application is started, the user's roles are
used to load the security data. If a user has more than one role, there may be
overlap in the security data. This is especially true for supervisory and
management roles. To resolve the overlap, the loading process should allow the
roles with the greatest accessibility to dominate. For example, a user may
perform clerk duties but also be a manager. The clerk role would prohibit
deletion, but the manager role would allow it, so the database should allow
the user to have delete capability.


Using PowerBuilder


To implement this model, all of your objects must be inherited from a base
object class that contains the necessary communication interfaces. The base
class for windows will be called w_base. This class provides the necessary
object leveling for the security handler. In general, it is poor programming
practice and counter to all object-oriented concepts for one object to
directly manipulate attributes of another. For practical purposes, you may
find it necessary to limit this rule to window boundaries. In other words, one
window is prohibited from directly altering attributes for objects in another,
but a window may directly alter the attributes of objects it itself contains.

Each object class will have two local functions--f_Set() and f_Get()--to
process information about object attributes. f_Set() will be used as the
primary entry point for each object. If one object needs to tell another
object to do something, it uses this function. Listing One is one way to
implement f_Set(). On the other hand, f_Get() (see Listing Two) is used by
other objects to ask questions about attribute values and operating states.
Using these two functions, the security manager can control the behavior of
objects by assuming the role of informant.
PowerBuilder provides customizable User Objects; see Table 2. The security
handler will be implemented as a nonvisual User Object with functions and
structures defined within it. The security handler will have two registration
functions, f_Register() and f_DeRegister(), and two communication functions,
f_Direct() and f_Broadcast().
f_Register(). When a window is opened it must register itself with the
security handler. The registration process allows the security handler to
store information about the window, and is mainly used for Object Broadcast.
Listing Three illustrates this function.
f_DeRegister(). When a window is closed, it must inform the security handler.
The deregistration process allows the security handler to remove information
previously registered; see Listing Four.
f_Direct(). During the process of instantiating a window, this function is
called to instruct the object to execute specific methods to control user
access. In addition, this function can be used for localized mode changes; see
Listing Five.
f_Broadcast(). Certain conditions may arise during program execution that
require that all objects be notified of a specific event or situation. This
function informs the top-level windows of the change, and each window is
responsible for informing any subordinate objects; see Listing Six.
As each window opens, it interacts with the security handler to configure
itself. The code for the interaction resides in the W_BASE Open event of the
base window object because every window must communicate with the security
handler; see Example 1(a). 
As each window closes, the security handler needs to deregister it. Example
1(b) shows the W_BASE Close event for the base window object. 


Managing the Security System


To manage the security system, you will need to write an application to define
roles, browse object libraries, match methods against objects, and store the
access privileges in your security database. Clearly, having common objects
and roles already identified and defined is beneficial. However, building the
security-management system itself is beyond the scope of this article.


Implementation Strategies


During the development phase of a project, all developers dislike being
hampered by application-security restrictions. This security manager can be
particularly annoying because it is both object and role based. I have found
it useful to have a temporary security manager that merely says "yes" to
everything. This means that you must have a specific security test during your
application-testing phase (you should, anyway). When it is time to create an
executable program, the real security-manager object can be substituted
without having to change any code.
Another factor to consider is the proportion of total screens that require
security. If most screens are commonly available to all users, it should be
assumed that all screens are initially active and the security handler must
disable them based on security. If most screens are not commonly available to
all users, it should be assumed that all screens are initially inactive, and
the security handler must enable access to screens based on the security
settings.
Providing communication between objects has many benefits beyond access
security. Using this model as a foundation, you can create an entire
communication infrastructure embedded within the base of objects of your
development environment.
Figure 1: A shared security database.
Figure 2: Security handler communicating via the Object Direct method.
Figure 3: Security handler communicating via the Object Broadcast method.
Table 1: PowerBuilder user objects.
Object Description
Custom A collection of standard PowerBuilder controls that you
 create to perform a specific operation, such as a
 tab-folder object or a customer-profile maintenance
 utility.
External An object that contains controls defined by an external
 DLL. When using multiple development packages, you can
 create common objects in a DLL and access them through
 external objects. Other possibilities include
 communication or network objects.
Nonvisual This type of user object encapsulates functions and
 attributes but is never visible. Used primarily to
 create specific processing functionality bundled into a
 single entity such as a window-manager object.
Standard A preexisting, customizable PowerBuilder control such
 as a button or edit control. By default, its attributes
 and events are initialized to the standard PowerBuilder
 settings for that object. Used primarily to create
 object classes for standard PowerBuilder objects.
VBX PowerBuilder supports Visual Basic controls with special
 attributes and events. You must create objects using
 Visual Basic or purchase them as object libraries.
Table 2: PowerBuilder customizable User Objects.
Term Definition
Role An identifier for a definable set of business-related
 activities that can be used to classify employees into 
 workgroups. More than one employee may perform a specific 
 role, and more than one role may be performed by a 
 specific employee. Examples of role designations include 
 accounting clerk, production foreman, and VP sales.
Method Describes an operation used to manipulate an attribute
 of an object. More than one method may affect the same
 attribute differently. Examples of methods include

 SetVisibility and SetValue.
Object A discrete unit containing its own attributes.
 Objects can be hierarchical as long as the composite
 object has attributes specific to itself that are not
 obtained from its component objects. Examples of
 objects are windows, buttons, and scroll bars.
Attribute A single unit of information used to describe
 a specific characteristic of an object. Examples of
 attributes are visible, horizontal position, and font.
Example 1: (a) W_BASE Open event; (b) W_BASE Close event.
(a)
// The window (this) registers itself as a
parentSecurityHandler.f_Register(This, True)// The window requests the
security handler to provide access informationSecurityHandler.f_Direct(This)

(b)
// The window (this) informs the security handler that it is
closingSecurityHandler.f_DeRegister(This)

Listing One
Boolean f_Set(sObject, sMethod, sAttributeValue)
Parameters Description
sObject A string identifying the object that is to be modified.
sMethod A string identifying the name of the method to be used to 
 modify and attribute of sObject.
sAttributeValue A string representation of the new value to set by sMethod.
Script
CHOOSE CASE sMethod
 CASE 'SHOW'
 CHOOSE CASE sObject
 CASE 'CB_SAVE'
 cbSave.Visible = TRUE
 CASE 'CB_DETAIL'
 cbDetail.Visible = TRUE
 END CHOOSE
 CASE 'HIDE'
 CHOOSE CASE sObject
 CASE 'CB_SAVE'
 cbSave.Visible = FALSE
 CASE 'CB_DETAIL'
 cbDetail.Visible = FALSE
 END CHOOSE
 CASE 'TEXT'
 CHOOSE CASE sObject
 CASE 'CB_SAVE'
 cbSave.Text = sAttributeValue
 CASE 'CB_DETAIL'
 cbDetail.Text = sAttributeValue
 END CHOOSE
END CHOOSE
RETURN TRUE

Listing Two
String f_Get(sObject, sAttributeName)
Parameters Description
sObject A string identifying the object that has the desired attribute.
sAttributeName A string identifying the attribute whose value is requested.
Script
string sAttributeValue
CHOOSE CASE sAttributeName
 CASE 'VISIBLE'
 CHOOSE CASE sObject

 CASE 'CB_SAVE'
 IF cbSave.Visible = TRUE THEN
 sAttributeValue = 'TRUE'
 ELSE 
 sAttributeValue = 'FALSE'
 END IF
 END CHOOSE
 CASE 'TEXT'
 CHOOSE CASE sObject
 CASE 'CB_SAVE'
 sAttributeValue = cbSave.Text
 CASE 'CB_DETAIL'
 sAttributeValue = cbDetail.Text
 END CHOOSE
END CHOOSE
RETURN sAttributeValue

Listing Three
Boolean f_Register(wRegistrant, bParent)
Parameters Description
wRegistrant A window inherited from base window class, w_base. This window
 is to be registered by the security handler.
bParent A boolean flag indicating if this window is a top level window.
Script
MaxWindows ++
Registered[MaxWindows].Window = wRegistrant
Registered[MaxWindows].bParent = bParent
RETURN TRUE

Listing Four
Boolean f_DeRegister(wRegistrant)
Parameters Description
wRegistrant A window inherited from base window class, w_base. 
 This window is to be de-registered by the security handler.
Script
int i
FOR i = 1 to MaxWindows
 IF Registered[i].Window = wRegistrant THEN
 IF i = MaxWindows THEN
 MaxWindows --
 ELSE
 Registered[i].Window = Registered[MaxWindows].Window
 Registered[i].bParent = Registered[MaxWindows].bParent
 MaxWindows --
 END IF
 EXIT
 END IF
NEXT
RETURN TRUE

Listing Five
Boolean f_Direct(wTarget)
Parameters Description
wTarget A window inherited from base window class, w_base. This window
 defines the target for the security handler. In most cases, a 
 window will request security information about itself.
Script
int i
FOR i = 1 TO MaxEntries

 IF ClassName(wTarget) = Security[i].sWindowName THEN
 wTarget.f_Set(Security[i].sControlName, 
 Security[i].sMethodName,& Security[i].sAttributeValue)
 END IF
NEXT
RETURN TRUE
Notes: MaxEntries is an instance variable in the security handler 
identifying how many entries were loaded from the security database.
The Security array is a structure containing the window names, control 
names, method names, and attribute values that were loaded from the security 
database.

Listing Six
Boolean f_Broadcast(sMessageId, sMessageValue)
Parameters Description
sMessageId A string identifying a pre-defined message to be broadcast.
sMessageValue A string containing a value (if necessary) to elaborate on 
 the message. This parameter is used if there is a variable 
 component to the broadcast message.
Script
int i
FOR i = 1 to MaxWindows
 IF Registered[i].bParent = TRUE THEN
 Registered[i].Window.f_Set("BROADCAST", sMessageId, sMessageValue)
 END IF
NEXT
RETURN TRUE
Notes: MaxWindows is an instance variable containing the number of windows
that are currently registered.
The Registered array is a structure containing a reference to each registered
window and a boolean flag that is TRUE if the window is a parent level window.


































VToolsD for VxD Development


A VxD toolkit for C/C++ programmers




Charles Mirho


Charles works in Silicon Valley, specializing in communications, multimedia,
and games. He can be reached at 70563.2671@compuserve.com.


VToolsD for Windows from Vireo Software is a C/C++ toolkit for writing virtual
device drivers (VxDs). It is designed as a replacement for Microsoft's Device
Driver Kit (DDK). In addition to a visual-programming environment, the toolkit
provides a code generator for dynamically loaded/unloaded device drivers.
VToolsD supports VxD development for Windows 3.1 and Windows for Workgroups
3.11 under both Microsoft Visual C++ and Borland C++. In addition, Vireo
recently released VToolsD for Windows 95, which supports features of the new
OS-like plug and play.
VToolsD comes with libraries that "wrap" calls to the Virtual Machine Manager
(VMM) and other VxDs into a C calling syntax. It includes header files with
useful macros and definitions, and a C run-time library that implements
VxD-safe versions of the most-useful C functions (malloc, strcpy, sscanf, and
the like). The toolkit also includes source code to the libraries and a wizard
that creates a skeleton VxD. For C++ programmers, there are class libraries to
access VMM and other VxD services in an object-oriented way, as well as
classes to simplify VxD programming. 
To examine the VToolsD toolkit, I'll develop a Windows 3.1 VxD that implements
a simple autocorrect feature like those in word processors. The VxD looks for
the keyboard sequence <space>adn<space> in any VM, assumes the user made an
error, and replaces it with <space>and<space>. In other words, the VxD assumes
that the user meant to type the word "and" but transposed two of the
characters. I call this VxD "Spellx" because it imparts even the lowliest text
editor with autocorrect capability. I wrote the VxD entirely in C, compiled it
with Microsoft Visual C++ Compiler Version 2.0 (C9 compiler), and debugged it
with the Nu-Mega Soft-ICE resident debugger. 


QuickVxD Wizard


When writing VxDs with VToolsD, you'll likely start with the QuickVxD wizard.
QuickVxD is simply a pair of dialog boxes for entering information about the
VxD you're creating. Once you provide the information, QuickVxD generates a
skeleton VxD, a header file for it, and a make file. You can't run QuickVxD on
the skeleton after making manual changes to it because your changes will not
be saved by QuickVxD. Of course, you can make them to a second skeleton, then
copy the changes over to your working copy. 
Figure 1 is the main dialog. The collection of controls in the upper left is
called "Device" and defines information in the VxD device-descriptor block.
This information includes the name of the VxD (which also becomes the VxD
filename, header filename, and make filename), the VxD ID value (if the VxD
exports services), the initialization order when VMM loads the VxD at Windows
launch time, and the major and minor version numbers of the VxD. The result of
completing this section is the line Declare_Virtual_Device(SPELLX) in the VxD
skeleton; see Listing One. Listing Two is the equivalent skeleton in C++.
The name SPELLX appears in the declaration. QuickVxD creates a header file for
the VxD containing declarations for the major and minor version numbers, the
device id, and the initialization order. These definitions are used to expand
the Declare_Virtual_Device macro. For Spellx, the major and minor versions are
1 and 0, the device id is undefined, and the initialization order is
VKD_INIT_ORDER+1. This produces the header file in Listing Three. These
definitions are used to expand the declaration of the device-descriptor block
as in Example 1. (The complete source code for the SPELLX.386 VxD is provided
electronically; see "Availability," page 3.) 
The main dialog includes options for creating the skeleton in C or C++ and
including or excluding debug information in the build. The debug symbols
supported are for Soft-ICE and WinDeb386 (the same as for the Microsoft C8 and
C9 compilers). The source code for the VxD did not exactly match up with the
executable instructions, but this is likely a problem with Soft-ICE, and the
discrepancies were minor. The dialog includes options for which entry points
the VxD should have: vendor-specific V86 mode, vendor-specified protected
mode, and the V86 and PM entry points called by VMM when the VxD registers a
device id with VMM. Checking off these boxes saves work because QuickVxD will
create empty entry points in the skeleton for you to fill in. If the VxD
exports any services, QuickVxD can list the prototypes for these exported
services in the skeleton.
Clicking on the Control Messages button causes the QuickVxD Control Messages
dialog to be displayed. Each check box corresponds to a VMM message for the
control dispatch routine to trap and process. For each box checked, a trap is
inserted into the control dispatch routine in the VxD skeleton. The trap will
call an (initially) empty handler function to process the event message. It's
your job to fill in these handlers with useful code.
Next, clicking the Generate Now button creates the C or C++ skeleton, header
file, and make file for the VxD. QuickVxD will ask you for the pathnames under
which to save these files. To build the VxD, open a DOS box and run nmake -f
<name of make file>. When I was writing Spellx, the example skeleton built on
the first try without warnings or errors.


The Spellx VxD


The Spellx VxD monitors all VMs for the keyboard sequence <space>adn<space>
and replaces the assumed erroneous characters. Of course, this can cause
problems if you really meant to type <space>adn<space>, but you could modify
the VxD to disable itself when another key sequence is typed. 
Spellx requires the exported services of the virtual keyboard driver (VKD).
Because it depends on keyboard services, Spellx should be loaded after VKD. I
set the initialization order equal to VKD_INIT_ORDER+1 to make sure Spellx is
loaded after VKD. Next, I opened the Control Messages dialog by clicking on
the Control Messages button. Spellx traps five control messages: DEVICE_INIT,
for one-time initializations when the VxD is loaded; SYS_VM_INIT, to create a
context for the virtual machine; VM_INIT, to create a context for each
additional VM that is created; SYS_VM_TERMINATE, to clean up the system VM
context; and VM_TERMINATE, to clean up a VM context when the VM is closed.
Spellx needs a context for each VM, because otherwise the program confuses
sequential keystrokes from different VMs and assumes the user was typing "adn"
when the user was actually just switching between several applications in a
way that looked like the offending sequence. Spellx records the keyboard
history for each VM separately, so that it can compare each VM's keyboard
activity to the offending pattern. This makes Spellx less prone to mistakes. I
generate output in C, with Soft-ICE debugging symbols.
Listing One is the skeleton generated by QuickVxD as a result of these
settings. After the device declaration, macros contained in the skeleton
declare control handlers for the five VMM control messages that the VxD
processes. The DefineControlHandler(MessageType, HandlerName); macro expands
to extern MessageType_type HandlerName;. For example,
DefineControlHandler(VM_INIT, OnVmInit); expands to extern VM_INIT_type
OnVmInit;. Parameter checking is not done by the prototype. This isn't a
problem because the call to the control handler is also generated by the
Wizard. Next comes the VxD control dispatcher, whose entry point has the
declaration BOOL ControlDispatcher (DWORD dwControlMessage, DWORD EBX, DWORD
EDX, DWORD ESI, DWORD EDI). 
The actual entry point to the control dispatch routine is in the VToolsD
prologue code, which is linked to the skeleton automatically by the make file.
The prologue code reformats the register information passed from VMM to the
control dispatcher into C parameters and then calls the skeleton entry point
ControlDispatcher.
Within the body of the control dispatcher, calls to the various event handlers
are implemented with message-cracker macros. Example 2 lists the five VMM
message traps Spellx requires. The START_CONTROL_DISPATCH and
END_CONTROL_DISPATCH macros expand into a switch statement on
dwControlMessage. Each ON_xxx macro expands into a case statement and call to
the appropriate handler for the message. For instance, the statement in
Example 3(a) expands to Example 3(b).
Of course, the wizard only creates skeleton code--it can't write the code that
makes a VxD do something useful. However, the toolkit contains two libraries
that facilitate writing useful code. The first is a C run-time library full of
the functions that made C popular in the first place: calloc, sprintf, itoa,
strcpy, and so on. The second is a library of wrapper functions for calling
VMM and VxD services from C or C++. 


C Run-Time Library


The C run-time library is one of the more useful VToolsD features. The library
contains only a subset of the routines in a typical C library. The available
routines are those most useful for VxD programming--memory allocation, port
I/O, string manipulation, data-type conversion, and the like. I used the
memory functions calloc and free to create a linked list of keyboard contexts
for each VM in the system, including the system VM. I could have used the VMM
linked-list functions for this purpose, but the code is easier to understand
and maintain if standard linked-list routines are used instead of arcane calls
to VMM services.
The function to add a node to the list (AddVMContext) is called when the
system VM is created and whenever a new VM is created. Thus, AddVMContext is
called to create a new context whenever the control dispatcher receives a
VM_INIT or SYS_VM_INIT message from VMM. AddVMContext calls calloc to allocate
a new context node; see Example 4.
Each VM context contains the VM handle, key history buffer, buffer index, and
link to the next context. Each context also contains a field to record the
global shift state when each character N, D, and Space is pressed. The key
history buffer contains a record of the last MAXVMKEYS pressed in the VM.
MAXVMKEYS is defined as 5, which is just enough space to record the five-key
sequence of <space>adn<space> that the VxD wants to detect. The buffer index
wNextKey indicates the next space in the buffer to place a key. The shift
state fields are dwDShift, dwNShift, and dwSShift. When the keys D, N, or
Space are pressed, the VxD records the global shift-state of the keyboard in
these fields. 
AddVMContext takes a VM handle, allocates a new context using calloc, and puts
the VM handle into the context. It then links the context into the linked list
and returns a pointer to the new context. The complementary function
DeleteVMContext is called whenever any VM is destroyed. DeleteVMContext is
called to delete a VM context whenever the control dispatcher receives a
VM_TERMINATE or SYS_VM_TERMINATE message from VMM. DeleteVMContext frees the
context node associated with a VM; see Example 5. DeleteVMContext calls the
local function ThisVMContext to get a pointer to the node associated with the
given VM handle.
The C run-time library allowed me to replace the VMM linked-list services with
my own using calloc and free. Of course, most VMM services cannot be easily
replaced, and so an easy method of calling VMM and VxD services in C is
necessary. That's where the wrapper functions come in.


Wrapper-Function Libraries


VToolsD comes with wrapper functions for most of the VMM and standard VxD
services--no more pushing arguments into high and low registers. VMM and other
VxD services are available through C function calls. These wrapper functions
are written in assembler, and the ones I examined take the C parameters off
the stack, put them in registers, call the VMM services, and place the return
value in EAX. Using the VKD virtual keyboard driver, I trapped the A, N, D,
Space, and Backspace keys with the wrapper functions. The wrapper function
VKD_Define_Hot_Key tells VKD to notify the VxD when a particular key is
pressed. Like most wrapper functions, VKD_Define_Hot_Key corresponds to the
service of the same name. Example 6(a) illustrates its syntax. Example 6(b)
will trap the A key.
The first parameter is the scan code of the key to trap, and the second is the
type of scan code to trap. Some keyboards have keys with extended scan codes.
I set the value of the second parameter to SCAN_EITHER, which traps both
normal and extended scan codes. The third parameter is the shift state of the
key to trap. The shift state defines, with great precision, the combination of
a regular key and Shift key (left shift, right shift, left or right control
key, left or right Alt key, caps lock, scroll lock, and other keys that shift
the value of a key). The shift state works like this: When a key is pressed,
the states of all shift keys on the keyboard are collected into a 16-bit word
called the "global shift state." This global shift state is ANDed with the
high word of the shiftstate parameter, and the result of the AND is compared
with the low-order 16 bits of the shiftstate parameter. If the result of the
AND matches the lower 16 bits, VKD calls the VxD's hot-key handler (defined in
the fifth parameter); otherwise, the handler is not called. I set shiftstate
(for both high and low words) to 0. That way, the global shift state is always
ANDed with 0, resulting in 0. This value is then compared with the low word,
which is also 0. In other words, the hot-key handler will always be called,
regardless of the keyboard's shift state.
After the shiftstate parameter, the flags further define which keyboard events
will result in a callback to the hot-key handler. The flags specify whether
VKD will call the handler when a key is pressed, released, or autorepeated,
and when a shiftstate ends. I had problems with this parameter. I specified
CallOnPressCallOnRepeat which, according to the documentation, should have
resulted in calls to the hot-key handler when a key was pressed or
autorepeated. However, the hot-key handler is called only when the key is
pressed. I'm not an expert on the VKD services, so I don't know if this is a
VKD bug, a bug in the VToolsD documentation or wrapper function, or my own
ignorance. Whatever the case, the VxD won't detect and reflect autorepeat
presses of the trapped keys.

The parameters RefData and MaxDelay define, respectively, data that can be
passed transparently during the callback (for example, a pointer to a unique
context for each hot key), and the maximum delay time allowed between
occurrence of the event and notification of the event handler. I didn't use
either parameter, so both are set to 0. VKD_Define_Hot_Key returns a handle
which uniquely identifies the hot key for calls to other VKD services.
The fifth parameter specifies a pointer to the hot-key handler that VKD will
call when the trapped keys are pressed. The wrapper libraries implement event
handlers using thunks, which are nothing more than chunks of a locked data
segment used as the entry point for the event handler. When VMM or a VxD calls
an event handler in another VxD, the arguments to the event handlers are
passed in registers. When the event handler is a C function, the register
arguments must be translated into stack-based arguments compatible with C. To
accomplish this, VToolsD uses thunks. To trap hot-key events, I passed the
address of my C event handler to VKD_Define_Hot_Key, and I also passed the
address of a thunk (in the last parameter). All wrapper functions that define
event handlers take a thunk argument, which is a pointer to an area of locked
memory. The wrapper function modifies this locked memory area with the code
necessary to translate the register arguments in C stack parameters. It also
inserts into the thunk a call to the entry point of the event handler in the
VxD. The wrapper function then passes the address of the thunk to VMM or the
other VxD, which detects the event to trap and calls the thunk. The thunk
reformats the register arguments into C stack arguments and calls the VxD
event handler. After the call, the VxD event handler returns to the next
instruction in the thunk. The thunk reformats any return values into the
appropriate registers and returns to VMM or the calling VxD.
I declared the thunk for VKD_Define_Hot_Key in the locked data segment.
Because different event handlers require different arguments, the wrapper
library contains a unique template for each type of event handler available
from VMM and other VxDs. Unfortunately, the template for VKD_Define_Hot_Key
did not match the function prototype in the VToolsD documentation: Two of the
parameters were switched. 
In particular, the documentation defines the event handler for
VKD_Define_Hot_Key as VOID __stdcall HotKeyHandler(DWORD scancode, DWORD
hotkeyhandle, DWORD shiftstate, DWORD data, DWORD delaytime). The first
parameter is the scan code of the key that caused the event, the second is the
hot-key handle returned from VKD_Define_Hot_Key, the third is the global shift
state, the fourth is the reference data passed to VKD_Define_Hot_Key, and the
fifth is the delay between the event and the handler being called (this is
normally 0 unless you specifically request delay notification). However, the
thunk was reversing the order of the first two parameters, scancode and
hotkeyhandle, on the C stack, which created a lot of confusion. This bug was
especially frustrating because the thunk comprises code executing in the data
segment, so I could not set a break point on it when debugging. I talked to
the support staff at Vireo, who were very helpful and claimed this would be
fixed in the next release.


Processing Hot Keys


When called, the event handler first calls VKD_Get_Kbd_Owner, another wrapper
function, which returns the handle of the VM that currently owns the keyboard.
Because VKD_Get_Kbd_Owner is called from the event handler, it must be
asynchronous; that is, VKD must be reentrant with regard to this function. The
returned VM handle is used to locate the context for the VM by calling the
local function ThisVMContext. The key and global shift state are then
reflected into the VM with a call such as VKD_Reflect_Hot_Key (hVM, hHotKey,
dwShiftState);. The first parameter to VKD_Reflect_Hot_Key is the VM handle,
the second is the hot-key handle returned by VKD_Define_Hot_Key, and the third
is the global shift state passed to the event handler by VKD. The trapped key
is saved into the VM key-history buffer, which is analyzed in a circular
fashion, using a simple state machine, to determine if the sequence
<space>adn<space> has been typed. If this sequence is detected, then the
sequence <backspace><backspace>nd<space> is reflected into the VM. Reflecting
these keys into the VM replaces the erroneous "dn" sequence with the correct
"nd" sequence without disturbing the cursor position.
Before saving a key into the VM key buffer, I check the global shift state. If
either the Alt key or the Ctrl key is pressed, the key is not saved into the
key buffer. Thus, key presses such as Ctrl-A and Alt-N will not be saved or
analyzed. The result is that sequences like Ctrl-A,N,Alt-D will not trigger
the replacement sequence, which could have potentially unpleasant results. It
is relatively simple to check the global shift state before saving and
analyzing the trapped keys. I simply mask the global shift state against the
value (SS_AltSS_Ctrl). If the result is nonzero, one or both of the Ctrl and
Alt keys are pressed and the key is not saved. The values of SS_Alt and
SS_Ctrl correspond to the bits VKD sets in the global shift-state variable
when Alt or Ctrl is pressed. I could not find documentation for SS_Alt or
SS_Ctrl in either the manual or the online help. They turned up in a file
called VKD.H in the VToolsD header files.


Conclusion


Although VToolsD proclaims itself as a tool for all levels of VxD designers,
it is more suited to beginner or intermediate developers. Even experts who
don't want to lose intimacy with their VxD may want to buy the toolkit just to
get the source code to the C run-time routines and the wrapper functions. Of
course, if you want to create object-oriented VxDs, VToolsD is a real
blessing. I'm somewhat skeptical that the benefits of object-oriented design
outweigh the negatives of larger code size and increased abstraction in a VxD
environment. I'm also skeptical that C++ is appropriate for VxDs at all, most
of which don't use enough objects or methods to make C++ worthwhile. But then
again, I'm more comfortable with C; programmers raised on C++ may have an
entirely different point of view.
Overall, VToolsD and other high-level environments for coding VxDs should help
open up the field of VxD development. If you have stayed away from VxDs until
now, you may find this just the opening you've been waiting for.


For More Information


VToolsD 
Vireo Software
385 Long Hill Road 
Bolton, MA 01740 
508-779-8352
vireo@vireo.com.
Figure 1: The main dialog of QuickVxD.
Example 1: Expanding the declaration of the device-descriptor block.
DDB The_DDB = {0, /* used by VMM */
 DDK_VERSION, /* version of the DDK used */
 SPELLX_DeviceID, /* device id entered by QuickVxD */
 SPELLX_Major, /* major version number */
 SPELLX_Minor, /* minor version number */
 0, /* flags */
 {' ', ' ', ' ', ' ',' ', ' ', ' ', ' '}, 
 /* eight spaces where VToolsD prologue code fills in VxD name */
 SPELLX_Init_Order, /* in the order device should be 
 loaded at Windows launch time */
 (DWORD)LocalControlDispatcher, /* points to entry point in VToolsD 
 prologue code linked to VxD. This is what VMM calls with 
 control events. Prologue calls VxD control dispatcher */
 (DWORD) LocalV86handler, /* entry point in VToolsD prologue 
 code which VMM calls to implement V86 services. 
 Prologue calls VxD V86 service entry point */
 (DWORD) LocalPMhandler, /* entry point in VToolsD prologue 
 code which VMM calls to implement protected mode services. 
 Prologue calls the VxD protected mode service entry point */
 0, /* used by VMM */
 0, /* used by VMM */
 0, /* used by VMM */
 (DWORD) VXD_SERVICE_TABLE, /* points to VxD service table */
 0} /* size of service table */
Example 2: VMM message traps that Spellx requires.
START_CONTROL_DISPATCH
 ON_DEVICE_INIT(OnDeviceInit);
 ON_VM_INIT(OnVmInit);

 ON_VM_TERMINATE(OnVmTerminate);
 ON_SYS_VM_INIT(OnSysVmInit);

ON_SYS_VM_TERMINATE(OnSysVmTerminate);
END_CONTROL_DISPATCH
Example 3: The statement in (a) expands to (b).
(a)
ON_DEVICE_INIT(OnDeviceInit)

(b)
case DEVICE_INIT: return OnDeviceInit((VMHANDLE)EBX,
 (PCHAR) ESI);
Example 4: AddVMContext calls calloc to allocate a new context node.
#define MAXVMKEYS 5
typedef struct tag_VMContext {
 VMHANDLE hVM;
 BYTE btKeyHistory[MAXVMKEYS];
 DWORD dwDShift;
 DWORD dwNShift;
 DWORD dwSShift;
 WORD wNextKey;
 struct tag_VMContext *pNext;
} VMContext;
VMContext *AddVMContext (VMHANDLE hVM)
{
 VMContext *NewVM=calloc
(1, sizeof(VMContext));
 if (!NewVM)
 return NULL;
 NewVM->hVM = hVM;
 NewVM->pNext = VMListHead;
 VMListHead = NewVM;
 return NewVM;
}
Example 5: DeleteVMContext calls free to free the context node associated with
a VM.
VOID DeleteVMContext (VMHANDLE hVM)
{
 VMContext **PrevVM = NULL;
 VMContext *ThisVM =
ThisVMContext (hVM, PrevVM);
 if (ThisVM)
 {
 *PrevVM= ThisVM->pNext;
 free (ThisVM);
 }
}
Example 6: (a) Syntax of the wrapper function; (b) trapping the A key.
(a)
HOTKEYHANDLE VKD_Define_Hot_Key (BYTE ScanCode, BYTE ScanType, DWORD
ShiftState,
 DWORD Flags, PVXDHotkey_HANDLER Callback, CONST VOID *Refdata, DWROD
MaxDelay, 
 PVKD_Hotkey_THUNK pThunk).

(b)
#define CHAR_A 0x1EVKDHotkey_THUNK HKThunk .. ..hCHAR_A=VKD_Define_Hot_Key
(CHAR_A, SCAN_NORMAL, 0, CallOnPressCallOnRepeat,
 HotKeyHandler, 0, &HKThunk);

Listing One
// SPELLX.C - main module for VxD SPELLX
#define DEVICE_MAIN

#include "spellx.h"
#undef DEVICE_MAIN
Declare_Virtual_Device(SPELLX)
DefineControlHandler(DEVICE_INIT, OnDeviceInit);
DefineControlHandler(SYS_VM_INIT, OnSysVmInit);
DefineControlHandler(SYS_VM_TERMINATE, OnSysVmTerminate);
DefineControlHandler(VM_INIT, OnVmInit);
DefineControlHandler(VM_TERMINATE, OnVmTerminate);
BOOL ControlDispatcher(
 DWORD dwControlMessage, DWORD EBX, DWORD EDX, DWORD ESI, DWORD EDI)
{
 START_CONTROL_DISPATCH
 ON_DEVICE_INIT(OnDeviceInit);
 ON_SYS_VM_INIT(OnSysVmInit);
 ON_SYS_VM_TERMINATE(OnSysVmTerminate);
 ON_VM_INIT(OnVmInit);
 ON_VM_TERMINATE(OnVmTerminate);
 END_CONTROL_DISPATCH
 return TRUE;
}
BOOL OnDeviceInit(VMHANDLE hVM, PCHAR CommandTail)
{
 return TRUE;
}
BOOL OnSysVmInit(VMHANDLE hVM)
{
 return TRUE;
}
VOID OnSysVmTerminate(VMHANDLE hVM)
{
}
BOOL OnVmInit(VMHANDLE hVM)
{
 return TRUE;
}
VOID OnVmTerminate(VMHANDLE hVM)
{
}

Listing Two
// SPELLXP.CPP - main module for VxD SPELLXP
#define DEVICE_MAIN
#include "spellxp.h"
Declare_Virtual_Device(SPELLXP)
#undef DEVICE_MAIN
SpellxVM::SpellxVM(VMHANDLE hVM) : VVirtualMachine(hVM) {}
BOOL SpellxDevice::OnDeviceInit(VMHANDLE hSysVM, PCHAR pszCmdTail)
{
 return TRUE;
}
BOOL SpellxVM::OnSysVMInit()
{
 return TRUE;
}
VOID SpellxVM::OnSysVMTerminate()
{
}
BOOL SpellxVM::OnVMInit()
{

 return TRUE;
}
VOID SpellxVM::OnVMTerminate()
{
}

Listing Three
// SPELLX.H - include file for VxD SPELLX
#include <vtoolsc.h>
#define SPELLX_Major 1
#define SPELLX_Minor 0
#define SPELLX_DeviceID UNDEFINED_DEVICE_ID
#define SPELLX_In
it_Order VKD_INIT_ORDER+1

















































Visual Programming with Reusable Objects


Construction-from-parts can simplify development




Carol Jones and Morgan Kinne


Carol and Morgan are software engineers for IBM. Morgan can be contacted at
kinnem@carvm3.vnet.ibm.com, and Carol, at cjones@carvm3.vnet.ibm.com. 


Although well-known in other industries, the idea of construction from parts
is just now catching on in the software industry. To build a house, for
example, you wouldn't consider designing and manufacturing every piece from
scratch. Likewise, rather than designing and developing every bit of software
for an application, it's much easier to assemble standard, prebuilt parts. 
Parts are software objects that support a simple, standard interface protocol.
Parts can vary widely in capabilities, ranging from simple (buttons and
arrays, for example) to complex (forms or even entire applications). Complex
parts are typically built by combining a number of simple parts into one.
IBM's VisualAge is an object-oriented development environment that uses this
construction-from-parts paradigm. With this environment, you build
applications by visually arranging and connecting the prefabricated parts
(objects) found on the VisualAge parts palette. Furthermore, you can extend
the environment by adding any Smalltalk object to the palette.
VisualAge provides a set of primitive, visual parts that are basic building
blocks for user interfaces--a push button or list box, for example. Any of
these primitive visual parts can be combined to form reusable composite parts
such as a form or window. These composite parts can then be added to the parts
palette, so that they are available for other development projects.
VisualAge also provides primitive nonvisual parts, such as arrays and digital
audio players, which work the same as visual parts. You can write your own
nonvisual parts in Smalltalk, such as a customer object (or rule) that sends a
reminder notice whenever a customer's payment is overdue. These nonvisual
parts can also be combined to form composite parts that can be added to the
palette. Combining parts into new, composite reusable parts is what
differentiates construction-from-parts technology from concepts such as
software modularity.


Public Interfaces


The basic principle of visual programming in VisualAge is defining an
application's behavior by connecting the actions, events, and attributes of
the parts. These elements are the part's public interface. Actions are the
functions or methods that a part knows how to perform. A part can notify other
parts of the occurrence of an event. The simplest example is a push button,
which notifies other parts when it is pressed by signaling its clicked event.
Attributes are the data stored by a part, such as a balance for a bank-account
part, or a result for a database-query part.
When you build parts for VisualAge, you write Smalltalk code defining these
actions, events, and attributes. For example, actions are simply methods that
VisualAge calls. Attributes are usually stored in instance variables, and you
write Smalltalk methods that return, compute, or set the values of attributes.


Building a Timer Part


To illustrate how construction-from-parts works, we'll design and implement a
timer, which is a nonvisual part that runs for a certain length of time, then
notifies other parts when the time limit has expired. The timer is able to
automatically restart after each expiration. 
First, we'll use the VisualAge tools to create a new application that contains
a nonvisual part named Timer. The result is an empty part that we enhance
using the appropriate VisualAge editors. The next, and most important step is
to construct Timer's public interface.
You define a part's public interface using VisualAge's Public Interface
Editor. The editor, which uses a notebook metaphor (see Figure 1), includes
separate pages for attributes, actions, and events.
The public interface of Timer has an integer-length attribute that contains
the number of milliseconds that will elapse before the timer expires. It also
has a Boolean repeat attribute indicating whether the timer automatically
restarts. The actions needed are start and stop. A timerFired event is needed
to notify other parts when the timer has expired.
Once the definition is complete, VisualAge generates much of the Smalltalk
code for you; see Listing One. You can generate code as often as you wish to
accommodate changes in a part's public interface, although retaining any code
changes involves some manual steps.
The timer part's definition causes the generation of both get and set
selectors for each attribute. A generated set selector stores an input value
in the attribute variable and notifies VisualAge that the value has changed.
VisualAge can then notify other parts of the new value. Typically, a generated
set selector needs no modifications. Get selectors, on the other hand, are
normally modified to use lazy initialization to establish a default value for
the attribute. Action methods are always modified and frequently notify other
parts of events. You can also add other variables and methods as required to
complete the part's logic (Listing Two).
At this point, the timer can be used as is. However, you can refine it by
implementing class methods in the part that integrate it more tightly into the
VisualAge development environment. (If you don't implement any of these
methods, VisualAge uses appropriate default values.)
VisualAge parts typically provide a settings view, which is a visual part that
allows a user to specify attribute values for a particular instance of the
part. For Timer, it would allow you to set the length and repeat values. You
create the view by adding a variable part from the palette to a new visual
part named TimerSettingsView. You change the variable part to indicate it is a
placeholder for Timer. You tear-off a "quick form" for Timer and drop it in
the window part that VisualAge automatically provides. You can add two push
buttons and connect the clicked event of one of them to the execute action of
the variable part. The clicked event of the other push button is connected to
the window's closeWidget action. You tell VisualAge to use this settings view
by implementing the customSettingsView class method in Timer. Figure 2 shows
the implementation of the custom settings view.
For Timer, we'll also implement class methods that add the part to the parts
palette (addPartToCatalog), provide a custom icon for the part
(abtInstanceGraphicsDescriptor), provide a descriptive name for the part
(displayName), and customize its connection pop-up menu
(preferredConnectionFeatures). See Listing Three for the implementation of
these class methods.
The fully implemented Timer is now available for reuse in any VisualAge
application.


Building the Image Viewer


To illustrate how to use Timer in a typical application, we'll build a
slide-show application that's based on the part. This application, however,
requires another reusable part--a form that displays bitmaped images, with an
attribute to hold a collection of bitmap filenames, and actions to step
forward or backward through the files. You start by creating a new visual part
called SlideCarousel. Unlike the timer part, this has a user interface.
Another difference is that it is a composite part, which means it has other
parts contained inside it.
When you create a composite, visual part like this one, VisualAge
automatically adds a main window. Since this visual part is intended for use
inside other windows, delete the main window and replace it with a form part.
This is where the image will be displayed.
For the public interface, define one attribute, images, and two actions:
displayNext and displayPrevious (Listing Four). As with the timer part, define
a suitable icon and descriptive name for this new part, and add it to the
parts palette.
Now that you have some useful parts, you assemble them into an application.
Begin by creating another visual part named SlideShow. Note that this is also
a reusable part: Just like any other VisualAge part, you can add it to the
parts palette.
Inside the window, place a SlideCarousel part and size it to fill most of the
window. Below this, add two buttons for moving forward and backward through
the slides. Next, add Timer from the palette. You use Timer to advance the
slides after five seconds, if no button is pressed.
Since visual programming is all about connecting parts, you visually add
connections that start the timer when the window opens and stop it when the
window closes. Make connections between the timer and each button, to stop and
restart the timer when the button is clicked. To advance the slides, connect
the timer's timerFired event to the carousel's displayNext action. To make the
buttons work, connect their clicked events to the carousel's displayNext
action and displayPrevious action.
To complete the application, you need to set initial values for certain
attributes. One of them is the carousel's images attribute. To set this, write
a Smalltalk script that runs when the window first opens, and adds several
filenames to the list of images (see Listing Five). 
The last step is setting the attributes for the timer. Set the length to 5000
milliseconds and the repeat attribute to True. Figure 3 shows the finished
application and its connections. Figure 4 shows the running application.


Summary



Your success depends on a sufficient supply of standard parts that you can
draw upon. As more parts are developed, they will become increasingly useful
and require less new code. Reusing code saves time and reduces the chance of
introducing new errors.
Figure 1: Constructing the Timer part's public interface.
Figure 2: Timer custom settings view with connections.
Figure 3: Visually constructed slide-show application.
Figure 4: Executing the slide-show application.

Listing One
"Generated IBM Smalltalk code for the Timer part"
AbtAppBldrPart subclass: #Timer
 instanceVariableNames: 'length repeat'
 classVariableNames: ''
 poolDictionaries: ''
eventTimerFired: anObject
 "Notify other parts that the timer has expired."
 self signalEvent: #timerFired.
length
 "Return the value of length."
 ^length
length: anInteger
 "Save the value of length."
 length := anInteger.
 self signalEvent: #length with: anInteger.
repeat
 "Return the value of repeat."
 ^repeat
repeat: aBoolean
 "Save the value of repeat."
 repeat := aBoolean.
 self signalEvent: #repeat with: aBoolean.
start
 "Perform the start action."
stop
 "Perform the stop action."

Listing Two
"Modified and added code for the Timer part"
AbtAppBldrPart subclass: #Timer
 instanceVariableNames: 'length repeat timer'
 classVariableNames: ''
 poolDictionaries: ''
eventTimerFired: anObject
 "Notify other parts that the timer has expired."
 self signalEvent: #timerFired.
 timer := nil.
 self repeat ifTrue: [self start].
length
 "Return the value of length."
 length isNil ifTrue: [self length: 0].
 ^length
length: anInteger
 "Save the value of length."
 length := anInteger.
 self signalEvent: #length with: anInteger.
repeat
 "Return the value of repeat."
 repeat isNil ifTrue: [self repeat: false].
 ^repeat

repeat: aBoolean
 "Save the value of repeat."
 repeat := aBoolean.
 self signalEvent: #repeat with: aBoolean.
start
 "Perform the start action."
 timer := CwAppContext default
 addTimeout: self length
 receiver: self 
 selector: #eventTimerFired:
 clientData: nil.
stop
 "Perform the stop action."
 timer notNil
 ifTrue: [CwAppContext default removeTimeout: timer].

Listing Three
"Timer class methods"
addPartToCatalog
 "Adds the part to the VisualAge parts palette"
 category 
 category := AbtPartsCatalog current categoryNamed: 'Models'.
 category isNil
 ifFalse: [category addPart: (self symbol)].
customSettingsView
 "Answer an instance of the custom settings view"
 ^TimerSettingsView newPart.
displayName
 "Answers the default name for this kind of part"
 ^'Timer'
preferredConnectionFeatures
 "Answer an array of items to place on the connection pop-up menu"
 ^#(length repeat start stop timerFired)
abtInstanceGraphicsDescriptor
 "Answer the icon for the part on the palette"
 ^(AbtIconDescriptor new
 moduleName: 'abticons';
 id: 277)

Listing Four
"Code for the SlideCarousel part"
AbtAppBldrView subclass: #SlideCarousel
 instanceVariableNames: 'images index '
 classVariableNames: ''
 poolDictionaries: ''
displayNext
 "Perform the displayNext action."
 i 
 i := (self index + 1).
 (i > self images size) ifFalse: [
 self index: i.
 (self subpartNamed: 'Label1') graphicsDescriptor: (self images at: i)
 ].
displayPrevious
 "Perform the displayPrevious action."
 i 
 i := (self index - 1).
 (i <= 0) ifFalse: [
 self index: i.

 (self subpartNamed: 'Label1') graphicsDescriptor: (self images at: i)
 ].
images
 "Return the value of images."
 images isNil ifTrue: [images := OrderedCollection new].
 ^images
images: anOrderedCollection
 "Save the value of images."
 images := anOrderedCollection.
 self signalEvent: #images
 with: anOrderedCollection. 
index
 "Return the value of index"
 index isNil ifTrue: [index := 0].
 ^index
index: aNum
 "Save the value of index"
 index := aNum

Listing Five
"Code for the SlideShow application"
AbtAppBldrView subclass: #SlideShow
 instanceVariableNames: ''
 classVariableNames: ''
 poolDictionaries: ''
initializeBitmaps
 "set up the collection of images"
 (self subpartNamed: 'Slide Carousel') images
 add: (AbtBitmapDescriptor new
 moduleName: 'ABTBMP30'; id: 416);
 add: (AbtBitmapDescriptor new
 moduleName: 'ABTBMP30'; id: 417);
 add: (AbtBitmapDescriptor new
 moduleName: 'ABTBMP30'; id: 418).
 (self subpartNamed: 'Slide Carousel') displayNext
DDJ



























PROGRAMMING PARADIGMS


Cheapo Bizarre Languages




Michael Swaine


There are way too many Mikes in this world. There are even too many Mikes
writing about the computer and electronics industries. I don't like to
complain, but Jeez.
And in almost every case, I note that I was using the name first.
Mike Malone is known for his eponymous Public Television interview program
("Malone," I'm happy to say, not "Mike"), his book The Big Score (a history of
Silicon Valley), and his pioneering investigative work on high-tech stories
like the black market in chips and espionage in the electronics industry. His
latest book, The Microprocessor: A Biography, dropped over the transom here at
Stately Swaine Manor recently, and I couldn't put it down.
Not, at least, until I had enough material for the first third of this column.



The Microprocessor, a Biography


I don't know about you, but I read history for the stories. Fortunately for
me, Malone writes history as a bunch of stories. All the memorable stories of
the birth and twenty-something life of the microprocessor are here, usually
gleaned from Malone's many interviews with all the movers and most of the
shakers, but a few no doubt overheard at the legendary Wagon Wheel restaurant,
which often played the role of the Ryder van in the moving and shaking in
Silicon Valley's wild early years.
The book provides some interesting wrinkles on the founding of Silicon Valley.
Most of that story is familiar: How William Shockley took his Nobel Prize,
left Bell Labs, and went home to Palo Alto. How he founded Shockley
Laboratories to make himself rich and attracted the best scientists in the
field. How he quickly drove out his eight best scientists, including Gordon
Moore and Bob Noyce, all of whom went out and founded Fairchild. Or were these
scientists at fault? Were they "the Traitorous Eight," as they have been
called? Malone more than hints at the answer when he speaks of Shockley's
paranoia and labels him a "rotten boss."
Malone wades right into the credit thing: the recurring debates over who
actually created the microprocessor. I needed some reminding, at least, as to
the role of Federico Faggin vs. Ted Hoff at Intel. Faggin, who left Intel in
1974 to start a competing company (Zilog), more or less faded out of Intel's
official history of the microprocessor. Malone sets the record straight,
telling who did what and concluding diplomatically by labeling Faggin the
creator, and Hoff the inventor, of the microprocessor.
That, of course, is only the intra-Intel credit controversy. The fact remains
that two Texas Instruments scientists, Michael (another one) Cochran and Gary
Boone, were granted the first "microcomputer" patent. Why Malone, and most of
the world, gives credit for the invention of the microprocessor to Intel's
effort is something Malone covers.
And suddenly in 1990, from out of nowhere, comes one Gilbert Hyatt, a virtual
unknown, who claims to have invented the microprocessor over twenty years
earlier, and has the patent to prove it. As it turns out, he was not unknown
to Bob Noyce and Gordon Moore. Malone covers it all.
Malone writes knowledgeably of the crazy period when the bottom fell out of
the calculator market. Although Hewlett-Packard doesn't normally break out
revenues or profits for particular product lines, Malone was working for HP's
calculator group from 1977 to 1979, so he is able to detail exactly how HP
managed to ride out the storm and face down TI's "scorched-earth policy." That
was the same scorched-earth policy, incidentally, that drove MITS out of the
calculator business and inspired Ed Roberts to build the Altair computer that
launched the personal-computer revolution.
You'll find Moore's Law in The Microprocessor: A Biography in several forms,
and neatly graphed. It looks good to me, but I'll defer to DDJ contributor Hal
Hardenbergh on the question of whether or not Malone got it right. Hal, who
has documented many misinterpretations of Moore's Law over the years, may or
may not be the absolute final authority on the issue, but I wouldn't dare
contradict him.
Malone's book tells its stories with a proper appreciation for their drama.
Probably the most dramatic moment in the book comes, believe it or not, during
a slide presentation by an HP exec at a trade show. The speaker started his
presentation slowly, then dropped the bomb. In Malone's words, "Based on its
testing of U.S. semiconductor deliveries versus those from the Japanese, the
latter were not just superior in quality, but shockingly so." The difference
in failure rates was on the order of 5 percent from U.S. vendors versus no
failures at all from Japanese vendors.
The period that followed this revelation was not pleasant. Former
antiprotectionists in the U.S. ran crying to the government for help. Japanese
executives began speaking of the U.S. as a dying and decadent society. Ugly.
How the American semiconductor companies responded is a fascinating story, one
of many good stories Malone tells. Not only did I enjoy this and the other
stories in this book, I also found that I learned a lot that I didn't know
from Malone. Say, brother, you can't beat that with a stick.
Although Malone occasionally mentions his own history (like his HP job) in a
footnote, the book is never in danger of becoming The Microprocessor: An
Autobiography. Don't you just hate authors who are always inserting themselves
into their writing? I know I do.
Here are the vital statistics: The Microprocessor: A Biography, by Michael S.
Malone, 1995, Springer-Verlag (TELOS, Santa Clara, CA), 333 pages, $29.95.


Cheapo Bizarre Languages


What follows is the first installment of an ongoing survey of cheapo bizarre
languages. Okay, some of these are cheap, some are free, and some are less
bizarre than others. Basic, for example, can hardly be called bizarre,
although the implementation described here may qualify. Forth is bizarre
enough, but I want you Forth fanatics to know that I mean that in the nicest
possible way.
This installment might be called more accurately "The State of Programming
Paradigms: Part I: Public Domain and Shareware Languages for the Macintosh
that Embody Alternative Programming Models; Section A: Basic and Forth;
Including Some Historical Background."
In any event, this is a very selective overview. It won't deal with, say, the
special needs of embedded-systems programmers. It will present evidence that
there's more to programming than C, more than C++ even. The languages I am
examining in this installment all exist on the Mac, though some that I'll
discuss in later installments are available on other platforms as well.


Cheapo Basic


Before the Mac was released in 1984, it was clear to any right-thinking person
that the first high-level programming language for the Mac would be Basic. In
the early 1980s, it was a foregone conclusion that you supplied a Basic with
any new machine. The only questions were, interpreted or compiled, disk-based
or in ROM, Microsoft's or your own or somebody else's.
Apple decided to pursue two routes: It encouraged Microsoft to develop a Basic
for the Mac, and it put Donn Denman to work on Apple's own MacBasic.
Microsoft's Basic was delivered on time, but it was godawful. MacBasic, which
was due for delivery within a few months of the Mac's release, was clearly
going to kill Microsoft's Basic dead.
Microsoft took immediate action: Using the leverage of its licensing agreement
over AppleSoft Basic for the Apple II, Microsoft demanded that Apple kill
MacBasic. This may not have been a very bright move for Microsoft, since it
could have renegotiated its AppleSoft license for almost any amount at that
point, a fact that Bill Gates has since acknowledged. But instead it pressured
Apple to kill MacBasic, and Apple caved.
It's easy to paint Microsoft as the heavy in this (although Apple management
looks worse for knuckling under), but the story of Microsoft's influence on
the Mac is too complex for such simple characterizations. One example: A key
feature of the original Mac file system was apparently actually taken, with
Bill's blessing, from his file-allocation scheme for Standalone DOS.
Let the dead past stay buried, you say? Ah, but like a rodent from the grave
rises Chipmunk Basic, described by its author, Ronald Nicholson
(rhn@netcom.com), as "a simple Basic interpreter...similar to the line-number
based MumbleSoft BASIC interpreters of circa 1980." Yes, friends, here's that
Basic interpreter for the Mac that you've been wanting all these years,
accelerated for the PowerMac.
Understand, this is not a language for producing fast, bullet-proof,
commercial-quality code. It began life in humble circumstances, out of the
need for a Pascal program to test a Pascal-to-C translator called "p2c." The
Pascal program, basic.p, was part of the test input suite to p2c.
But while it's not what you'd call fast, Chipmunk Basic runs 150 to 200 times
faster on a PowerMac 7100/80 than Microsoft Basic 1.0 did on the original,
128K Mac.
If you need a relatively traditional Basic, this one is fairly solid and it's
free. It has some features that make sense in a 1980-vintage Basic, like
sprite graphics, an old-fashioned line-number-based editor, a predictable
collection of numeric and string operators and functions, and file I/O
statements. Its two variable types are long float and string, the latter with
a maximum length of 254 characters.
But seriously, why would you ever want to use Chipmunk Basic?
Well, it also has some features you wouldn't expect. It supports AppleScript's
DoScript command, for example, so you can drive your Chipmunk Basic programs
with scripts. And you can do an open "SFGetFile" for input as #2 to use
Apple's sfGetFile; ditto for SFPutFile for output. Or you can do an open
"COM1:" for input as #3 or an open f$ for data input as #4 where f$ references
a data file. 
There's a say command that speaks strings if you have the Speech Manager
installed, and you can save and restore variables with push(x,y,a$) : gosub
200 : pop 3. There's also a construct similar to awk fields: a$ = field$("aa
bb cc dd", 2, " "), which suggests one purpose for which you might use any of
these cheapo bizarre languages--as a so-called little language.
In his "Programming Pearls" columns and books, Jon Bentley has popularized
both the term and the concept of little languages, those small programming
systems that you use for just-bigger-than-back-of-the-envelope calculations,
quick tests of ideas, aides to thought. Bentley's little language of choice,
is, of course, awk. Bentley evidently thinks in awk. But if you're one of us
who were once warped by Basic and never altogether recovered, maybe, in
unguarded moments, you still think in Basic. If so, maybe Basic should be one
of your little languages of choice.



Cheapo Forth


Although there was no Basic on the first Mac, this was no great handicap,
since Apple didn't intend for anyone to write programs for the Mac on the Mac
anyway. The development platform for the early 128K Mac was a Lisa.
There was, however, in the early months of Mac availability in 1984, a
programming language that allowed developers to write programs for the Mac on
the Mac. It produced fast code and gave access to nearly every routine in the
then-64K ROM. The language was MacForth, from Creative Solutions.
Forth lives on today on the Mac in various implementations. Two free versions
are MacQForth and Pocket Forth.
MacQForth is a voice from the apparently-not-quite-dead past. Its author,
Ronald T. Kneusel (rkneusel@post.its.mcw.edu or kneusel@msupa.pa.msu.edu or
rkneusel@carroll1.cc.edu, take your pick), claims that its chief use is as a
tool for teaching Forth, and he has included some excellent tutorial
materials.
It has another use, however, due to its heritage: It's also a decent tool for
learning 6502 assembler. QForth itself was written for the Apple IIe by
Toshiyasu Morita. MacQForth is basically QForth plus a 65C02-microprocessor
simulator. For the most part, the code in MacQForth is identical to the code
in the Apple II version. MacQForth programs think they're running on an Apple
II. 
Running QForth on the simulator has the advantages that the system is small
and easy to learn, and not only is all of Forth available, but an entire
microcomputer environment is available as well, including a system monitor and
6502 assembler.
An aside: The 6502, Chuck Peddle's invention and Steve Wozniak's obsession,
lives on. MOS Technologies passed it on to Synertek, which passed it on to
Rockwell, which passed it on to Western Design Center, which lists among its
current customers AT&T, ITT, Sony, and Siemens. Who'da thunk the 6502 would be
among the survivors of the chip wars? This data courtesy of Mr. Malone's
aforementioned book.
Return from aside: Pocket Forth is a Forth implementation that requires the
MDS assembler. MDS, Macintosh Development System, is Apple's now-discontinued
68000 assembler. Consulair apparently also sold an assembler that is
compatible with MDS.
Pocket Forth is interesting because it supports the required suite of Apple
Events and can define new Apple Events. This means that Pocket Forth can be
scripted by scripts running in Userland Frontier, HyperCard, and AppleScript.
Its author, Chris Heilman (heilman@pc.maricopa.edu), sees this as important.
To show off Pocket Forth scripting, he has supplied a text file to define
three new events, a HyperCard stack to demonstrate the events, and a Frontier
install script that facilitates writing Frontier scripts to control Pocket
Forth.


Cheapo Neon


One of my all-time favorite languages was Chuck Duff's Neon, an
object-oriented language based on a kernel written in Forth and assembly
language, sold by Kriya Systems between 1985 and 1989. Chuck learned a lot in
writing Neon, and brought that experience to the more sophisticated language
Actor some time later. Despite some rough edges, though, Neon was a fun
language, combining the speed and extensibility of Forth with (some of) the
virtues of Smalltalk.
Neon lives on, too, since Kriya Systems released all the source code to the
public domain. Two free descendants of Neon are Mops and Yerk.
Mops is an object-oriented programming system derived from Neon. The latest
Mops release includes various ex-Neon classes that author Michael Hore
(mikeh@zeta.org.au) converted, as well as a number of other classes he wrote
over the years. He has released all original Mops material into the public
domain. If you want to use it commercially, that's fine, although he'd like
you to let him know about it.
Yerk is a Neon derivative, too. It has been maintained by several programmers
at The University of Chicago since the demise of Neon as a product. The name
Yerk is not an acronym for anything, but rather stands for Yerkes Observatory,
part of the Department of Astronomy and Astrophysics at U of C. Bob
Loewenstein at Yerkes Observatory (rfl@yerkes.uchicago.edu) distributes Yerk
under Kriya's release statement:
Kriya Systems, Inc. gives you [me] the permission to freely distribute for
scientific and educational purposes the programming language formerly known as
Neon, including the distribution of the source which has been released to you.
You do not have the right to use the name Neon, as it apparently had prior use
by another company and is not a valid trademark of Kriya Systems. All
commercial distribution rights are reserved by Kriya Systems, Inc.
He takes this to mean that Yerk can't be sold.
Appealing features of Yerk, some of which are shared by Mops, include:
Defaulted early binding, with the ability to late bind in almost any
circumstance.
Dynamic instantiation of objects on the heap.
Single inheritance.
Floating point (SANE).
Many system classes and objects for Mac interfacing.
Module (overlay) creation that is loaded only when necessary and may be purged
from the application's heap memory.
Loewenstein has a number of general classes for color Quickdraw interfaces,
MacTCP classes, and the like, that he's willing to make available to people
who contact him.
All current languages discussed in this column are available on various online
services and on the Apprentice CD series from Celestin Company (Port Townsend,
WA).
Next month, even more bizarre languages.




























C PROGRAMMING


Windows: Casting Rays and Developer Days




Al Stevens


I recently attended one of 15 simultaneous, around-the-country sessions of
Microsoft's Developer Days, small regional conferences hosted by local
Microsoft-devoted software-development companies and attended by area
programmers. I went to the Fort Lauderdale session, about three hours from
home. Attendees paid $50 in advance or $75 at the door, for which they got one
day chock-full of presentations about Microsoft development tools. As a bonus,
there were a few vendor booths around the perimeter of the room, but nothing
impressive to a veteran Software Development '95 or Borland conference
attendee.
The program started with a general session wherein the local hosts introduced
themselves and some token Microsoft participants. We burned some time that way
while waiting for the West Coast to get out of bed. Then we spent the rest of
the morning watching announcements and presentations of Microsoft products via
live satellite video from Redmond. This pitch took until lunch time and was
little more than a high-hype Microsoft infomercial. It even had the glossy
look of those late-night promotional TV programs that sell
real-estate-investment courses, exercise equipment, diet programs, and so on.
The tone was set by an oh-so-sincere Microsoft veep, who waved his arms and
proclaimed in a high-pitched, shrill voice how wonderful our lives will be as
soon as we buy all these new Microsoft offerings. As he went on and on, saying
nothing of substance, you could close your eyes and substitute the face of
that obnoxious infomercial kid pitching his fast-money-making course,
promising you "...four ways to make money: running ads, buying and selling,
getting a 1-900 number..." and I forgot the fourth one. The veep and that kid
must have the same public-speaking coach. This sure wasn't worth fifty bucks,
not to mention getting out of bed at 5 am and driving three hours. (Just to be
fair, I should tell you that my press credentials, not my fifty bucks, got me
in the door.)
The infomercial continued with four carefully rehearsed product announcements.
The scripts emulated those Tony Roberts/Fran Tarkenton smart-guy/dumb-guy
roles. The smart guy would demonstrate some cool feature and the dumb guy
would ask a programmed question designed to elicit an appropriate positive
response from the smart guy. Remember, this was a live telecast, and in the
Visual Basic 4.0 segment, the smart guy was demonstrating OLE servers across a
network. The dumb guy either improvised or couldn't see the Teleprompter and
asked a really dumb question. Something like, "Wow, sending objects across a
network is really very revolutionary, isn't it?" The audience laughed. The
smart guy stifled a giggle and then shot the dumb guy a look which seemed to
say, "You really are a dumb guy, aren't you?" I didn't know Visual Basic
supported type casting.
After lunch, which was both provided and excellent, we broke into separate
sessions, based on our interests. This is where the trip paid for itself. If
you attend one of these sessions in the future, skip the general session and
show up in time for lunch. I doubt that you'll miss anything important.
Of the four announcements, the one that interested me was Visual C++ 4.0.
There were two VC++ sessions conducted by area developers with the assistance
of a member of Microsoft's VC++ development team. The first addressed the new
features of Visual C++ 4.0, and the second was about MFC 4.0. Microsoft
skipped version 3.x for VC++ to align the version numbers between the compiler
and the class library. Henceforth, they'll be released together.
I've been using a beta of VC++ 4.0 for a while now, have become a fan, and was
looking forward to seeing it in the hands of experts. One prominent feature is
the ability to define custom App Wizards. You can build application templates
and distribute them as App Wizards with which other programmers can build
their applications. Expect to see some third-party application-specific
frameworks distributed this way. I'm investigating whether the Windows game
engine we're developing for a new book will fit into that model.
Microsoft is marketing a source-code version-control package called "Source
Safe," and, of course, it integrated the package into the visual components of
its language tools. It uses a Windows 95 Explorer interface and seems to be
intuitive. When you view your list of source files in Visual C++, a check mark
indicates whether you have the file checked out for modification or if it is
still sacrosanct in the baseline copy of the developing project. Several
developers complained that they still cannot do anything to manage a resource
file when several programmers are working on the same project. Resource (.RC)
files contain resource-language source code that defines menus, bitmaps,
dialog boxes, strings, and so on. There is only one ASCII resource file for a
program; no way exists to automatically manage its configuration when several
programmers and screen designers are simultaneously messing with its contents.
Someone asked how Microsoft deals with the problem, given that they build
enormous projects with many programmers. The answer reminded me of Carol
Burnette's response to the question, "How do you dance here at Camp Sunny
nudist colony?" "Very carefully," she said.
The VC++ presenter knew her stuff and showed the tool to advantage. She was
best with her prepared pitch, but some questions stumped her, and the
Microsoft guy jumped in and filled in the blanks. They both seemed to
understand the new Developer Studio and MFC really well, but they weren't
particularly current on C++ language developments. When asked if VC++ 4.0
supports the ANSI mutable keyword and bool type specifier, their blank stares
revealed that neither of them even knew about these language features. A
search in the help file turned up no such references, so we assumed for the
moment that VC++ 4.0 does not support them. The Microsoft guy tried to slough
it off by saying something like, Well there is the accepted standard, and then
there are the proposed features. Horse hockey. There is the standard as
published by ANSI for public review. Period. It represents what the committee
has approved and codified. I assume that Microsoft has a copy. They ought to
let their compiler developers read it. Both mutable and bool (and true and
false) are prominent in the keyword list of that document--on page 2-4, for
those of you who do have a copy.
Borland C++ has supported a subset of the new ANSI features for some time now,
but it does not include mutable or bool. Borland's chief compiler architect
bolted to Microsoft a while back. Now Microsoft C++, which was always way
behind in the new feature department, has close to that same subset. Probably
just a coincidence.
I tried both keywords on my beta of VC++ 4.0, and they are indeed not there.
There's still time, however. The product release is not due for another six
weeks.
The session devoted to MFC was a delight, due largely to the knowledge and
humor of the presenter, Mike LaRue of Decision Consultants (Clearwater, FL).
His whimsical "feature list" guided him through some revealing demonstrations
in which he built programs and OLE controls from scratch on the podium with
everyone watching and listening. I gained a lot of insight into the
improvements that MFC 4.0 brings to Windows programming, particularly in the
areas of OLE controls, OLE automation, and something called "message
reflection." In his demonstration, Mike designed a button control that
provided its own behavior. Mike stored the new control in the Component
Gallery. Then he showed how he could include the new button in any other
project without having to port the behavior-defining button-click code. Neat
stuff.


The Raycaster Project


Last month I introduced the Raycaster project, a DOS raycasting game engine in
C++. This month sees some additions and changes to the project and a port to
Windows. The DOS version is virtually complete. It contains almost everything
I need to develop the simulation that the program was designed to support. The
Windows version works, but not as well as I'd like. More about that later.
First, I'll discuss the changes to the engine.


GFX Files


In its first version, the raycasting engine used .PCX files for the texture
maps of wall tiles and props. This month, I add animated sprites to the
engine. It occurred to me that distributing games--particularly shareware
games--in file formats that users can change could be dangerous. Someone could
add graffiti to a wall tile or features to a sprite and re-upload your game
with their modifications. Unsuspecting downloaders might think that they were
getting the original.
Consequently, I created a generic bitmap-library format that stores one
palette and some number of bitmaps. The game engine now uses that format
rather than .PCX files. I wrote a utility program to create the libraries from
listed collections of .PCX files. Listing One is gfxmake.cpp and Listing Two
is cmdline.cpp. These files, along with the pcx and bitmap classes from the
engine, constitute the program that builds the library. The cmdline.cpp file
implements a generic command-line parser to read filenames from either the
command line or a response file and execute a callback function for each one.


Compressing the Maze


The game's maze was previously defined in an ASCII text file described last
month. To hide its contents from interlopers, I wrote a program that
compresses the file with a simple run-length encoding algorithm. The engine
now loads the maze data file by using a matching decompression algorithm.
Listing Three is bldmaze.cpp, the program that builds the compressed maze-data
file.


Sprites


The first incarnation of the Raycaster project did not support animated
sprites, although it did support immovable props. The prop logic was flawed,
and putting props in certain locations in the maze revealed bugs in the way
their positions were computed and their hidden slices were suppressed. When I
added sprites to the engine, I overhauled the prop logic as well, and those
bugs went away.
A sprite is represented by a set of 24 32x64 bitmaps. There are three sets of
eight frames each. Each set represents the three frames of an animated walk.
One set has the left foot forward, the second has the right leg forward, and
the third has the two legs together. The third set also represents the sprite
standing still. The eight frames in each set represent the sprite as viewed
from the front, back, either side, and any of the four quartering views. From
these bitmaps a sprite is rendered based on its current mode with respect to
walking or standing, the direction the sprite is facing, and the direction
from which the game player is viewing the sprite.
The game program instantiates a sprite object, and provides its position and
orientation in the maze, and its current stepping mode. As the game
progresses, the game program changes those values to reflect sprite movements
and turns. The raycaster renders a sprite if it is in view depending on all
the factors that would select an appropriate frame.
Eventually, sprites will need to do more than just walk and stand. In the
traditional 3-D maze game, the sprites fire at the player and fall down when
fired upon. I have not worked those sequences out yet, but their
implementations should be a minor extension of the walking sequence. The hard
work is rendering the frames from several views.


Floors and Ceilings


Last month I indicated that I did not want texture-mapped floors and ceilings
in the maze game. Since then, I've changed my mind. In Gardens of Imagination
(The Waite Group, 1994), Christopher Lampton implements floor and ceiling
texture mapping in a demo of his raycaster. Using his example as a guide, I
added floors and tiles to my raycaster, only to find two insidious problems:
As I moved around the maze, the floors and ceilings shifted under the walls;
and as I changed viewport sizes, the floors and ceilings did not change
appropriately. Thinking that I probably misinterpreted the code, I returned to
Lampton's demo and discovered that his program has the shifting problem, too.
He does not provide for changing viewport sizes, so I don't know about that
part of his algorithm. By cobbling and tweaking away at the math, I was able
to make everything work properly.
Texture-mapped floors and tiles are a mixed blessing. The screens are lovely
to look at. Tiled ceilings and floors add an impressive dimension of reality
to any view from within the maze. But the processing cycles to render them are
extensive. In a big room with an expanse of floor and ceiling to render, the
frame refresh rate goes way down. I determined that texture-mapped floors and
ceilings should be used only in small rooms in the maze to minimize their
overhead.

The increased overhead stems from the way the program renders walls and
ceilings with perspective. A raycaster makes one horizontal pass across the
viewport for each frame of walls. The number of passes depends on the number
of horizontal pixels in the viewport. All the calculations for a slice are
based on the distance of the wall slice from the viewer, and no vertical
casting is necessary. Casting slices for walls and floors, however, involves
an addition trace up the floor from the bottom of the viewport to the bottom
of the nearest wall in the current slice. For each vertical point in the
slice, the algorithm chooses the correct pixel from the texture map, and
that's where the overhead comes in. For slow machines, it is better to just
leave out the textured walls and ceilings, using solid colors instead. Or, you
could take a whack at optimizing the algorithm with assembly language.


Tubas of Terror Demo


Listing Four is main.cpp, the program that uses the RayCaster and Sprite
classes to implement a demo of a DOS game called Tubas of Terror. The game
demo is incomplete. You can wander around in the maze, and the sprite, a tuba
player, paces back and forth. Other than that, nothing happens, but the
primitives are in place for a more-complete game, which I plan to finish soon.
To implement the demo, I derived a TubasOfTerror class from the RayCaster
class and a TubaPlayer class from the Sprite class. The TubasOfTerror class
object instantiates an object of the TubaPlayer class, which contains member
functions that step the sprite forward and do an about-face. The main function
instantiates a TubasOfTerror object and uses it to display the maze rooms. A
keyboard object provides game controls to move the game player around in the
maze. The game program walks the sprite up and down in its path. Presumably,
the player would be able to blow a blast at the sprite to knock it down, and
the sprite would be able to do likewise. A real game would have the sprite(s)
wander around in the maze looking or waiting for the player.


Raycasting in Windows


I developed all this code with Borland C++ 4.5 to run under DOS. If you
download the program and use the makefile, it should build the DOS utility
programs, the game data files, and the DOS demo game program. A port to
Windows of the raycasting engine and demo is included. I developed that port
with a beta of Visual C++ 4.0, and the download includes the attendant source
code for that port, along with the project makefile. The entire zipped project
file unzips to the directory structure that you need for both versions.
The main.cpp file is replaced by source-code files that implement MFC CWinAPP
and CFrameWnd derivative classes. The DOS vga.h and vga.cpp files are replaced
by source-code files that implement the VGA class to operate in the Windows
Win32 Dib environment. Files that implement the keyboard and an ASSERT macro
are eliminated. 
I learned a lot during this Windows port. First, the program, which uses Win32
Dib logic to write to the frame window, is not as fast, by a small margin, as
its DOS counterpart, which writes directly to video memory. Second, a 320x200
screen in DOS uses the full screen. The same image in a Windows Dib uses
320x200 pixels in whatever resolution Windows is using. In standard 640x480
mode, the raycaster casts an image that occupies half the screen area. In
larger resolutions, the image is proportionately smaller. The DOS raycaster
runs a lot better in a lower resolution. 240x160 is more efficient and quite
comfortable on a DOS screen, assuming that some of the screen's 320x200 real
estate would be used for scoreboards and so on. That size on an 800x600
Windows display presents barely discernible walls, props, and sprites. You can
try it out for yourself. Both versions of the demo use the plus and minus keys
to change the viewport size. Watch the performance change as the viewport gets
bigger and smaller. Observe the difference as you enter rooms with
texture-mapped floors and ceilings. The DOS demo does not trigger frames on a
timer tick. Therefore, when you go into the large room where the sprite is
pacing, it seems to be running (depending on the speed of your processor). But
when you follow the sprite into the room with a tiled floor and ceiling, the
sprite slows down considerably. That difference illustrates the overhead
needed to cast those floor and ceiling slices.
The Windows performance problem is an indication of why there have not been
any great animation games developed for Windows. We've all heard about
WinDoom--has anyone seen it? The problems, real as they are, are addressed by
something new called the Windows 95 Game SDK, still in beta. I have a copy of
the beta, which arrived by surprise in the mail one day. I have subsequently
tried unsuccessfully for four months to become a certified beta tester so that
I could tap into the secret meetings on CompuServe, get advice, and download
drivers. No luck. I even signed and mailed in an unsolicited NDA, which was
apparently ignored. Therefore I feel no qualms about discussing the product,
which the rumor mill tells me is soon to be released anyway. Wonder if they'll
let me buy one.
The Game SDK consists of DirectDraw, DirectSound, DirectInput, and DirectPlay,
which handle real-time demands for video, sound, input devices, and network
access, respectively, under Windows 95. I've looked only at the video part and
should soon have the raycaster ported to use it. Applications programmed to
use the Game SDK assume the presence of accelerated drivers and
high-performance hardware. They work with my 32-bit ATI Mach32, but not very
well. The same programs scream on a machine with a 64-bit card. As usual,
software development advances the mainstream of hardware. I'll report more on
my progress with the Games SDK when I've made enough progress to report.


Source Code


The source-code files for the Raycaster project are free. You can download
them from the DDJ Forum on CompuServe, the Internet by anonymous ftp, and
elsewhere; see "Availability," page 3.
If you cannot get to one of the online sources, send a 3.5-inch diskette and
an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue,
San Mateo, CA 94402, and I'll send you the source code. Make sure that you
include a note that says which project you want. The code is free, but if you
care to support my Careware charity, include a dollar for the Brevard County
Food Bank. 

Listing One
#include <fstream.h>
#include <dir.h>
#include "pcx.h"
#include "utils.h"
static void create_gfxlib(int argc, char *argv[]);
static PCXBitmap* bitmaps[bitmapcount];
static SHORT count;
int main(int argc, char *argv[])
{
 if (argc<3) {
 cout <<
 "\nUSAGE : GFXMAKE <gfxlib> <file1, ...> @<listfile>\n"
 " Builds graphics library file <gfxfile>\n"
 " from the files listed on the command line or in\n"
 " <listfile>. GFXMAKE constructs the GFX library\n"
 " with the entries in the order in which they appear\n"
 " on the command file and/or in <listfile>.\n";
 return -1;
 }
 create_gfxlib(argc, argv);
 return 0;
}
static void bld_gfxlib(const char *file);
static void create_gfxlib(int argc, char *argv[])
{
 ofstream ofile(argv[1],ios::binary);
 if (ofile.fail()) {
 cout << "Cannot open " << argv[1] << endl;
 return;
 }
 count = 0;
 parse_cmdline(argc-2, argv+2, bld_gfxlib);

 // ----- write the signature
 ofile.write("GFX", 3);
 // --- record the common palette
 ofile.write(bitmaps[0]->GetPalette(), palettelength);
 // --- record number of bitmaps in GFX
 ofile.write((char*)&count,sizeof(count));
 // --- record the bitmaps
 for (SHORT i = 0; i < count; i++) {
 Bitmap& bm = *bitmaps[i];
 WORD height = bm.Height();
 WORD width = bm.Width();
 const char* buf = bm.GetPixelmap();
 const char* name = bm.Name();
 ofile.write(name, namelength);
 ofile.write((char*)&height, sizeof(height));
 ofile.write((char*)&width, sizeof(width));
 ofile.write(buf, height * width);
 delete bitmaps[i];
 }
}
static void bld_gfxlib(const char *file)
{
 if (count < bitmapcount)
 bitmaps[count++] = new PCXBitmap(file);
}

Listing Two
// ------------ cmdline.cpp
// Parse command lines for file names (no wildcards allowed
// in order to preserve file sequence control)
// Recognize a response file by @filename
// Call a callback function for each name parsed
#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <dir.h>
#include <io.h>
#include <string.h>
int checkfile(const char *fname)
{
 if (access(fname, 0) != 0) {
 cerr << "\nNo such file as " << fname;
 return 0;
 }
 return 1;
}
int process_responsefile(const char *rfname,
 void (*func)(const char*))
{
 if (!checkfile(rfname))
 return 0;
 ifstream list(rfname);
 char fname[MAXPATH];
 while (!list.eof()) {
 list.getline(fname,MAXPATH);
 if (*fname) {
 if (!checkfile(fname))
 return 0;
 (*func)(fname);

 }
 }
 return 1;
}
int parse_cmdline(int argc, char *argv[],
 void (*func)(const char*))
{
 int n = 0;
 while (argc--) {
 if (*argv[n] == '@') {
 if (!process_responsefile(argv[n]+1, func))
 return 0;
 }
 else {
 char path[MAXPATH];
 char drive[MAXDRIVE];
 char dir[MAXDIR];
 _splitpath(argv[n], drive, dir, 0, 0);
 _makepath(path, drive, dir, 0, 0);
 char *cp = path+strlen(path);
 ffblk ff;
 int ax = findfirst(argv[n], &ff, 0);
 if (ax == -1)
 return 0;
 do {
 strcpy(cp, ff.ff_name);
 (*func)(path);
 ax = findnext(&ff);
 } while (ax != -1);
 }
 n++;
 }
 return 1;
}

Listing Three
// -------- bldmaze.cpp
#include <fstream.h>
int main(int argc, char* argv[])
{
 if (argc > 2) {
 ifstream ifile(argv[1]);
 if (ifile.fail())
 return 1;
 ofstream ofile(argv[2], ios::binary);
 if (ofile.fail())
 return 1;
 unsigned char ch, holdch;
 ifile.get(holdch);
 while (!ifile.eof()) {
 char rlectr = 0;
 for (;;) {
 ifile.get(ch);
 if (ch != holdch)
 break;
 rlectr++;
 }
 if (!ifile.eof()) {
 if (rlectr)

 ofile.put((char)(rlectr 0x80));
 ofile.put(holdch);
 }
 holdch = ch;
 }
 }
 return 0;
}

Listing Four
// ---------- main.cpp
#include <time.h>
#include <iostream.h>
#include <stdlib.h>
#include <stdio.h>
#include <except.h>
#include "raycast.h"
#include "keyboard.h"
#include "sprite.h"
const int stepincrement = 4; // maze coordinates per step
// ------ sprite class
class TubaPlayer : public Sprite {
 SHORT xincr;
 SHORT yincr;
 SHORT stepctr;
public:
 TubaPlayer(SHORT tx, SHORT ty);
 void StepForward();
 void AboutFace();
 SHORT CurrXIncrement() const
 { return xincr; }
};
TubaPlayer::TubaPlayer(SHORT tx, SHORT ty) :
 Sprite('Z', "sprites.gfx")
{
 SetPosition(tx, ty);
 SetPose(Sprite::walking);
 xincr = stepincrement;
 yincr = 0;
 stepctr = 0;
}
inline void TubaPlayer::StepForward()
{
 if (stepctr & 1)
 Step();
 stepctr++;
}
inline void TubaPlayer::AboutFace()
{
 RotateRight();
 RotateRight();
 RotateRight();
 RotateRight();
 xincr = -xincr;
}
// ----------- game class
class TubasOfTerror : public RayCaster {
 TubaPlayer* tp;
 SHORT spriteno;

 SHORT stepctr;
public:
 TubasOfTerror(SHORT px, SHORT py, SHORT pangle,
 ViewPort vp);
 ~TubasOfTerror();
 void StepSpriteForward();
 void SpriteAboutFace()
 { tp->AboutFace(); }
};
// ----- initial tuba player sprite position within maze
static SHORT tx1 = 14*64, ty1 = 28*64-16;
TubasOfTerror::TubasOfTerror(SHORT px, SHORT py, SHORT pangle,
 ViewPort vp) :
 RayCaster("maze.dat", "tiles.gfx", px, py, pangle, vp)
{
 if (!isLoaded())
 throw("TILES.GFX load failure");
 tp = new TubaPlayer(tx1, ty1);
 if (!tp->isLoaded()) {
 VGA::ResetVideo();
 throw("SPRITES.GFX load failure");
 }
 spriteno = AddSprite(tp);
 stepctr = 0;
}
TubasOfTerror::~TubasOfTerror()
{
 delete tp;
 VGA::ResetVideo();
}
// ----- walk the sprite through its path
void TubasOfTerror::StepSpriteForward()
{
 if (stepctr == 20 * (tilewidth / stepincrement))
 tp->AboutFace();
 if (stepctr == 40 * (tilewidth / stepincrement)) {
 tp->AboutFace();
 stepctr = 0;
 }
 tp->StepForward();
 MoveSpriteRelative(spriteno, tp->CurrXIncrement(), 0);
 stepctr++;
}
// ---- view ports: changed by pressing + and -
static ViewPort vps[] = {
// x y ht wd (position and size)
// --- -- --- --- 
 { 120, 75, 80, 50 },
 { 110, 69, 100, 62 },
 { 100, 62, 120, 74 },
 { 90, 57, 140, 86 },
 { 80, 50, 160, 100 },
 { 70, 52, 180, 112 },
 { 60, 45, 200, 124 },
 { 50, 37, 220, 136 },
 { 40, 30, 240, 150 },
 { 30, 23, 260, 164 },
 { 20, 15, 280, 174 },
 { 40, 30, 240, 120 }, // typical

 { 0, 0, 320, 200 }, // full screen
};
const int nbrvps = sizeof vps / sizeof(ViewPort);
int vpctr = nbrvps - 2; // viewport subscript
int main()
{
 TubasOfTerror* ttp = 0;
 // ----- player's starting position and view angle
 SHORT x = 2163;
 SHORT y = 1730;
 SHORT angle = 180;
 // ---- for computing frames/per/second
 long framect = 0;
 clock_t start = clock();
 // ---- error message to catch
 char* errcatch = 0;
 try {
 // ----- ray caster object
 ttp = new TubasOfTerror(x, y, angle, vps[vpctr]);
 // ---- keyboard object
 Keyboard kb;
 while (!kb.wasPressed(esckey)) {
 // ----- draw a frame
 ttp->DrawFrame();
 framect++;
 // ----- test for player movement commands
 if (kb.isKeyDown(uparrow))
 ttp->MoveForward();
 if (kb.isKeyDown(dnarrow))
 ttp->MoveBackward();
 if (kb.isKeyDown(rtarrow)) {
 if (kb.isKeyDown(altkey))
 ttp->MoveRightward();
 else
 ttp->RotateRight();
 }
 if (kb.isKeyDown(lfarrow)) {
 if (kb.isKeyDown(altkey))
 ttp->MoveLeftward();
 else
 ttp->RotateLeft();
 }
 // -------- open and close door commands
 if (kb.wasPressed(' '))
 ttp->OpenCloseDoor();
 // ----- command to turn the map on and off
 if (kb.wasPressed(inskey))
 ttp->ToggleMap();
 // ----- commands to change player movement speed
 if (kb.wasPressed('f'))
 ttp->Faster();
 if (kb.wasPressed('s'))
 ttp->Slower();
 // ----- commands to change the size of the viewport
 if (kb.wasPressed(pluskey)) {
 if (vpctr < nbrvps-1) {
 ttp->GetPosition(x, y, angle);
 delete ttp;
 ttp = new TubasOfTerror(x, y, angle,

 vps[++vpctr]);
 }
 }
 if (kb.wasPressed(minuskey)) {
 if (vpctr > 0) {
 ttp->GetPosition(x, y, angle);
 delete ttp;
 ttp = new TubasOfTerror(x, y, angle,
 vps[--vpctr]);
 }
 }
 // ----- walk the sprite
 ttp->StepSpriteForward();
 }
 }
 catch (char* errmsg) {
 errcatch = errmsg;
 }
 catch (xalloc xa) {
 static char msg[50];
 sprintf(msg, "Out of memory (%d)", xa.requested());
 errcatch = msg;
 }
 // ---- get report player's final position
 if (ttp)
 ttp->GetPosition(x, y, angle);
 // --------- get current time to compute frame rate
 clock_t stop = clock();
 delete ttp;
 if (errcatch)
 cerr << "\aRuntime error: " << errcatch << endl;
 else {
 cout << "-------------------------------" << endl;
 cout << "Frames/sec: "
 << (int) ((CLK_TCK*framect)/(stop-start)) << endl;
 cout << "-------------------------------" << endl;
 cout << "Position (x, y, angle) :"
 << x << ' ' << y << ' ' << angle << endl;
 }
 return 0;
}






















ALGORITHM ALLEY


Generating Sequential Keys in an Arbitrary Radix




Gene Callahan


Gene is president of St. George Technologies and the developer of software
such as Managing Your Money and the HyperWriter Autolinker. He can be
contacted at ecallah@ibm.net.


Recently I had to expand the range of 4-byte keys that could be sent to a
client database, as we were on the verge of running out of keys. Our existing
internal library functions dealt adequately with things like incrementing a
key to generate the next key in sequence. But these functions were hardwired
to operate in base-36, using all 10 digits and 26 lowercase letters to
represent the key. Obviously, I could have "changed the wiring" to use base-62
keys (by including uppercase letters, which the client system could accept).
But I hesitated to recast an algorithm hardcoded to handle a radix of 36 into
one hardcoded to handle a radix of 62.
I had seen a similar problem many years before. Working on a dBase III system,
I had to generate as many sequential MS-DOS filenames as possible, varying
only the last four characters of the name. (The first four indicated the type
of file, and all of the extensions would be ".db3".) Certainly a problem like
this, which I had seen occur in several contexts, was worthy of a general
solution!
I set about looking for one. I consulted my algorithm books and found nothing.
I asked half a dozen other programmers if they had ever needed to write code
to handle this situation. Five said that they had. When asked what they had
done, it turned out that each had written an ad hoc solution suitable only for
the radix and character representation used in that specific situation. I
wanted one that could handle any reasonable radix, any length key, and any
ordering of characters within the base representation.
The solution I present here achieves these goals and also illustrates a couple
of important programming principles, to which I was first alerted by Jon
Bentley's book, Programming Pearls (Addison-Wesley, 1986):
Often, solving a general problem is easier than solving a specific instance of
that problem. The algorithm employed here is easier to understand than a
hardwired one, and shouldn't take any longer to get right.
The key to solving many problems is getting the right data structures--here,
given the data structures, the code follows naturally.


The Data


First, I looked at the data I had to deal with. Many languages have built-in
routines for processing strings into numbers. C, however, has no routine that,
when fed a string in an arbitrary base, will convert it to an internal
representation suitable for numeric processing. Clearly, such a facility would
make the problem as trivial as it would be if we were using octal, decimal, or
hexadecimal numbers in C. In such cases, we simply use standard library
routines such as sprintf(), atoi(), and sscanf().
Because the programs in the system I was working on had to perform these
conversions thousands of times a day, they had to be fast. Space, however, was
not at a premium--our machine had 512 megabytes of RAM. This immediately
suggested two table structures, one mapping from characters in a string to
their numeric value in a particular base, and the other, back again, by simple
array lookup.
In creating the data structure for the conversion from string to number, I
took advantage of the happy circumstance that C considers characters valid
array indexes. What could be simpler than to use the character itself as the
index to find its value in a base? Since the characters are merely signals in
an arbitrary code (the key), I didn't have to worry about such niceties as
whether some user will want a different, larger alphabet appearing on screen.
My strings were for internal consumption only!
To convert from native numeric representation back to a string representation
in base-X, a string laying out the ordering of characters in base-X is the
natural mapping mechanism. The base-X digits of the native number itself,
"peeled off" modulus the powers of the radix X, can serve as indexes into the
mapping string. This takes advantage of the relationship between a base and
the modulus and division operations--for any base X and number N, repeated
modulus and division operations on N by X will produce the value of each
base-X digit of N from right to left. The structures I came up with, along
with a representative instance, appear in Listing One.
The base representation in the listing has long runs of consecutive values,
but this need not be the case. If the application involved something like
encryption, the values assigned to different characters could just as well be
randomly generated.


The Functions


To map from string to number, BaseXToLong() (see Listing Two) moves through
the string representing the number from right to left, indexing by the
character at each position into the structure NumValues. That character's
value in the current base is stored in NumValues and then multiplied by the
power of the radix equivalent to that position in the string. In other words,
the numeric value of the last character in the string is multiplied by the 0th
power of the radix; the next to last, by the first; the next, by the square of
the radix; and so on.
LongToBaseX() maps in the other direction, using the structure CharValues.
Again, the function operates from right to left, filling the last character of
the string first. To determine the character in the last place, I take the
long argument (named "number") modulus the base, and index CharValues with the
result. For the next character, I divide number by the radix, essentially
dropping off the last digit, and repeat the modulus step; see Listing Three.
Finally, I had to implement the increment function itself. I wrote it as a
single statement using the primitives discussed earlier. The code has now
gained a level of abstraction. The caller of LongToBaseX() has to worry about
details like the length of the return value and where to store it. But at the
level of increment abstraction, I knew the answers to these questions--the
length is the length of the original key, and it is stored back in the
original buffer. I left the low-level routines flexible about the answer to
these questions, and built another layer of abstraction to hide these
implementation details from the layers above it; see Example 1. 
I wrote this function as a single statement because it concisely states my
algorithm. The calls to BaseXToLong() and strlen() are made only for their
return values, not for any side effect such as I/O or assignment; hence their
position inside the call to LongXToBase(). Since I won't otherwise use any
intermediate results, storing them would be deceptive. Also, side effects
complicate the proof of a function's correctness and impede operations such as
shipping the calculation of each of a function's arguments off to separate
processors--one function may depend on the side effect of another. By making
explicit the precedence of calls to BaseXToLong(), strlen(), and
LongXToBase(), this form suggests a macro (or template, in C++) implementation
of a family of BaseXIncr functions. These functions add one to a macro
argument, rather than the return of BaseXToLong(). Example 2 is a function
that allows key comparison, even of keys stored in different bases.
This basic idea can be implemented in several different ways. For instance,
the radix could be passed as an argument instead of being in the BaseRep
structure; thus, one BaseRep structure could work for many different radices.
The base representation could be read in from a file for further flexibility.
C++ users should easily be able to turn this into a class, packaging together
the BaseRep pointer and the functions operating on it. You might write the
class so that the radix could be reset, and perhaps BaseXToLong() and
LongToBaseX() would be protected virtual members, so that you could override
them in descendant classes.


Conclusion


In working out this problem, I found that once I discovered adequate data
structures, I was nearly done. The rest of the implementation consists of only
a few lines of code--it is much shorter, in fact, than the version hardcoded
to base-36, and it expresses the solution to the problem in a more
straightforward way. Changing the radix or the character representation of the
base requires only filling in tables, not error-prone recoding. And by
isolating general-purpose primitives, I was able to keep all "messy" details
out of my application-level functions.
Example 1: Hiding details from the above layers.
char* BaseXIncr(char* number, BaseRep* br)
{
 return LongToBaseX(BaseXToLong
(number, br) + 1,
 number, strlen(number), br);
}
Example 2: Function that allows key comparison, even of keys stored in
different bases.
int LessThan(char* num1, BaseRep* br1,
 char* num2, BaseRep* br2)
{

 return BaseXToLong(num1, br1)
 < BaseXToLong(num2, br2);
}

Listing One
/* given an ASCII character, what is its value in the base? */
#define MAX_BASE 256
typedef struct baserep
{
 int Radix;
 int NumValues[MAX_BASE];
 char CharValues[MAX_BASE + 1];
} BaseRep;
BaseRep Base36DigitsLower = {
 { 36 },
/* 0 - 47 */
 {-1, -1, -1, /* ...and so on, until 48 negative ones in all*/
/* 48 ('0') - 57 ('9') */
 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
/* 58 - 96: another run of -1 */
/* 97 ('a') - 122 ('z') */
 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
/* 123 - 255: whole bunch o' negative ones go here */},
 {"0123456789abcdefghijklmnopqrstuvwxyz"}
};

Listing Two
long BaseXToLong(char* number, BaseRep* br)
{
 long PowerOfRadix;
 long ret = 0;
 int i;
 int digits = strlen(number);
 for(i = digits - 1, PowerOfRadix = 1; 
 i >= 0; 
 i--, PowerOfRadix *= br->Radix)
 ret += (br->NumValues[(int)number[i]] * PowerOfRadix);
 return ret;
}

Listing Three
char* LongToBaseX(long number, char* buf, 
 int buf_len, BaseRep* br)
{
 int i;
 for(i = buf_len - 1; i >= 0; i--)
 {
 buf[i] = br->CharValues[number % br->Radix];
 number /= br->Radix;
 }
 if(number) return 0; /* null: error return */
 return buf; /* so function will generate rvalue */
}





































































PROGRAMMER'S BOOKSHELF


The Network as a Half-Empty Cup




Ray Duncan


Ray is a DDJ contributing editor and the author of several computer books. He
can be reached through the DDJ offices.


The web was a neighborhood more efficiently lonely than the one it replaced.
Its solitude was bigger and faster. When relentless intelligence finally
completed its program, when the terminal drop box brought the last barefoot,
abused child on line and everyone could at last say anything instantly to
everyone else in existence, it seemed to me we'd still have nothing to say to
each other and many more ways not to say it.
-Galatea 2.2, by Richard Powers
Publishers have left no stone unturned in their mad rush to exploit the
public's fascination with the Internet. There's an unending deluge of books
and articles on every conceivable subtopic, including (for the technologically
challenged) vague but glorious speculations about the Internet's impact on
society authored by New Age Polyannas ranging from Howard Rheingold to Timothy
Leary. But Sir Isaac Newton told us that for every lash there is an equal and
opposite backlash, and perhaps it was inevitable that some publishers would
choose to go after the doom-and-gloom market niche instead.
Silicon Snake Oil, by Clifford Stoll, and The Future Does Not Compute, by
Stephen Talbott, present us with an interesting study in contrasts. Talbott is
a former presidential scholar and currently a senior editor for O'Reilly &
Associates, a premier hard-core technical publishing house. Stoll is, of
course, the astronomer turned network hacker turned Internet-security expert
and purveyor of cookie recipes, a celebrity of sorts as a result of his
previous book, The Cuckoo's Egg.
Talbott's book is the philosophical descendant of Joseph Weizenbaum's landmark
work Computer Power and Human Reason (W.H. Freeman, 1976). It is thoughtful,
learned, and provocative. The Internet, while an important focus of the book,
is not by any means the only focus; Talbott addresses a broad range of issues
centered on the inexorable mechanization, depersonalization, and derealization
of the world by increasingly pervasive computer and communications technology,
and the replacement of value-oriented, experience-based human judgments by
rule-based bureaucracies and "corporate information systems." I will say
before anything else that I strongly urge all of you to buy and read this
book.
I was especially impressed with Talbott's analysis of computer-based education
in general, and Seymour Papert in particular. Many of us have deep-seated
doubts and fears about the trend toward the replacement of teacher-child
interactions with computer-based tutorials and games, the preoccupation with
"computer literacy," and the introduction of small children to control of a
fantasy world via Logo and Basic programming. Talbott has articulated the
dangers of this trend in a few pithy chapters that should be force-fed to
every elementary-school administrator, teacher, and well-meaning PTA hell-bent
on "computer lab" fund-raising.
Talbott occasionally strays onto shakier ground as the issues get closer to
home. For example, I found his warnings about the insidious dangers of
computer-based word-processing rather laughable. Talbott feels that the ease
with which words can be set down with a computer leads willy-nilly to
undisciplined, automatic writing:
I sit at my keyboard and produce all letters of the alphabet with the same
undifferentiated, inexpressive, purely percussive strokes. Words, phrases,
endless streams of thought flow effortlessly from me in all directions, with
so little inner participation that I have reached the opposite extreme from
the ancient word--self unity. I spew out my words easily, unthinkingly, at no
psychic cost to myself, and launch them into a world already drowning in its
own babble.... And as I produce my own words, so I will likely judge those of
others, discounting them as the superficial disjecta membra they too often
really are.
No doubt Gutenberg, and later the manufacturers of the first typewriters, were
similarly taxed with complaints by the scribes of their eras. However, I must
admit that the structure of Talbott's book, when compared to a classic like
Weizenbaum's, lends some unwitting support to this particular argument. The
traditional painstaking, tightly reasoned development of a thesis over the
course of a chapter has been replaced by collections of subsections that are
essentially extended thoughts of 500-800 words each, the literary counterpart
to TV "sound bites." It is almost as though the author wrote his musings on
index cards, sorted them by keyword, and divided the whole stack into chapters
at arbitrary boundaries of several thousand words. Perhaps this is the style
of the future, but I don't feel entirely comfortable with it.
Turning our attention from the sublime to the ridiculous, as it were, it is
time to say a few words about Stoll's Silicon Snake Oil. Sadly, a far better
title for this book would have been "Publishing Snake Oil"--it represents a
cold-blooded, cynical attempt to capitalize on Internet hysteria and Stoll's
good name with a book that has literally almost nothing useful or original to
say. The meat in this book would barely suffice for an "Op-Ed" column in
Infoworld, but Stoll rambles on with vaguely formed opinions, half-baked
musings, unsubstantiated prophecies of doom, and outright whining for nearly
250 pages. Cuckoo's Egg was written from the heart and was vivid and
entertaining, but this book is the Heaven's Gate of computer trade-book
publishing--avoid it.
The Future Does Not Compute: Transcending the Machines in Our Midst
Stephen L. Talbott
O'Reilly & Associates, 1995 502 pp., $22.95
ISBN 1-56592-085-6 
Silicon Snake Oil: Second Thoughts on the Information Highway
Clifford Stoll
Doubleday Publishing, 1995 247 pp., $22.00
ISBN 0-385-41993-7





























SWAINE'S FLAMES


Dirty Web Tricks


The minute I heard, on "Larry King Live," that Ross Perot was starting a new
political party, I went online looking for its Web site. My favorite search
engine (Open Text Web Index - Power Search at http://www.opentext
.com:8080/omw-comp.html) rarely lets me down, but this time it came up short.
I found sites for Ross Perot's United We Standers (http://www.uwsa.org/), I
found a Draft Colin Powell site (http://206.65.84.96/), and I found various
third-party sites (http://web.kaleida.com/). But no Independence/Reform Party
site.
By now, this new party probably has a home page, but I note for the historians
that as of late 1995, we have not yet reached the point where a new political
party puts up a Web page before it holds a press conference.
Since I was already online and an election year is looming, an election year
that promises to be more interesting than some recent ones, I decided to check
out other political sites on the Web.
Bob Dole's people had just announced that they were putting up a Web site that
they guaranteed would not be too busy. Shoot, I could have told them that.
Still, in the spirit of research, I decided to check it out and fired up the
Open Text search engine again.
Moments later, I was looking at the Bob Dole for President page at
http://www.dole96.org:80/dole/.
I soon noticed that there was something odd about this page.
For one thing, it didn't seem properly respectful of the leader of the Senate.
It said that Dole was "Against War (except when it's only sort of a war, like
the Gulf Not-A-War, which he was for, even though he thinks Congress should
have instigated it rather than Bush.)"
For another thing, a link purporting to give Dole's views on crime turned out
to be a link to a Microsoft site. And for yet another thing, the subtle but
attractive background image, I now realized, was a bunch of Dole Pineapple
labels.
Checking more closely, I discovered that every use of the word "weenie," and
there were several, was a link to a Pete Wilson for President page.
Following more links, I discovered that I was in a substantial web of bogus
Presidential candidate Web pages, all very official looking, all pretty funny,
and all without any explicit admission of authorship.
Wasn't this the kind of thing Nixon's dirty-tricks squad, the Committee to
Re-elect the President, aka CREEP, specialized in? Are we now seeing dirty
tricks on the Web?
Well, not at these sites anyway. Nobody but a Dittohead is going to be fooled
into thinking that the bogus Bob Dole site is the real thing. This is good
satire, but it is obviously satire.
The real dirty tricks on the Web are those sites that violate all principles
of user-interface design, waste their visitors' time, waste Internet
bandwidth, and put an apostrophe in "its" when it's not a contraction.
Okay, that last one is just one of my pet peeves.
But there are some painfully bad Web pages out there, and a few
public-spirited souls have taken on the responsibility of exposing the most
egregious violators of taste and netiquette on the World Wide Web. They don't
actually do anything to reduce such bad pages, but they provide links to them
for those of us who enjoy the pain of looking at bad Web pages. If you're into
that sort of thing, check out: 
This is the Worst, at http://turnpike.net/metro/mirsky/Worst.html.
The Enhanced for Netscape Hall of Shame, at http://www.europa.com/~yyz/netbin
/netscape_hos.html.
Stupid Netscape Tricks, at http://agent2.lycos.com:8001/tools/nutscape/.
Useless WWW Pages, at http://www.primus.com/staff/paulp/useless.html.
Every so often, too, you come across an isolated satirical comment on
NetScape's extensions to HTML, like the joke at the Dead People Server at
http://web.syr.edu/~rsholmes/dead/index.html. Look under "S".
Pat Paulsen, by the way, is alive and running, according to the Dead People
Server. His campaign page is at http://www.amdest.com/Pat/pat.html.
Michael Swaineeditor-at-large
MikeSwaine@eworld.com
































OF INTEREST
Stingray Software has announced a pair of Microsoft Foundation Class (MFC)
extension class libraries--the MFC++ 1.0 (Mantaray Foundation Classes++) and
Objective Grid 1.0. Both tools provide a variety of new MFC-based C++ classes
that Microsoft Visual C++ users can start using immediately for Windows 95
application development.
MFC++ 1.0 is a set of over 20 general-purpose MFC extension classes that fall
into a wide range of categories, including: document/view enhancements; image
classes for reading, writing, and manipulating popular image formats; Win32
classes; MDI alternative classes; and MFC-based control classes.
Objective Grid 1.0 is a grid control that supports a variety of cell types and
can be used in different contexts. For example, Objective Grid can be used as
a child window, as a pop-up window, and even in a dialog. Full ODBC support is
provided via a set of CRecordSet derivatives. You can easily attach Objective
Grid to any data source by overriding one C++ virtual function. Also,
printing, print-preview, find/replace, and cut/copy/paste are supported and
completely integrated with existing MFC classes. 
MFC++ sells for $495.00; Objective Grid, for $395.00; or bundled together, for
$795.00. Both packages include full source code.
Stingray Software
1201-F Raleigh Road, Suite 140
Chapel Hill, NC 27514
800-924-4223
http://www.unx.com/~stingray
ProtoView's DataTable 3.0 is a grid-control component that supports Visual
Basic 4.0, Visual C++ 4.0, Borland C++ compilers, and any front-end
development tool that can use DLLs, VBXs, or OCXs. DataTable 3.0 offers
advanced data-caching schemes and virtual memory for database applications.
Additional features include: European formatting for date, time, and numbers;
3-D effects; vertical and horizontal splitter windows; bitmaps placed in
cells; numeric column totaling; column searches; cell overwriting; improved
keyboard handling; automatic row insert; region selection; and
autoconfiguration of the grid from a database. DataTable 3.0 (16-bit DLL, VBX,
OCX) is priced at $149.00. DataTable 32 3.0 (16- and 32-bit DLL, VBX, OCX) is
priced at $495.00. DataTable source code is available for $1495.00. 
ProtoView 
2540 Route 130
Cranbury, NJ 08512
609-655-5000
Mainsoft has released a version of its MainWin software that supports OLE 2.0
capabilities under UNIX. The first release of Mainsoft's OLE capabilities
includes OLE Document technology (visual editing, drag-and-drop, and so on),
automation, MFC OLE class support, and OCXs. UNIX platforms supported include
Sun, HP, DEC, IBM, SGI, and SCO.
Mainsoft 
12170 Oakmead Parkway, Suite 310
Sunnyvale, CA 94086
408-774-3400
http://www.mainsoft.com
Amzi! has added a Delphi component called "Logic Server" to Release 3.3 of its
development environment. The Logic Server component provides methods for
recording facts, issuing queries, and updating the logic base. It also blends
into the Delphi environment by using existing exception handling and help
facilities. 
The Delphi component can also be used to extend the Logic Server so it can
directly access other components and libraries. This makes it possible for
logic bases to access the host application's data, functions, or other APIs.
Amzi! 3.3 also offers a 32-bit Windows DLL and library that allows the Logic
Server to run in 32-bit mode under Windows 95 as well as NT and 3.x. Logic
bases can be embedded in Delphi, Visual Basic, C/C++, Access, PowerBuilder,
and other tools.
Amzi! Prolog+Logic Server 3.3 (formerly Cogent Prolog) is available in
Professional and Personal Editions. The system includes a full Prolog
development system with a Windows IDE and royalty-free, static and dynamic
Logic Server libraries.
Amzi! 
40 Samuel Prescott Drive 
Stow, MA 01775 
508-897-7332
info@amzi.com
Iona Technologies has announced the release of its Orbix object request broker
(ORB) for the QNX operating system. This version of Orbix includes support for
native QNX high-speed message passing. With Orbix for QNX, you can design and
implement CORBA-compliant, distributed applications across QNX, Windows, UNIX,
OpenVMS, OS/2, Macintosh, and real-time operating systems.
Iona Technologies 
55 Fairbanks Blvd. 
Marlboro, MA 01752 
508-460-6868 
http://www.iona.ie/
Microport has unveiled its NetMark 1000 Internet Server, a World Wide Web
server that offers turnkey operation. Microport's NetMark 1000 is a
self-contained Web server on a Pentium PC, complete with two Ethernet ports,
external SCSI port, CD-ROM drive, and more. It comes loaded with Novell's
UnixWare, providing TCP/IP apps such as PPP and SLIP for dial-up access, SMTP
e-mail services, an FTP server, Gopher, Network News services, Mosaic,
security functions, and more. The NetMark 1000 sells for $6800.00.
Microport 
108 Whispering Pines Drive
Scotts Valley, CA 95066
408-438-8649
http://www.mport.com/
Intercon has announced InterServer Publisher, a Macintosh-based
Internet-server package for TCP/IP communications. InterServer Publisher
provides MacHTTP-compatible CGI support, including AppleScript scripts, CGI
search modules, MacHTTP 2.0 CGI modules, and asynchronous CGI modules. To
restrict access to certain pages, the software provides real-based
authentication which lets you use the Internet for internal distribution of
documents. The server software sells for $795.00.
Intercon Systems
950 Herndon Parkway, Suite 420
Herndon, VA 22070
703-709-5500
http://www.intercon.com
Object/FX has announced its SpatialWorks family of embeddable
geographic/spatial visualization and analysis tools. These reusable,
object-oriented components are used for viewing and analyzing maps, pictures,
schematics, and tables. The toolsets include the Visual Companion Integrator's
Kit (VCIK), which allows you to embed SpatialWorks components into existing
apps independent of operating system and environment (PowerBuilder, Visual
Basic, SQL Windows, C/C++, Cobol, and others). The Visual Companion Object
Developer's Kit (VCODK) lets you embed components into ParcPlace-Digitalk
Smalltalk apps. 
A VCODK single-user development license costs $2995.00, while the VCIK sells
for $1295.00.
Object/FX
2515 Wabash Ave.
St. Paul, MN 55114
612-644-6064
ofx@millcomm.com
Microsoft has announced a beta version of a Speech SDK that provides native
speech-recognition and text-to-speech capabilities for Windows 95 and Windows
NT apps. The Speech SDK includes a speech-recognition engine, Centigram's
TruVoice text-to-speech-engine, sample source code, the Microsoft Speech API,
and documentation. 
The Speech API allows you to leverage a number of text-to-speech and
speech-recognition technologies, including discrete and continuous speech
recognition for uses ranging from command and control to full dictation. The
final Speech SDK is scheduled for release in early 1996. 

Microsoft
One Microsoft Way
Redmond, WA 98052
206-880-8080
msspeech@microsoft.com 
Open Software Associates has released OpenWeb for developing and deploying
distributed applications on the Internet. OpenWeb extends Open Software's
OpenUI integrated development environment by allowing you to create Web-page
links that point to downloadable client application modules. In short, OpenWeb
provides Internet-based apps with the GUI functionality required for
transaction processing, as well as message-based client/server linkages needed
to run apps across the Internet. OpenUI supports Cobol, C, and C++, running on
Windows 3.1/95/NT, OSF/Motif, OS/2 PM, Macintosh, UNIX/Motif, and VMS/Motif.
Open Software Associates
20 Trafalgar Square
Nashua, NH 03063
603-886-4330
http://www.osa.com
Voysys has announced its voysAccess 2.0 SDK for implementing interactive
voice- response capabilities for Visual Basic 4.0 apps. voysAccess 2.0
provides a 32-bit OLE-control software toolkit that lets you add telephony
capabilities to applications using software components, without prior
telephony experience. 
Interactive voice response (IVR) describes a way to access information from a
PC database using a touch-tone telephone. IVR lets callers do such diverse
things as order a product, obtain instructions, leave information, or receive
specific account information, whenever and wherever they need it.
The voysAccess 2.0 for Visual Basic SDK includes an OLE control to add
telephone functions and the voysAccess Server, which is capable of expanding
the number of telephone lines in a single PC. The SDK will also include a
two-port voice board, along with voysSmith, Voysys's software for
full-featured Windows sound capture and editing that enables IVR applications
to "speak" database information. Sample apps and tutorials are also included.
voysAccess 2.0 for Visual Basic 4.0, which supports two telephone lines, is
available for $595.00. The complete toolkit, which includes software and a
two-line Dialogic voice card, is available for $995.00. Each package includes
a royalty-free license to distribute the voysAccess custom control. 
Voysys Corp. 
48634 Millmont Street
Fremont, CA 94538
510-252-1100 
ImageBasic for Delphi from Diamond Head Software is an integrated suite of
components for creating document-imaging applications. The ImageBasic
components support OCR-based, automatic image-indexing options such as
handprinted text, bar codes, check boxes, and photographs, from within a
visual interface. The core system includes scan, display, and print
functionality, and supports third-party imaging engines. The standard edition
sells for $1250.00, with additional modules starting at $295.00.
Diamond Head Software
Ocean View Center, Penthouse 3
707 Richards Street
Honolulu, HI 96813
808-545-2377
VMARK Software has released uniVerse Objects, an OCX for building
client/server applications. uniVerse Objects gives you a view of the data
which reflects the nature of a business object (such as customer order) that
the data represents. 
uniVerse Objects supports application partitioning for creating multitier,
client/server applications and includes exception-handling features. uniVerse
Objects is designed to work with the Windows 95 32-bit environment and
supports Visual Basic 4.0, along with any OLE-enabled application-development
tool.
uniVerse Objects, which is part of the uniVerse SDK, is packaged with the
latest version of uniVerse for Windows NT 1.2 and sells for $395.00.
VMARK Software 
50 Washington Street
Westboro, MA 01581
508-366-3888 
Applied Microsystems has released its CodeTest tool suite, a set of tools for
in-circuit verification of software performance. CodeTest can monitor as many
as 32,000 C/C++ functions with the system running at full speed, while
simultaneously measuring performance, test coverage, and memory allocation. 
CodeTest provides a target-system probe that attaches to the CPU running the
program being tested. A software instrumenter prepares the user's program for
in-circuit verification. The utility reads program-source files and inserts
test-point instructions into the C/C++ code. The suite is available for SunOS,
HP-UX, and Windows 95 systems. Probes are available for MC68060/68360/68340
and i960 processors. Probes sell for $6000.00 each, with the CodeTest software
selling for $3000.00.
Applied Microsystems
P.O. Box 97002
Redmond, WA 98073-9702
206-882-2000
http://www.amc.com
Intel has announced its Indeo Video Interactive, wavelet-based software that
enables real-time interaction and control of video and graphic imagery. The
royaltyfree Indeo for Windows 95 and 3.1 SDK is available to software
developers at no charge. It includes video interactive drivers, programming
tools, and documentation. The SDK is available at http://www.intel.com and on
CompuServe in the IntelA multimedia library.
Indeo Video Interactive offers transparent support for interactive digital
effects, local window decoding, random key-frame access, password protection,
contrast and brightness control, and scalability.
Intel
P.O. Box 58119
Santa Clara, CA 95052-8119
408-987-8080
http://www.intel.com
G6G Consulting has released The G6G Directory of Intelligent Software: Volume
VI, a directory of over 600 hardware/software products that profess to possess
some level of "intelligence"--that ability to infer, recognize patterns, make
decisions, and so on. Among the products covered in the directory are tools
related to expert systems, fuzzy logic, genetic algorithms, natural language,
neural networks, virtual reality, and voice/speech. The directory sells for
$11.95. 
G6G Consulting Group
1137 6th Street, Suite 104
Santa Monica, CA 90403
310-458-4187









































































EDITORIAL


Knobs and Switches


When I look at the recent C++ compiler offerings from companies such as
Microsoft, Borland, and Symantec, the first thing I notice is the development
environment. Somehow, I feel a bit like the pilot of a Cessna 182 who has
wandered into the cockpit of a Boeing 767. Some things look vaguely familiar,
but most of the console is a blur of knobs and switches. Give me an editor and
a make utility any day. Oh, I need a resource compiler, too. That's it. That's
all I need. And a debugger. A C++ compiler, a make utility, an editor, a
resource compiler, and a debugger. That's all I need. And a profiler. Okay, a
faster linker might be nice, too--and that class browser does look pretty
cool. Hey, wouldn't it be great if I could just drag some knobs and switches
and drop them on the screen. Then, I could junk that old editor and get
something with syntax-directed highlighting. 
Still, for all of the sparkle and flash of integrated-development
environments, the real tool is (and will be) the language. That the C++
language is going through a public review should not go unnoticed. Working
group WG-21 of the ISO SC-22 committee and ANSI's X3J16 committee have made
the draft C++ standard generally available for public review. The committees
are soliciting comments on the clarity and completeness of the document in its
description of the standard. This is a necessary step in the adoption process.
However, they are not looking for major rewrites. Rather, the committees are
looking for comments that would polish the description of the language and its
standard libraries.
Strangely, the committees seem to be forging ahead at speeds uncharacteristic
of a standards body. One indication of this is the short review period.
Although the working paper was released on May 1 (actually, "two hours
remained in the month of April in the time zone of the first FTP site," said
Andrew Koenig), the C++ community has until mid-July to make comments. To make
matters worse, the document is some 720 pages, and it makes for anything but
light reading--even Evelyn Wood might find it difficult to get through in a
weekend. 
It's the WG-21 committee's position that you've had five years to make
suggestions on the standard. According to Bjarne Stroustrup, "a public review
is primarily an exercise for language lawyers." At this point, the committee
has no intention of redesigning the language. Items that stand little chance
of being accepted, says Stroustrup, include banning the preprocessor,
requiring automatic garbage collection, and adding a persistence library.
Even if you've missed the opportunity to provide comments, you may still find
the draft interesting. It is available via anonymous ftp at research.att.com
in the dist/C++/WP directory. You'll also find some interesting discussions of
the standard taking place in the comp.std.c++ newsgroup. If you would like to
make an official comment, you can get the procedures for doing so by sending
e-mail to c++std-notify@research.att.com.
Speaking of standard C++ libraries, this issue of Dr. Dobb's Sourcebook
provides some useful techniques for programming with the Standard Template
Library, which was adopted by the standards committees late last year. As
Rogue Wave founder Tom Keffer points out, the STL is not particularly object
oriented. However, it provides trade-offs in terms of flexibility. This issue
also provides three class libraries you should find useful. For instance, Todd
Esposito and Andrew Johnson present a generalized parsing engine that can be
used to parse anything from command lines to your WIN.INI file. Michael Yam
provides a C++ framework that abstracts DCE's pthreads, and William Hill takes
a close look at associative arrays in C++. These libraries were developed in
the context of real-world projects and are currently being used in the
development of serious applications.
Now you have something to do with all of those knobs and switches.
Michael Floyd
executive editor














































Programming with the Standard Template Library


Sage advice for coping with STL




Thomas Keffer


Tom is the founder and CEO of Rogue Wave Software. He has served on the C++
ANSI/ISO Committee and is actively involved in working to evolve C++ into an
even more practical and elegant object-oriented language.


In terms of algorithms, data structures, internationalization, and language
support, the Standard C++ Library could well be the single most powerful
library ever incorporated into a standardized language. The Standard C++
Library includes an iostream facility, a locale facility, a templatized string
class, a templatized class for representing complex numbers, a class for
numerical arrays, support for memory management (new and delete), and language
support (terminate(), unexpected(), and so on). It also includes one other
important component--the Standard Template Library.
The Standard Template Library (STL) was developed by Alexander Stepanov and
Meng Lee at HP Labs. In July of last year, HP proposed to the ANSI/ISO
Standardization Committee that the STL be included as part of the Standard C++
Library. This addition was approved by the committee, and last August
Hewlett-Packard offered a freely copyable version of the STL (available for
anonymous FTP at butler.hpl.hp.com and mirror sites). Earlier this year, DDJ
named Stepanov as a recipient of the "Excellence in Programming Award" for his
work on the STL; see "Dr. Dobb's Journal Excellence in Programming Awards"
(DDJ, March 1995).
The STL is a large set of templatized classes and has an unusual and elegant
architecture. A key concept of the STL is that it separates data structures
from algorithms. This prevents it from being very object oriented, but it also
gives it unusual flexibility. While this library is currently known informally
as "STL," it will eventually lose its distinct identity as it becomes subsumed
by the rest of the Standard C++ Library. Eventually, it will all be known as
the "Standard C++ Library."


Algorithms and the STL


Algorithms in the STL are written in generic terms, without regard for the
data structure that might be used to hold the elements upon which they will
operate. This allows you to add additional data structures that can
immediately leverage existing algorithms. Conversely, you can add new
algorithms that will work with any data structure. Of course, different data
structures have different time/space characteristics, which, after all, is why
more than one data structure is useful. Hence, for this approach to work, the
time/space characteristics of both data structures and algorithms must be
recorded and standardized. This is what the STL does, allowing structures and
algorithms to be combined in highly predictable ways.
Algorithms can be generalized to work on any data type and any STL-compliant
data structure. Consider, for example, the linear-search algorithm in Figure
1(a), which searches an array of integers for an element with a particular
value. Figure 1(b) shows how you might use such a function to find the integer
value 5 in a given array. Note that if the search fails, the algorithm will
return a pointer to one past the end of the array (address a+20, in this
case). While there is no element there, both C and C++ guarantee that this is
always a valid address. You can't dereference such an address, but you can
build a pointer that contains it.
Hence, this is a completely general algorithm for finding an element within a
C-like array of like-typed elements. You can generalize the algorithm to work
with any data type using templates as in Figure 1(c). This will work not only
with integers, but with linear arrays of any type that supports an inequality
operator; see Figure 1(d). Our templatized algorithm, find2(), requires:
That for a variable a of type T and a variable q of type pointer to T, the
expression *q!=a be convertible to type bool. 
That the second parameter be reachable from the first; that is, that you be
able to continually increment the first pointer and eventually reach the
second pointer.
In return, our algorithm promises to:
Return a pointer pos for which the expression *pos!=value evaluates False. 
Return the value end if no such pointer can be found. 
Take linear time.
This last promise states that as elements are added, the time taken for the
algorithm to execute will expand linearly (as opposed to quadratically,
logarithmically, and so on).
This is as precise a statement as we can make about the algorithm. Given types
and values that satisfy the input requirements of the algorithm, you can make
very strong statements about the output variables. These strong statements
make it possible to combine types and STL algorithms in new and novel ways,
while making strong guarantees about the outcome.


Generalizing an Algorithm


As general as our algorithm is, it can be made still more general. Right now,
it will work only with linear arrays. But other kinds of linear searches are
possible, through a linked list, for example. How might you accommodate these?
The problem is that our first attempt at an algorithm advances to the next
element by incrementing a pointer. To advance down a linked list, you need to
chase a pointer. 
You can generalize to any kind of linear search by introducing a kind of
abstract pointer called an "iterator." The iterator is incremented by the
familiar p++ notation, but the actual implementation might involve chasing a
pointer or taking even more exotic actions. Figure 2 presents the most general
kind of linear-search find algorithm, as provided in the STL. 
Like the previous algorithm, Figure 2 has been templatized on type T, but it
has also been templatized on the type of pointer. It performs a linear search
between the element pointed to first and the element just before end, looking
for value. If successful, it returns an iterator that points to the matching
element. If unsuccessful, it returns an iterator equal to end. The actual type
of the pointer parameterized by the type InputIterator pointer is left
unspecified. 
I find it useful to think of an iterator as a kind of pointer but, really, all
they have in common with pointers is a similar interface. As a minimal
requirement, you need to be able to apply a dereferencing operator (*p) to an
iterator and be able to increment it (p++). These operators can be either the
native operators for built-in pointer types or overloaded operators that you
have supplied for nonnative types.
Remember, algorithms are written in terms of a "signature" on types. So long
as your iterator offers the correct signature, it will work. For example,
Figure 3(a) presents a linked-list class. Because a simple pointer type will
not suffice for traversing this list, I also supply the list iterator shown in
Figure 3(b).
Note that the iterator supports a dereferencing interface through an
overloaded "*" operator. You can also increment the iterator through an
overloaded "++" operator. Finally, an overloaded "!=" operator tests for
inequality of iterators. With these supported interfaces, the list iterator is
indistinguishable from a built-in pointer, as far as the find() algorithm is
concerned. 
Because the relationship between iterators and the data structure over which
they traverse can be complicated, STL data structures supply simplified
interfaces for the most common iterator values: the start of the structure
(member function begin()) and one past the end of the structure (end()). The
class in Figure 3(a) shows sample implementations of begin() and end(); Figure
3(b) shows the ListIterator used by these functions.
Finally, given a data structure type, it is useful to know the type of its
corresponding iterator. This is supplied by a public typedef, also shown in
Figure 3(a). Figure 4 shows how you use the results. Note how our find
algorithm is completely comfortable using either built-in arrays or linked
lists, of any type. Provided that we have included sensible iterators, it will
also be just as efficient as any algorithm coded with a particular data
structure in mind.
Many other algorithms are offered by the STL. For example, there is a
binary-search algorithm for sorted sequences, as well as algorithms for
sorting, merging, copying, and shuffling sequences. 


STL Data Structures


So far, all we have looked at only one STL algorithm, with a hint of how a
data structure might be written. While the focus of the STL is algorithms, it
also includes a modest set of data structures; see Table 1. All data
structures include an interface necessary to support the algorithms. Hence,
the same algorithm can be used on many data structures. Conversely, the same
data structure can be used by many algorithms. But, things cannot be combined
willy-nilly. For example, many algorithms (sort comes to mind) require
iterators more capable than the simple InputIterator examined earlier. For
example, the iterator may have to support random indexing p[i], which would be
insufferably slow with a linked list. For this reason, the iterator associated
with the STL data-structure list does not support random indexing.
All the information you need for deciding which algorithms can be usefully
combined with which data structures has been carefully recorded as part of the
standardization process. However, it is up to you to look up this information
(perhaps in online help pages) and to be aware of the hazards.


Limitations of the STL



While the STL is powerful, it has its limitations, the biggest being that it
is not very object oriented. Object orientation depends on data and algorithms
being combined under one interface. The whole premise of the STL is at odds
with this: The STL separates data and algorithms, allowing their recombination
in novel and useful ways. This exposes the programmer to a host of programming
errors. For example, consider the code in Figure 5(a). If the ending iterator
is not reachable from the starting iterator, that is, if they refer to
different data structures (perhaps because the programmer accidentally passed
in the wrong iterator), results are unpredictable. This example could lead to
a program crash.
This is a danger of the STL. The algorithms bristle with pointer-like
iterators, none of which have any relationship to each other that the compiler
can check. The result is code that is less object oriented and more prone to
errors that go undetected at compile time. 
In Figure 5(b), we start by filling two list data structures, a and b, with
ten elements, the numbers 0 through 9. At this point, the two lists are
identical. We then use the STL remove algorithm to erase all instances having
the value 5 from list a. We might expect the resultant data structure to have
nine elements, but it still has ten. This is because, in general, the
algorithm is handed only iterators. Given an iterator, there is no way for the
algorithm to figure out into which data structure it points. Hence, the
algorithm is unable to adjust the size of the resultant data structure. You
must do this yourself. The correct answer is obtained by calling the special
and (arguably) more object-oriented member function list<T>::remove(const T&),
as in Figure 5(b), where it has been applied to list b.


Multithreading


The inability of an iterator to be bound to a specific data structure is a
general problem and can lead to other limitations. For example, the STL cannot
be made multithread safe. A multithread-safe data structure ensures proper
synchronization between accessing threads. One way to do this, pioneered by
the C++ Booch Components, is to have each thread, upon entry to a member
function, lock the data structure, preventing its access by another thread.
This lock is then released when the thread returns from the function.
Unfortunately, it is impossible to write a multithreaded STL algorithm that
uses standard STL iterators. For example, suppose you wanted to find an
element in a vector while locking it from access by another thread that could
potentially modify the vector while the search is being performed. The find()
function accepts InputIterators to do its work. InputIterators do not have any
locking concept in their interface. One could lock the data structure to which
an iterator refers, but given an STL iterator, there is no way to get a
reference to the data structure into which it points.
Again, this problem arises because of the STL's approach of separating data
from the algorithm. Because of this separation, there is no way for the
algorithm to control state (a synchronization lock) that is not directly
related to iterators.


Making the Best of the STL


With its flexible design, the STL is a powerful resource. However, for all its
power, it can be quite complicated and error prone. To avoid errors, consider
subclassing the STL data structures (vector<T>, list<T>, and so on) and
putting the algorithms you most commonly use (such as find()) directly in the
subclass interface. For example, you can create the class in Figure 6(a). With
this approach, we put the algorithm right into the interface. The context of
the search is fully known: The search will be done within the data structure.
You don't have to break the vector apart to supply starting and ending
iterators. There is no possibility of the ending iterator being unreachable
from the starting iterator. Hence, the results are simpler and less error
prone, as shown in Figure 6(b).
Because MyVector<T> publicly derives from its corresponding STL class, all of
the iterator facilities will be available. It can be used anywhere an STL data
structure can be used, allowing you the full use of the STL algorithms; see
Figure 7. 
This approach also allows you to offer a multithread-safe version. Note how
this class does not publicly inherit from its STL equivalent (an
implementation could, however, inherit privately). This is because STL
iterators cannot be used safely in a multithread-safe environment. Hence, they
have not been exposed. Instead, access must be gained through member
functions, so they can ensure that a lock has been acquired. For example, the
implementation of sort() would look like Figure 7(d).


Summary


I've presented some of the fundamental ideas behind the Standard Template
Library and shown how its design can lead to great flexibility. The STL
represents a breakthrough in the genericity of data structures and algorithms,
but it can be tedious and tricky to use. 
I believe the STL will eventually be most useful in the hands of component
writers who will use its powerful facilities to build components with larger
granularity. Internally, these components can take advantage of the algorithms
and efficiencies of the STL, while offering a more object-oriented interface
to their clients.
Figure 1: (a) Linear-search algorithm; (b) using find1() to locate the integer
5 in an array; (c) using templates to generalize the linear-search algorithm;
(d) find2() will work for type that supports an inequality operator.
(a)
int* find1(int* first, int* end, const int& value)
{
 while(first != end && *first != value)
 first++;
 return first;
}

(b)
int a[20];
// ... fill the array "a" ...
int* pos = find1(a, a+20, 5); // Searches a[0] through a[19]
 // Returns &a[20] on failure
assert( pos==a+20 *pos == 5 );

(c)
template <class T> T* find2(T* first, T* end, const T& value)
{
 while(first != end && *first != value)
 first++;
 return first;
}

(d)
class Foo { ... };
bool operator!=(const Foo&, const Foo&);
Foo fa[20];
// ... fill the array "fa" ...
Foo f;
Foo* pos = find2(fa, fa+20, f);

assert( pos==fa+20 !(*pos!=f) );
Figure 2: General linear-search algorithm as presented in the STL.
template <class InputIterator, class T> InputIterator
find(InputIterator first, InputIterator end, const T& value)
{
 while(first != end && *first != value)
 first++;
 return first;
}
Figure 3: (a) A template class for linked lists; (b) a list-iterator class.
(a)
// Forward declaration:
template <class T> class ListIterator<T>;
template <class T> struct Link
{
 Link<T>* next_;
 T val_;
 Link(T p) : val_(p), next_(0) {;}
};
template <class T> class List
{
 Link<T>* head_;
public:
 typedef ListIterator<T> iterator;
 List() : head_(0) {;}
 void addLink(Link<T>* link)
 {link->next_=head_; head_=link;}
 ListIterator<T> begin() const
 {return ListIterator<T>(head_);}
 ListIterator<T> end() const
 {return ListIterator<T>(0);}
 friend class ListIterator<T>;
 };

(b)
template <class T> class ListIterator
{
 Link<T>* current_;
public:
 ListIterator(Link<T>* link) : current_(link) {;}
 T operator*()
 {return current_->val_;}
 void operator++(int)
 { current_ = current_->next_; }
 int operator!=(const ListIterator<T>& it)
 {return current_!=it.current_;}
};
Figure 4: Using an iterator's type information.
int main()
{
 List<int> list;
 list.addLink(new Link<int>(1));
 list.addLink(new Link<int>(2));
 list.addLink(new Link<int>(3));
 list.addLink(new Link<int>(4));
 List<int>::iterator pos = find(list.begin(), list.end(),2);
 assert(*pos == 2);
 List<int>::iterator pos2 = find(list.begin(), list.end(),6);
 assert(!(pos2 != list.end()));

 return 0;
}
Figure 5: STL's separation of data structures from algorithms exposes the
programmer to a number of potential programming errors.
(a)
vector<int> v(10), x(10);
// ...
// Oops!
vector<int>::const_iteratorposition =
 find(v.begin(), x.end(), 5);
assert(*position == 5);

(b)
void main()
{
 list<int> a, b;
 for(int i=0; i<10; i++)
 {
 a.push_front(i);
 b.push_front(i);
 }
 remove(a.begin(), a.end(), 5);
 assert(a.size()==9); // fails
 b.remove(5);
 assert(b.size()==9); // OK
}
Figure 6: (a) Subclassing an STL data structure, such as vector <T>, and
encapsulating STL algorithms such as find(); (b) the result is a much simpler
class interface.
(a)
template <class T> class MyVector : public vector<T>
{
public:
size_t index(const T& val) const;
void sort();
};
template <class T> size_t
MyVector<T>::index(const T& val) const
{
 const_iterator pos = ::find(begin(), end(), val);
 return pos==end() ? (size_t)(-1) : pos-begin();
}
template <class T> void MyVector<T>::sort()
{
 sort(begin(), end());
}

(b)
MyVector<int> v(10);
size_t position = v.index(5); // Much simpler interface
assert(v[position] == 5);
Figure 7: (a) Because MyVector<T> publicly derives from its corresponding STL
class, it can be used anywhere an STL data structure can be used; (b) this
includes being passed by reference where an STL structure is expected; (c)
this approach also allows a multithread-safe version; (d) implementing a
multithread-safe sort().
(a)
MyVector<int> v(10);
// Sort only the first 7 elements, using the STL sort
// algorithm:
sort(v.begin(), v.begin()+7);

(b)
void foo(const vector<int>&);
void main() {
 MyVector<int> v(10);

 foo(v); // OK
}

(c)
template <class T, class Lock> class MyVector_MT {
private:
 Lock lock_;
 MyVector<T> vec_;
public:
 ...
size_t index(const T&) const;
void sort();
};

(d)
template <class T, class Lock> void MyVector_MT::sort()
{
 lock_.lock(); // Acquire lock
 vec_.sort(); // Call non MT-hot algorithm
 lock_.unlock(); // Release lock
}
Table 1: STL data structures.
Data Description
Structure
vector Linear sequence.
list Doubly linked list.
deque Linear sequence, with constant time prepend and append.
set Associative array of unique keys.
multiset Associative array of (possibly) nonunique keys.
map Associative array of unique keys and associated values.
multimap Associative array of (possibly) nonunique keys and values.
































Associative Arrays in C++


Coding for speed and efficiency




David Weber


David has been programming since 1971. He can be reached on CompuServe at
75267,1632.


Although at first glance they may seem simple, associative arrays are nothing
less than a database indexed by a single unique text key. In fact, a
considerable number of database applications require no more than that. And
like a well-written B-tree database, associative arrays are both fast and
space efficient. As I migrated into the C++ world, I found the old C module I
was using for associative arrays distant and dated. Consequently, I wrote a
complete C++ template implementation that handles objects, pointers to
objects, and pointers to functions. The implementation also deals with deep
copying the arrays and automatically extends itself when it runs out of room.
In this article, I'll explain these ideas and present the C++ code. Along the
way I'll also detail general ideas for efficient template design, hazards for
the C++ unwary, and what the C++ version offers that the C version missed.
An associative array looks like a regular C array indexed by a string. In AWK,
for example, you can perform the operations in Figure 1(a). The first
statement retrieves a dollar value, which is an integer, indexed by year and
name of automobile. However, you aren't limited to integer arrays. Any
arbitrarily complex structure is a candidate for an array. With AWK, the
structure consists of text fields in a string. With C or C++, the array can
contain structs, classes, or pointers to either. Thus, any form of
consistently organized data can be stuffed into an associative array. Our C++
template has the same basic capability as the AWK example and tries, within
the limitations of its grammar, to mimic the syntax.
Listing One is the header file for the associative array. Note that the
capability splits across two classes, ASSOCIATION_BASE and ASSOCIATION. The
first is a standard C++ class, while the second is a class template.
ASSOCIATION derives from ASSOCIATION_BASE, which has no public interface--it
is a "hidden" class. 
You may, at this point, wonder why the capability is not simply placed in the
ASSOCIATION class. The reason is efficiency. The first time I saw examples of
header files with long C++ template listings, I was struck by dj vu: Here
was macro madness all over again. In the past, it was common practice to fill
header files with long macros that, when invoked in a module, gave legibility
to the code and also granted speed. Unfortunately, a side effect was code
bloat. Every invocation replicated the macro code. Another problem was large,
unreadable headers. Templates attempt with a scalpel what macros tried with a
chain saw, but they have the same inherent failures. Every template you
instance for a new object type replicates all the code of the template.
Further, the template code must be visible at the point that an instance is
made. This means stuffing the entire template into the header. When the
template is small and simple, these problems are insignificant. Unfortunately,
most things in software are neither small nor simple.
My solution is to split the template by creating a functional core using a
standard C++ class (but having no public interface), then designing a template
that has only the interface and derives its operation from the core. I then
use a universal format for passing the information between the template and
the core. In this case, the universal media are chars and pointers to chars.
The template casts types between the universal format and the template
instance type. Even though casts are usually considered dangerous, these casts
are contained in a class and hidden from user access. Most important, the cast
object is not reinterpreted--the functional core is simply a byte bag that
does not interpret or presume upon the bytes it stores. The interpretation
occurs in the template layer, where strong typing keeps users from getting
their hands caught in the casting machinery.
You make an associative array using a template constructor that defines the
data type and, optionally, takes a parameter that is a best guess of the
maximum size. If that maximum is exceeded, the array will automatically double
in size. Once created, you can stuff associations into the array with the
insert method. You can access keys with the find function. find returns a
pointer to a data object, not the data object itself, and can therefore be
used as an existence operator. If the key does not exist in the array, find
will return 0. The remove function deletes an entry from an array, but it does
not shrink the array. Instead, it opens up space in the array that is reusable
by another insert. The first and next methods iterate over the object in the
array. Since there is no standardization of iterators, I picked yet another
neutral format. You can use these to make your preferred iterator, whether it
be a Windows-style enumerator function or a Standard Template Library (STL)
iterator object. Notice in Figure 1(b) that I do not use AWK-style syntax. The
reason will become apparent later, when I overload C++ operators to give the
class a quasi-AWK look.


References and Other Riddles


The definition for the insert function in Listing One takes template data
objects as parameters rather than as references to an object. This is
unconventional. When I first wrote the class, I used the typical approach and
inserted the data by reference, but this had undesirable effects. I wanted the
array to handle three data formats--objects, pointers to objects, and pointers
to functions--without using function overloading or custom variants of the
template interface. One compiler had problems referencing a pointer to a
function. This is understandable. A function pointer exists only as a pointer;
there is no tangible data object behind it. Taking references of function
pointers, although perfectly legal, seems to be treading near the edge of the
compiler's abilities. 
Another problem with references is overhead. For example, Figure 1(c) stuffs a
simple array of integers with constants. With references, the compiler builds
a temporary image of the numeric constant and then pushes a pointer to that
temporary onto the stack. Without references, just the constant is pushed,
which is much faster. Don't get me wrong: References are a valuable addition
to the language, and I will need them to implement the array operator. Their
intended use is avoiding the overhead of copying large objects on the stack.
Passing an object with a hidden reference pointer is easier than passing the
entire data area. However, when I look over my use of associative arrays I
find that big objects, being dynamically allocated on the heap, already exist
as pointers. It is simpler to use the pointer-based array in Figure 1(d).
Operator overloading allows an AWK-like syntax. The array operator [] gives
both rvalue and lvalue access to associative arrays. Since it is impossible to
add keywords to the C++ language through the overloading mechanism, the AWK
keyword "in" cannot be duplicated. Instead, I use the function operator for
existence checking; see Figure 2(a). Similarly, object removal and iteration
cannot have an AWK syntax. I stick with the named methods remove, first, and
next. The hazard with this scheme is that the array operator cannot
distinguish between lvalues and rvalues , so it creates storage for new keys,
even when they are used only as rvalues. With named methods, insert and find,
the lvalue/rvalue dichotomy is kept separate. With the array operator, they
merge. Figure 2(b) will create a zero-filled object if the key does not exist,
so the existence operator is essential.
A class containing pointers needs special handling when it is copied or
assigned. By default, the compiler makes an exact, byte-for-byte copy of the
class, known as a "shallow copy." However, this can create aliases. Consider
class instance A, which has an internal pointer, and class instance B, which
is a shallow copy of A. B's internal pointer is the same as A. Delete class A,
and you delete what it points to. Now, the pointer in B points to nothing, so
using the B pointer will bring the walls tumbling down. The correct approach
is to make a new copy of the internal object and direct B's pointer towards
it. This is known as "deep copying." ASSOCIATION_BASE handles this properly.
Examine the header in Listing One and the engine in Listing Two, following the
code for the copy constructor and the assignment operator. Note how the copy
constructor piggybacks on the assignment operator rather than replicating its
code.
ASSOCIATION_BASE uses pointer-independent linkage to streamline the copying of
an array. This underscores a C++ design rule--minimize the number of pointers
in a class. The internal data structure for the associative array is a hash
table with collisions resolved by chaining. The linked list for the chain
typically looks like Figure 3(a). The next pointer in this structure is a
speed trap. When you deep copy the array into a new data area you must walk
every chain and reconnect the links. Not too swift. But if you store the links
as offsets from some base pointer, you can bulk copy the linkage (memcpy) into
the newly allocated area; see Figure 3(b).
The ASSOCIATION class forces the programmer to maintain storage for the keys.
That isn't a problem if the keys are static. However, if the keys are dynamic,
built in a For loop, or read from a stream, then the overhead is an immense
hassle. ASSOC_STORED is a sister class that does everything ASSOCIATION does
while managing heap space for copies of the keys. Use the class that best fits
your intended purpose.


Wrap Up


Keeping up with the fast moving changes in the C++ language means more than
continuously spending money for compiler upgrades. It also means writing
software that is portable across a range of language variants. Listing One
uses preprocessor #defines to control exception handling and new-style casts.
Set them to 0 for older compilers, and turn them on as new features appear in
your favorite compiler.
The proposed string class for the C++ standard library dynamically adjusts the
case sensitivity of its comparison function using the set_case_sensitive
method. The ASSOCIATION class uses statically controlled case sensitivity that
you can modify by editing the ASSOC_CASE_SENSTIVE preprocessor directive in
the header file, then recompiling. It would have been easy to make it dynamic;
just use a function pointer for the comparison operation and toggle it between
strcmp and stricmp, but I am always haunted by the ghosts of performance lost.
By statically hardwiring the comparison function, the compiler can elect to
optimize the code with an intrinsic inline replacement that a function pointer
would disallow.
What is gained by moving the software from C to C++? Both do their job equally
well. However, there is a payoff for the translation effort. On a simplistic
level, C++ allows a more-natural, AWK-like syntax, which C could never
achieve. Beyond this, the real gain is safety. My C version used untyped
pointers, which, if you didn't keep your eye on them, could run amok. The C++
version is strongly typed. You can't accidentally shove the wrong thing into
the array. The C++ version is also easier to modify, with shades of
reusability. The C++ ASSOC_STORED class, as a variant of the ASSOCIATION
class, required only 90 lines of code. A similar variation in C would be much
more expensive. The C++ version feels more comfortable. It has the right heft
and balance. These intangible features are precisely the characteristics that
make a tool useful in a craftsman's hands.
Figure 1: (a) Associative-array operations as used in the AWK language; (b)
associative-array operations in C++; (c) filling an array with integers; (d)
using pointer-based arrays. 
(a)
price = bluebook["1967 VW"]; # access rvaluebluebook["1967 VW"] -= valuedecay;
# set lvalueif ("1928 Durant" in bluebook) # check existencedelete
bluebook["Stanley Steamer"]; # remove entryfor (pv in bluebook) # iterate over
array

(b)
ASSOCIATION<myclass> test(100); // creationtest.insert("key1",instance1); //
lvalue accesstest.find("key1"); // rvalue and existencetest.remove("key1"); //
remove entrytest.first(), test.next(); // iterate

(c)
ASSOCIATION<int> test2;test2.insert("one",1); // inserting
constantstest2.insert("two",2);

(d)
ASSOCIATION<bigclass *> test3; // note: pointerbigclass *instance1 = new
bigclass(construction parameters);test3.insert("bigkey1",instance1);
Figure 2: (a) Using a function operator for existence checking; (b) using an
existence test to prevent the array operator from unconditionally
instantiating rvalues. 
(a)
ASSOCIATION<maillist> test4;dial(test4["Bob Smith"].phone_number); //
rvaluetest4["Jane Thomas"].state = "CO"; // lvalueif (test4("John Doe")) //
existence

(b)
// Creates "Mary Lamb" if she did not previously existcout << test4["Mary
Lamb"].city; // even though an rvalue// Use this insteadif (test4("Mary
Lamb")) cout << test4["Mary Lamb"].city;

Figure 3: (a) Linked-list struct; (b) creating an offset from the base
pointer.
(a)
struct ELEMENT { some data stuff; struct ELEMENT *next; };

(b)
struct ELEMENT { some data stuff; unsigned int next; // offset from "base"
};struct ELEMENT *base;

Listing One
/***************************************************************
 * file: ASSOC.HPP
 * purpose: template class for associative arrays
 * contains:
 * ASSOCIATION_BASE - core routines for the ASSOCIATION template
 * ASSOCIATION - association between strings and data objects
 * copyright: 1994 by David Weber. Unlimited use granted in EXE, OBJ,
 * or LIB formats. Do not sell as source without asking first.
 * environment: tested Borland C++ 4.01 and Zortech C++ 3.1r2
 * history: 10-02-94 - initial code, based on an earlier C module
 **************************************************************/
#ifndef _ASSOC
#define _ASSOC
// needed goodies
#include <string.h>
// Feature controls
#define ASSOC_CASE_SENSITIVE 1 // string case sensitivity (1 or 0)
#define ASSOC_EXCEPTIONS 0 // environment supports C++ exceptions
#define ASSOC_NEW_CASTS 0 // environment supports new C++ casts
// case sensitivity - This could be done dynamically with a function ptr in
the
// class, but would disallow the compiler to do optimization via intrinsic
// function expansion.
#if ASSOC_CASE_SENSITIVE
#define ASSOC_STRCMP strcmp
#define ASSOC_MAP(c) (c)
#else
#include <ctype.h>
#define ASSOC_STRCMP stricmp
#define ASSOC_MAP(c) toupper(c)
#endif
// The only place exceptions occur are resource failure with the "new" calls.
// If the environment supports C++ exceptions and a failure occurs, a handler
// somewhere, even if it's terminate(), will take care of it. Without
// exception handling we just shut down the farm via assert() and abort()
#if ASSOC_EXCEPTIONS
#define ASSOC_MEM_CHECK(p)
#else
#include <stdlib.h>
#include <assert.h>
#define ASSOC_MEM_CHECK(p) { if ((p) == 0) { assert((p) != 0); abort(); } }
#endif
// old versus new casting, not much gained here except you can search for
casts
#if ASSOC_NEW_CASTS
#define ASSOC_CAST(cast,item) reinterpret_cast<cast>(item)
#else
#define ASSOC_CAST(cast,item) (cast)(item)
#endif
// defines
const int DEFAULT_ASSOC_SIZE = 64; // default estimated size of array
const unsigned int ASSOC_NULL = ~0; // end of chain
// The base class for associative arrays. You should NOT make an instance

// of this class. Instead use the ASSOCIATION template below
class ASSOCIATION_BASE
 {
 protected: // protect everything
 ASSOCIATION_BASE(unsigned int data_size,unsigned int estimated_size);
 ~ASSOCIATION_BASE(); // destructor - make virtual if extending
 // inheritance and/or pointing along the chain
 ASSOCIATION_BASE(const ASSOCIATION_BASE &); // copy constructor
 ASSOCIATION_BASE &operator=(const ASSOCIATION_BASE &);// assignment
 ASSOCIATION_BASE(const char **keys,const char *data, // static initializer
 unsigned int data_size,unsigned int count);
 unsigned int size(void) // how many data elements in array
 { return fill_level; }
 int insert(const char *key,const char *data); // add an element
 int remove(const char *key); // remove an element
 char *find(const char *key); // find an element
 const char *first(void); // traversal functions
 const char *next(void);
 int operator!() // test array for problems
 { return hash_list==0 key_list==0 link_list==0 data_list==0; }
 int operator()(const char *key) // existence operator
 { return find(key) != 0; }
 char &operator[](const char *key) // access operator
 { return *reference(key); }
 char *reference(const char *key); // get a reference to data or insert
 unsigned int hash(const char *key);// hash function
 void allocate_array(void); // get enough space for the array
 int expand_array(void); // resize array
 void clear_pointers(void) // clear all pointers
 { hash_list = 0; key_list = 0; link_list = 0; data_list = 0; }
 void delete_pointers(void) // delete all pointers
 { delete [] hash_list; delete [] key_list;
 delete [] link_list; delete [] data_list; }
 unsigned int array_size; // full size of array
 unsigned int fill_level; // data entries currently in array
 unsigned int *hash_list; // hash indexed array of chains
 const char **key_list; // storage for key pointers
 unsigned int *link_list; // storage for key linkages
 char *data_list; // storage for data expressed as char
 unsigned int sizeofdata; // size of data objects in bytes
 unsigned int iteration; // current iteration position in data
 };
// ASSOCIATION - associative array template
// Use this class template for creating instances when you will be
// storing associations between character strings and data objects.
// There are three ways to use this template:
// direct storage - data are stored directly in the associative array.
// good for small classes or native types
// ASSOCIATION<myclass> direct_array(estimated_size);
// value = *direct_array.find("key");
// indirect storage - pointers to data are stored in the array.
// good for large classes or pointers to things that vary in size
// ASSOCIATION<myclass *> indirect_array(estimated_size);
// ptr_to_value = *indirect_array.find("key");
// value = **indirect_array.find("key");
// function pointer storage - pointers to functions are stored.
// ASSOCIATION<int (*)(int)> func_ptr_array(estimated_size);
// int (**fptr)(int);
// if ((fptr = func_ptr_array.find("key")) != 0) // always test

// function_return = (**fptr)(parameter);
// You are responsible for the string storage. In the case of indirect
// storage you are also responsible for storing the data.
// example:
// ASSOCIATION<myclass> assoc_array(estimated_size); // declaration
// if (!assoc_array) { class is unusable; } // validity
// assoc_array.insert("xray",myclass_instance1); // insert,
// assoc_array.insert("yodel",myclass_instance2); // returns 0 on failure
// assoc_array.insert("zorro",myclass_instance3);
// assoc_array.remove("yodel"); // delete, returns 0 if not there
// cout << "Size is " << assoc_array.size() << "\n"; // size is 2
// cout << "zorro is " << *assoc_array.find("zorro") << "\n"; // find
// assert(assoc_array.find("garbage") == 0); // failed find returns 0
// if ((const char *p = assoc_array.first()) != 0) // iterate over set
// { // do not insert or remove while iterating
// do
// {
// cout << p << "\t\t" << *assoc_array.find(p) << "\n";
// } while ((p = assoc_array.next()) != 0);
// }
// other uses:
// copy constructor
// ASSOCIATION<int> x; // fill x with stuff
// ASSOCIATION<int> y(x);
// assignment
// ASSOCIATION<int> x; // fill x with stuff
// ASSOCIATION<int> y;
// y = x;
// static initialization
// static const char *keys[] = { "key1","key2", "key3" }; // note: const
// static int data[] = { 1,2,3 };
// ASSOCIATION<int> x(keys,data,sizeof(keys)/sizeof(keys[0]));
template <class T> class ASSOCIATION : public ASSOCIATION_BASE
 {
 public:
 ASSOCIATION(unsigned int estimated_size = DEFAULT_ASSOC_SIZE) :
 ASSOCIATION_BASE(sizeof(T),estimated_size) { } // default constructor
 // destructor and copy constructor found in ASSOCIATION_BASE
 ASSOCIATION<T> &operator=(const ASSOCIATION<T> &original) // assignment
 { ASSOCIATION_BASE::operator=(original); return *this; }
 // static initializer
 ASSOCIATION(const char **keys,T *data,unsigned int count) :
 ASSOCIATION_BASE(keys,ASSOC_CAST(const char *,data),sizeof(T),count) { }
 unsigned int size(void) // how many data elements in array
 { return fill_level; }
 int insert(const char *key,T data) // add an element
 { return ASSOCIATION_BASE::insert(key,ASSOC_CAST(const char *,&data)); }
 int remove(const char *key) // remove an element
 { return ASSOCIATION_BASE::remove(key); }
 T *find(const char *key) // find an element
 { return ASSOC_CAST(T *,ASSOCIATION_BASE::find(key)); }
 const char *first(void) // traversal functions
 { return ASSOCIATION_BASE::first(); }
 const char *next(void)
 { return ASSOCIATION_BASE::next(); }
 int operator!() // test array for problems
 { return ASSOCIATION_BASE::operator!(); }
 int operator()(const char *key) // existence operator
 { return ASSOCIATION_BASE::find(key) != 0; }

 T &operator[](const char *key) // access operator
 { return *(ASSOC_CAST(T *,ASSOCIATION_BASE::reference(key))); }
 };
// ASSOC_STORED - ASSOCIATION class with storage for keys
// This class is almost identical to ASSOCIATION except that it maintains the
// storage for the keys. The interface is the same but the static initializer
// is left out. If it is static, the keys are already stored. So why bother?
template <class T> class ASSOC_STORED : public ASSOCIATION_BASE
 {
 public:
 ASSOC_STORED(unsigned int estimated_size = DEFAULT_ASSOC_SIZE) :
 ASSOCIATION_BASE(sizeof(T),estimated_size) { } // default constructor
 ~ASSOC_STORED(); // destructor
 ASSOC_STORED(const ASSOC_STORED<T> &original) : ASSOCIATION_BASE(original)
 { dup_keys(); } // copy constructor
 ASSOC_STORED<T> &operator=(const ASSOC_STORED<T> &original)
 { // assignment
 ASSOCIATION_BASE::operator=(original);
 dup_keys();
 return *this;
 }
 unsigned int size(void) // how many data elements in array
 { return fill_level; }
 int insert(const char *key,T data) // add an element
 {
 if (key == 0)
 return 0;
 char *p = new char[strlen(key)+1];
 ASSOC_MEM_CHECK(p);
 strcpy(p,key);
 return ASSOCIATION_BASE::insert(p,ASSOC_CAST(const char *,&data));
 }
 int remove(const char *key); // remove an element
 T *find(const char *key) // find an element
 { return ASSOC_CAST(T *,ASSOCIATION_BASE::find(key)); }
 const char *first(void) // traversal functions
 { return ASSOCIATION_BASE::first(); }
 const char *next(void)
 { return ASSOCIATION_BASE::next(); }
 int operator!() // test array for problems
 { return ASSOCIATION_BASE::operator!(); }
 int operator()(const char *key) // existence operator
 { return ASSOCIATION_BASE::find(key) != 0; }
 T &operator[](const char *key) // access operator
 {
 char *p = ASSOCIATION_BASE::find(key);
 if (p == 0)
 {
 if (key == 0)
 return *(ASSOC_CAST(T *,0)); // garbage in, garbage out
 char *p = new char[strlen(key)+1];
 ASSOC_MEM_CHECK(p);
 strcpy(p,key);
 return *(ASSOC_CAST(T *,ASSOCIATION_BASE::reference(p)));
 }
 return *(ASSOC_CAST(T *,p));
 }
 private:
 void dup_keys(void);

 };
// functions out of line cuz looped, large and/or called infrequently
// destructor
template <class T> ASSOC_STORED<T>::~ASSOC_STORED()
 {
 for (unsigned int i = 0 ; i < fill_level ; i++)
 delete [] ASSOC_CAST(char *,key_list[i]);
 }
// remove
template <class T> int ASSOC_STORED<T>::remove(const char *key)
 {
 char *data = ASSOCIATION_BASE::find(key);
 if (data != 0)
 {
 data = ASSOC_CAST(char *,key_list[(data-data_list)/sizeofdata]);
 if (ASSOCIATION_BASE::remove(key))
 {
 delete [] data;
 return 1;
 }
 }
 return 0;
 }
// duplicate the full set of keys
template <class T> void ASSOC_STORED<T>::dup_keys(void)
 {
 char *newkey;
 const char *oldkey;
 for (unsigned int i = 0 ; i < fill_level ; i++)
 { // duplicate keys
 oldkey = key_list[i];
 newkey = new char[strlen(oldkey)+1];
 ASSOC_MEM_CHECK(newkey);
 strcpy(newkey,oldkey);
 key_list[i] = newkey;
 }
 }
#endif // _ASSOC

Listing Two
/***************************************************************
 * file: ASSOC.CPP
 * purpose: template class for associative arrays
 * contains:
 * ASSOCIATION_BASE - core routines for the ASSOCIATION template
 * ASSOCIATION - association between strings and data objects
 * copyright: 1994 by David Weber. Unlimited use granted in EXE, OBJ,
 * or LIB formats. Do not sell as source without asking first.
 * environment: tested Borland C++ 4.01 and Zortech C++ 3.1r2
 * history: 10-02-94 - initial code, based on an earlier C module
 **************************************************************/
#include "assoc.hpp"
/************************************************
* function:ASSOCIATION_BASE(unsigned int data_size,unsigned int
estimated_size)
* Constructor for an associative array
* parameters: byte size of a data element and the estimated size of array
* returns: nothing
************************************************/
ASSOCIATION_BASE::ASSOCIATION_BASE(unsigned int data_size,

 unsigned int estimated_size)
 {
 clear_pointers(); // preset as invalid
 sizeofdata = data_size; // set up sizes
 array_size = estimated_size;
 fill_level = 0; // empty to start
 allocate_array(); // allocate space
 }
/************************************************
 * function: ~ASSOCIATION_BASE()
 * Destructor for an associative array
 * parameters: none
 * returns: nothing
 ************************************************/
ASSOCIATION_BASE::~ASSOCIATION_BASE()
 {
 delete_pointers(); // delete storage
 clear_pointers(); // just for tidiness
 }
/************************************************
 * function: ASSOCIATION_BASE(const ASSOCIATION_BASE &original)
 * copy constructor for class
 * parameters: previous associative array to copy
 * returns: nothing
 ************************************************/
ASSOCIATION_BASE::ASSOCIATION_BASE(const ASSOCIATION_BASE &original)
 {
 clear_pointers(); // no heap to start
 *this = original; // piggyback on assignment operator
 }
/************************************************
 * function: ASSOCIATION_BASE & operator=(const ASSOCIATION_BASE &original)
 * assigment operator for class
 * parameters: previous associative array to copy
 * returns: *this
 ************************************************/
ASSOCIATION_BASE & ASSOCIATION_BASE::operator=(const ASSOCIATION_BASE
&original)
 {
 delete_pointers(); // remove old storage
 clear_pointers(); // no heap to start
 array_size = original.array_size; // essential data
 fill_level = original.fill_level;
 sizeofdata = original.sizeofdata;
 allocate_array(); // allocate heap
 if (!*this) // valid?
 return *this;
 // copy hash, keys, links and data; this only works cuz linkage is via
offsets
 memcpy(hash_list,original.hash_list,array_size * sizeof(unsigned int));
 memcpy(key_list,original.key_list,fill_level * sizeof(const char *));
 memcpy(link_list,original.link_list,fill_level * sizeof(unsigned int));
 memcpy(data_list,original.data_list,array_size * sizeofdata);
 return *this;
 }
/************************************************
 * function: ASSOCIATION_BASE(const char **keys,const char *data,
 * unsigned int data_size,unsigned int count)
 * static initialization constructor, build an associative array with
 * pre-existing key and data arrays.
 * parameters: pointer to an array of keys, pointer to an array of data

 * expressed as chars, size of array in elements
 * returns: nothing
 ************************************************/
ASSOCIATION_BASE::ASSOCIATION_BASE(const char **keys,const char *data,
 unsigned int data_size,unsigned int count)
 {
 unsigned int i;
 clear_pointers(); // mark storage as invalid
 sizeofdata = data_size; // set up sizes
 array_size = count;
 fill_level = 0; // empty to start
 allocate_array(); // allocate space
 if (!*this) // valid?
 return;
 for (i = 0 ; i < count ; i++, keys++, data += sizeofdata)
 insert(*keys,data); // add info
 }
/************************************************
 * function: int insert(const char *key,const char *data)
 * Insert an entry into the associative array. The caller is responsible for
 * storage of the key;
 * parameters: const pointer to key string and const/non const pointer to data
 * returns: 1 if inserted OK or 0 if a bad key
 ************************************************/
int ASSOCIATION_BASE::insert(const char *key,const char *data)
 {
 unsigned int index,k;
 if (key == 0) // no null keys allowed
 return 0;
 if (fill_level >= array_size)
 if (!expand_array())
 return 0;
 index = hash(key); // find start in array
 for (k = hash_list[index] ; k != ASSOC_NULL ; k = link_list[k])
 { // see if already exists
 if (ASSOC_STRCMP(key,key_list[k]) == 0)
 { // replace current data
 memcpy(data_list+k*sizeofdata,data,sizeofdata);
 return 1;
 }
 }
 key_list[fill_level] = key; // put in new data
 link_list[fill_level] = hash_list[index];
 hash_list[index] = fill_level;
 memcpy(data_list+fill_level*sizeofdata,data,sizeofdata);
 fill_level++;
 return 1;
 }
/************************************************
 * function: int remove(const char *key)
 * Remove an entry from the associative array
 * parameters: const pointer to the key string
 * returns: 1 if removed or 0 if not found in the array
 ************************************************/
int ASSOCIATION_BASE::remove(const char *key)
 {
 unsigned int k,*l;
 if (key == 0) // no null keys allowed
 return 0;

 for (l=hash_list+hash(key), k=*l ; k != ASSOC_NULL ; l=link_list+k, k=*l)
 { // find entry
 if (ASSOC_STRCMP(key,key_list[k]) == 0)
 {
 *l = link_list[k]; // unlink
 fill_level--; // level shrinks
 key_list[k] = key_list[fill_level]; // move top of pile into "removed"
 link_list[k] = link_list[fill_level];
 memcpy(data_list+k*sizeofdata,data_list+fill_level*sizeofdata,sizeofdata);
 for (l=hash_list+hash(key_list[k]) ; *l != ASSOC_NULL ; l=link_list + *l)
 if (*l == fill_level)
 { // fix link to what was top of pile
 *l = k;
 break;
 }
 return 1;
 }
 }
 return 0; // doesn't exist
 }
/************************************************
 * function: char *find(const char *key)
 * Lookup an entry in the associative array and return a non const pointer
 * parameters: const pointer to the key string
 * returns:non const pointer to the data associated with the key
 ************************************************/
char *ASSOCIATION_BASE::find(const char *key)
 {
 unsigned int k;
 if (key == 0) // no null keys allowed
 return 0;
 for (k = hash_list[hash(key)] ; k != ASSOC_NULL ; k = link_list[k])
 { // follow chain in array
 if (ASSOC_STRCMP(key,key_list[k]) == 0)
 return data_list + k * sizeofdata;
 }
 return 0; // not found
 }
/************************************************
 * function: const char *first(void)
 * Find the first key string in the array. Follow this by
 * next() calls until a 0 is returned. Inserting or Removing
 * from an array while you are iterating will invalidate the
 * iteration sequence (but doesn't mess up the array).
 * parameters: none
 * returns: first key string encountered or 0 if none
 ************************************************/
const char *ASSOCIATION_BASE::first(void)
 {
 iteration = 0; // start from beginning
 return next(); // and search
 }
/************************************************
 * function: const char *next(void)
 * Find the next key string in the array, call first()
 * to start iteration.
 * parameters: nothing
 * returns: next key string encountered or 0 if all done
 ************************************************/

const char *ASSOCIATION_BASE::next(void)
 {
 while (iteration < fill_level) // until end of data
 return key_list[iteration++]; // return key
 return 0;
 }
/************************************************
 * function: char *reference(const char *key)
 * find a key and return a reference to its data if it is there
 * otherwise insert a place for it and return a reference to the
 * zeroed out hole
 * parameters: const pointer to the key string
 * returns: pointer to associated data (which may be zeroed out)
 ************************************************/
char *ASSOCIATION_BASE::reference(const char *key)
 {
 unsigned int k,index;
 if (key == 0) // no null keys allowed
 return 0;
 index = hash(key);
 for (k = hash_list[index] ; k != ASSOC_NULL ; k = link_list[k])
 { // follow chain in array
 if (ASSOC_STRCMP(key,key_list[k]) == 0)
 return data_list + k * sizeofdata; // found it
 }
 if (fill_level >= array_size) // expand if necessary
 if (!expand_array())
 return 0;
 key_list[fill_level] = key; // put in hole for new data
 link_list[fill_level] = hash_list[index];
 hash_list[index] = fill_level;
 memset(data_list+sizeofdata*fill_level,0,sizeofdata);
 return data_list + sizeofdata*fill_level++; // return pointer to hole
 }
/************************************************
 * function: unsigned int hash(const char *key)
 * local function calculates a hash. Designed to minimize clustering.
 * parameters: key string
 * returns: hash value clipped to array size
 ************************************************/
unsigned int ASSOCIATION_BASE::hash(const char *key)
 {
 unsigned int index;
 const unsigned char *k;
 for (index=0x5555, k=ASSOC_CAST(const unsigned char *,key) ; *k != 0 ; k++)
 index = (index << 1) ^ ASSOC_MAP(*k); // hash key
 return index % array_size; // fit in array
 }
/************************************************
 * function: void allocate_array(void)
 * local function allocates and initializes the array
 * parameters: none
 * returns: nothing
 ************************************************/
void ASSOCIATION_BASE::allocate_array(void)
 {
 unsigned int i;
 hash_list = new unsigned int[array_size]; // allocate hash array
 key_list = new const char *[array_size]; // allocate key pointers

 link_list = new unsigned int[array_size]; // allocate key linkage
 data_list = new char[array_size*sizeofdata]; // allocate storage for data
 ASSOC_MEM_CHECK(hash_list); // validate resources
 ASSOC_MEM_CHECK(key_list);
 ASSOC_MEM_CHECK(link_list);
 ASSOC_MEM_CHECK(data_list);
 for (i = 0 ; i < array_size ; i++)
 hash_list[i] = ASSOC_NULL; // preset with nothing
 }
/************************************************
 * function: int expand_array(void)
 * double the size of the array
 * parameters: none
 * returns: 1 if expanded OK or 0 if failed
 ************************************************/
int ASSOCIATION_BASE::expand_array(void)
 { // if array full, increase size
 const char **old_key;
 char *old_data;
 unsigned int i,index;
 old_key = key_list; // save old data
 old_data = data_list;
 delete [] hash_list; // remove pointer storage
 hash_list = 0;
 delete [] link_list;
 link_list = 0;
 array_size <<= 1; // new size
 allocate_array(); // new array
 if (!*this) // valid?
 return 0;
 memcpy(key_list,old_key,fill_level*sizeof(const char *));
 memcpy(data_list,old_data,sizeofdata*fill_level);
 for (i = 0 ; i < fill_level ; i++)
 { // rehash old data into new array
 index = hash(old_key[i]);
 link_list[i] = hash_list[index];
 hash_list[index] = i;
 }
 delete [] old_key; // blow away old storage
 delete [] old_data;
 return 1;
 }
End Listings




















A Portable C++ String Class


A framework for cross-platform data-file management




William Hill


William is information-systems manager at Eurdit SA, a Paris firm which
publishes the Europages business-to-business telephone directory on paper,
CD-ROM, European online services, and the World Wide Web. William can be
contacted at bhill@dialup .francenet.fr.


The company I work for, Eurdit SA, produces a European-wide yellow-pages
directory which includes listings and publicity for 150,000 European companies
in over 30 countries. With over 60 partner companies worldwide collecting data
and selling publicity, communication and file exchange can be a Tower of
Babel, confusing enough to intimidate even the most hardened of
data-processing personnel. We currently accept two formats for submission of
editorial data: Paradox tables generated by a Windows application we
distribute to interested partners, and a fixed-length EBCDIC file format used
by our partners working out of mainframe production shops. All of our
publication data is processed for markup from the second format. In addition
to photocomposition markup, our data-file format has been used in a
Europe-wide online system, a Windows application on CD-ROM, and a Europe-wide
fax server. Needless to say, this EBCDIC file format, though aging and
cryptic, is mission critical for our products. For the purposes of this
article, I will refer to this file format as the "Europages file format." 
Since our Europages file format is just a specialized, fixed-format file, we
developed a fixed-format library and inherited our Europages file-format
classes from them. A small set of foundation classes was then put together in
order to guarantee portable development using string classes, linked lists,
and arrays. The foundation classes immediately paid for themselves in terms of
generic, reusable code, and served as a solid underpinning for the port of the
entire package to other platforms. Here, I'll share some of the design
decisions we made in developing the foundation classes, describe some of the
portability gotchas we encountered, and present a lean-but-mean portable
string class.


Europages File Format


All Europages file records are of equal length, but each record has a variable
number of fields depending on the information it contains. Before an
individual record can be used, its record type must be known. Once the record
type is known, its individual fields may be accessed. The format has often
been used by dense, hard-to-debug applications that break once the data-file
definition is modified (every year to varying degrees). We needed to make the
format transparent by creating a software tool for our production chain that
could be shared with our editorial partners.
Our design priorities were: 
Use of C++ as the project language.
Portability across platforms, as well as across compilers on the same
platform.
Use of general classes as building blocks for more-specialized task-specific
classes.
Manipulation of all data in the native character set of the client programmer.
Portability across all platforms of on-disk data files used to support the
package.
A resulting class library that would be perceived as an API by client
programmers, facilitating the cross-platform exchange of programs.


Portability Requirements


We did not want to develop our package on one platform or compiler and then
port it to the next one. Code was implemented and tested simultaneously on DOS
and SunOS compilers. This allowed us to identify the particularities of each
compiler early on. Code written later in development benefited from this
cross-platform approach. The result is an important body of C++ code that does
not contain any direct references to a single platform or compiler. DOS
compilers included Borland C++ 3.1 and 4.0, Visual C++ 1.5, and GNU 2.3. The
SunOS compilers used were Sun C++ 3.0 and GNU 2.3.
We avoided those C++ features that, due to lack of an adopted language
standard, are not universally supported: templates, exception handling, and
run-time type identification. As soon as these language features are generally
available, they will be integrated into the package.
Programmers using our classes don't need to know if the data being manipulated
is represented in EBCDIC or not. A character string in memory can be compared
automatically to a character string in our Europages file. All read operations
convert from EBCDIC to the native character set, correctly translating all
accented characters. Likewise, all write operations handle the proper
translation towards EBCDIC. These operations are carried out by a
straightforward table lookup. The Europages file-format classes guarantee that
all data is properly represented in memory.
Our package requires several reference files, which are accessed on disk
during processing. One of these files contains the list of yellow-pages
product references under which advertisements may be purchased in our
directory. Each reference, and its associated text heading, occupies 116
bytes. The entire table contains over 7000 entries (812 KB). We chose not to
access this in RAM in order to avoid complications when using the package
under DOS. We did not place this table in an indexed file, to avoid problems
with file compatibility and byte order (Endianness) in our UNIX
implementations. To obtain acceptable access times and portable behavior, the
table was placed in a fixed-format file and sorted by product code. The
entries are retrieved with a simple binary search. Although elementary, this
technique provides satisfactory performance and instant code portability.
Example 1 shows the technique we used within our class library.


Portability Gotchas


PC software tools are extremely rich in features and functionality when
compared to programming environments on more powerful platforms. This richness
sometimes comes back to haunt PC programmers when functions that they have
routinely used turn out to be PC specific.
One such DOS function was memmove() in the <string.h> library. Simple code
like the function call in Example 2 is not portable across platforms.
memmove() is not available on UNIX compilers, but memcpy() does the job quite
adequately. Since our earliest code relied heavily on function calls like
this, #define macros are used to transform memmove() calls to memcpy() calls
for our SunOS implementation.
We needed case-insensitive string comparison from our string class. Under
DOS/Windows, functions like stricmp() have always been available for most C
and C++ compilers. Programmers are so used to them that they forget that
strcmp() is not part of the C Standard Library. When a small portion of code
is ported to another platform (such as a Sun workstation) to be compiled with
a strictly conforming ANSI C or C++ compiler, it instantly breaks at link
time. Programmers usually view truly portable code as a pipe dream. Listings
One and Two contain our portable version of strcmp(), which is conditionally
compiled if we are not working on a DOS/Windows platform. Listing Three is a
program that tests the string library.
Designing generic classes to be used in all types of applications is
difficult. The temptation to add that one last member function is always
there. Trying to design a "kitchen-sink" class that does all things for all
applications is always a danger. Our string class contains exactly what we
need, across all of our applications, and no more. Generic class design is a
lot like application-interface design in that the public member functions of a
generic class constitute an API. The more pertinent the proposed functions,
the more useful the API. Unused functions in a class interface are better
implemented through inheritance, when they are truly needed.


A Generic String Class


The generic string class presented here contains the most basic
string-manipulation functions that we needed. It has been immensely useful for
encapsulating traditionally problem-prone code inside an intuitive interface.
From this base class, we have since inherited an enhanced class with full
support for the comparison of accented character strings. The enhanced class
includes a more-sophisticated search function based on the Boyer-Moore
algorithm. The class now provides excellent performance on extremely large
character buffers. We are currently working on a Unicode implementation of a
derived class for Windows NT. The beauty of these solutions is that the
original base code keeps ticking away, providing constant service and a clean
springboard to more complex solutions.
All of the classes in our data-format framework use the orthodox canonical
form for their class declarations and definitions. A fine illustration of this
form is presented in Advanced C++ Programming Styles and Idioms, by James O.
Coplien (Addison-Wesley, 1992). The canonical form ensures that instances of
declared classes will do exactly what you expect them to do when they are
created, copied, passed by value as function arguments, used on the left side
of an assignment operator, and destroyed. This canonical form requires a
default constructor, a copy constructor, an assignment operator, and a
destructor (almost always a virtual destructor).
The string class has four different constructors:
 The default takes no arguments and creates an empty string. This constructor
is called if a string instance is created without any arguments or if an array
of strings is created.
 The copy constructor creates an exact replica of another string instance.

 A copy constructor accepting regular C strings is also available as part of
the interface.
 The final constructor takes an unsigned integer argument used to determine
the size of the buffer used by the sprintf() member function. (This buffer has
a default size of 1024 bytes.)
The destructor is declared as virtual. This guarantees that a derived
destructor will be called if an instance of a derived class is deleted through
a base-class pointer. Two assignment operators are supplied with the class:
one for assigning string instances and one for assigning C strings. Note in
Listings One and Two that the assignment operators only delete the internal
storage for their instances if the right-hand value's string length is
strictly superior to its own length.


Selectors


Selectors are functions used for getting inside a C++ class and having a look
around without producing any side effects. In other words, selector functions
access class variables without changing their values. Selector member
functions are usually declared as const member functions. The compiler then
guarantees that the function cannot modify the instance for which it is
called. The length() function, which returns the string length (not including
the binary zero terminator, as in C), is the most solicited selector.
isEmpty() and the ! operator both return True if the string has a zero length.
The ! operator is really an inlined call to isEmpty() and is provided for
notational convenience.
The majority of selectors for the string class are comparison and
concatenation operators that share the same arithmetical notation. Strings may
be compared with the mathematical operators >, <, >=, <= , and the traditional
C operators == and !=. To maximize code reuse, the string class uses its own
operators at every opportunity. Listing One shows that the operator += uses
the operator + inside itself. We avoid rewriting existing functionality by
building powerful operators from more basic ones. The class seems to bootstrap
itself. Using the operators internally illustrates the usefulness of
particular member functions. The operators >=, <=, and != are just simple
calls to the <, >, and == operators, respectively. In Listing One, the >=
operator returns a Boolean value indicating whether or not < is true. In this
way, symmetric behavior for opposing functions is guaranteed.
The comparison operators all return Boolean values. For processing that
requires the exact value returned by functions like strcmp(), the member
function cmp() is supplied. It simply calls strcmp() from the standard library
and passes back the value. The Basic functions left(), mid(), and right() are
supplied as part of the public interface. They all rely on the private-member
workhorse function ncpy(), which encapsulates the actual extraction code.
ncpy() returns any substring within the string instance for which it is
called. The [] operator has been overloaded to allow access to individual
characters. Note that it returns a reference to the selected character, not a
copy of the character on the stack. In this way, the character may be used on
the left side of the assignment operator. The locate() functions find an
occurrence of a character or a substring within the instance for which they
are called. They return a zero-based offset to the occurrence.


Manipulators


Manipulators are member functions that actually change the inner state of the
class instance for which they are called. toUpper() and toLower() transform
the string instance to all upper- or lowercase characters. The derived
"enhanced" class redefines these member functions to do special processing for
accented characters. The insert() and erase() functions are used for inserting
and deleting substrings from a string instance. The fill() member function
floods the buffer with a given character value and tags on a trailing binary
zero. The iostreams classes are a great improvement over <stdio.h>, but
sometimes nothing works like a call to sprintf(). Listing One shows how we
recuperated sprintf() functionality using the <stdarg.h> library.
Numerous applications and techniques have been simplified using the string
class. In Example 3, a character buffer is scanned and a given substring is
continuously located and replaced by a different character sequence. The
function is tight and clear, the algorithm stands out, and all code dealing
with allocating and deallocating character buffers and copying and
concatenating character strings is neatly encapsulated within the string
class. This search-and-replace function was put together quickly and is easily
maintained.


Supporting New Character Sets


Political and economic changes in Europe have brought new partners to the
directory, and we must now support character sets for more languages,
including Slavic, Polish, and Slovenian. If all possible accented characters
are to be published, we must manage many more character values than the 256
available in eight bits. Don't forget that all these characters will end up in
an EBCDIC file and will have to be translated somewhere along the line. We are
extremely interested in the possibilities of 16-bit character sets such as
Unicode, which, I am sure, will profoundly change the way editorial data
processing is carried out. Unfortunately, Unicode is only implemented on
Windows NT for the time being, and we are still trying to clear out our
legacy-application cobwebs.
Our C++ package uses a completely portable, albeit inelegant, mechanism for
transmitting accented characters. Specific byte values are set aside as
"floating-accent" values. These values can only be used as accents of other
characters. For example, the hexadecimal value 0x06 is used for the circumflex
accent "^". Floating-accent values are placed immediately before the character
to be accented. A Europages file-format record will use the hexadecimal
sequence 0x03, 0x85 to transmit the character "". On a DOS platform, this
sequence will be read into memory as the ASCII hexadecimal value 0x88. Under
Windows, it will be read into memory as the ANSI hexadecimal value 0xEA.
Despite the work currently aimed at establishing a Unicode standard and the
premature announcement of the death of ASCII, the truly portable
text-representation systems are all based on 7-bit ASCII codes: PostScript,
Acrobat, SGML (and by extension, HTML).
Our character controls include run-time font-measurement calculations. This
means that character strings too long to be published in one column are
detected at file-composition time (not photocomposition time), generating
important savings for us in terms of delay and overdue costs. Each editorial
field in the package knows about the typeface used to compose it on a printed
page. All accepted characters are assigned their corresponding typeface width
in 200ths of Didot points. If a given field value overshoots the width of a
page column, an error is returned.


Conclusion


The Europages file-format classes have been used extensively since the
beginning of 1994. They serve as a 10,000-line C++ code repository for all
programs used to verify and process Europages data files for the publication
of our paper directory and electronic products. Results show that editorial
applications put together using these classes have been developed up to ten
times faster than previous applications, which were developed in C.
The true test of usefulness and durability is software maintenance. Our
previous software tools were difficult to maintain when our product changed.
This meant either an exhausting application rewrite to match the new product
specifications, or a hasty, unsatisfying maintenance job that left the edifice
as shaky as our nerves.
The Europages directory undergoes product modifications every year. Sometimes
these changes are incremental. This year the changes ran deep and profoundly
affected the structure of our data files. The C++ classes were easily
maintained, and publishing programs were ready in October of 1994, seven
months in advance of next edition's publishing deadline. Using previous
methods and tools, the publishing software was never available more than three
months in advance. This software-maintenance success has validated our design
decisions and proven that medium-to-large-scale project portability is
possible if design goals and considerations are clearly defined and understood
at the outset.
Example 1: Retrieving entries using a binary search.
BOOL F2HCode::verify(const char *src)
{
 const int nRecLen = 5; // length of an individual field
 FFCursor *cursHead = new FFCursor("head13.dat", FALSE, nRecLen);
 char szBuf[nRecLen + 1];
 BOOL retval = FALSE;
 long nLow = 1; // record numbers start at 1
 // return the number of records in the fixed format file:
 long nHigh = cursHead -> numRecs();
 while(nLow <= nHigh) {
 long nMid = (nLow + nHigh) / 2;
 cursHead -> gotoRecord(nMid);
 cursHead -> getRecord(szBuf);
 szBuf[sizeof(szBuf) - 1] = '\0';
 if(strcmp(src, szBuf) < 0)
 nHigh = nMid - 1;
 else if(strcmp(src, szBuf) > 0)
 nLow = nMid + 1;
 else {
 retval = TRUE;
 break;
 }
 }

 delete cursHead;
 return retval;
}
Example 2: Nonportable function call between UNIX and DOS.
// fixed-format field write
// function
BOOL FField::put(char *src)
{
 memmove(szBuf + nOffset, src,
nLength);
 return TRUE;
}
Example 3: Using a string class to continuously replace substrings in a
character buffer by another sequence of characters.
// replaces one substring by another for an entire string
// char *s : source character buffer
// const char *x : substring to find
// const char *y : substring to replace x with
// int len : string length of s
void replaceXbyY(char *s, const char *x, const char *y, int len)
{
 if(strcmp(x, y) == 0) return;
 MXString strSrc = s;
 MXString strX = x;
 MXString strY = y;
 int pos; // position returned by locate() function
 // if pos == MXSTRING_LOCATENOTFOUND,
 // the substring was not found
 // find the offset of strX in strSrc
 while((pos = strSrc.locate(strX)) != MXSTRING_LOCATENOTFOUND)
 {
 strSrc.erase(pos, strX.length()); // erase this copy of StrX
 strSrc.insert(pos, strY); // insert StrY in its place
 }
 strncpy(s, strSrc, len);
}

Listing One
// Class: MXString  class for managing zero terminated C-style strings
// Author: W Hill
#ifndef MXSTRING_HPP
#define MXSTRING_HPP
#include <string.h>
#include <assert.h>
#include <ctype.h>
enum BOOL { FALSE, TRUE };
typedef const char *CSTR;
const int MAX_VARGS_BUFLEN = 1024;
const int MXSTRING_LOCATENOTFOUND = -1;
class MXString {
//
public:
 // orthodox canonical form
 // see Advanced C++ Programming Styles and Idioms, James O. Coplien
 MXString();
 MXString(const MXString&); // copy constructor
 MXString(const char *);
 MXString(unsigned int nSprintfSize); // resize buffer for sprintf()
 virtual ~MXString();
 virtual MXString& operator=(const MXString&); // assignment operator

 virtual MXString& operator=(CSTR);
 virtual MXString operator+(const MXString&) const;
 virtual MXString& operator+=(const MXString&);
 // type conversion
 operator CSTR() const;
 // duplication
 // user is responsible for deleting the returned pointer
 // just like ANSI C strdup() function
 char *strDup() const;
 // substring member functions/operators
 char& operator[](unsigned int index);
 // remember BASIC?
 MXString left(unsigned int len) const; // return first len characters
 MXString mid(unsigned int start, unsigned int len) const; 
 // return len characters from offset start
 MXString right(unsigned int len) const; // return last len characters
 // substring/character functions return MXSTRING_LOCATENOTFOUND for 
 // error. offset is 0 based
 virtual int locate(const MXString&) const;
 virtual int locate(const char c) const;
 // comparison operators
 // > and < are used for alphabetical sorting operators
 virtual BOOL operator>(const MXString&) const;
 virtual BOOL operator>(CSTR) const;
 virtual BOOL operator>=(const MXString&) const;
 virtual BOOL operator>=(CSTR) const;
 virtual BOOL operator<(const MXString&) const;
 virtual BOOL operator<(CSTR) const;
 virtual BOOL operator<=(const MXString&) const;
 virtual BOOL operator<=(CSTR) const;
 virtual BOOL operator==(const MXString&) const;
 virtual BOOL operator==(CSTR) const;
 virtual BOOL operator!=(const MXString&) const;
 virtual BOOL operator!=(CSTR) const;
 virtual int cmp(const MXString&) const;
 // case conversion member functions
 virtual void toUpper(); // converts instance to uppercase
 virtual void toLower(); // converts instance to lowercase
 // check/toggle sensitivity setting for all MXStrings
 static void sensitivity(BOOL b);
 static BOOL sensitivity();
 // insertion; deletion members functions
 MXString& insert(unsigned int start, MXString&);
 MXString& erase(unsigned int start, unsigned int len);
 // handy printf formatting-type function;
 MXString& sprintf(CSTR fmt, ...);
 // returns length of zero terminated string
 // not length of allocated buffer
 unsigned int length() const;
 
 // return TRUE if string is empty 
 BOOL isEmpty() const;
 BOOL operator!() const;
 // fills string with single character 
 void fill(unsigned int len, const char c =  );
//
private:
 static BOOL bSensitive; // compares/searches are case sensitive ?
 char *rep;

 int nSprintfBufSize;
 MXString ncpy(unsigned int start, unsigned int len) const;
 };
inline MXString::operator CSTR() const
{
 return rep;
}
inline MXString MXString::left(unsigned int len) const
{
 return ncpy(0, len);
}
inline MXString MXString::mid(unsigned int start, unsigned int len) const
{
 return ncpy(start, len);
}
inline unsigned int MXString::length() const
{
 return strlen(rep);
}
inline MXString MXString::right(unsigned int len) const
{
 return ncpy(length() - len, len);
}
inline BOOL MXString::isEmpty() const
{
 return (*rep == \0) ? TRUE : FALSE;
}
inline BOOL MXString::operator !() const
{
 return isEmpty();
}
inline void MXString::sensitivity(BOOL b)
{
 MXString::bSensitive = b;
}
inline BOOL MXString::sensitivity()
{
 return MXString::bSensitive;
}
class ostream;
ostream& operator<<(ostream& s, MXString& m);
#endif // MXSTRING_HPP

Listing Two
#include mxstring.hpp
BOOL MXString::bSensitive;
#ifdef sunos
int stricmp(const char *s1, const char *s2);
#endif // portable stricmp()
#ifdef sunos
int stricmp(const char *s1, const char *s2)
{
 while(toupper(*s1++) == toupper(*s2++))
 if(*s1 == \0 && *s2 == \0)
 return 0;
 if(toupper(*s1) < toupper(*s2))
 return -1;
 else
 return 1;

}
#endif // portable stricmp()
MXString::MXString()
{
 rep = new char[1];
 assert(rep);
 rep[0] = \0;
 nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::MXString(unsigned int nSprintfSize)
{
 rep = new char[1];
 assert(rep);
 rep[0] = \0;
 nSprintfBufSize = (nSprintfSize > 0) ? nSprintfSize : MAX_VARGS_BUFLEN;
}
MXString::MXString(const MXString& s)
{
 rep = new char[s.length() + 1];
 assert(rep);
 strcpy(rep, s.rep);
 nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::MXString(const char *s)
{
 rep = new char[strlen(s) + 1];
 assert(rep);
 strcpy(rep, s);
 nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::~MXString()
{
 delete[] rep;
}
// As for all operators and functions that require possible reassigning to the
// *rep pointer, a test is first made to verify that the existing string 
// buffer is larger than the incoming string. If so, make a straightforward
// copy. Buffer space is freed only if incoming string requires extra space.
MXString& MXString::operator=(const MXString& s)
{
 if(rep != s.rep) {
 if(s.length() > length()) {
 delete[] rep;
 rep = new char[s.length() + 1];
 assert(rep);
 }
 strcpy(rep, s.rep);
 }
 return *this;
}
MXString& MXString::operator=(const char *s)
{
 if(rep != s) {
 if(strlen(s) > length()) {
 delete[] rep;
 rep = new char[strlen(s) + 1];
 assert(rep);
 }
 strcpy(rep, s);

 }
 return *this;
}
MXString MXString::operator+(const MXString& s) const
{
 char *tmp = new char[length() + s.length() + 1];
 assert(tmp);
 strcpy(tmp, rep);
 strcat(tmp, s.rep);
 MXString retval = tmp;
 delete[] tmp;
 return retval;
}
MXString& MXString::operator+=(const MXString& s)
{
 *this = *this + s;
 return *this;
}
char *MXString::strDup() const
{
 char *tmp = new char[length() + 1];
 assert(tmp);
 strcpy(tmp, rep);
 return tmp;
}
MXString MXString::ncpy(unsigned int start, unsigned int len) const
{
 if(start > (length() - 1)) {
 MXString emptyString;
 return emptyString;
 }
 if(len > strlen(&rep[start]))
 len = strlen(&rep[start]);
 char *tmp = new char[len + 1];
 assert(tmp);
 strncpy(tmp, &rep[start], len);
 tmp[len] = \0;
 MXString retval = tmp;
 delete[] tmp;
 return retval;
}
char& MXString::operator[](unsigned int index)
{
 // return \0 if the index value is out of bounds
 if(index < 0 index > length())
 return rep[length()];
 return rep[index];
}
int MXString::locate(const MXString& s) const
{
 char *p;
 int off;
 if(MXString::sensitivity()) {
 p = strstr(rep, s.rep);
 off = p ? (int)(p - rep) : MXSTRING_LOCATENOTFOUND;
 }
 else {
 MXString src = *this;
 src.toUpper();

 MXString tmp(s);
 tmp.toUpper();
 p = strstr(src.rep, tmp.rep);
 off = p ? (int)(p - src.rep) : MXSTRING_LOCATENOTFOUND;
 }
 return off;
}
int MXString::locate(const char c) const
{
 char *p;
 int off;
 if(MXString::sensitivity()) {
 p = strchr(rep, c);
 off = p ? (int)(p - rep) : MXSTRING_LOCATENOTFOUND;
 }
 else {
 MXString src = *this;
 src.toUpper();
 char tmp = toupper(c);
 p = strchr(src.rep, tmp);
 off = p ? (int)(p - src.rep) : MXSTRING_LOCATENOTFOUND;
 }
 return off;
}
BOOL MXString::operator>(const MXString& s) const
{
 if(MXString::sensitivity())
 return strcmp(rep, s.rep) > 0 ? TRUE : FALSE;
 else
 return stricmp(rep, s.rep) > 0 ? TRUE : FALSE;
}
BOOL MXString::operator>(CSTR s) const
{
 MXString str = s;
 return (*this > str);
}
BOOL MXString::operator>=(const MXString& s) const
{
 return (s < *this);
}
BOOL MXString::operator>=(CSTR s) const
{
 MXString str = s;
 return (str < *this);
}
BOOL MXString::operator<(const MXString& s) const
{
 if(MXString::sensitivity())
 return strcmp(rep, s.rep) < 0 ? TRUE : FALSE;
 else
 return stricmp(rep, s.rep) < 0 ? TRUE : FALSE;
}
BOOL MXString::operator<(CSTR s) const
{
 MXString str = s;
 return (*this < str);
}
BOOL MXString::operator<=(const MXString& s) const
{

 return (s > *this);
}
BOOL MXString::operator<=(CSTR s) const
{
 MXString str = s;
 return (str > *this);
}
BOOL MXString::operator==(const MXString& s) const
{
 if(MXString::sensitivity())
 return strcmp(rep, s.rep) == 0 ? TRUE : FALSE;
 else
 return stricmp(rep, s.rep) == 0 ? TRUE : FALSE;
}
BOOL MXString::operator==(CSTR s) const
{
 MXString str = s;
 return (*this == str);
}
BOOL MXString::operator!=(const MXString& s) const
{
 return (*this == s) ? FALSE : TRUE;
}
BOOL MXString::operator!=(CSTR s) const
{
 return (*this == s) ? FALSE : TRUE;
}
int MXString::cmp(const MXString& s) const
{
 if(MXString::sensitivity())
 return strcmp(rep, s.rep);
 else
 return stricmp(rep, s.rep);
}
void MXString::toUpper()
{
 for(unsigned int i = 0; i < length(); i++)
 rep[i] = toupper(rep[i]);
}
void MXString::toLower()
{
 for(unsigned int i = 0; i < length(); i++)
 rep[i] = tolower(rep[i]);
}
MXString& MXString::insert(unsigned int start, MXString& s)
{
 if(start < (length() + 1)) {
 MXString strStart = ncpy(0, start);
 MXString strEnd = ncpy(start, length() - start);
 *this = strStart + s + strEnd;
 }
 return *this;
}
MXString& MXString::erase(unsigned int start, unsigned int len)
{
 if(start < (length() + 1) && len <= strlen(&rep[start])) {
 MXString strStart = ncpy(0, start);
 MXString strEnd = ncpy(start + len, length() - (start + len));
 *this = strStart + strEnd;

 }
 return *this;
}
void MXString::fill(unsigned int len, const char c)
{
 if(len > length()) { 
 delete[] rep;
 rep = new char[len + 1];
 assert(rep);
 }
 memset(rep, c, len);
 *(rep + len) = \0;
}
#include <stdio.h>
#include <stdarg.h>
MXString& MXString::sprintf(const char *fmt, ...)
{
 char *szBuf = new char[nSprintfBufSize];
 assert(szBuf);
 va_list args;
 va_start(args, fmt);
 int val = ::vsprintf(szBuf, fmt, args);
 va_end(args);
 // if retval >= MAX_VARGS_BUFLEN then
 // we have written past the end of the buffer
 // memory is probably trashed; an exception should be thrown here
 assert(val < nSprintfBufSize);
 *this = szBuf;
 delete[] szBuf;
 return *this;
}
#include <iostream.h>
ostream& operator<<(ostream& s, MXString& m)
{
 s << (CSTR)m;
 return s;
}

Listing Three
#include <iostream.h>
#include <fstream.h>
#include mxstring.hpp
int main(int argc, char *argv[])
{
 MXString str(Hello, world!);
 cout << instance [str] ==  << str << \n;
 cout << MXString instances are  << sizeof(MXString) <<
  bytes in size << \n; 
 cout << instance [str] is  << sizeof(MXString) <<
  bytes in size << \n; 
 cout << instance [str] contains string representation of  <<
 str.length() <<  bytes in length << \n\n; 
 MXString strUp = STRING;
 MXString strLow = string;
 cout << strUp ==  << strUp << \t << strLow ==  << strLow << \n;
 MXString::sensitivity(FALSE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  ==  << strLow <<  :  <<
 (int)(strUp == strLow) << \n;

 MXString::sensitivity(TRUE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  ==  << strLow <<  :  <<
 (int)(strUp == strLow) << \n;
 MXString::sensitivity(FALSE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  !=  << strLow <<  :  <<
 (int)(strUp != strLow) << \n;
 MXString::sensitivity(TRUE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  !=  << strLow <<  :  <<
 (int)(strUp != strLow) << \n\n;
 strUp = UP;
 strLow = low;
 cout << strUp ==  << strUp << \t << strLow ==  << strLow << \n;
 
 MXString::sensitivity(FALSE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  <  << strLow <<  :  <<
 (int)(strUp < strLow) << \n;
 MXString::sensitivity(TRUE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  <  << strLow <<  :  <<
 (int)(strUp < strLow) << \n;
 MXString::sensitivity(FALSE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  >  << strLow <<  :  <<
 (int)(strUp > strLow) << \n;
 MXString::sensitivity(TRUE);
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strUp <<  >  << strLow <<  :  <<
 (int)(strUp > strLow) << \n\n;
 MXString::sensitivity(FALSE);
 MXString strSrc(This string is for searching inside);
 MXString strLocate(SEARCH);
 char chLocate = G;
 int rc = strSrc.locate(strLocate);
 cout << Using source string :  << strSrc << \n;
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strLocate <<  subchain search result :  << rc << \n;
 rc = strSrc.locate(chLocate);
 cout << Using source string :  << strSrc << \n;
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << chLocate <<  subchain search result :  << rc << \n\n;
 
 MXString::sensitivity(TRUE);
 rc = strSrc.locate(strLocate);
 cout << Using source string :  << strSrc << \n;
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << strLocate <<  subchain search result :  << rc << \n;
 rc = strSrc.locate(chLocate);
 cout << Using source string :  << strSrc << \n;
 cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
 cout << chLocate <<  subchain search result :  << rc << \n\n;
 MXString strBegin = Beginning ;
 MXString strMiddle = Middle ;
 MXString strEnd = End;
 cout << strBegin ==  << strBegin <<  strMiddle ==  <<
 strMiddle <<  strEnd ==  << strEnd << \n;

 MXString strCat;
 strCat = strBegin + strMiddle + strEnd;
 cout << strCat ==  << strCat << \n;
 cout << strCat.left(9) ==  << (MXString)strCat.left(9) << \n;
 cout << strCat.mid(10, 6) ==  << (MXString)strCat.mid(10, 6) << \n;
 cout << strCat.right(3) ==  << (MXString)strCat.right(3) << \n\n;
 
 strCat.erase(0, 10);
 cout << strCat.erase(0, 10) ==  << strCat << \n;
 strCat.insert(0, (MXString)Start );
 cout << strCat.insert(0, \Start\) ==  << strCat << \n\n;
 MXString strBig;
 if(argc > 1) {
 ifstream fin(argv[1]);
 if(fin.good()) {
 int nCount = 0;
 char szBuf[1024];
 while(fin.getline(szBuf, sizeof(szBuf))) {
 strBig.sprintf(Line [%05d] : %s, ++nCount, szBuf);
 cout << strBig << \n;
 }
 }
 cout << \n;
 }
#if defined _CONSOLE sunos
 strBig.fill(1000000, #); // this works under WinNT and SunOS
 cout << strBig << \n\n;
#endif // _CONSOLE sunos
 // write over string used at beginning of program 
 str = Goodbye, world!;
 cout << str << endl;
 return 0;
}
End Listings





























A C++ Framework for DCE Threads


Multitasking with pthreads




Michael Yam


Michael is the founder of Y Technology Inc., a consulting company that serves
New York's financial district. He can be reached on CompuServe at 76367,3040.


If you've done any serious programming under UNIX, you probably appreciate its
multitasking capabilities and may want to add such capabilities to your
application. Using threads to multitask pieces of your program, you can divide
and conquer a problem, while improving a program's throughput. Threads are
often used to build servers that handle multiple clients; their real-time
nature also makes them effective for simulations and real-time data delivery. 
The OSF Distributed Computing Environment (DCE) supports a powerful,
multithreaded programming facility through a set of API calls sometimes called
"pthreads." In a recent project, I created a C++ framework under HP-UX to cope
with multithreaded programming using the DCE facilities. My pthread framework
(PTF) consists of five classes and one template; see Table 1. PTF makes no
effort to encapsulate the entire DCE pthread API. For example, PTF creates
only threads of equal priorities with round-robin scheduling. However, the
framework can be extended to accommodate varied priorities and different
schedules. (For more information on OSF DCE, see "Distributed Computing and
the OSF/DCE," by John Bloomer, DDJ, February 1995.)


pthreads Explained


DCE threads is a facility that supports multithreaded programming. On HP-UX,
it is a set of nonkernel libraries to which you link your programs. The API
consists of some 50-plus calls. It is based on a POSIX standard, which is why
DCE threads are sometimes referred to as "pthreads."
Much as UNIX can multitask several processes, threads allow a program to
multitask pieces of itself. Unlike processes, however, threads share the same
address space and can communicate with one another without complex
interprocess communications. Because threads are intraprocess, they can "see"
the other's static variables while maintaining private automatic variables of
their own. This is possible because every thread gets its own stack.
Threads allow your program to divide and conquer a problem. For example, one
thread can calculate the area under a curve between 0 and 1 while another
thread does the same between 0 and -1. When the two threads complete their
respective calculations, they join up and total their results for the area
under the curve between -1 and +1. While nonthreaded algorithms exist to solve
for an area under a curve, a threaded solution can transparently take
advantage of a platform with multiple CPUs.
Threads can be used to improve a program's throughput. At one client site,
we've designed a multithreaded server to support up to 100 users concurrently.
One thread manages connections and dynamically creates threads to service
users, thus providing one thread per user. Helper threads within the server
monitor connections, message statistics, and the general health of the server.
A nonthreaded solution could not handle the same number of users as cleanly;
see the accompanying text box entitled "Unraveling DCE pthreads" for details.


Constructing Threads


To create a thread under DCE, you call pthread_create(), which accepts a
pointer to a function as one of its arguments. This function becomes the entry
point of your thread. To create a thread using PTF, on the other hand, you
must derive from the abstract base class, PTObject; instead of using a pointer
to a function, you supply code for the virtual function runPThread(). This has
three advantages. First, a thread as an object can store related data such as
the thread ID, the thread handle, mutexes, condition variables, and any data
that might be shared with other threads. Secondly, pointers to functions in C
are natural candidates for virtual functions in C++. Third, deriving from
PTObject gives you the start(), tid(), and join() member functions in addition
to the constructor and destructor.
The start() member function starts the thread object running. A thread can
have four states: running, ready to run, waiting, and terminated. A running
thread is one that is executing CPU instructions. Don't assume that a newly
created thread starts in the running state; that will depend on the thread
priority and scheduling policy, as well as CPU availability. Ideally, the
constructor should start the thread, but if a CPU were available during thread
creation, the thread could start running even before the constructor had
completed! To prevent this, the thread object must be instantiated first, then
followed with a call to start(). However, start() has its limitations. All it
does is invoke DCE's pthread_create(). It cannot truly force a thread to
start--that remains the responsibility of the scheduler. Still, start() gives
the constructor a chance to complete its initializations and thus prevents the
premature starting of a thread. Because start() invokes pthread_create(), you
should not call start() more than once per PTObject.
You might also be tempted to modify start() to accept arguments. As an
example, for a thread to service a particular socket, it may seem desirable to
pass the file descriptor into the start() member. However, it's better to
avoid overloading start(). Instead, when you derive a class from PTObject,
write your constructor to accept a file descriptor and store the value in your
class (perhaps in the private section). The member function tid() returns the
thread ID as a unique integer. I assign the ID during thread creation using
the undocumented, nonportable DCE call, pthread_getunique_np(). If you're
uncomfortable using nonportable calls, modify the class to generate and assign
your own unique numbers. The idea behind tid() is to manage threads based on
ID and to identify threads in debugging statements.
The join() function, a wrapper for DCE's pthread_join(), allows the current
thread to wait for the completion of another thread. In terms of DCE calls,
pthread_create() and pthread_join() are opposites: pthread_create() splits
single-threaded code for parallel processing, and pthread_join() serializes
any parallel code. If the current thread calls pthread_join() on itself, the
thread will deadlock. PTF's join(), however, checks for this condition.
Instead of deadlocking, the thread will simply continue. 
Once a thread has completed running, it is in a terminated state. Normally,
you would free resources by calling C++'s delete on your thread object. You
can also have a thread self-destruct by coding a delete at the end of the
runPThread() routine. Having a thread invoke its own destructor avoids storing
and deleting the pointers to your thread objects. Finally, you can terminate
running threads by just "deleting" the thread object. I don't recommend this,
however, because your thread might be in a critical state. To interrupt your
thread, set a flag in your thread object to mark it for termination. The
thread itself can then poll this flag and self-destruct when it is not
critical.


Good Class Habits


Good, safe, robust frameworks require other member functions in PTObject and
all the PTF classes: a copy constructor, an assignment operator, isValid(),
and className(). For simplicity and clarity, I have omitted the copy
constructor and assignment operators from the framework, so the compiler
supplies its own. This will probably work, assuming the class doesn't contain
pointers. PTObject doesn't contain pointers today, but may in the future.
Thus, to prevent a class user from accidentally copying or assigning a thread
object, I've declared those members in the header file (see Listing One), but
haven't defined them in the .cpp file (Listing Two). Should the class user try
to copy or assign a thread object, an error would occur at link time.
The isValid() function determines if the object was constructed properly.
Recall that constructors do not return a value. Although isValid() usually
acts like a bool, returning 1 or 0 for True or False, respectively, I've
declared it to return an integer. I could have implemented a Boolean type, but
there are endless variations that might clash with libraries from different
vendors. Fortunately, a C++ bool type using "true" and "false" literals has
been accepted by the ANSI/ISO committee. It's just a matter of time before
compiler vendors implement the bool type, but until then, I prefer returning
an integer. Checking for a bad constructor therefore involves ensuring that
the object pointer is non-null and then invoking the isValid() member: 1 means
the constructor completed successfully, and 0 indicates failure.
Some favor exception handling for detecting construction problems, and in
fact, exception handling is the preferred method in C++. However, I've used
flags for simplicity, and because HP exception handling doesn't work properly
with threads as of this writing. className() returns the class name like so:
const char *className() { return "PTObject" }. A typical use for this function
might be to display the class name in a title bar or a message log.


Thread Synchronization


Two classes, PTMutex and PTCondVar, manage thread synchronization. DCE mutexes
come in three flavors: fast, recursive, and nonrecursive. A fast mutex is
synchronous: A thread will wait indefinitely until it gets the lock. This is
the most commonly used type of mutex. A recursive mutex allows a thread that
has already locked a mutex to lock it again (it doesn't block itself). Since
the mutex is recursive, it will take as many unlocks as there were locks to
clear the mutex. A nonrecursive mutex, like a fast mutex, can only be locked
once, but unlike a fast mutex, if the thread tries to lock it again, it will
receive an error instead of waiting indefinitely.
PTF provides the PTMutex class to encapsulate the DCE fast mutex. The
constructor creates the mutex using DCE's pthread_mutex_init(), and the
destructor deletes the mutex using pthread_ mutex_destroy(). Member functions
lock(), unlock(), and trylock() wrap the DCE functions pthread_lock(),
pthread_ unlock(), and pthread_trylock(), all returning the same respective
error codes. Admittedly, PTMutex does not add much to DCE's mutex
functionality, but it is important because it serves as a base class for
PTCondVar. I've derived PTCondVar from PTMutex because a condition variable
"is a kind of" mutex. Whereas a mutex synchronizes threads with data, a
condition variable synchronizes threads with other threads. For a condition
variable to be useful, it must also be paired with a mutex. A condition
variable has Boolean properties and acts much like a traffic light,
controlling whether the waiting thread(s) should go or continue to wait. The
state of the condition variable is controlled by other threads with the help
of the PTCondVar class members signal(), broadcast(), and timedWait().
The signal() function allows one thread to notify a waiting thread to
continue. Threads that have a producer/consumer relationship often use this
mechanism. In such a relationship, a producer thread would generate data while
a consumer thread would process the data. When data is available, the producer
thread would signal() the waiting thread, telling it that there is data to
process. The broadcast() function is similar to signal(), except that signal()
wakes the first waiting thread, and broadcast() wakes them all.
A consumer thread waits for a signal or broadcast by calling the member
timedWait(). A waiting thread will resume execution only when it receives a
signal or broadcast, or when a specified number of seconds have elapsed,
whichever comes first. I've written timedWait() to accept an integer
representing the number of seconds to wait; passing in a zero will cause the
thread to wait indefinitely for a signal or broadcast.


Thread-Safe Objects



PTMutex and PTCondVar, while handy, are still primitive. An entirely
thread-safe class would be more convenient. For a class to be thread safe,
operations on data must be atomic: Another thread must not be able to
interrupt the current one during an update operation. A thread-safe object can
be achieved with the classes PTSafeObject and PTLock. Derive your class from
PTSafeObject. Then, for every member function that reads or writes data,
instantiate PTLock on the stack, as in Figure 1. Upon leaving the function,
PTLock will automatically unlock and delete itself. If you design your
application to access data through one class, this mechanism will ensure safe
updates across multiple threads. Creating a mutex on the stack also prevents
you from accidentally leaving the mutex in a locked state when returning early
from a function (to handle an error).
PTSafeObject and PTLock operate as a pair. Actually, PTLock is nested within
PTSafeObject and is also a friend of PTSafeObject. When you derive from
PTSafeObject, your class inherits a mutex that PTLock locks in its constructor
and unlocks in its destructor. When you delete your object, the mutex also
gets deleted. This allows only one mutex per PTSafeObject, thus precluding
designs that dedicate a mutex for reading and another mutex for writing.
A natural extension to PTSafeObject is to create a template version:
PTTSafeType (the extra "T" denotes template); see Listings Three and Four. A
template allows built-in types, such as an int, to be thread safe. I found
this especially useful for managing sequence numbers, because several threads
could simultaneously update the sequence number. The template provides
thread-safe get() and set() functions and overloads the prefix and postfix
increment and decrement operators, making them thread safe as well; see Figure
2.


Wrapping Up


To test the thread framework, a sample program (available electronically; see
"Availability" on page 3) creates a "boss" thread, which in turn creates ten
"worker" threads. The pointers to the worker threads are stored in an array so
that they can be deleted later. The worker threads wait on a condition
variable for either a broadcast() from the boss thread or for 15 seconds to
elapse, whichever comes first. To make the program a little more interesting,
the boss thread issues a signal() rather than a broadcast(), thus waking up
only one worker thread. The remaining worker threads sleep for 15 seconds
before starting. All worker threads print "Hello World" to stdout along with
their thread IDs. The worker threads then join the boss thread and are
deleted.
Under HP-UX, I compiled PTF with the following definitions: _REENTRANT,
_POSIX_SOURCE, and _CMA_NOWRAPPERS. I had to use _REENTRANT because threaded
code must be reentrant, and I opted for _POSIX_SOURCE because pthreads are
based on a POSIX standard. As for _CMA_NOWRAPPERS, the threads package OSF
provides to vendors was originally called CMA. HP offers CMA compatibility by
providing wrappers in header files that redefine standard library calls with
calls to cma_ routines. For example, read() is replaced with cma_read().
Unfortunately, enabling CMA wrappers can lead to name collisions, especially
in C++, where classes commonly have a member function named read(). When
linking the test program, you'll need the following two libraries (shared or
archive): libdce and libc_r. libdce provides DCE support. libc_r is HP's
thread-safe library of system calls.
Finally, developing PTF was hardly a one-man job. It takes support, criticism,
and testing to produce reusable code. I'd like to thank David Potochnak,
Chanakya Ganguly, Dan McCartney, and Anne Jata for helping make this framework
possible.


References


Becker, Pete. "Writing Multi-threaded Applications in C++." C++ Report
(February 1994).
Lockhart, Harold W. OSF DCE. New York, NY: McGraw Hill, 1994.
Open Software Foundation. OSF DCE Application Development Reference. Englewood
Cliffs, NJ: Prentice-Hall, 1993.
Open Software Foundation. OSF DCE Application Development Guide. Englewood
Cliffs, NJ: Prentice-Hall, 1993.
Using Threads Effectively
Because a typical program can have more than 100 threads, using threads
properly requires synchronization. Two mechanisms are available: mutual
exclusion locks (mutexes) and condition variables. A mutex is used to protect
data from simultaneous access by multiple threads (race condition). A thread
would lock a mutex associated with a piece of data, update the data, then
unlock the mutex. You'll need to guard against deadlocks. Deadlocks usually
occur when working with more than one mutex. Imagine a problem where both
threads A and B need to lock both mutexes 1 and 2. Thread A grabs mutex 1 and
thread B grabs mutex 2. Both threads will wait indefinitely for the other's
mutex to become available. You can also deadlock on one mutex if you attempt
to lock the same mutex a second time, say in a recursive routine. DCE,
however, supports a special recursive mutex for just such a situation.
A condition variable can be considered as a kind of mutex, except that instead
of synchronizing threads with data, it synchronizes threads with other
threads. For example, if thread A needs an intermediate value calculated by
thread B, thread A can wait for thread B to complete its computations. Thread
B, when ready, can then signal thread A. This signal is not to be confused
with UNIX signals. 
By default, DCE threads are scheduled with a round-robin (RR) policy in medium
priority. If all your threads run at the same priority, the RR policy ensures
that all threads get serviced eventually. Don't assume, however, that
time-slicing occurs at a fine level (millisecond or smaller). If your platform
has multiple CPUs, threads can run simultaneously, but on a single-CPU
platform, time slicing can occur on the order of a minute or more, depending
on your system configuration. This seemingly large time slice is efficient;
context switching between threads can be expensive. 
DCE threads also support a first-in, first-out (FIFO) policy, whereby a thread
will run uninterrupted by any other thread of equal priority. The thread will
yield only when it's completed or when it blocks for I/O. FIFO is the most
efficient policy, because it minimizes context switching. It is also the least
fair, because short-running threads will have to wait for long-running
threads. Long-running threads can, however, give up control with intelligently
placed calls to pthread_yield(). Realize, though, that if the long running
thread has locked a mutex needed by other threads, pthread_yield() will
accomplish nothing and cost a context switch. Other scheduling policies are
available, but RR and FIFO are the most commonly used ones.
While the nature of an application will determine the proper scheduling
policy, when it comes to thread priorities, simple is better. Use threads of
identical priority or one high-priority thread and many with medium priority.
Writing a program with clever scheduling priorities generally leads to
unexpected performance issues. One such case is priority inversions, where a
lower-priority thread can run before a higher-priority thread. For example,
given three threads of different priorities, if low-priority thread C locks a
mutex needed by high-priority thread A and thread C waits while
medium-priority thread B runs, thread A must wait while thread B runs. 
Often, a low-priority thread can be replaced with a medium-priority thread
that sleeps a lot. For example, a low-priority thread might make sense for
such tasks as monitoring the state of a file, but a better solution would be
to create the thread with the same priority as all the other threads (medium)
and have it periodically "sleep," or wait on a condition variable, for n
seconds before checking the file. In addition, if the state of the file were
to be changed by another thread, the thread making the change could wake
(signal) the sleeping thread. Putting a thread to sleep avoids polling in a
tight loop looking for work to do, thus saving CPU cycles and simulating a
lower-priority schedule.
Working with threads isn't easy. Their simultaneous execution and interaction
often lead to conditions that are difficult to foresee. Troubleshooting a
threaded application can be arduous if you don't have a debugger that can
trace through threads. As of this writing, HP has a debugger for threads in
beta. For the most part, I relied on old-fashioned debugging statements that
also displayed the thread ID. I also developed and depended on PTF, a
framework for pthreads, to reduce my chances for error. It's evident that
threads require a fair amount of work, but the results are well worth the
effort. 
--M.Y.
Table 1: PTF classes and the PTTSafeType template.
Class Purpose
PTObject Threads are derived from this class.
PTMutex Encapsulates DCE mutexes.
PTCondVar Derived from PTMutex. Encapsulates DCE condition variables.
PTSafeObject A thread-safe class can be derived from this class.
PTLock Works with PTSafeObject to lock and unlock a mutex.
PTTSafeType Template derived from PTSafeObject; makes built-in types thread
safe.
Figure 1: Code fragment using PTLock to make your derived class thread safe.
derivedClass::updateData()
{
 // Class is derived from PTSafeObject
 PTLock lock(this); // mutex locked
 // safe to update data here.
 ...
 // return invokes PTLock's destructor
 // which unlocks the mutex.
}
Figure 2: Code fragment illustrating thread-safe template.
PTTSafeType <int> sequenceNumber;
 ...
sequenceNumber.set (0); // sets number to 0 safely.
++sequenceNumber; // inc number safely.
// or sequenceNumber++;
 ...

Listing One
/***** PTF.H -- Header file describes classes for pthread framework. *****/
#ifndef PTF_H
#define PTF_H

extern "C"
{
#include <pthread.h>
}
#define TRUE 1
#define FALSE 0
/*--- CLASS: PTMutex. Description: Create, destroy, lock, unlock a mutex.
---*/
class PTMutex 
{
public:
 PTMutex ();
 virtual ~PTMutex();
 int lock();
 int unlock();
 int tryLock();
 // dummy declarations to prevent copying by value. classes not having 
 // pointer members may remove these declarations. Note: operator=() 
 // should be chained in case of inheritance.
 PTMutex (const PTMutex &);
 PTMutex& operator=(PTMutex&);
 virtual const char *className() { return "PTMutex"; }
 virtual int isValid(); 
protected:
 pthread_mutex_t _hMutex; // handle to mutex
private:
 int _validFlag;
};
/*--- CLASS: PTCondVar. 
 *--- Description: Manages condition variables and associated mutexes. ----*/
class PTCondVar : public PTMutex
{
public:
 PTCondVar (); // pthread_cond_init()
 virtual ~PTCondVar (); // pthread_cond_destroy()
 int signal ();
 int broadcast ();
 int timedWait (int seconds=0);
 pthread_cond_t hCondVar() { return _hCondVar; }
 // dummy declarations to prevent copying by value.
 // classes not having pointer members may remove these declarations.
 // Note: operator=() should be chained in case of inheritance.
 PTCondVar (const PTCondVar&);
 PTCondVar& operator=(PTCondVar&);
 const char *className() { return "PTCondVar"; }
 virtual int isValid(); 
protected:
 pthread_cond_t _hCondVar; // handle to condition variable
private:
 int _validFlag; 
};
/*-- CLASS: PTObject. Description: Abstract class. Use to build a pthread.
--*/
class PTObject 
{
public:
 PTObject (); // use default schedule & policy
 virtual ~PTObject ();
 int start ();
 virtual int runPThread() = 0; // this gets called by start_routine()
 int tid () {return _tid;}

 int join ();
 // dummy declarations to prevent copying by value.
 // classes not having pointer members may remove these declarations.
 // Note: operator=() should be chained in case of inheritance.
 PTObject (const PTObject &);
 PTObject& operator=(const PTObject&);
 
 const char *className() { return "PTObject"; }
 virtual int isValid(); 
 pthread_addr_t exitStatus; // thread's exit code (used by join()).
protected:
 pthread_t _hPThread; // handle to thread
 // this static routine gets passed to pthread_create()
 static pthread_addr_t start_routine(void *obj);
private:
 int _validFlag;
 int _tid;
};
/*--- CLASS: PTSafeObject. Description: Derive from this to create a 
 *--- thread-safe class. ----*/
class PTSafeObject 
{
public:
 PTSafeObject ();
 ~PTSafeObject ();
 PTMutex *pPTMutex() const;
 // dummy declarations to prevent copying by value.
 // classes not having pointer members may remove these declarations.
 // Note: operator=() should be chained in case of inheritance.
 PTSafeObject (const PTSafeObject &);
 PTSafeObject& operator=(const PTSafeObject&);
 
 const char *className() { return "PTSafeObject"; }
 virtual int isValid(); 
protected:
 class PTLock
 {
 public:
 PTLock (PTSafeObject *ThreadSafe);
 ~PTLock ();
 private:
 PTMutex *_pPTLockMutex;
 };
private:
 // friend declaration needs to be here for nested classes. 
 // might be an HP compiler bug.
 friend class PTLock;
 PTMutex *_pPTMutex;
 int _validFlag;
};
#endif

Listing Two
/***** PTF.CPP -- Encapsulation of DCE PThreads. Classes include: PTObject, 
 derive from this to create your threads; PTMutex, creates a mutex;
 PTCondVar, derived from PTMutex. Creates a condition variable and 
 and an associated mutex; PTSafeObject, derive from this for classes 
 which update shared resources; PTLock, locks a mutex. Works with 
 PTSafeObject. Currently supports default thread creation: round-robin 

 scheduling and medium priority. Currently supports default mutex 
 creation: fast locks (as opposed to recursive and non-recursive locks). 
***************************************************************/
#include "PTF.H"
#include <assert.h>
#include <sys/errno.h>
#ifndef NDEBUG
#include <stdio.h>
#endif
extern int errno;
/*--- Function Name: PTObject::PTObject. Description: Constructor using 
 *--- default thread scheduling (Round-robin) and priority (medium). 
 *--- Returns: None ----*/
PTObject::PTObject()
{
 _validFlag = FALSE;
 _tid = 0; // id assigned when thread is created
 exitStatus = 0; // initial thread return code
 _validFlag = TRUE;
}
/*--- Function Name: PTObject::~PTObject. Description: Destructor. Free 
 *--- resources allocated to PThread. Returns: None ---*/
PTObject::~PTObject()
{
 pthread_cancel (_hPThread); // issue a cancel message to thread
 pthread_detach (&_hPThread); // free resources of cancelled thread
}
/*---- Function Name: PTObject::isValid. Description: Return private variable
 *---- _validFlag. The variable indicates the state of the object, whether it 
 *---- is valid or not. Returns: TRUE or FALSE ---*/
int
PTObject::isValid()
{
 return _validFlag;
}
/*--- Function Name: PTObject::join. Description: join() causes the calling 
 *--- thread to wait for the thread object to complete. See pthread_join() in 
 *--- DCE Dev. Ref. When the thread is complete, the thread's return code is 
 *--- stored in a public variable: exitStatus. Returns: 0, success; -1, Error.

 *--- Check errno. ---*/ 
int
PTObject::join ()
{ 
 pthread_t threadID = pthread_self();
 int uniqueID = pthread_getunique_np (&threadID);
 if (uniqueID == tid())
 {
 printf ("TID %d: Can't join thread to itself.\n", uniqueID);
 return -1;
 }
 return pthread_join (_hPThread, &exitStatus);
}
/*--- Function Name: PTObject::start. Description: Explicitly starts the 
 *--- thread. Actually, thread creation is performed here as well. If thread 
 *--- were created in the constructor, thread may start before a derived class
 *--- had a chance to complete its constuctor which would lead to 
 *--- initialization problems. Returns: 0, success; -1, fail (errno = 
 *--- EAGAIN ENOMEM) ---*/ 
int 

PTObject::start()
{
 // Create a thread using default schedule & priority. Also, pass in *this 
 // ptr for argument to associate thread with an instance of this object.
 int status = pthread_create (&_hPThread, pthread_attr_default,
 (pthread_startroutine_t)&PTObject::start_routine,
 (void *)this);
 if (status == 0)
 _tid = pthread_getunique_np (&_hPThread);
 return status;
}
/*--- Function Name: PTObject::start_routine. Description: Static function is 
 *--- passed into pthread_create. It is the start of the thread routine. In 
 *--- turn, it calls the virtual function runThread() which is written by the 
 *--- user of this class. Returns: None ---*/ 
pthread_addr_t 
PTObject::start_routine (void *obj)
{
 // get object instance
 PTObject *threadObj = (PTObject *)obj;
 int status = threadObj->runPThread();
 return (pthread_addr_t)status;
}
/*--- Function Name: PTMutex::PTMutex. Description: Constructor creates a 
 *--- mutex with a default attribute (fast mutex). Returns: None ---*/
PTMutex::PTMutex()
{
 _validFlag = FALSE;
 int status = pthread_mutex_init (&_hMutex, pthread_mutexattr_default);
 if (status == -1)
 return;
 _validFlag = TRUE;
}
/*--- Function Name: PTMutex::~PTMutex. Description: Destructor destroys
mutex.
 *--- Assumes mutex is unlocked. DCE doesn't provide a direct way to determine
 *--- the state of a mutex. In case of failure, use assert() macro. 
 *--- Returns: None ---*/ 
PTMutex::~PTMutex()
{
 // assumes mutex is unlocked. DCE doesn't provide a direct way to determine
 // the state of a lock so I'll just try to destroy it without any checks.
 // I'm using a long name for the return value so that
 // "assert" macro is self documenting.
 int ipthread_mutex_destroy = pthread_mutex_destroy (&_hMutex);
#ifndef NDEBUG
 pthread_t threadID = pthread_self();
 int tid = pthread_getunique_np (&threadID);
 if (ipthread_mutex_destroy == -1)
 printf ("TID %d: Could not destroy mutex. errno=%d\n", tid, errno);
 assert (ipthread_mutex_destroy == 0);
#endif 
}
/*--- Function Name: PTMutex::isValid. Description: Used to determine if 
 *--- object has been constructed successfully. Returns: TRUE or FALSE ---*/ 
int
PTMutex::isValid()
{
 return _validFlag;
}

/*--- Function Name: PTMutex::lock. Description: Lock this mutex. If mutex is 
 *--- already locked, wait for it to become available. Returns: 0, success;
 *--- -1, fail (errno = EINVAL or EDEADLK) ---*/ 
int
PTMutex::lock()
{
 return pthread_mutex_lock (&_hMutex); 
}
/*--- Function Name: PTMutex::trylock. Description: Try and lock this mutex. 
 *--- If mutex is already locked, do not wait for it to become available. 
 *--- Just return. Returns: 1, success; 0, mutex already locked; -1, fail, 
 *--- mutex handle invalid ---*/ 
int
PTMutex::tryLock()
{
 return pthread_mutex_trylock (&_hMutex);
}
/*--- Function Name: PTMutex::unlock. Description: Unlock this mutex.
 *--- Returns: 0, success; -1, fail, invalid mutex handle ---*/ 
int
PTMutex::unlock()
{
 return pthread_mutex_unlock (&_hMutex); 
}
/*--- Function Name: PTCondVar::PTCondVar. Description: Constructor creates a 
 *--- condition variable. Returns: None ---*/ 
PTCondVar::PTCondVar()
{
 _validFlag = FALSE;
 int status = pthread_cond_init (&_hCondVar, pthread_condattr_default);
 if (status == -1)
 return; // errno = EAGAIN or ENOMEM
 _validFlag = TRUE;
 return;
}
/*--- Function Name: PTCondVar::~PTCondVar. Description: Destructor destroy a 
 *--- condition variable. It can fail if the condition variable is busy. 
 *--- Returns: None --*/
PTCondVar::~PTCondVar()
{
 int ipthread_cond_destroy = pthread_cond_destroy (&_hCondVar);
#ifndef NDEBUG
 pthread_t threadID = pthread_self();
 int tid = pthread_getunique_np (&threadID);
 if (ipthread_cond_destroy == -1)
 printf ("TID %d: Could not destroy condition variable. errno=%d\n", 
 tid, errno);
 assert (ipthread_cond_destroy == 0);
#endif 
}
/*--- Function Name: PTCondVar::broadcast. Description: Wakes all threads 
 *--- waiting on a condition variable object. Calling this routine means that 
 *--- data is ready for a thread to work on. A broadcast lets one or more 
 *--- threads proceed. Returns: 0, success; -1, fail (errno = EINVAL) ---*/ 
int
PTCondVar::broadcast()
{
 return pthread_cond_broadcast (&_hCondVar);
}

/*--- Function Name: PTCondVar::isValid. Description: Used to determine if 
 *--- constructor succeeded or not. Returns: TRUE or FALSE ---*/ 
int
PTCondVar::isValid()
{
 return _validFlag;
}
/*--- Function Name: PTCondVar::signal. Description: Wakes one thread waiting 
 *--- on a condition variable. Thread to wake is determined by its scheduling 
 *--- policy. Returns: 0, on success; -1, on failure (errno = EINVAL) ---*/ 
int
PTCondVar::signal()
{
 return pthread_cond_signal (&_hCondVar);
}
/*--- Function Name: PTCondVar::timedWait. Description: Will wait on a 
 *--- mutex/condition variable until thread receives a signal or broadcast, or
 *--- until specified number of seconds have elapsed, whichever comes first.
 *--- 0 seconds means wait forever. (default) Returns: 0, success. Thread was 
 *--- signalled; 1, wait timed out. No signal; -1, wait failed. See errno
---*/
int
PTCondVar::timedWait (int seconds)
{
 int status;
 lock(); // lock this condition vars' mutex
 if (seconds <= 0)
 {
 // thread will wait here until it gets a signal
 // 0, default value, means wait forever.
 status = pthread_cond_wait (&_hCondVar, &_hMutex);
 }
 else
 {
 // wait for specified number of seconds
 // use non-portable dce routine to get absolute time from seconds.
 struct timespec delta;
 struct timespec abstime;
 
 delta.tv_sec = seconds;
 delta.tv_nsec = 0;
 
 // I'm using a long name for the return value so if "assert"
 // aborts, message is self-documenting.
 int ipthread_get_expiration_np = pthread_get_expiration_np (&delta,
&abstime);
 assert (ipthread_get_expiration_np == 0); 
 // thread will wait here until it gets a signal or
 // until abstime value is reached by system clock.
 status = pthread_cond_timedwait (&_hCondVar, &_hMutex, &abstime);
 if (status == -1 && errno == EAGAIN)
 status = 1; // lock timed-out
 }
 unlock(); // unlock internal mutex
 return status;
}
/*--- Function Name: PTSafeObject::PTSafeObject. Description: Used to make a 
 *--- class thread-safe. Derive from this class, then add PTLock (this); to 
 *--- the first line of each member function in your derived class that 
 *--- accesses data. This class forms the outer part of a thread-safe class. 
 *--- It creates and deletes a PTMutex object. The inner class (nested class 

 *--- -- PTLock) locks and unlocks a PTMutex object. Returns: None ---*/
PTSafeObject::PTSafeObject()
{
 _validFlag = FALSE;
 _pPTMutex = new PTMutex;
 if (!_pPTMutex->isValid())
 return;
 _validFlag = TRUE;
 return;
}
/*--- Function Name: PTSafeObject::~PTSafeObject. Description: Delete internal
 *--- PTMutex object. Returns: None ---*/ 
PTSafeObject::~PTSafeObject()
{
 delete _pPTMutex;
}
/*--- Function Name: PTSafeObject::pPTMutex. Description: Retrieve a copy of 
 *--- the internal PTMutex object. Returns: None ---*/ 
PTMutex *
PTSafeObject::pPTMutex() const
{
 return _pPTMutex;
}
/*--- Function Name: PTSafeObject::isValid. Description: Determine if 
 *--- constructor succeeded or not. Returns: TRUE or FALSE. ---*/ 
int
PTSafeObject::isValid()
{
 return _validFlag;
}
/*--- Function Name: PTSafeObject::PTLock::~PTSafeObject::PTLock. Description:
 *--- Destructor for class nested within PTSafeObject. It just unlocks the 
 *--- PTMutex object. The outer class, PTSafeObject, deletes it. 
 *--- Returns: None ---*/ 
PTSafeObject::PTLock::~PTLock()
{
 (void)_pPTLockMutex->unlock();
}
/*--- Function Name: PTSafeObject::PTLock::PTLock. Description: This class 
 *--- forms inner (nested) class of a thread-safe object. The object is 
 *--- responsible for locking and unlocking a PTMutex object. This constructor
 *--- locks it. A user should not instantiate this object explicitly. Pass
 *--- a "this" pointer to this function to give access to outer class' private
 *--- variables. The outer part (PTSafeObject) creates and deletes a PTMutex 
 *--- object. Returns: None ---*/ 
PTSafeObject::PTLock::PTLock(PTSafeObject *ThreadSafe)
{
 _pPTLockMutex = ThreadSafe->_pPTMutex;
 (void)_pPTLockMutex->lock();
}

Listing Three
/***** PTTF.H -- Template to create thread-safe types. *****/
/*--- TEMPLATE: PTSafeType. Description: This inherits from class
PTSafeObject.
 *--- Implemented as template, it can make a variety of types threadsafe.---*/

#ifndef PTTF_H
#define PTTF_H
#include <PTF.H>
template <class T>

class PTTSafeType : public PTSafeObject
{
public:
 void operator=(T threadSafeType) {set(threadSafeType);}
 operator T () {return get();}
 T operator ++();
 T operator --();
 T operator ++(T threadSafeType);
 T operator --(T threadSafeType);
 T get();
 T set(T threadSafeType);
private:
 T _threadSafeType;
};
#ifdef RW_COMPILE_INSTANTIATE
#include "PTTF.CC"
#endif
#endif

Listing Four
/**** PTTF.CC -- Template definition for PTTSafeType ****/
#ifndef PTTF_CC
#define PTTF_CC
/*--- Function Name: get. Description: retrieves a value safely in a threaded 
 *--- environment. Note that the value is invalid if a set() has never been 
 *--- called. Returns: value T. ---*/ 
template <class T> T PTTSafeType<T>::get()
{
 PTLock Lock(this);
 return _threadSafeType;
}
/*--- Function Name: set. Description: sets a value safely in a threaded 
 *--- environment. Returns: previous value of T. ---*/ 
template <class T> T PTTSafeType<T>::set (T threadSafeType)
{
 PTLock Lock(this);
 T previous;
 previous = _threadSafeType;
 _threadSafeType = threadSafeType;
 return previous;
}
/*--- Function Name: prefix and postfix operators: ++ and --. Description: 
 *--- Increment and decrement a value safely in a threaded environment
 *--- Returns: value of T after incrementing or decrementing. ---*/ 
template <class T> T PTTSafeType<T>::operator ++()
{
 PTLock Lock(this);
 return ++_threadSafeType;
}
template <class T> T PTTSafeType<T>::operator --()
{
 PTLock Lock(this);
 return --_threadSafeType;
}
template <class T> T PTTSafeType<T>::operator ++(T threadSafeType)
{
 PTLock Lock(this);
 return _threadSafeType++;
}

template <class T> T PTTSafeType<T>::operator --(T threadSafeType)
{
 PTLock Lock(this);
 return _threadSafeType--;
}
#endif
End Listings>>
























































A Generic Parsing Engine in C++


A portable engine for parsing using regular expressions




Todd D. Esposito and Andrew K. Johnson


Todd is a systems architect with Prodis Inc., a systems integrator and
custom-software development firm in Carol Stream, Illinois. Andrew is a
development engineer with Prodis. They can be reached at 708-462-8600 or on
CompuServe at 70661,2717.


In building applications and user/management-level tools for our clients, we
have found a recurring need for parsing technology. Whether for updating
WIN.INI or running complicated macros within a custom applications, parsing
keeps rearing its head. Our first few parsers were fitted to the content being
parsed, with little or no room for deviation, and each new project started
almost completely from scratch. 
After a few such projects, we found a better way. The result is the parsing
engine presented in this article: a generic parser, requiring no specific
knowledge of the source language, and with no ties to the underlying
application. The engine is a collection of five objects. The interface is
almost exclusively contained in two objects, and then only in a few member
functions, so it is easily integrated into any project. In this article, we
will examine the engine's design and operation, and use it to implement a
Basic-like macro language.


A Structural Overview of the Parsing Engine


The parsing engine is generic in that it contains no application- or
language-related code. The parser's five classes are each fitted to a
particular aspect of the parsing strategy. The relationships between these
objects are outlined in Figure 1. The application using the engine typically
will create a gpFlowControl object and feed it the parameters needed to
construct Syntax objects, which encapsulate the language constructs. This
initializes the engine. The gpFlowControl object creates all the other objects
and converts the supplied parameters to the appropriate object type. The
application then sends input lines to the engine. The engine returns tokens,
telling the application the meaning of the input, and extracts any parameters
from the input.
Starting from the top, we encounter the gpFlowControl class, which acts as
traffic cop, ensuring that the correct code is executed in the proper order.
Most of your application's calls will be to this object, which then passes
control to its embedded gpParser. On the way back to your application, the
gpFlowControl object takes a peek at the token the gpParser returned and
decides if it needs to intercede. This will happen in the case of an
If/Then/Else or other control structure.
Each possible command in your macro language must be paired with a "token,"
which is just a more-convenient representation of the command. The gpParser
class substitutes tokens for the associated input patterns it receives.
gpParser "learns" your language by way of the AddSyntax() member function,
which takes as parameters either a string and a token, or a Syntax object.
(gpFlowControl's AddSyntax() function simply passes its parameters to the
embedded gpParser.) Syntax objects encapsulate the pairing between syntax and
token. Note the syntax/token relationship is not necessarily one-to-one: One
token can represent many different syntactic constructs.
The gpRegExp class implements regular expressions (string-matching templates),
and is derived from gpString. gpRegExp overloads two key functions: the
constructor and the equality operator (==).
The gpString class handles character-string manipulation and provides
overloaded operators to make using strings easier (see Listing One). We built
this class before the ANSI string class was sanctioned, and we still prefer
ours to the ANSI version. It should be a simple matter to retrofit ANSI
strings into the system. However, since we use our gpString class in all of
our projects (rather than the ANSI class), this has not been a priority. 


The Macro Language


Microsoft has popularized the use of Basic as a macro language for
applications. While we won't comment on the wisdom of that practice, the Basic
syntax structure provides a reasonable example of how to use our parsing
engine. The result is the shell of a full-fledged macro language, ready to be
embedded into an application. Due to space constraints, our discussion will
focus on the operation of the parsing engine, leaving the implementation of
the Basic constructs (such as variable typing and storage and retrieval) for
you to explore. 
Before we begin, we need to design and analyze our Basic syntax. The simplest
macro selects a menu or invokes an application process; see Example 1(a).
Example 1(b) shows how to call a function in Basic. Example 1(c) shows a more
complex macro demonstrating conditional execution, and Example 1(d) uses a
loop to extend Example 1(c), allowing the user to try again.


Regular Expressions Unveiled


Regular expressions are tremendously powerful. They help propel the UNIX
shells (sh, csh, ksh) and text-processing utilities (sed, grep, awk, perl, and
so on) to an unparalleled level of sophistication. Regular expressions, as
anyone who has worked with UNIX knows, are string templates. A regular
expression matches a string if the string's content fits into the regular
expression's template. The DOS command line, DIR *.BAT, for example, uses a
limited form of regular expressions. In this case, "*.BAT" is the regular
expression, and the names of the files are the content being matched.
Normally, a regular expression is not position sensitive and will match a line
if any substring of the line matches. For example, the regular expressions
"Goodbye," "Cruel" and "World!" will each match "Goodbye Cruel World!" This is
useful, since we don't have to know a string's entire content to "find" it.
Table 1 provides a description of various wildcards.
The parsing engine makes constant use of regular expressions. In fact, most of
its execution time is spent inside a member function of the gpRegExp class. We
implemented regular-expression matching in two steps: decomposition in the
gpRegExp class's constructor, and pattern matching in the overloaded equality
operator. When you create a gpRegExp object, it breaks down each regular
expression into several smaller atomic expressions, each of which is
classified. Figure 2 depicts how a sample regular expression is decomposed.
This process creates a gpRegExp object for each atom in the expression,
arranged in a typical linked list. A private member function, ParseAtoms(),
serves as a common initializer for several different overloads of the
constructor; see Listing Two. ParseAtoms() performs the actual decomposition
of the expression in an almost recursive manner.
ParseAtoms() removes characters from its input and stores them in its internal
string. It does this one character at a time, until the end of the string or
some special character is reached (with two exceptions, "^" and "&," which are
explained later). Once ParseAtoms() reaches a special character, it cuts the
input string into two pieces: One, it retains and classifies; the other, it
uses to construct the next atom, which will in turn go through the same
process.
Classification of the atom is all-important, because it drives the
pattern-matching engine. ParseAtoms() classifies each atom for content. These
classifications are defined by the enumerated type ExpType_T. For example,
Literal is a string literal that must match exactly, Wild is a wildcard
matching any single character, and so on. Additionally, ParseAtoms() will
classify an atom in terms of positional requirement, based on the "^" and "$"
special characters, and will mark gpRegExp as meaningful if the "&" character
precedes it. (Meaningful, in this sense, indicates the matched portion should
be saved for later retrieval, because the calling application will need to
know exactly what was matched.)
The second parameter, called "top," is a curious detail. Top controls whether
the gpRegExp object is marked as a top-level (or root) atom in the linked
list, and determines whether or not to handle the "^" special character.
Normally, this parameter is not supplied, except that ParseAtoms() always sets
it to 0 (no) when constructing the next atom. It defaults to 1 (yes), so the
programmer need not think about it. In fact, it is important not to change
this parameter.
Once ParseAtoms() has completed decomposition, the gpRegExp object is ready.
Its primary purpose is to match things with the overloaded equality operator.
The process starts at the root and travels down the list of atoms. At each
level, the gpRegExp object tries to match its input. If it succeeds, it passes
what remains unmatched to the next atom. If that atom returns failure, so does
the caller. Thus, the root can only succeed if all of the atoms in the list
succeed.
Matching is based on the atom's ExpType_T, as determined at construction time.
The rules for the types Literal, Wild, Range, and Optional are simple; the
Multi-0 and Multi-1 variants, however, bear investigation. Both of these
attempt to match as much of the input as possible. If the next atom returns
failure, the object will step back one character in the match, and try again.
This process ends when the next atom returns success, or the Multi atom can no
longer back up because too few characters were matched. Note that the Multi-0
types, which specify zero or more occurrences of a given pattern, will match
an empty string, whereas the Multi-1 types consume at least one character.


The Parser Itself


The parser learns about the target language by means of a syntax table, which
is a list of Syntax objects. These Syntax objects each contain a gpRegExp and
a token. The gpRegExp defines the syntax of the command. The token indicates
which command it represents and is usually put through a switch construct by
the application to dispatch the proper handler, much like a WM_ message in the
Windows WndProc function.
The gpParser class takes all of the credit without doing any of the work. The
constructor merely stores the SyntaxList that the caller passed in. The
Parse() function does little more than pass its parameter to the SyntaxList's
Seek() function; see Listing Three. Seek() simply walks through the list of
Syntax pattern objects, asking each if it is equal to the parameter. The
Seek() function doesn't know that regular-expression matching is going on,
since it just uses the equality operator. If Seek() returns success, Parse()
returns the token associated with the matching syntax pattern. This portion of
the code shows the true power of overloaded operators in C++.


Constructing the Language



So, how do we use this to construct a macro language? First, we must construct
our syntax table. In our example, the DoCmd command takes only one form; see
Example 2.
This regular expression indicates we're looking for a line beginning with the
literal "DoCmd DoMenuItem" and we want to capture the three possible
comma-separated parameters. We also have to assign this command a token, so we
choose to define this to be TK_DOMENUITEM, and give it an arbitrary value of
(TK_USERDEF + 1) with a #define. We use TK_USERDEF as a base point, so we
don't conflict with any predefined tokens.
We continue this process for each command available in our language. When
finished, we have produced the code in Listings Four and Five. One word of
caution: The most-specific syntax patterns must precede the more-general
patterns so that a general pattern (such as Call +&.*(&.*)) does not get
matched when a more-specific one (such as Call +MessageBox *(&.*)) was called
for.
Listing Six, syntoken.h, contains definitions for tokens such as TK_IF and
TK_USERDEF. In most cases, your application will have to handle predefined and
user-defined tokens. Table 2 lists the predefined tokens and what they
represent.
After calling the BuildMacroEngine() function in Listing Five, read lines from
the input file and pass them to the gpFlowControl object that
BuildMacroEngine() returned. Do this by calling its Parse() function, sending
in the line to parse and an empty StringList, which Parse() will fill with the
command's parameter before returning. Parse() will return the token associated
with this command.
Handling expressions, such as those between If and Then, is a simple matter of
constructing a gpParser object that knows the rules of the expression's
grammar. This is demonstrated in the BuildExpEngine() and EvalExp() functions
in Listing Five. The gpFlowControl::Parse() function returns TK_IF, TK_WHILE,
or a similar token when an expression has to be evaluated before flow control
can be determined. In this case, the expression should be in the first element
of StringList, and can be passed directly to EvalExp(), which passes it into
the gpParser::Parse() function. Breaking down expression syntax adequately can
make this process quite powerful. Once gpParser::Parse() establishes a truth
value for the expression, the application calls the
gpFlowControl::PostExpressionValue() function. This sets the state of the
gpFlowControl object and determines how the flow-control construct will be
handled.


Conclusion


The parsing engine presented provides a very powerful and generalized
solution. We have used it in products from simple utilities to
embedded-application scripting languages. Its implementation in C++ makes good
use of inheritance and polymorphism. We have attempted to keep the engine pure
so that it will port easily. In fact, we used it under DOS, Windows 3.x,
Win32s, Windows NT, and OS/2 platforms (in all cases, using the Borland C++
compiler).
However, the engine does have some limitations. The gpRegExp class does not
implement all of the UNIX-style, regular-expression special characters and
functionality. We have designed the engine with this in mind, but have yet
completed this work. And since our gpRegExp class makes all comparisons case
insensitive, operations such as case-sensitive variable naming may not be
supportable. Also, the gpFlowControl object does not handle flow-control
constructs such as Switch/Case and For/Next. These would be relatively easy to
add.
Example 1: (a) Basic statement that invokes a menu command; (b) Basic function
calls; (c) typical If/Then/Else construct; (d) While loop.
(a)
DoCmd DoMenuItem FILE, OPEN, "C:\DIRNAME\FILENAME.EXT"

(b)
Call MessageBox ("Invalid Type Code!", 0, "Error")Call RecalcBalances ()

(c)
If FileName$ > "" Then DoCmd DoMenuItem FILE, OPEN, FileName$Else Call
MessageBox ("You must type a file name!", 0, "Error")Endif

(d)
Let GotFile = 0While GotFile = 0 Call GetFileName (FileName$) If FileName$ >
"" Then DoCmd DoMenuItem FILE, OPEN, FileName$ Call RecalcBalances () Let
GotFile = 1 Else Call MessageBox ("You must type a file name!", 0, "Error")
EndifWEnd
Example 2: A syntax rule specified via a regular expression, consisting of a
string literal followed by three comma-separated parameters.
DoCmd *DoMenuItem *&[A-Z]+ *, *&[A-Z]+ *, *&.*
Figure 1: Structural overview of the parsing engine.
Figure 2: Decomposing the regular expression ^_*If +&.+ +Then$. This
implements the expression If <<expression>> Then.
Table 1: Wildcards and their use.
Wildcard Description
. Period matches any one character. Using this character in
 the expression "HEL." will match "HELP," "HELL," "HELD,"
 and so on. The ExpType_T value
 Wild represents this wildcard.

* Asterisk matches zero or more of the previous character.
 This is useful if you don't know how many spaces are
 between two words, such as in: "Goodbye Cruel World!" or
 "GoodbyeCruel World!" The regular expression to match
 this would be "Goodbye *Cruel *World!" Asterisk can be
 used to create MultiWild0, MultiChar0,
 and MultiRange0 when paired with a period,
 regular character, or Range operator, respectively.

+ Plus matches one or more of the previous characters. This
 is useful if you don't know how many spaces are between
 two words, but know there must be at least one, such as
 in: "Goodbye Cruel World!" or "Goodbye Cruel World!"
 The regular expression to match this would be
 "Goodbye+Cruel+World!". Note that this would not match
 "GoodbyeCruel World!". Plus can be used to create
 MultiWild1, MultiChar1, and
 MultiRange1 when paired with a period,
 regular character, or Range operator, respectively.

[x-y] The Range operator pair, "[" and "]" provide for
 matching a specific character or range of characters. For
 example, if you are looking for "ABC" or "BBC" but not

 "CBC" or any other, you would use "[A-B]BC" as your
 regular expression. Range defines the ExpType_T
 for this construct.

{ s } The Optional operator pair, "{" and "}" allow you to
 specify a string that may or may not be there, such as in
 "Goodbye {Cruel} World!", which matches "Goodbye Cruel
 World!" and "Goodbye World!" Braces result in an
 ExpType_T of Optional.

& Ampersand indicates the atom to follow is meaningful, and
 should be retained. A parsing application uses ampersand
 to extract parameters from commands. For example, if we
 need to know the name of the file passed to the File Open
 command, our regular expression is "File Open &.+"
 Whatever is matched by the ".+" atom is saved, and will
 be passed back from gpFlowControl::Parse()
 in the lsParms List object. When using
 the gpParser object directly, you can get
 these parameters with a call to
 gpParser::DumpParameters().

^ Caret matches beginning of line. If this character is the
 first character of the regular expression, it makes the
 expression position sensitive. "^Good" would match
 "Goodbye Cruel World!" but "^Cruel" and "^World" would
 not. If caret occurs anywhere else in the regular
 expression, it is treated as a literal (technically it
 should be quoted, but we're a bit more relaxed than UNIX).

$ Dollar matches end of line. Like caret, a dollar as the
 last character makes the expression position sensitive.
 So, "Good$" and "Cruel$" would not match "Goodbye Cruel
 World!" but "rld!$" would. Unlike caret, it cannot appear
 unquoted anywhere in the regular expression except at the
 end. Dollar effectively ends the regular expression,
 making any characters following it drop into the bit
 bucket.

\ Backslash is the "quote" character. If you need to find a
 special character in a file, such as when looking for
 "$10,000,000 Winner" in your e-mail, you have to quote
 the $ so that gpRegExp doesn't try to
 interpret it. A dollar in the first position of an
 Expression won't find the match. In this case, you would
 use "\$10,000,000 Winner" as your regular expression.
Table 2: Predefined tokens and their uses.
Token Description
TK_UNRECOGNIZED No match was found, meaning that a syntax
 error has occurred in the input stream.
TK_NOOP No-Operation. This is a comment, or no
 action should be taken because we're inside
 a not-taken Then or Else clause or a failed
 While loop.
TK_REWIND Rewind the input stream (loop back to a 
 predetermined point). The line number is in
 the first parameter.
TK_IF Part of an IF structure was found. Normally,
TK_ELSE only TK_IF requires action: An expression

TK_ENDIF needs to be evaluated. The parameter list
TK_LABEL contains the expression. TK_LABEL is usually
TK_GOTO treated as a NOOP, but your application may
 want to cache this location for faster
 rewinding later. When TK_GOTO is encountered,
 the label should be in the parameter list.
TK_WHILE TK_WHILE begins a While loop. The expression
TK_ENDWHILE needing evaluation is in the parameter list.
 TK_ENDWHILE should be treated like TK_REWIND.
TK_COMMENT A comment was encountered. This token is
 included for convenience.
TK_EQUALS These tokens are used in evaluating
TK_NOT_EQUAL comparative expressions and are included
TK_GREATER_THAN for convenience.
TK_LESS_THAN
TK_GREATER_OR_EQUAL
TK_LESS_OR_EQUAL
TK_AND These tokens are used in evaluating logical
TK_OR expressions and are included for convenience.
TK_NOT
TK_UNMATCHED_ELSE These tokens are returned whenever a
TK_UNMATCHED_ENDIF TK_ELSE, TK_ENDIF or TK_ENDWHILE is
 encountered, and it is not matched
 with a corresponding TK_IF or TK_WHILE.
TK_USERDEF Acts as the base point for application-
 specific verbs.

Listing One
//------------------------------------------------------------------
// gpString.h - Declaration of the gpString class.
// Copyright 1994 Prodis Incorporated.
// Architect: TDE
// Developer: AKJ
//------------------------------------------------------------------
#ifndef GPSTRING_H
#define GPSTRING_H
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#define NPOS 32000
typedef unsigned size_t;
// StripType is used with ::Strip()
enum StripType { Trailing, Leading, Both, All };
// Direction and Inclusive are used with ::FindString() and ::FindChar()
enum Direction { Forward, Backward };
enum Inclusive { Of, NotOf };
class gpString
 {
 protected :
 char *cText;
 size_t nSize;
 int CharIsOfString(char cChar, char *cBuffer);
 public :
 // Constructors:
 gpString (const char *cString);
 gpString (char cChar);
 gpString (size_t nNewSize, char cFill = ' ');
 gpString ( );
 gpString (gpString &sString);

 // Destructor:
 ~gpString ( );
 // Size-related functions:
 size_t Length ( ) {return strlen(cText);}
 void Resize (size_t nNew);
 size_t Size ( ) {return nSize;}
 // Case Conversion - returns a copy:
 gpString ToUpper ( );
 gpString ToLower ( );
 // Assignment Operators:
 // Return the character in the string at a given offset.
 char &operator[] (int nPos) {return *(cText+nPos);}
 // Type conversion to char *.
 operator char *( );
 // Assign one string to another.
 gpString &operator= (gpString &oString);
 // Append another string to this one.
 gpString &operator+= (gpString &oString);
 // Concatenate two strings.
 gpString operator+ (gpString &oString);
 // Relational operators:
 // Compare two strings for equality.
 virtual int operator== (gpString &oString);
 virtual int operator== (char *cString)
 {return operator==(gpString(cString));}
 // Compare two strings for inequality.
 int operator!= (gpString &oString); 
 int operator!= (char *cString) 
 {return operator!=(gpString(cString));}
 // Compare two strings for exclusive alphanumeric precedence. 
 int operator< (gpString &oString); 
 int operator< (char *cString)
 {return operator<(gpString(cString));}
 // Compare two strings for exclusive alphanumeric antecedence.
 int operator> (gpString &oString);
 int operator> (char *cString) 
 {return operator>(gpString(cString));}
 // Compare two strings for inclusive alphanumeric precedence. 
 int operator<= (gpString &oString);
 int operator<= (char *cString) 
 {return operator<=(gpString(cString));}
 // Compare two strings for inclusive alphanumeric antecedence.
 int operator>= (gpString &oString);
 int operator>= (char *cString) 
 {return operator>=(gpString(cString));}
 // Search Functions:
 // Find the target substring within this string.
 size_t FindSubstring (gpString &sTarget
 , Direction dDirect = Forward, size_t nStart = 0);
 // Find any character of the target string.
 size_t FindChar (Inclusive iIsOf, gpString &sTarget
 , Direction dDirect = Forward, size_t nStart = 0);
 // Edit Routines:
 // Insert the new string at the given position.
 gpString &Insert (gpString &sNew, size_t nPos);
 // Remove nSize characters starting with position nPos.
 gpString &Remove (size_t nPos, size_t nSize = NPOS);
 // Replace nSize characters with the new string.
 gpString &Replace (size_t nPos, size_t nSize, gpString &sNew);

 // Remove the given characters at the given positions.
 // We default to stripping leading and trailing whitespace.
 gpString &Strip (StripType s = Both
 , gpString &sTarget = gpString (" \t"));
 }; 
#endif

Listing Two
//------------------------------------------------------------------
// gpRegExp.cpp - Definition of the gpRegExp class.
// Copyright 1994 Prodis Incorporated.
// Purpose: The General Purpose Regular Expression class handles
// pattern matching duties.
// Architect: TDE
// Developer: AKJ
// Modification History:
// 09/08/94 TDE: Original Code.
// 09/09/94 AKJ: Fixed TDE's stuff, and fleshed out functions.
// 01/17/95 AKJ: Added support for +.
// 02/22/95 AKJ: Minor fix to Opional to allow fallback.
//------------------------------------------------------------------
#include <stdlib.h>
#include <gpregexp\gpregexp.h>
#include <gpstring\gpslist.h>
#define LOWER_BOUND 1
#define UPPER_BOUND 3
// Create a gpRegExp object from a character string.
gpRegExp::gpRegExp (const char *cNewText)
 : gpString (cNewText)
 {
 nDoICount = 0;
 fTopLevel = top;
 NextAtom = 0;
 ParseAtoms ();
 }
gpRegExp::gpRegExp (char cChar, int top)
 : gpString (cChar)
 {
 nDoICount = 0;
 NextAtom = 0; 
 fTopLevel = top;
 ParseAtoms ();
 }
gpRegExp::gpRegExp (int top) : gpString ( )
 {
 nDoICount = 0;
 NextAtom = 0; 
 fTopLevel = top;
 ParseAtoms ();
 }
gpRegExp::gpRegExp (const gpString &s, int top)
 : gpString (s)
 {
 nDoICount = 0;
 NextAtom = 0; 
 fTopLevel = top;
 ParseAtoms ();
 }
gpRegExp::~gpRegExp ( )

 {
 if (NextAtom) delete NextAtom;
 }
gpRegExp &gpRegExp::operator= (gpString &oString)
 {
 if (NextAtom) delete NextAtom;
 NextAtom = 0;
 fTopLevel = 1;
 (*this) = oString;
 ParseAtoms ();
 
 return *this;
 }
gpRegExp &gpRegExp::operator= (char *cString)
 {
 if (NextAtom) delete NextAtom;
 (*this) = (cString);
 NextAtom = 0;
 fTopLevel = 1;
 ParseAtoms ();
 
 return *this;
 }
//-------------------------------------------------------------
void gpRegExp::ParseAtoms ( )
 {
 int nPos = 0;
 int GotToken = 0;
 gpString *copy;
 ExpType = Literal;
 size_t nOffset;
 
 // Release the previous atoms and reset parameters.
 // This is ususally only for when an assign is done.
 firstOnly = 1;
 lastOnly = 0;
 if (fTopLevel)
 {
 // First, optimize by removing extraneous '.*'s 
 if (FindSubstring ("^.*") == 0)
 {
 firstOnly = 0;
 Remove (0, 3);
 } 
 else if (FindSubstring (".*") == 0)
 {
 firstOnly = 0;
 Remove (0, 2);
 }
 // Next, check for "Beginning-of-Line"
 if (*this[0] == '^')
 Remove (0, 1);
 else
 firstOnly = 0;
 }
 // Strip out the first atom in the string.
 copy = new gpString (cText);
 while (nPos < Length() && ! GotToken)
 {

 switch ((*this)[nPos])
 {
 case '\\': // We have to Quote the next
 Remove (nPos, 1); // character, so remove the
 copy->Remove (nPos, 1); // slash and get the char.
 nPos++;
 break;
 
 case '.': // if we get a '.'
 if ((nPos) == 0) // and it's the first char
 if ((*this)[1] == '*') // and it's followed by a '*'
 {
 Remove (2); // Then we have a '.*' token.
 copy->Remove (0, 2);
 GotToken = 1;
 ExpType = MultiWild0;
 }
 else if ((*this)[1] == '+') // if followed by '+'
 {
 Remove (2); // Then we have a '.+' token.
 copy->Remove (0, 2);
 GotToken = 1;
 ExpType = MultiWild1;
 }
 else
 {
 Remove (1); // we have a plain
 copy->Remove (0, 1); // old '.' token.
 GotToken = 1;
 ExpType = Wild;
 }
 else
 {
 Remove (nPos); // we have a literal token.
 copy->Remove (0, nPos);
 GotToken = 1;
 ExpType = Literal;
 }
 break;
 case '*': // if we get a '*'
 if (nPos == 1) // and it's the second character
 {
 Remove (2); // The we have a <char>* token.
 copy->Remove (0, 2);
 GotToken = 1;
 ExpType = MultiChar0;
 }
 else
 {
 Remove (nPos - 1); // Or, we have a literal token.
 copy->Remove (0, nPos - 1);
 GotToken = 1;
 ExpType = Literal;
 }
 break;
 case '+': // if we get a '+'
 if (nPos == 1) // and it's the second character
 {
 Remove (2); // The we have a <char>+ token.

 copy->Remove (0, 2);
 GotToken = 1;
 ExpType = MultiChar1;
 }
 else
 {
 Remove (nPos - 1); // Or, we have a literal token.
 copy->Remove (0, nPos - 1);
 GotToken = 1;
 ExpType = Literal;
 }
 break;
 case '$': 
 Remove (nPos); // the buck stops here.
 copy->Remove (0); // And we won't have any kids.
 lastOnly = 1;
 GotToken = 1;
 ExpType = Literal;
 break;
 case '[': // if we get a '['
 if ((nPos) > 0) // and it's NOT the first char
 {
 Remove (nPos); // we have a literal
 copy->Remove (0, nPos);
 ExpType = Literal;
 GotToken = 1;
 } 
 else // or we are beginning a range.
 nPos++;
 break;
 case ']': // when we get ']'
 if ((*this)[nPos + 1] == '*') // we may have [...]*
 {
 Remove (nPos + 2);
 copy->Remove (0, nPos + 2);
 GotToken = 1;
 ExpType = MultiRange0;
 }
 else if ((*this)[nPos + 1] == '+') // we may have [...]+
 {
 Remove (nPos + 2);
 copy->Remove (0, nPos + 2);
 GotToken = 1;
 ExpType = MultiRange1;
 }
 else 
 {
 Remove (nPos + 1); // or just plain old [...]
 copy->Remove (0, nPos + 1);
 GotToken = 1;
 ExpType = Range;
 }
 break;
 case '{': // we have a brace
 if ((nPos) > 0) // and it's NOT the first char
 {
 Remove (nPos); // we have a literal
 copy->Remove (0, nPos);
 ExpType = Literal;

 GotToken = 1;
 } 
 else // or we are beginning an 
 // Optional expression
 {
 nOffset = FindChar (Of, "}");
 copy->Remove (0, nOffset+1);
 Remove (nOffset);
 Remove(0, 1);
 
 while ( (nOffset = FindChar (Of, "")) != NPOS )
 {
 (*this)[nOffset] = '\0';
 lrChildren.AddItem (
 new gpRegExp (*this, 0) );
 Remove (0, nOffset+1); 
 }
 lrChildren.AddItem (
 new gpRegExp (*this, 0) );
 GotToken = 1;
 ExpType = Optional;
 }
 break;
 case '&' : // if we get ampersand
 if (nPos > 0) // and we've already got an atom
 {
 GotToken = 1; // then we stop where we are
 ExpType = Literal;
 Remove (nPos);
 copy->Remove (0, nPos);
 }
 else
 { // otherwise, we are starting
 nDoICount = 1; // a meaningful atom.
 Remove (0,1);
 copy->Remove (0,1);
 }
 break; 
 default: 
 // Just copy the character.
 nPos++;
 }
 }
 // Pass the rest along to the next atom.
 if (GotToken && (*copy != ""))
 NextAtom = new gpRegExp (*copy, 0);
 // Flag this guy as NOT top level.
 
 if (copy)
 delete copy; 
 }
//------------------------------------------------------------------
// Routine : operator== (gpString)
// Function : sees if the given regular expression matches this gpString. 
// Notes : Normally, the gpString may be longer than the regular
// expression; the comparison ends with the last character
// in that expression. However, if the last character of
// the expression is '$', then an exact match is called
// for, and the gpString may not have extra characters.

//------------------------------------------------------------------
int gpRegExp::operator==(gpString &sExpress)
 {
 int nMin = 0;
 int lMatch = 0; // Assume that the match will fail.
 int nPos;
 gpString sBuffer;
 char cBuffer;
 gpString sStringSaver;
 
 if (fTopLevel)
 sStringSaver = sExpress;
 sLastMatch = "";
 switch (ExpType)
 {
 case Literal : 
 nPos = sExpress.FindSubstring (cText);
 if ((firstOnly && !nPos)(!firstOnly && (nPos != NPOS)))
 {
 sLastMatch = cText;
 sExpress.Remove (0, nPos);
 sExpress.Remove (0, Length () );
 lMatch = match_remainder (sExpress);
 }
 break;
 case MultiChar1 :
 nMin = 1; // set our min chars to 1 and fall thru
 case MultiChar0 :
 nPos = sExpress.FindChar(NotOf,(*this)[0]);
 if (nPos == NPOS)
 nPos = sExpress.Length ();
 lMatch = DecrementingMatch(nMin, nPos, sExpress); 
 break;
 case Wild : 
 if (sExpress.Length () )
 {
 sLastMatch = sExpress[0];
 lMatch = match_remainder(sExpress.Remove (0, 1) );
 } 
 break;
 case MultiWild1 : 
 nMin = 1; // set our min chars to 1 and fall thru 
 case MultiWild0 : 
 nPos = sExpress.Length ();
 lMatch = DecrementingMatch(nMin, nPos, sExpress); 
 break;
 case Range : 
 cBuffer = sExpress[0];
 cBuffer = toupper (cBuffer); 
 if ( (cBuffer >= toupper(cText[LOWER_BOUND])) && 
 (cBuffer <= toupper(cText[UPPER_BOUND])))
 {
 lMatch = match_remainder(sExpress.Remove (0,1));
 sLastMatch = cBuffer;
 } 
 break;
 case MultiRange1 : 
 nMin = 1; // set our min chars to 1 and fall thru
 case MultiRange0 : 

 for (nPos = 0; 
 (toupper(sExpress[nPos]) >= 
 toupper(cText[LOWER_BOUND])) && 
 (toupper(sExpress[nPos]) <= 
 toupper(cText[UPPER_BOUND]));
 nPos++
 );
 lMatch = DecrementingMatch(nMin, nPos, sExpress); 
 break; 
 case Optional :
 { 
 gpString sBuffer(sExpress);
 if (lrChildren.Seek (sExpress))
 sLastMatch = lrChildren.Peek()->LastMatch();
 if ((lMatch = match_remainder (sExpress)) == 0)
 {
 sLastMatch = "";
 lMatch = match_remainder (sBuffer);
 }
 } 
 }
 if (fTopLevel)
 sExpress = sStringSaver;
 return (lMatch); 
 }
// These helpers will keep trying to match a Multi-type atom.
// First, try to match as much as possible, then try to match
// the next atom. If that atom succeeds, good.
// If not, we need to decrement our match string by one character
// and retry. We do this until we have reached our minimum chars.
int gpRegExp::DecrementingMatch(int nMin, int nPos, gpString &sExpress)
 {
 int lMatch = 0;
 gpString sBuffer;
 
 for (; !lMatch && (nPos >= nMin); nPos--)
 { 
 sBuffer = sExpress;
 sBuffer.Remove(0, nPos);
 lMatch = match_remainder(sBuffer);
 } 
 if (lMatch)
 {
 sLastMatch = sExpress;
 sLastMatch.Remove(++nPos);
 }
 return lMatch; 
 }
int gpRegExp::match_remainder (gpString &sExpress)
 {
 int lMatch = 1;
 if (lastOnly)
 {
 if (sExpress.Length () )
 lMatch = 0;
 }
 else
 { 
 if (NextAtom)

 lMatch = ((*NextAtom) == sExpress);
 } 
 return lMatch; 
 }
//------------------------------------------------------------------
// Routine : DumpParameters
// Function : travers atom tree, creating list of meaningful parameters.
//------------------------------------------------------------------
void gpRegExp::DumpParameters (StringList &lsParms)
 {
 if (nDoICount)
 lsParms.AddItem (new gpString (LastMatch () )); 
 if (NextAtom)
 NextAtom->DumpParameters (lsParms);
 }
//------------------------------------------------------------------
// The following code implements a List of gpRegExp's
// We keep it here because the Optional atoms use it.
//------------------------------------------------------------------
RegExpList::RegExpList ( ): List ()
 {
 pFirst = pLast = pCurrent = 0;
 }
RegExpList::~RegExpList ( )
 {
 for (gpRegExp *eRegExp = Reset ();
 eRegExp;
 eRegExp = GetNext ()
 )
 delete eRegExp; 
 }
gpRegExp *RegExpList::Seek (gpString &sName)
 {
 int lFound = 0;
 Reset ();
 while (!lFound && pCurrent)
 {
 if (*(Peek ()) == sName )
 lFound = 1;
 else
 GetNext ();
 }
 return pCurrent;
 }
gpRegExp *RegExpList::Seek (char *cName)
 {
 gpString sName (cName);
 return (Seek (sName) );
 }
gpRegExp *RegExpList::Reset ( ) 
 {
 return (pFirst = pCurrent = (gpRegExp *)List::Reset () );
 }
gpRegExp *RegExpList::GetNext ( ) 
 {
 return (pCurrent = (gpRegExp *)List::GetNext () );
 }
gpRegExp *RegExpList::AddItem (gpRegExp *eNew) 
 {

 pLast = pCurrent = (gpRegExp *)List::AddItem (eNew);
 if (!pFirst)
 pFirst = pLast;
 return (pCurrent); 
 }
gpRegExp *RegExpList::Peek ( ) 
 {
 return pCurrent;
 } 
gpRegExp *RegExpList::Seek (int nSequence)
 {
 if (nListSize < nSequence)
 return 0;
 Reset (); 
 for (int i = 1; i < nSequence; GetNext (), i++);
 return pCurrent; 
 }
gpRegExp &RegExpList::operator[] (int nSequence)
 {
 return (*(Seek (nSequence)));
 }
void RegExpList::Clear ( )
 {
 for (gpRegExp *eRegExp = Reset ();
 eRegExp;
 eRegExp = GetNext ()
 )
 delete eRegExp;
 List::Clear ();
 pFirst = pLast = pCurrent = 0;
 } 

Listing Three
//------------------------------------------------------------------
// Syntax.cpp - Definition of the Syntax and SyntaxList classes.
// Copyright 1994 Prodis Incorporated.
// Purpose: The Syntax class pairs syntactic patterns (REs) with tokens (ints)
// Architect: AKJ
// Developer: AKJ
// Modification History:
//------------------------------------------------------------------
#include <stdlib.h>
#include <parser\syntax.h>
SyntaxList::SyntaxList ( ): List ()
 {
 pFirst = pLast = pCurrent = 0;
 }
SyntaxList::~SyntaxList ( )
 {
 for (Syntax *eSyntax = Reset ();
 eSyntax;
 eSyntax = GetNext ()
 )
 delete eSyntax; 
 }
Syntax *SyntaxList::Seek (gpString &sName)
 {
 int lFound = 0;
 Reset ();

 while (!lFound && pCurrent)
 {
 if ( (*(pCurrent->reSyntax)) == sName)
 lFound = 1;
 else
 GetNext ();
 }
 return pCurrent;
 }
Syntax *SyntaxList::Seek (char *cName)
 {
 gpString sName (cName);
 return (Seek (sName));
 }
Syntax *SyntaxList::Reset ( ) 
 {
 return (pFirst = pCurrent = (Syntax *)List::Reset () );
 }
Syntax *SyntaxList::GetNext ( ) 
 {
 return (pCurrent = (Syntax *)List::GetNext () );
 }
Syntax *SyntaxList::AddItem (Syntax *eNew) 
 {
 pLast = pCurrent = (Syntax *)List::AddItem (eNew);
 if (!pFirst)
 pFirst = pLast;
 return (pCurrent); 
 }
Syntax *SyntaxList::Peek ( ) 
 {
 return pCurrent;
 } 
//------------------------------------------------------------------
// Syntax object starts here:
Syntax::Syntax ()
 {
 reSyntax = 0;
 nToken = 0;
 }
Syntax::Syntax (gpRegExp *reExpress, int nNewToken)
 {
 reSyntax = reExpress;
 nToken = nNewToken;
 }
Syntax::Syntax (char *cExpress, int nNewToken)
 {
 reSyntax = new gpRegExp(cExpress);
 nToken = nNewToken;
 } 

Listing Four
// BasicMac.h - define tokens for the BASIC-Macro language.
// We need only include the FLOW.H file; it includes everything else we need.
#include <parser\flow.h>
// Function prototypes:
int EvaluateExpression (gpString &sExpress);
gpFlowControl *BuildMacroEngine();
gpParser *BuildExpEngine( );

// These functions are prototyped, but not actually coded. These are just for 
// demonstration purposes, and need to be written in a production system.
int GetMacroLine (gpString &sLine);
int RewindMacroSource (int nLine);
void ErrorHandler();
void SetVariableToValue (gpString &p1, gpString &p2);
void DoMenuItem (gpString &p1, gpString &p2, gpString &p3);
void MessageBox (gpString &p1, gpString &p2, gpString &p3);
void CallFunctionWithParms (gpString &p1, gpString &p2);
int CheckEquals (gpString &p1, gpString &p2);
int CheckGreater (gpString &p1, gpString &p2);
int CheckLessThan (gpString &p1, gpString &p2);
// Define our Tokens:
// We base each token on TK_USERDEF to avoid conflicts with predefined tokens.
#define TK_DOMENUITEM (TK_USERDEF + 1)
#define TK_ASSIGN (TK_USERDEF + 2)
#define TK_MBOX (TK_USERDEF + 3)
#define TK_CALL (TK_USERDEF + 4)

Listing Five
// BasicMac.cpp - demonstration of using the Prodis Parsing Engine.
// Notes: We include BasicMac.h for our tokens, and it includes 
// all other neccessary files.
#include "basicmac.h"
// Declare global variable, just for ease of demonstration
// flow control object for parsing input lines
gpFlowControl *fcMacEngine;
// parser object for parsing expressions
gpParser *pExpEngine; 
 
int main ( )
 {
 // Build the macro-language flow control object.
 fcMacEngine = BuildMacroEngine();
 // Build the expression parser object.
 pExpEngine = BuildExpEngine();
 gpString sLine; // buffer for input lines
 int nLineNo, nToken; // line number, token
 StringList slParms; // string list for holding parameters
 nLineNo = GetMacroLine(sLine); // Get the first line.
 
 // nLineNo will be zero when there are no more input lines.
 while(nLineNo)
 {
 // Parse the input line. The engine will return a token.
 nToken = fcMacroEngine->Parse(sLine, slParms);
 switch(nToken)
 {
 // no match was found for the line
 case TK_UNRECOGNIZED : 
 ErrorHandler(); //Send an error message
 break;
 case TK_ASSIGN :
 // first parm was the variable name
 // second parm was the value to set it to
 SetVariableToValue(slParms[1], slParms[2]);
 break; 
 case TK_IF :
 case TK_WHILE :

 { 
 // slParms[1] has the expression following the 'if'
 // or while. It must be parsed and evaluated.
 int IsTrue = EvalateExpression(slParms[1]);
 PostExpressionValue(IsTrue);
 break;
 } 
 case TK_DOMENUITEM :
 // The following function - not listed -
 // must implement the macro-language line :
 // DoCmd DoMenuItem [Parm1], [Parms2], [Parm3]
 DoMenuItem (lsParms[1], lsParms[2], lsParms[3]);
 break;
 case TK_MBOX :
 // The following function - not listed -
 // causes a Message Box to be displayed
 MessageBox(lsParms[1], lsParms[2], lsParms[3]);
 break;
 case TK_CALL :
 // The following function - not listed - 
 // Matches the first parameter with a 
 // function call and gets its arguments
 // from the second parameter.
 CallFunctionWithParms(lsParms[1], lsParms[2]);
 break;
 case TK_ENDWHILE :
 // When an ENDWHILE is returned, the first parameter
 // is the number of the line continaing the matching
 // WHILE clause, which will be re-evaluated.
 RewindMacroSource (atoi (lsParms[1]));
 break;
 case TK_ELSE :
 case TK_ENDIF :
 // We don't need to do anything here; gpFlowControl
 // will handle it all. However, if we were 
 // interested, we could trap these lines.
 break;
 }
 nLineNo = GetMacroLine(sLine); // get the next line
 }
 return (1); 
 }
gpFlowControl *BuildMacroEngine( )
 {
 gpFlowControl fcNew = new FlowControl;
 
 fcNew.AddSyntax ("^ *If +&.+ +Then *$", TK_IF);
 fcNew.AddSyntax ("^ *Else *$", TK_ELSE);
 fcNew.AddSyntax ("^ *Endif *$", TK_ENDIF);
 fcNew.AddSyntax ("^ *Let &.+ *= *&.+ *$", TK_ASSIGN);
 fcNew.AddSyntax ("^ *While *&.* *$", TK_WHILE);
 fcNew.AddSyntax ("^ *WEnd *$", TK_ENDWHILE);
 fcNew.AddSyntax ("^ *Call *MessageBox *(&.+,&.+,&.+) *$" , TK_MBOX);
 fcNew.AddSyntax ("^ *Call *&.+ *(&.+) *$", TK_CALL);
 fcNew.AddSyntax ("^ *DoCmd +DoMenuItem +&.+,+&.+,+&.+ *$", TK_DOMENUITEM);
 return fcNew;
 } 
gpParser *BuildExpEngine( )
 {

 gpParser pNew = new gpParser;
 pNew.AddSyntax ("&.+ *= *&.+", TK_EQUALS);
 pNew.AddSyntax ("&.+ *< *&.+", TK_LESS_THAN);
 pNew.AddSyntax ("&.+ *> *&.+", TK_GREATER_THAN);
 return pNew;
 }
int EvalExp (gpString &sExpress)
 {
 int nReturn = 0; //return value
 int nToken; //token
 StringList lsParms; //String list for parameters
 // parse the given line and get the token
 nToken = pExpEngine->Parse(sExpress, lsParms);
 switch (nToken)
 {
 //the line is : expression = expression
 //if the two expressions are equal, return TRUE
 case TK_EQUALS :
 if CheckEquals(lsParms[1], lsParms[2])
 nReturn = 1;
 break;
 //the line is : expression > expression
 //if the first expression is greater, return TRUE
 case TK_GREATER_THAN : 
 if CheckGreater(lsParms[1], lsParms[2])
 nReturn = 1;
 break;
 //the line is : expression < expression
 //if the second expression is greater, return TRUE
 case TK_LESS_THAN : 
 if CheckLessThan (lsParms[1], lsParms[2])
 nReturn = 1;
 }
 return nReturn; 
 }

Listing Six
//------------------------------------------------------------------
// Syntoken.h - "System" Syntax tokens.
// Copyright 1994 Prodis Incorporated.
// Architect: AKJ
// Developer: AKJ
// Modification History:
//------------------------------------------------------------------
// Define "non-language" tokens:
#define TK_UNRECOGNIZED 0
#define TK_NOOP 1
#define TK_REWIND 2
#define TK_COMMENT 3
// Define flow-control tokens:
#define TK_IF 10
#define TK_ELSE 11
#define TK_MISMATCHED_ELSE 12
#define TK_ENDIF 13
#define TK_MISMATCHED_ENDIF 14
#define TK_LABEL 15
#define TK_GOTO 16
#define TK_WHILE 17
#define TK_ENDWHILE 18

#define TK_MISMATCHED_ENDWHILE 19
// Define Expression realational tokens:
// (These are just here because we ALWAYS seem to need them.)
#define TK_EQUALS 901
#define TK_NOT_EQUAL 902
#define TK_GREATER_THAN 903
#define TK_LESS_THAN 904
#define TK_GREATER_OR_EQUAL 905
#define TK_LESS_OR_EQUAL 906
#define TK_AND 907
#define TK_OR 908
#define TK_NOT 909
// Define the base for "user-defined" (app-specific) tokens.
// Additional definitions should be of the form:
// #define TK_SOMETOKEN (TK_USERDEF + n)
#define TK_USERDEF 1000
End Listings














































RAMBLINGS IN REAL TIME


Compiling a BSP Tree




Michael Abrash


Michael is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be reached at mikeab@idsoftware.com.


As long-time readers of my columns know, I tend to move my family around the
country. Change doesn't come out of the blue, so there's some interesting
history to every move, but the roots of the latest move go back even farther
than usual. To wit:
In 1986, just after we moved from Pennsylvania to California, I started
writing a column for Programmer's Journal. I was paid peanuts for writing it,
and I doubt if even 5000 people saw some of the first issues the columns
appeared in, but I had a lot of fun exploring fast graphics for the EGA and
VGA.
By 1991, I was in Vermont and writing the "Graphics Programming" column for
DDJ (and having a great time doing it, even though it took all my spare nights
and weekends to stay ahead of the deadlines). In those days I received a lot
of unsolicited evaluation software, including a PC shareware game called
Commander Keen, a side-scrolling game that was every bit as good as the hot
Nintendo games of the day. I loved the way the game looked, and actually
drafted a column opening about how for years I'd been claiming that the PC
could be a great game machine in the hands of the right programmers, and here,
finally, was the proof, in the form of Commander Keen. In the end, though, I
decided that would be too close to a product review, an area that I've
observed to inflame passions in unconstructive ways, so I went with a
different opening.
In 1992, I wrote a series of columns about my X-Sharp 3-D library and hung out
on DDJ's bulletin board. There was another guy who hung out there who knew a
lot about 3-D, a fellow named John Carmack who was surely the only game
programmer I'd ever heard of who developed under NextStep. When I moved to
Redmond, I didn't have time for BBSs any more, though.
In early 1993, I hired Chris Hecker. Later that year, Chris showed me an alpha
copy of DOOM, and I nearly fell out of my chair. About a year later, Chris
forwarded me a newsgroup posting about NextStep, and said, "Isn't this the guy
you used to know on the DDJ bulletin board?" Indeed it was John Carmack;
what's more, it turned out that John was the guy who had written DOOM. I sent
him a congratulatory piece of mail, and he sent back some thoughts about what
he was working on, and somewhere in there I asked if he ever came up my way.
It turned out he had family in Seattle, so he stopped in and visited, and we
had a great time.
Over the next year, we exchanged some fascinating mail, and I became steadily
more impressed with John's company, id Software. Eventually, John asked if I'd
be interested in joining id, and after a good bit of thinking, I decided that
nothing would be as much fun or teach me as much, and the upshot is that here
we all are in Dallas, our fourth move of 2000 miles or more since I've
starting writing computer articles, and I'm writing some seriously cool 3-D
software.
Now that I'm here, it's an eye-opener to look back and see how the events of
the last decade fit together. You see, when John started doing PC game
programming, he learned fast graphics programming from those early
Programmer's Journal articles of mine. The copy of Commander Keen that
validated my faith in the PC as a game machine was the fruit of those
articles, for that was an id game (although I didn't know that then). When
John was hanging out on the DDJ BBS, he had just done "Castle Wolfenstein 3D,"
the first great indoor 3-D game, and was thinking about how to do DOOM. (If
only I'd known that then!) And had I not hired Chris, or had he not somehow
remembered me talking about that guy who used NextStep, I'd never have gotten
back in touch with John, and things would surely be different. (At the very
least, I wouldn't be hearing jokes about my daughter saying y'all.)
I think there's a worthwhile lesson to be learned from all this: If you do
what you love, and do it as well as you can, good things will eventually come
of it. Not necessarily quickly or easily, but if you stick with it, they will
come. Threads run through our lives, and by the time we've been adults for a
while, practically everything that happens has roots that run far back in
time. The implication should be clear: If you want good things to happen in
your future, stretch yourself and put in extra effort now at whatever you care
passionately about, so those roots will have plenty to work with down the
road.
All this is surprisingly closely related to this month's topic, BSP trees,
because John is the fellow who brought BSP trees into the spotlight by
building DOOM around them. He also got me started with BSP trees by explaining
how DOOM worked and getting me interested enough to want to experiment; the
BSP compiler in this article is the direct result. Finally, John has been an
invaluable help to me as I've learned about BSP trees, as will become evident
when we discuss BSP optimization.
Onward to compiling BSP trees.


Compiling BSP Trees


As you'll recall from last time, a BSP tree is nothing more than a series of
binary subdivisions that partition space into ever-smaller pieces. That's a
simple data structure, and a BSP compiler is a correspondingly simple tool.
First, it groups all the surfaces (lines or polygons) together into a single
subspace that encompasses the entire world of the database. Then it chooses
one of the surfaces as the root node, and uses its line or plane to divide the
remaining surfaces into two subspaces, splitting surfaces into two parts if
they cross the line or plane of the root. Each of the two resultant subspaces
is then processed in the same fashion, and so on, recursively, until all
surfaces have been assigned to nodes, and each leaf surface subdivides a
subspace that is empty except for that surface. Put another way, the root node
carves space into two parts, and the root's children carve each of those parts
into two more parts, and so on, with each surface carving ever-smaller
subspaces, until all surfaces have been used. (Actually, a BSP tree can use
many other lines or planes to carve up space, but this is the approach I'll
use this month.)
If you find any of this confusing (it would be understandable; BSP trees are
not easy to get the hang of), you might want to refer to my previous column
("Ramblings in Real Time," DDJ Sourcebook, May/June 1995). It would also be a
good idea to get hold of the visual BSP compiler I'll discuss shortly; when it
comes to understanding BSP trees, there's nothing quite like seeing one built.
So the two interesting operations in building a BSP tree are: choosing a root
node for the current subspace (a "splitter") and assigning surfaces to one
side or the other of the current root node (splitting any that straddle the
splitter). First, let's look at splitting and assigning, for which you need to
understand parametric lines.


Parametric Lines


While we're all familiar with lines described in slope-intercept form, with y
as a function of x, as in Example 1(a), another sort of line description is
useful for clipping (and for a variety of 3-D purposes, such as curved
surfaces and texture mapping): parametric lines. In parametric lines, x and y
are decoupled from one another, and are instead described as a function of the
parameter t; see Example 1(b). This can be summarized as Example 1(c), where
L=(x,y).
Figure 1 shows how a parametric line works. The t parameter describes how far
along a line segment the current x and y coordinates are. Note that this
description is valid not only for the line segment, but also for the entire
infinite line; however, only points with t values between 0 and 1 are actually
on the line segment.
In our 2-D BSP compiler (as you'll recall from last time, we're working with
2-D trees for simplicity, but the principles generalize to 3-D), we'll
represent walls (all vertical) as line segments viewed from above. The
segments will be stored in parametric form, with the endpoints of the original
line segment and two t values describing the endpoints of the current
(possibly clipped) segment, providing a complete specification for each
segment; see Figure 2.
What does that do for us? For one thing, it keeps clipping errors from
creeping in, because clipped line segments are always based on the original
line segment, not derived from clipped versions. Also, it's potentially more
compact, because you need to store the endpoints only for the original line
segments; for clipped line segments, you can just store pairs of t values,
along with a pointer to the original line segment. The biggest win, however,
is that it allows us to use parametric line clipping, a very clean form of
clipping.


Parametric Line Clipping


To assign a line segment to one subspace or the other of a splitter, you must
somehow figure out whether the line segment straddles the splitter or falls on
one side or the other. To determine that, you first plug the line segment and
splitter into the parametric line-intersection equation in Example 2, where N
is the normal of the splitter, Sstart is the start point of the splitting line
segment in standard (x,y) form, and Lstart and Lend are the endpoints of the
line segment being split, again in (x,y) form. Figure 3 illustrates the
intersection calculation. Due to space constraints, I'm just going to present
this equation and its implications as fact, rather than deriving them; if you
want to know more, there's an excellent explanation on page 117 of Computer
Graphics: Principles and Practice, by Foley and van Dam (Addison Wesley, ISBN
0-201-12110-7), a book that you should certainly have in your library.
If the denominator is 0, you know that the lines are parallel and don't
intersect, so you don't do the divide, but rather check the sign of the
numerator, which tells you which side of the splitter the line segment is on.
Otherwise, you do the division, and the result is the t value for the
intersection point, as shown in Figure 3. You then simply compare the t value
to the t values of the endpoints of the line segment being split; if it's
between them, that's where you split the line segment; otherwise, you can tell
which side of the splitter the line segment is on according to which side of
the line segment's t range it's on. Simple comparisons do all the work, and
there's no need to generate actual x and y values. If you look closely at
Listing One, the core of the BSP compiler, you'll see that the parametric
clipping code itself is exceedingly short and simple.
One interesting point about Listing One is that it generates normals to
splitting surfaces simply by exchanging the x and y lengths of the splitting
line segment and negating the resultant y value, thereby rotating the line 90
degrees. In 3-D, it's not that simple to come by a normal; you could calculate
the normal as the cross-product of two of the polygon's edges, or precalculate
it when you build the world database.


The BSP Compiler


Listing One shows the code that actually builds the BSP tree. (Listing One is
from a .CPP file, but it is very close to straight C. It may even compile as a
.C file, though I haven't checked.) The compiler first sets up an empty tree,
then passes that tree and the complete set of line segments from which a BSP
tree is to be generated to SelectBSPTree(). SelectBSPTree() chooses a root
node and calls BuildBSPTree() to add that node to the tree and generate child
trees for each of the node's two subspaces. BuildBSPTree() calls
SelectBSPTree() recursively to select a root node for each of those child
trees, and this continues until all lines have been assigned nodes.
SelectBSP() uses parametric clipping to decide on the splitter, and
BuildBSPTree() uses parametric clipping to decide which subspace of the
splitter each line belongs in, and to split lines, if necessary.
Listing One isn't very long or complex, but it's somewhat more complicated
than it could be because it's structured to allow visual display of the
ongoing compilation process. That's because Listing One is actually just a
part of a BSP compiler for Win32 that visually depicts the progressive
subdivision of space as the BSP tree is built. (Listing One might not compile
as printed; I may have missed copying some global variables that it uses.) The
code is too large to present here in its entirety, but you can ftp it from
ddjbsp.zip in the /mikeab directory on ftp.idsoftware.com; see "Availability,"
page 3, as well.



Optimizing the BSP Tree


Last time, I promised I'd discuss how to choose the wall to use as the
splitter at each node in constructing a BSP tree. This is a difficult problem,
but you can't ignore it, because the choice of splitter can make a huge
difference.
Consider, for example, a BSP in which the line or plane of the splitter at the
root node splits every single other surface in the world, doubling the total
number of surfaces to be dealt with. Contrast that with a BSP built from the
same surface set in which the initial splitter doesn't split anything. Both
trees provide a valid ordering, but one is much larger than the other, with
twice as many polygons after the selection of just one node. Apply the same
difference again to each node, and the relative difference in size (and,
correspondingly, in traversal and rendering time) soon balloons
astronomically. So you need to do something to optimize the BSP tree--but
what? Before you try to answer that, you need to know exactly what you'd like
to optimize.
There are several possible optimization objectives in BSP compilation. We
might choose to balance the tree as evenly as possible, thereby reducing the
average depth to which the tree must be traversed. Alternatively, we might try
to approximately balance the area or volume on either side of each splitter.
That way you don't end up with huge chunks of space in some tree branches and
tiny slivers in others, and overall processing time is more consistent. Or you
might choose to select planes aligned with the major axes, because such planes
can help speed up our BSP traversal.
The BSP metric that seems most useful to me, however, is the number of
polygons that are split into two polygons in the course of building a BSP
tree. Fewer splits is better; the tree is smaller with fewer polygons, and
drawing will go faster with fewer polygons to draw, due to per-polygon
overhead. There's a problem with the fewest-splits metric, though: There's no
sure way to achieve it.
The obvious approach to minimizing polygon splits would be to try all possible
trees to find the best one. Unfortunately, the order of that particular
problem is N!, as I found to my dismay when I implemented brute-force
optimization in the first version of my BSP compiler. Take a moment to
calculate the number of operations for the 20-polygon set I originally tried
brute-force optimization on. I'll give you a hint: There are 19 digits in 20!,
and if each operation takes only one microsecond, that's over 70,000 years
(or, if you prefer, over 500,000 dog years). Now consider that a single game
level might have 5000 to 10,000 polygons; there aren't anywhere near enough
dog years in the lifetime of the universe to handle that. We're going to have
to give up on optimal compilation and come up with a decent heuristic
approach, no matter what optimization objective we select.
In Listing One I've applied the popular heuristic of choosing as the splitter
at each node the surface that splits the fewest of the other surfaces being
considered for that node. In other words, I choose the wall that splits the
fewest of the walls in the subspace it's subdividing. There's nothing wrong
with this technique, but it's time-consuming to perform all those trial
splits, and this test doesn't guarantee anything about the total number of
resulting splits in the subspace. John reports that when he switched from that
approach to the considerably faster approach of using the splitter that
divided the subspace most evenly, he sometimes got fewer splits overall,
although in other cases the fewest-split heuristic worked better. So, although
the approach used in Listing One will generally improve matters, it's only one
of many techniques, all imperfect but certainly better than nothing. John's
rule of thumb is, "Do something, but make it something fast."
Although BSP trees have been around for at least 15 years now, they're still
only partially understood, and are a ripe area for applied research and
general ingenuity. You might want to try your hand at inventing new BSP
optimization approaches; it's an interesting problem, and you might strike pay
dirt. There are many things that BSP trees can't do well, because it takes so
long to build them--but what they do, they do exceedingly well, so a better
compilation approach that would allow BSP trees to be used for more purposes
would be valuable indeed.
Next time: Drawing from a BSP tree.
Example 1: (a) Describing lines in slope-intercept form, where y is a function
of x; (b) describing lines in parametric form, where x and y are decoupled
from one another and described as a function of the parameter t; (c)
summarizing parametric lines.
(a)
y=mx+b

(b)
x=xstart+t(xend-xstart)y=ystart+t(yend-ystart)

(c)
L=Lstart+t(Lend-Lstart)
Example 2: The parametric line-intersection equation.
Equation 1: numer =N(Lstart-Sstart)
Equation 2: denom =-N(Lend-Lstart)
Equation 3: tintersect =numer/denom
Figure 1: A sample parametric line; x=100+t(150-100), y=50+t(150-50).
Figure 2: Line-segment storage in the BSP compiler. Segments are stored as t
start and end points relative to the original, unclipped line segment, which
is (100,50), (150,150), from t=0 to t=1.
Figure 3: The result of applying Equation 3 to two parametric lines; L= the
line being split, S=the splitting line, N=Normal.

Listing One
#define MAX_NUM_LINESEGS 1000
#define MAX_INT 0x7FFFFFFF
#define MATCH_TOLERANCE 0.00001
// A vertex
typedef struct _VERTEX
{
 double x;
 double y;
} VERTEX;
// A potentially split piece of a line segment, as processed from the
// base line in the original list
typedef struct _LINESEG
{
 _LINESEG *pnextlineseg;
 int startvertex;
 int endvertex;
 double walltop;
 double wallbottom;
 double tstart;
 double tend;
 int color;
 _LINESEG *pfronttree;
 _LINESEG *pbacktree;
} LINESEG, *PLINESEG;
static VERTEX *pvertexlist;
static int NumCompiledLinesegs = 0;
static LINESEG *pCompiledLinesegs;
// Builds a BSP tree from the specified line list. List must contain
// at least one entry. If pCurrentTree is NULL, then this is the root

// node, otherwise pCurrentTree is the tree that's been build so far.
// Returns NULL for errors.
LINESEG * SelectBSPTree(LINESEG * plineseghead,
 LINESEG * pCurrentTree, LINESEG ** pParentsChildPointer)
{
 LINESEG *pminsplit;
 int minsplits;
 int tempsplitcount;
 LINESEG *prootline;
 LINESEG *pcurrentline;
 double nx, ny, numer, denom, t;
 // Pick a line as the root, and remove it from the list of lines
 // to be categorized. The line we'll select is the one of those in
 // the list that splits the fewest of the other lines in the list
 minsplits = MAX_INT;
 prootline = plineseghead;
 while (prootline != NULL) {
 pcurrentline = plineseghead;
 tempsplitcount = 0;
 while (pcurrentline != NULL) {
 // See how many other lines the current line splits
 nx = pvertexlist[prootline->startvertex].y -
 pvertexlist[prootline->endvertex].y;
 ny = -(pvertexlist[prootline->startvertex].x -
 pvertexlist[prootline->endvertex].x);
 // Calculate the dot products we'll need for line
 // intersection and spatial relationship
 numer = (nx * (pvertexlist[pcurrentline->startvertex].x -
 pvertexlist[prootline->startvertex].x)) +
 (ny * (pvertexlist[pcurrentline->startvertex].y -
 pvertexlist[prootline->startvertex].y));
 denom = ((-nx) * (pvertexlist[pcurrentline->endvertex].x -
 pvertexlist[pcurrentline->startvertex].x)) +
 ((-ny) * (pvertexlist[pcurrentline->endvertex].y -
 pvertexlist[pcurrentline->startvertex].y));
 // Figure out if the infinite lines of the current line
 // and the root intersect; if so, figure out if the
 // current line segment is actually split, split if so,
 // and add front/back polygons as appropriate
 if (denom == 0.0) {
 // No intersection, because lines are parallel; no
 // split, so nothing to do
 } else {
 // Infinite lines intersect; figure out whether the
 // actual line segment intersects the infinite line
 // of the root, and split if so
 t = numer / denom;
 if ((t > pcurrentline->tstart) &&
 (t < pcurrentline->tend)) {
 // The root splits the current line
 tempsplitcount++;
 } else {
 // Intersection outside segment limits, so no
 // split, nothing to do
 }
 }
 pcurrentline = pcurrentline->pnextlineseg;
 }
 if (tempsplitcount < minsplits) {

 pminsplit = prootline;
 minsplits = tempsplitcount;
 }
 prootline = prootline->pnextlineseg;
 }
 // For now, make this a leaf node so we can traverse the tree
 // as it is at this point. BuildBSPTree() will add children as
 // appropriate
 pminsplit->pfronttree = NULL;
 pminsplit->pbacktree = NULL;
 // Point the parent's child pointer to this node, so we can
 // track the currently-build tree
 *pParentsChildPointer = pminsplit;
 return BuildBSPTree(plineseghead, pminsplit, pCurrentTree);
}
// Builds a BSP tree given the specified root, by creating front and
// back lists from the remaining lines, and calling itself recursively
LINESEG * BuildBSPTree(LINESEG * plineseghead, LINESEG * prootline,
 LINESEG * pCurrentTree)
{
 LINESEG *pfrontlines;
 LINESEG *pbacklines;
 LINESEG *pcurrentline;
 LINESEG *pnextlineseg;
 LINESEG *psplitline;
 double nx, ny, numer, denom, t;
 int Done;
 // Categorize all non-root lines as either in front of the root's
 // infinite line, behind the root's infinite line, or split by the
 // root's infinite line, in which case we split it into two lines
 pfrontlines = NULL;
 pbacklines = NULL;
 pcurrentline = plineseghead;
 while (pcurrentline != NULL)
 {
 // Skip the root line when encountered
 if (pcurrentline == prootline) {
 pcurrentline = pcurrentline->pnextlineseg;
 } else {
 nx = pvertexlist[prootline->startvertex].y -
 pvertexlist[prootline->endvertex].y;
 ny = -(pvertexlist[prootline->startvertex].x -
 pvertexlist[prootline->endvertex].x);
 // Calculate the dot products we'll need for line intersection
 // and spatial relationship
 numer = (nx * (pvertexlist[pcurrentline->startvertex].x -
 pvertexlist[prootline->startvertex].x)) +
 (ny * (pvertexlist[pcurrentline->startvertex].y -
 pvertexlist[prootline->startvertex].y));
 denom = ((-nx) * (pvertexlist[pcurrentline->endvertex].x -
 pvertexlist[pcurrentline->startvertex].x)) +
 (-(ny) * (pvertexlist[pcurrentline->endvertex].y -
 pvertexlist[pcurrentline->startvertex].y));
 // Figure out if the infinite lines of the current line and
 // the root intersect; if so, figure out if the current line
 // segment is actually split, split if so, and add front/back
 // polygons as appropriate
 if (denom == 0.0) {
 // No intersection, because lines are parallel; just add

 // to appropriate list
 pnextlineseg = pcurrentline->pnextlineseg;
 if (numer < 0.0) {
 // Current line is in front of root line; link into
 // front list
 pcurrentline->pnextlineseg = pfrontlines;
 pfrontlines = pcurrentline;
 } else {
 // Current line behind root line; link into back list
 pcurrentline->pnextlineseg = pbacklines;
 pbacklines = pcurrentline;
 }
 pcurrentline = pnextlineseg;
 } else {
 // Infinite lines intersect; figure out whether the actual
 // line segment intersects the infinite line of the root,
 // and split if so
 t = numer / denom;
 if ((t > pcurrentline->tstart) &&
 (t < pcurrentline->tend)) {
 // The line segment must be split; add one split
 // segment to each list
 if (NumCompiledLinesegs > (MAX_NUM_LINESEGS - 1)) {
 DisplayMessageBox("Out of space for line segs; "
 "increase MAX_NUM_LINESEGS");
 return NULL;
 }
 // Make a new line entry for the split part of line
 psplitline = &pCompiledLinesegs[NumCompiledLinesegs];
 NumCompiledLinesegs++;
 *psplitline = *pcurrentline;
 psplitline->tstart = t;
 pcurrentline->tend = t;
 
 pnextlineseg = pcurrentline->pnextlineseg;
 if (numer < 0.0) {
 // Presplit part is in front of root line; link
 // into front list and put postsplit part in back
 // list
 pcurrentline->pnextlineseg = pfrontlines;
 pfrontlines = pcurrentline;
 psplitline->pnextlineseg = pbacklines;
 pbacklines = psplitline;
 } else {
 // Presplit part is in back of root line; link
 // into back list and put postsplit part in front
 // list
 psplitline->pnextlineseg = pfrontlines;
 pfrontlines = psplitline;
 pcurrentline->pnextlineseg = pbacklines;
 pbacklines = pcurrentline;
 }
 pcurrentline = pnextlineseg;
 } else {
 // Intersection outside segment limits, so no need to
 // split; just add to proper list
 pnextlineseg = pcurrentline->pnextlineseg;
 Done = 0;
 while (!Done) {

 if (numer < -MATCH_TOLERANCE) {
 // Current line is in front of root line;
 // link into front list
 pcurrentline->pnextlineseg = pfrontlines;
 pfrontlines = pcurrentline;
 Done = 1;
 } else if (numer > MATCH_TOLERANCE) {
 // Current line is behind root line; link
 // into back list
 pcurrentline->pnextlineseg = pbacklines;
 pbacklines = pcurrentline;
 Done = 1;
 } else {
 // The point on the current line we picked to
 // do front/back evaluation happens to be
 // collinear with the root, so use the other
 // end of the current line and try again
 numer =
 (nx *
 (pvertexlist[pcurrentline->endvertex].x -
 pvertexlist[prootline->startvertex].x))+
 (ny *
 (pvertexlist[pcurrentline->endvertex].y -
 pvertexlist[prootline->startvertex].y));
 }
 }
 pcurrentline = pnextlineseg;
 }
 }
 }
 }
 // Make a node out of the root line, with the front and back trees
 // attached
 if (pfrontlines == NULL) {
 prootline->pfronttree = NULL;
 } else {
 if (!SelectBSPTree(pfrontlines, pCurrentTree,
 &prootline->pfronttree)) {
 return NULL;
 }
 }
 if (pbacklines == NULL) {
 prootline->pbacktree = NULL;
 } else {
 if (!SelectBSPTree(pbacklines, pCurrentTree,
 &prootline->pbacktree)) {
 return NULL;
 }
 }
 return(prootline);
}
End Listing







































































DTACK REVISITED


This Stuff is Hard




 Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded and can be contacted through the DDJ offices.


I've just finished reading the article "Disruptive Technologies: Catching the
Wave," by Joseph L. Bower and Clayton M. Christensen (Harvard Business Review,
January/February 1995), which discusses the history of companies making
hard-disk drives and the choices and mistakes these companies made. It's a
fascinating article.
For instance, Bower and Christensen discuss the reasons new companies arise to
eventually challenge (and sometimes displace) entrenched companies. An example
is Seagate. Founded in 1980, the company had by 1986 grown to be a $700
million company, providing 5.25-inch hard drives for AT-compatible PCs.
Seagate developed 80(!) models of 3.5-inch drives, but its principal customers
wanted to stick with higher-capacity 5.25-inch drives. So Seagate continued to
concentrate on the larger format, allowing upstarts Conner and Quantum to
create a new market for 3.5-inch drives that could fit into small desktop
cases and Compaq's portable PCs.
As the authors point out, the capacity provided by 3.5-inch drives was
increasing faster than that demanded by PC users. Suddenly, a 3.5-inch drive
was adequate for the needs of even power users with large desktop cases, and
Seagate's principal customers left to find a volume source of the smaller,
less expensive drives.
Seagate survived and has even prospered of late, with 1994 sales of $3.5
billion. But Conner and Quantum had combined 1994 sales of $4.5 billion, and a
lot of that $4.5 billion came out of Seagate's hide.
The article introduces "the concept of performance trajectories--the rate at
which the performance of a product has improved, and is expected to improve,
over time...." (I've always called that a "performance trend.") The article
also differentiates between "sustaining" and "disruptive" technologies.
Transistors disrupted the vacuum-tube industry, for instance.
Bower and Christensen believe that the switch over from 5.25- to 3.5-inch
drives was a disruptive technological change. Well, I don't agree with their
every assertion; they seem to believe the 1.8-inch drive market is going to
grow dramatically, and that storage capacity is still growing only 50 percent
per year.
Their article doesn't include a whole lot of hard data. (It is important to
differentiate between the data the authors had available when preparing the
article, and what they chose to include in an article aimed at a business
audience.)


Collecting Hard Data on Hard Drives


I live in Santa Clara, the heart of Silicon Valley, just a long stone's throw
from Intel headquarters. It won't surprise you that the local rag, the San
Jose Mercury News, has a "Computing" section in its Sunday edition. It may
surprise you, however, that this section is devoted to personal computing,
written for the consumer. No corporate puff pieces that I can recall. This
must be frustrating to certain local industry leaders who like to see
themselves in print.
Since the section is read mostly by personal-computer consumers, it attracts a
lot of ads from retail outlets, many of which are low-overhead outfits selling
no-name PC clones at very attractive prices. Others are superstores, the best
known of which is Fry's Electronics, which used to be a supermarket (as in
food).
On a whim, in February 1993 I started saving the "Computing" sections. After a
year I thought of a use for all that stacked-up newsprint: a database that
would allow me to track the prices of various PC products over time.
The local superstores run full-page ads which feature two- or three-day sales
on just a few items. Because of the limited selection, these ads are not a
reliable way to track prices. Many of the low-overhead outlets run small ads
that list lots more items than the full-page ads of the superstores. One of
these, Hi-Tech USA (Milpitas, CA) has advertised consistently since I started
saving the "Computing" section, missing only one Sunday in over two years. The
prices in these ads (by the no-name clone outlets) are consistently low, if
not always as low as particular items in the superstores' full-page ads. I
decided to plot hard-disk prices as advertised by Hi-Tech.
I wrote a QuickBasic program that's a collection of disk-price data
statements, one line per week. Every Sunday I take a minute to update this
program. I wrote other QB programs to plot that data, using PCL5 to drive my
LaserJet 3P.
When I plotted all those disk prices, I wound up with an interesting but
confusing graph that had too much raw data. What I really wanted to know was
the price of hard-disk storage, measured in dollars/MB versus time and the
disk size that provided the lowest price/MB at a given time. I wanted this
information for my own buying strategy, and I figured that anybody else who
bought hard drives from time to time might be interested, too.


The Old, Bogus Buying Strategy


Everybody knows the old recommendation for choosing a hard drive: Buy the
largest one you can afford. I don't know if that ever was a good buying
strategy, but it certainly isn't now. Why?
Figure 1 is a plot of five time slices of disk price/MB versus disk capacity.
This plot shows a few things: First, at a given time there's an optimum disk
size, or "sweet spot," that provides the lowest cost/MB of storage. Buying a
smaller or larger drive increases the cost per megabyte. Second, the optimum
disk capacity increases with time. Third, the downward price trend is less
consistent for smaller drives.


A Physical Interpretation


A hard-disk drive contains h platters, where h=1,2,3,.... IDE drives for PC
clones invariably have either one or two platters. Since disk manufacturers
can't use, say, 1.5 platters to produce a variety of disk capacities, they use
higher or lower disk read/write technology. The "sweet spot" is for drives
with two platters and the most cost-effective R/W technology. This technology
moves with time.
Why two platters? It doubles the capacity to move from one platter to two, but
it obviously doesn't double the cost. Why not three or more platters? I dunno.
Maybe the physical size of the drive increases. Maybe the absolute price is
too high (as opposed to dollars/MB). In any event, the cost penalty for buying
away from the sweet spot is lower if you move up in size than if you move
down.
The lowest-cost/MB, one-platter drives are known by the marketeers as
"entry-level" drives. As of May 1, 1995, the entry level was 540 MB.


Refining the Data


We want to closely track that sweet spot, both in terms of cost/MB and optimum
disk size; see Figure 2. A glance shows that storage per buck is doubling
every year (at least). This is one reason I think the old paradigm of buying
the largest disk you can afford is dead.
I don't like coincidences in my data because it makes the data appear to be
cooked. Regrettably, there are two coincidences in this article. The first is
that I happened to start saving the Sunday "Computing" sections at the time
that downward disk-price trends suddenly doubled from their historical (1986
to early 1993) rate. It used to take two years for storage per buck to double.
Could the sudden increase in the areal-density trend that occurred early in
1993 have been anticipated? Well, it was anticipated! Robert Scranton,
director of storage systems and technology at IBM's San Jose Almaden Research
Center is quoted in the article "Disks Make Capacity Drive," by Terry Costlow
(Electronic Engineering Times, March 8, 1993), saying, "We think areal density
will increase by about 60 percent over the next few years. That means
disk-drive capacity will double every 14 to 16 months, instead of every two
years." So he was a tad conservative; capacity per dollar has in fact been
doubling every 12 months since he made that statement at the 1993 Technology
Forum.
Let's take a close look at Figure 2. The first thing to notice is the price
increase toward the end of 1993. Since there was no discontinuity (technology
shift) in the optimum size, this would appear to be a supply/demand problem.
In fact, a lot of PCs were sold in the fourth quarter of '93.
Next, you see the sudden jump in optimum disk size at the start of this year
from 540 to 1260 MB! This suggests a couple of things: First, the IDE 540-meg
addressing limitation threw a wet blanket over sales of IDE drives larger than
540 megs because of the slow consumer shift to EIDE drives. Second, Hi-Tech
happened to start selling 1.2-GB drives four weeks before offering 850-MB
drives. As soon as 850-MB drives were offered, they immediately became the new
optimum size. But 1.2-GB (actually about 1.26 GB) drive prices were dropping
steeply, and by March 26, 1.26 GB was once more the optimum size. Both 850-MB
and 1.2-GB drives then cost about $.27/MB, giving the consumer two price
points: $236 at 850 MB or $368 at 1.26 GB.

I did not overlook those large ads that feature only a few items; Access in
Santa Clara, for instance, advertised a Maxtor 853-MB drive for $209 ($.24/MB)
on April 30. Such occasional specials, however, are not included in my data.
Besides the discontinuity when the market moved from IDE to EIDE, some disk
sizes just never became cost effective. An example is the 1.08-GB IDE drive. I
dunno why, but your money is better spent on 850-MB or 1.26-GB drives. Figure
1 also shows this.
If your personal strategy is to bite the bullet and buy as big a drive as is
reasonable, then your best buy is a disk whose size has just become the new
sweet spot; for example, $338 for 1.26 GB. If you're a cheapskate like me,
you'll want the best cheap choice ($209, 850 MB). These are April 30 prices;
by the time you read this, the market will have moved on. And that's another
point: Delay buying a new hard disk until the last possible moment, because
prices continue to nosedive. Surely you can offload some inactive files?
(You do back up your hard disk, don't you? I have a colleague who has three
1-GB drives at home, and he does not back them up. Yikes!)


How Many Megabytes?


Picture a male high-school student with a brand-new, completely empty, 1.2-GB
drive, a color scanner, and a copy of Playboy magazine. How long do you think
it'll take him to completely fill that drive?
We PC users are divided into those who already work with images and those who
will be working with images. As soon as you start dealing with images, no hard
disk in the world is anywhere near as big as you need. I don't even want to
think about video, but Quantum has just introduced a 9-GB, 3.5-inch drive for
about $2500. Seagate and Micropolis have been shipping 9-GB drives for a while
now using 5.25-inch technology.


A Need for Speed?


850-MB drives come with 32-256 KB of on-disk buffer memory, have average
access times of 9-15 msec, and average latencies of 8.3-5.56 msec (3600-5400
RPM). The fastest of these drives commands about a 10 percent price premium
over the slowest.


The Historical Context


Figure 3 includes the data from Figure 2 and the plots from "Disk
Architectures for High Performance Computing," by Randy Katz, Garth A. Gibson,
and David A. Patterson (Proceedings of the IEEE, December 1989). The 1982 to
1988 plots are (presumably) based on hard data. Katz predicted trends for
another four years, up to 1992. Here is the second coincidence: When I
extended Katz's 3.5-inch trend to February 1993, it met my Hi-Tech data
perfectly. I didn't cook this data, honest! I predict that the new, steeper
trend will extend at least until 1998, when, at $.04 a megabyte, a 9-GB drive
will set you back $360. Let's hope we have a E2IDE standard by then, because 9
GB exceeds the 8.4-GB EIDE address limit!
Figure 1: Sweet spot versus time.
Figure 2: IDE disk price/MB and optimum size versus time.
Figure 3: Disk price/MB versus time.





































PROGRAMMER'S BOOKSHELF


Books for Software Engineers




Robin Rowe


Robin, the editor of The C++ Newsletter, can be contacted at cpp@netcom.com.


The idea of Patterns is that the class is at too fine a level of abstraction
to completely explain object-oriented design. Patterns are simply patterns of
classes, ways that classes can be used together to solve common problems in
software design. Patterns has been heavily hyped in some magazines, and after
seeing some really dreadful articles about Patterns I felt quite skeptical
that there was any substance to what otherwise seemed like a good idea. 
I was pleasantly surprised when I began reading Design Patterns: Elements of
Reusable Object-Oriented Software, by Erich Gamma, Richard Helm, Ralph
Johnson, and John Vlissides. Having followed the group running the Patterns
e-mail reflector out of the University of Illinois, I was already familiar
with the concept of Patterns. Since Johnson runs that reflector, it isn't
surprising that Design Patterns covers much of the same information, although
the book is more coherent than the e-mail reflector itself. Style-wise, the
book reminds me of Grady Booch's Object-Oriented Analysis and Design, which is
to say it is very well written. (Booch wrote the forward for Design Patterns.)

Design Patterns has a good amount of C++ code, so it isn't all dry theory.
Chapter Two is a case study in designing a document editor. It gives some
interesting insights into the design process and builds confidence that the
authors' methods can actually be applied. Chapters Three through Five present
a "Design Pattern Catalog" that lists 23 different types of patterns with
example C++ code. The appendices contain a guide to the book's object notation
(which is based on OMT and is quite simple to understand) and their foundation
classes (simple containers, mostly) that are used throughout the book. The
book includes a glossary, bibliography, and index.
Even if you have reservations about Patterns (I do), Design Patterns is a book
that belongs on your OOD reading list. To me, being thought-provoking, clear,
and free of technical mistakes is more valuable in a book than being 100
percent in agreement with my own design beliefs. Design Patterns makes good
arguments and is pleasant reading.
Source code for the book is available electronically by sending the message
send design pattern source to design-patterns-source@cs.uiuc.edu.
All software engineers try to predict how big a program will be, how long it
will take to build, and how many defects it will likely contain. Software
metrics is the branch of software engineering that attempts to put some
science behind this estimation process.
Metrics and Models in Software Quality Engineering, by Stephen Kan, makes a
good introduction to the software-metrics field. Kan relates not only the use
of metrics at his own employer (IBM Rochester), but gives examples from NEC
Switching Systems Division, HP, Motorola, NASA, and IBM Federal Systems. If
you intend to work on software with any of these companies, it would be useful
to know their techniques.
Don't expect any object-oriented or C++ engineering here. Metrics and Models
contains no code and is not object-oriented in its thinking. Kan says, "the
waterfall process is very valuable," and that "there is very little
information available about object-oriented development processes." Although
Kan clearly states a preference for the waterfall process as "time-proven,"
little time is taken up with the waterfall process itself, and its mention
should not be an impediment to OO readers. It does, however, leave some
question in my mind as to how well these metrics will work in conjunction with
OO techniques.
The use of metrics is geared toward big software projects, since so many of
its methods are statistical. Even so, Kan admits in the small print that many
of the metrics methods don't sample enough data to be considered statistically
sound. What we have with the current state of the art in metrics is empirical.
Perhaps, just as early astronomers could make decent calendars but couldn't
understand the workings of the solar system, software metrics makes
predictions without knowing exactly why they should be right.
Kan writes well and clearly, although a bit dry, and the book is generally
enjoyable to read. One small, persistent annoyance is the author's overuse of
the word "formal," which has a specific meaning to a mathematician or software
engineer. "Formalization is the process by which mathematics is adapted for
mechanical processing. A computer program is an example of a formalized text."
This quote is not from Kan's book, but from The Mathematical Experience, by
Davis and Hersh (ISBN 3-7643-3018-X). "Formal" can variously mean: 
Proven mechanically by mathematical logic.
Rigorous.
A social ritual. 
Only by context can the reader divine which meaning Kan intended in Metrics
and Models. You can even find "formal" used with two totally different
meanings in the same sentence. Kan never does define the word, so it may even
be that he intended some other meaning.
Metrics and Models contains standard practices in the metrics field. The
concise and clear explanation of function point counting is a jewel. If you
are looking for just one book on metrics, this is a good choice. Good use of
graphs and highly descriptive text keep the book moving. Although the book
doesn't contain code, it does have lots of equations. For speaking
intelligently to software metrics practitioners, or even performing metrics
yourself, Metrics and Models should be on your bookshelf.
Software safety and reliability should interest not only the software
engineer, but everyone in our society. Even if you choose not to take the risk
of boarding a fly-by-wire jetliner, you still face the risk of being struck by
falling airplane parts. In fact, the odds are actually considerably higher
than your chances of winning a state-lottery jackpot. This is less a comment
on the dangers of computer-controlled systems than on how society as a whole
shares the risks of our increasingly automated civilization.
Peter Neumann is the moderator of the RISKS forum on Internet. A visit to
comp.risks is the place to go to find out the latest happenings and concerns
in software safety and security.
Neumann's Computer Related Risks can serve as an almanac or history of
software disasters. If you have been frustrated by the lack of organization on
the somewhat free-wheeling RISKS forum, you will be glad to see the
information well organized, with deeper insight and more details. Just on the
sheer breadth of available information, you have to look at this book. Want to
know of a software failure caused by a dead cow? It's right there on page 17:
"On May 4, 1991, four of the FAA's 20 major air-traffic control centers shut
down for 5 hours and 22 minutes. The cause: 'Fiber cable was cut by a farmer
burying a dead cow.'" 
Neumann provides many similar anecdotes: submarine sinks trawler; Dutch
chemical plant explodes due to typing error; Michigan factory worker killed by
robot; robot camera runs away from Connie Chung; raccoons cripple the JPL;
NASA rockets, set to explore thunderstorms, launched by accident when hit by
lightning; interference from MacDonald's toasters increases employee
paychecks; and so on. Don't get the idea that this is not a serious book.
Neumann goes deeper, into the causes of software glitches, not just in the
specific cases, but in general. He looks into weak links, single-point failure
causes, multiple causes, and malicious acts. System security is a major focus
of the book.
Nancy Leveson, a University of Washington professor, is a well-known software
safety expert. Her paper on the Therac-25 medical accelerator software
accidents published by IEEE is one the most well-known papers published in the
field of software safety.
Leveson's Safeware: System Safety and Computers has much less to say about
software itself, despite the title. It instead focuses on the bigger picture
of accidents in general and how the software development process interacts
with them. What Leveson is looking for are the root causes of accidents in
general and how those apply to software specifically. After dispelling some
"Software Myths" in Chapter Two, the focus is on people, not the machine or
code. While Neumann's book focuses on how machines can go berserk, Leveson
studies how people foul things up. Although Safeware is presented in a sedate
textbook format, some may find it hard to read without getting good and mad
that so many people recklessly endanger (and kill) others just because they
don't want to believe an accident could happen.
Leveson looks at the history of engineering safety, particularly in aviation,
to see what has worked and what hasn't. Several chapters are devoted to hazard
analysis techniques, such as fault-tree analysis or state-machine hazard
analysis. A chapter addresses applying hazard and requirements analysis to
software. The chapter on hazard elimination provides many good insights but
requires some further effort on the part of the reader to consider how it all
applies to software. (It does.) Naturally, a book that is concerned with human
factors looks into the design of human/machine interfaces.
Leveson has some outstanding case studies in the substantial appendices,
divided in medical, aerospace, chemical, and nuclear categories. She examines
the Therac-25, Apollo 13, the DC-10 cargo door, the Challenger, Seveso,
Flixborough, Bhopal, Windscale, Three Mile Island, and Chernobyl. Leveson
chose to focus on accidents for which significant information was available.
She seeks the truth through deep insight. Neumann's book tries to cover the
spectrum of software accidents and incidents, including those that are
security related. He seeks the truth through a broad understanding.
Although both books reach many of the same conclusions, they are
complementary. If you care about software safety you really need both. It's a
bit surprising there isn't more overlap between the books in the software
safety field. Both of these books are in some ways better than "Digital Woes"
or "The Day the Telephones Stopped," but they don't supplant them. Safeware is
obviously intended as a college text. Computer Related Risks is more of a
crossover; it is breezier in tone and less rigorously organized, but it has
student exercises at the end of each chapter, a feature Safeware lacks. Both
books cover a number of software-related incidents, but Safeware goes for
depth while Risks goes for breadth. However, they don't cover everything. The
well-publicized but poorly researched Denver Airport baggage-handling-system
fiasco is missing from both.
Both authors write well. Neumann is more fun, while Leveson is more scholarly.
The bottom line is Computer Related Risks will have more appeal to programmers
and even the general public. It is also a good "think" text for undergrads in
computer science and other fields. Safeware is required reading for systems
analysts and others concerned about the problems of engineering management. I
enjoyed both and look forward to reading them again more leisurely. Both are
excellent reference material.
Many software engineers are interested in the "Deming Method." Dr. Deming was
responsible for quality and efficiency during urgent production improvement
efforts in the U.S. during World War II. The success of his methods helped us
win the war, but were seen as unnecessary in the post-war economic boom.
Post-war Japan, however, was not in such good shape and invited Dr. Deming to
teach his methods there. As a result Japan changed its production techniques
so that "Made in Japan" was transformed from a synonym for shoddy to an
indication of high quality. It would be 40 years before America rediscovered
Deming.
Four Days with Dr. Deming is a summary of his management lectures. The book is
too light, in my opinion, to serve as an introduction to Deming for engineers.
(See The Deming Management Method, by Mary Walton, ISBN 0-399-55000-3, or Dr.
Deming, by Rafael Aguayo, ISBN 0-671-74621-9 for popular paperbacks that
present Deming in a more narrative form.) However, if you have been looking
for a more approachable book on Deming to drop on a manager's desk, this is
it. Moreover, if you are a Deming aficionado or want to train in his methods,
you should spend a few days reading this book. It is the closest thing to a
Deming cookbook. It's paperback with lots of illustrations and amusing
anecdotes.
Design Patterns: Elements of Reusable Object-Oriented Software
by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
Addison-Wesley Publishing, 1994, 416 pp., $37.75,
ISBN 0-201-63361-2
Metrics and Models in Software Quality Engineering
by Steven H. Kan 
Addison-Wesley Publishing, 1994, 344 pp., $39.75, 
ISBN 0-201-63339-6
Computer Related Risks
by Peter G. Neumann
Addison-Wesley Publishing, 1995, 367 pp., $22.95, 
ISBN 0-201-55805-X
Safeware, System Safety and Computers
by Nancy G. Leveson
Addison-Wesley Publishing, 1995, 680 pp., $45.95,
ISBN 0-201-11972-2

Four Days with Dr. Deming
by William J. Latzko and David M. Saunders 
Addison-Wesley Publishing, 1994, 344 pp., $25.95,
ISBN 0-201-63366-3



























































SOFTWARE AND THE LAW


Competing with Your Former Employer




Marc E. Brown


Marc is a patent attorney and shareholder of the intellectual-property law
firm of Poms, Smith, Lande & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at meb@delphi.com.


Competing with your former employer is generally considered part of the
American Way. Unfortunately, it also leads to strong feelings of betrayal and
sometimes lawsuits. Employers are often angry and want to punish disloyal
employees to set an example for others. Win or lose, the lawsuit often drains
the employee of the money needed to successfully compete. But pursuing a
lawsuit can also be risky for employers, because employees sometimes counter
with successful claims of their own.
Consequently, it is best to avoid disputes from the beginning. Let's begin by
looking at the legal rights of each party. 


Restrictions on Competition


Most of the legal restrictions on competition by a former employee are
traceable to one of four areas: noncompetition promises, trade secrets,
transfers of ownership, and duties of loyalty. 
Noncompetition promises. At the beginning of employment, many sign written
agreements in which they promise not to compete with their employer after
termination of their employment. The agreements typically prohibit us from
working for a competitor, consulting for a competitor, or owning or operating
a competing business.
In some states, the courts won't enforce these promises at all. Perhaps of
greatest importance to the software industry is the state of California, where
such agreements are usually unenforceable. Other states enforce the clause,
but only if it's "reasonable." The court considers numerous factors, including
the length of time during which the employee is prohibited from competing, the
geographic areas in which he may not compete, and the scope of the
technologies in which he may not compete.
Among states that enforce reasonable restraints on competition, some enforce
an unreasonable clause, but only to a reasonable extent. Others enforce the
unreasonable promise if it can be made reasonable by striking out words, but
not by adding them (the "blue-pencil" rule). 
Trade secrets. Trade-secret law does not bar competition; it bars
misappropriation of trade secrets owned by a former employer. In practice,
this often prevents effective competition.
A vast array of information can be protected by trade-secret law. In addition
to technological information, such as flowcharts and algorithms, trade-secret
law can protect the identity of customers, suppliers, financing sources, and
outside consultants. It can even protect information about the skills and
professional interests of other employees in the company--recruiting
information a competitor would find useful. To be protected, the trade secret
must constitute valuable business information, must be the subject of
reasonable efforts to maintain its secrecy, and must, of course, be a secret.
A common obstacle to asserting a trade-secret claim is failure to make
reasonable efforts to protect secrecy. Physical security measures are usually
necessary. Everyone who comes in contact with the trade secret should be told
that it is a trade secret and must be protected. Of particular importance is
the need to specifically itemize the trade secrets that must be protected:
flowcharts, customer lists, and the like. It is normally insufficient to
simply state that all information an employee may come in contact with is a
trade secret. 
A common approach is to include language in the employment agreement that
advises the new employee that he may come in contact with trade-secret
information and specifies the categories of that information. The agreement
includes a promise to protect the confidentiality of that information and not
to use or disclose it, except in connection with his employment. Such a
promise substantially increases the chance that a court will find a
protectable trade-secret interest. Furthermore, a former employee can be
barred from using trade-secret information under certain circumstances, even
if he never promised not to.
Transfer of ownership. It's common to require employees to transfer all
ownership rights to technology developed during the course of employment to
the employer. A typical agreement will require you to assign to your employer
all the copyrights, patents, and trade secrets which may arise from any
software you develop for your employer. So long as the software relates to the
employer's business or your job, this can be required without regard to where
or on whose time the software was conceived or developed. It is also
reasonable to require you to assign any software conceived or developed using
the time or facilities of the employer, without regard to whether the effort
is part of the your job or relevant to the employer's business.
These clauses are usually enforced. Although they obviously do not bar
competition, they often can effectively prevent it by barring you from using
the technology you need to compete.
In some cases, the employer will own copyright, patent, and trade-secret
rights, even when you did not expressly promise to transfer these rights. The
legal theories that support this generally reflect the common-sense concept
that the employer should own that for which he paid.
Duty of loyalty. Employees usually owe a "duty of loyalty" to their employers.
Although such a duty rarely bars competition after termination of employment,
it often restricts your ability to make plans to compete while you are
employed. The duty and associated restrictions are particularly great when you
are an officer or director (not merely a shareholder) of the employer.
Officers and directors additionally owe employers a "fiduciary duty"
(involving confidence or trust). 


Employment Agreements


Carefully review any agreement your employer asks you to sign. If the
obligations seem oppressive, unfair, or unacceptable, discuss your concerns
with your prospective employer with the goal of making changes.
One effective technique for handling this obviously sensitive problem is to
provide the prospective employer with a friendly letter. The letter would
thank the employer for the employment offer, specifically identify the
problematic clause(s) in the agreement, set forth alternative, acceptable
language, and explain very diplomatically why a change is justified. The
employment agreement may then be signed, as presented. "Subject to
modification(s) set forth in attached letter" should be written above the
signature line, and the letter should be attached to the agreement. Keep a
copy of all of the documents.
To minimize friction, you might first advise the employer verbally that you
have a few concerns with the employment agreement, that you'll provide an
outline of them, and that you welcome the opportunity to discuss the matter
further.
The employer is usually in a much better position to dictate the terms of the
employment agreement. All employers would be wise to have every employee sign
an employment agreement containing the types of clauses just discussed, as
well as others in clear, specific, reasonable language. However, many
employers often use form agreements that do not focus on matters important to
the employer's business and are unclear and/or oppressive.
Timing is important. An agreement is most likely to be enforced when you are
asked to sign it before you terminate your prior position. But many companies
present the agreement to you on the very first day of employment, after you've
already quit your old job and, in many instances, moved your family. At this
point, your "consent" is often no longer voluntary and might be set aside on
this ground. 
Sometimes, an employer will ask you to sign a new employment agreement in the
middle of the employment. Typically, this occurs because the employer has just
experienced a problem (perhaps with a departing employee) that he hopes to
alleviate in the future. Unless handled properly, such an interim change may
not be enforced.
The best way to ensure enforceability of a revised agreement is to include it
in a package along with a raise, an increase in benefits, a promotion or some
other type of additional benefit. You should be given the option of not
signing the agreement and retaining your present position. If you accept the
added benefit and sign the agreement, the likelihood of it being enforced is
enhanced substantially. 
One last suggestion: If the employee requests changes in an employment
agreement before signing it, the employer should proceed cautiously. Although
the change may seem minor, the employer should carefully consider its effect
if the employee's commitment to his new job turns out to be less than the
employer expects.


Conduct During Employment


It is dangerous to plan to compete with an employer while employed by him,
particularly if you are an officer or director--an employee who owes the
employer a fiduciary duty. 
During the course of employment, you should not design software which you
intend to use in competition with your employer. Even designing it at home is
not always safe. You also should not be recruiting other employees, customers,
or suppliers. 
The employer should give prompt, serious attention to news of an employee
looking for another job or making plans to compete. If such actions are likely
to be harmful to the employer, the employer would be wise to immediately
consult a lawyer. The legal intricacies of the situation might make it
dangerous for the employer to fashion and execute a protective plan without
the guidance of counsel. 


Termination of Employment



Employers risk losing the protection of their employment agreement if they
fire employees and if the act of firing constitutes a breach of the employment
agreement. If there is any doubt, an attorney should be consulted.
If employment is going to be terminated, the employer should consider removing
the employee from all further contact with sensitive business information.
This does not mean evicting the employee from the premises, but merely
transferring his responsibilities to areas that do not involve sensitive
business information. The employer should retrieve any sensitive information
provided to the employee.
An exit interview should be conducted. If possible and with the utmost
diplomacy, the employer should confirm the employee's understanding, if true,
that he was not fired and is not quitting because of any breach by the
employer of an obligation or because of any type of duress. It would also be
beneficial to obtain the name of any new employer. The employee should also be
reminded of his obligations under his employment agreement after termination
of employment and provided with a copy of his employment agreement with all
post-termination obligations underlined.
It might be useful to have the employee sign a document stating that he has
been told all of the foregoing and that he has not been fired and is not
quitting because of any employer breach or duress.
Obviously, this may not be possible; if not, let it go. Under no circumstances
should employees be threatened or otherwise coerced into signing any document.
It is of the utmost importance that the interview be as friendly as possible.
The employee should be thanked for the efforts made for the employer, and
regrets should be expressed about the departure. Any wages owed should be paid
immediately.
Departing employees should be extremely careful not to remove any drawings,
listings, equipment, data, and so on. To reduce doubt, the employee might ask
his supervisor to examine the materials he is removing from his office to
ensure that he has not mistakenly taken something he should not.
The employee should politely decline to sign any documents during an exit
interview. One diplomatic way to handle this is to explain to the interviewer
that this is a very emotional day for the employee and that he would prefer to
take the papers home to consider during a more thoughtful and reflective
moment. As with the employer, it is also very much in the employee's interest
to maintain a good relationship with the former employer. 
If the employee has reason to believe that he may be sued if he works for a
particular new employer, the employee should consider asking the new employer
to indemnify him against any such lawsuit. Without such a promise in writing,
the employee might be wise not to accept that new employment.
If the employee is planning to begin a competing business, he should at least
purchase a Commercial General Liability insurance policy. In the event of a
lawsuit, that policy might fund all or some of the expense of a defense to the
lawsuit. Be advised, however, that the standard Commercial General Liability
policy often will not cover a lawsuit by a former employer. Again, a lawyer
should be consulted when in doubt.
The new employer also needs to act cautiously. Although he is certainly
entitled to interview job applicants, it is dangerous to directly solicit
employees from a competitor. After hiring a competitor's employee, it is
particularly dangerous to allow that new employee to solicit his former
associates. 
The new employer should also question whether the prospective employee is
contractually restricted from accepting the new employment, whether he has
been exposed to what the former employer might regard as trade secrets, and
what promises he has made in connection with these trade secrets. As a
condition of employment, the new employee should be asked to sign a statement
indicating that his current employment is not a breach of any prior agreement
and that his new employer has not objected to the new employment. It should
also state that he has not had access to any trade-secret information or, if
he has, that he has not told and will not tell the new employer about it and
will not use any of this information in his new employment.


After Termination


Needless to say, you should not use any information your former employer might
regard as a trade secret. Failure to adhere to this rule is one of the most
common causes of a lawsuit. 
The former employer should send a follow-up letter to the former employee,
particularly if the employee did not sign an exit statement. The letter should
enclose a copy of any agreement the former employee signed, with important
language underlined. The letter should remind the employee of his continuing
obligations in a polite, professional manner devoid of any threatening
language.
The former employer may also wish to communicate directly with the new
employer. This should be considered only if the new employer is a competitor
and if the former employee was exposed to information which the former
employer has a credible basis for claiming is protectable as a trade secret.
Such a communication must be carefully crafted to avoid any allegation of
wrongdoing. It should politely point out to the new employer the fact that the
former employee was exposed to trade-secret information, and it should
specifically describe the categories of that information. The new employer
should be thanked for his anticipated cooperation in making sure that the new
employer does not obtain or use this information in any way. This
communication should only be in writing.
The potential benefit of this communication should also be carefully weighted
against its risk. Although the former employer may believe that much of the
information discussed in his letter is legally entitled to trade-secret
protection, he may be wrong. If he is, he may be charged with having unfairly
interfered with the new employment relationship.


Litigation


A lawsuit against a former employee is dangerous for the former employer. As a
practical matter, such a lawsuit usually achieves its goal--to prevent
competition--without regard to its merit. For a former employee who is
competing on his own, the lawsuit often drains the former employee of the
money he needs for his new business. If the former employee is working for a
competitor, the lawsuit often strains his relationship with the new employer
and makes the new employer more cautious about promoting him. 
As a consequence of this foreseeable (and often intended) damage, a former
employer who pursues a lawsuit found to lack merit is exposed to a broad
variety of counterclaims: malicious prosecution, abuse of process,
interference with perspective business relationships, and violation of the
antitrust laws. 
The former employer should also be cautious about the the lawsuit and
statements he publishes about his former employee. To avoid claims of liable,
slander, and disparagement, the former employer should circulate a memo to all
personnel directing them not to discuss the lawsuit or the former employee
with anyone, including customers and competitors.
In framing the lawsuit, the former employer can usually assert a broad variety
of claims, some of which are more likely to be covered by insurance than
others. To avoid fighting an insurance company in addition to the former
employee, the former employer may prefer to omit those claims likely to be
covered by insurance.
Employees should also understand that competing with a former employer is
inherently dangerous from a legal perspective. As a practical matter, a
lawsuit by a former employer can severely damage a new business. Not only does
it drain needed capital, but it can sour efforts to develop customer
confidence. Even when the employee feels confident that he can weather the
storm, the outcome of such lawsuits are often difficult to predict.
The new employer is also subject to a lawsuit. Among the claims the former
employer can assert against the new employer are: inducing the former employee
to breach his employment contract with the former employer, misappropriation
of trade secrets, and unfair competition.


Conclusion


Leaving a job and competing with a former employer is like getting a divorce
to marry another--it often creates an emotionally charged situation that can
lead to explosive results. The most important goal when terminating employment
should be to leave a positive relationship behind. To do this, both the
employer and employee must sacrifice and compromise. Neither should ever try
to obtain every conceivable benefit to which they feel entitled.
Former employers should not overreact. Often, dangerous lawsuits against
former employees are filed based merely on suspicion and rumor. Immediate and
decisive action may be needed under certain circumstances; however, care,
caution, and deliberation are the best ways to avoid a serious lawsuit in
retaliation.
Each side may feel confident in the merit of its position, but the appearance
of unreasonableness often turns out to be more important than any legal
principal or right. Don't be the party who "looks bad." Act carefully and only
after reflection. When in doubt, seek legal advise first.






















EDITORIAL


Rebel Alliance


Believing they needed a competitive edge to combat a common foe, a small band
of large companies formed an alliance more than four years ago. The mission of
the alliance was to create a new computer capable of running each vendor's
software, while providing advantages over that of the "Wintel" (Windows
running on Intel-based platforms) empire. The alliance decided that, if it
could base its approach on a RISC architecture that offered price and
performance advantages over the empire, it might have a fighting chance. And
while Intel was protecting its chip's architecture through patents, the
alliance agreed to create a common instruction set architecture (ISA),
allowing other chip makers to create processors that will run the same code. 
Four years later, both Apple and IBM have launched hardware platforms based on
the Motorola/IBM PowerPC chip. Now, several operating systems are either in
development or already on the market, including AIX, Solaris, Linux, OS/2,
Windows NT, and Apple's Copland. Additionally, developer tools are beginning
to show up in numbers. The big question now is whether corporate developers
will embrace the new technology.
Developers must consider a dizzying array of options. First, there was the
original PowerPC 601 microprocessor, manufactured by IBM but sold by IBM and
Motorola. The 601 was designed to bridge the gap between PowerPC and the POWER
chip used in IBM's RS/6000 workstations. Thus, it uses the older POWER
instruction set, which has since been eliminated from the specification. The
MPC603 and MPC603e included power-management functionality for notebook
computers. The 64-bit MPC620 chip is still in development. However, features
and options such as instruction and data caches all vary with the different
implementations. For instance, the MPC602 (which is targeted at consumer
electronics and embedded applications) has dual 4-KB instruction and data
caches. The MPC604, on the other hand, contains dual 16-KB caches, while the
MPC620 will have separate 32-KB instruction and data caches. 
The alliance had originally intended to deliver on its performance promises by
now. Although Motorola contends that its MPC604 chip outperforms the Pentium,
the estimated 15-30 percent improvement in performance still falls short of
the 2:1 increase originally promised. Moreover, Intel's upcoming P6
microprocessor is expected to show performance comparable to that of the
MPC604 chip.
IBM finally began shipping its PowerPC-based machines in June of this year.
However, the entry-level models running Windows NT cost consumers some $3700,
well above the $2500 for similar Intel-based machines running the same
operating system. Further, IBM announced nearly a year ago that it was
delaying the launch of its PowerPC computers so that it could make ready its
OS/2 for PowerPC. Apparently, the port of the operating system is taking
longer than expected, and the company could wait no longer to deliver its
Power Series computers.
Meanwhile, the first Linux kernel for PowerPC is up and running on Motorola's
PowerPC VME 1603. But when asked about a similar port to PowerMac, project
coordinator Joseph Brothers indicates that Apple cannot come up with the
necessary programming specifications for the PowerMac's NuBus, nor can it
provide necessary information on devices, memory maps, or interrupt hardware.
Motorola has tried for more than a year to obtain the necessary specifications
from Apple. (Incidentally, the Linux kernel is available via anonymous ftp
from liber.stanford.edu/pub/linuxppc.)
Despite the short-term glitches in cost, performance, and roll-outs, the
PowerPC is impressive, and most major operating systems will likely run on
PowerPC-based platforms in the future.
Still, at this stage, the fate of the PowerPC alliance is in the hands of
developers such as yourself. May the source be with you.
Michael Floyd
Executive Editor













































Porting to the PowerMac


A tale of two operating systems




Paul Kaplan


Paul is a staff engineer with Symantec's Development Tools Group and works on
Macintosh and Windows development tools. He can be contacted at
pkaplan@symantec.com.


Apple's new generation of Macs is based on the Motorola PowerPC RISC
processor. The PowerMac offers extremely high performance for applications
that are compiled and linked for it. However, to preserve the investment users
may have in existing software, the PowerMac supports legacy 68K applications.
This support is accomplished through software emulation of the 68K instruction
set and operating-system support for the 68K run-time model. In addition, new
and old code (as well as run-time architectures) can be mixed within an
application. Apple developed the PowerMac OS this way--some of System 7.5 is
still 68K code.
In this article, I'll describe the main similarities and differences between
the old and new OSs and the process of porting Macintosh 68K applications to
the PowerMac. I'll also present an application that illustrates using code
resources to mix new and old code within an application.


68K versus PowerMac


On a PowerMac, two operating systems coexist in parallel --the original 68K
system and the new PowerMac system. They run on top of a "nanokernel," which
provides the lowest-level services such as memory management and interrupt
handling. The magic of coexisting 68K and PowerPC software is worked by the
Mixed Mode Manager.
When an application is launched, the PowerMac OS looks for the special Code
Fragment Resource, type cfrg, which specifies a PowerMac application. If a
valid cfrg resource exists, the application is handed to the Code Fragment
Manager (CFM). This subsystem manages the loading and execution of
applications and shared libraries. In addition to handling the default load
format, the CFM allows the use of custom loaders. A 68K application has no
cfrg resource and is therefore handed to the 68K Segment Manager.
After an application has been launched as either 68K or PowerMac, it can
switch modes while running. To switch modes and run unmodified, 68K
applications call the Mixed Mode Manager implicitly; PowerMac applications can
call it implicitly or explicitly.
In order to run 68K applications, the PowerMac OS has retained a number of
components of the 68K OS. In fact, the PowerMac toolbox calls are a superset
of the 68K system. The file system is the same, so "well-behaved" applications
can be ported with little more than recompiling and linking--the development
system will take care of run-time details. The System 7 MacOS, on the other
hand, retains a single address space for all running applications, and the
multitasking model is still cooperative and non-preemptive. Future releases of
the MacOS, beginning with System 8 (code named "Copland") will provide
multiple, virtual address space, preemptive multitasking, memory-mapped I/O,
and object-oriented user-interface components.
The run-time model for PowerMac applications is completely new. Now, only one
code and one data segment are required, and the segment manager is no longer
used.
The code segment has no relocations, which makes it sharable, and all the
relocations are in the data segment. Each application has a Table Of Contents
(TOC) that serves the same function as the 68K "A5 world" and greatly
simplifies access to global data. The TOC is created by your development
system and is transparent to C or C++ code. Also, the new OS supports, and
depends heavily on, shared libraries. In fact, the PowerMac toolbox is a
shared library. Finally, the application file format has been completely
reorganized.


Porting Your Application


Porting to the PowerPC can be as simple as recompiling if your source code
meets the requirements listed in the next few paragraphs. For example,
Symantec C++ 8.0 automatically converts your existing 7.0 (68K) project;
recompile and link it, and it's ready to go. On the other hand, legacy
applications that take shortcuts to system features will need some porting
work.
The first step in porting any application is to ensure that your code runs
under 68K System 7. Such an application should use only "32-bit clean"
addresses. Older Mac applications sometimes used the high byte of an address
for purposes other than the address. PowerPC addresses use all 32 bits. In
compiling your code, use ANSI C or C++, which will force stronger type
checking and function prototypes. Also, compile with Apple's Universal
Headers, which are shipped with your development system. Universal Headers are
appropriate for both 68K and PowerPC applications and will make your code
portable between them. In addition, either rewrite inline assembly in C, or
place the inline code in separate assembler files. If you insist on keeping
the 68K code, it should be isolated in a separate code resource.
Don't make assumptions about registers, especially passing parameters, as they
are all different. And try to use data types with 4-byte alignment. Although
the PowerPC processor allows alignment anywhere, 4-byte alignment produces
more- efficient code. However, if you're writing structures to a file, using
4-byte alignment can waste disk space.
Beyond these steps, you can use #pragmas to force 68K alignment where it is
necessary for toolbox routines. Check that the alignment is correct when
reading data from an existing disk file. Also, use int and long data types. On
the PowerPC, int and long are 32 bits, and short is 16 bits. The 32-bit
integer is the most efficient data type.
Use the double data type for floating-point variables. The PowerPC FPU
supports only the IEEE 4-byte (float) and 8-byte (double) floating-point
formats. Double is more efficient. The 10- and 12-byte doubles used on 68K are
not supported by the processor. Long doubles are supported with two doubles.
(Note that the Symantec compiler does not support long doubles.) Check all
#pragmas and dependencies on #defines to ensure they still have meaning in the
new environment. Do not put data in code. This would affect
pipelined-instruction performance. And if you have Pascal code, convert it to
C either by hand or with the MPW p2c Pascal-to-C converter (available on
Apple's ETO #17 CD-ROM).
When porting the system interface portion of your application, you should
generally use system calls instead of accessing the hardware directly. In
addition, convert callbacks to universal procedure pointers. These are
available in the Universal Headers. If you're passing a callback procedure's
address to the operating system, you must create a UniversalProcPtr with the
NewRoutineDescriptor function (the actual data structure that describes the
function is called a "routine descriptor"). You need to use UniversalProcPtrs
because the OS makes no assumption about the callback's architecture. Strictly
speaking, routine descriptors are not required for 68K builds (they are
compiled into addresses), but using them will make your code completely
portable between the two environments.
Another thing to watch for is direct access of low memory. Don't do it!
Rather, use the LMSetxxx and LMGetxxx calls in LowMem.h. Finally, don't
explicitly use the 68K run-time model. The 68K run-time-specific calls are not
supported. For example, a call to the Segment Manager would return with no
action.


Linking Your PowerMac Application


Your linker will create a "fragment," which is the atomic load unit and
contains code and static data. Fragments are managed by the CFM. Most PowerMac
applications and shared libraries use the Preferred Executable Format (PEF) to
house fragments. PEF specifies the file header, segments for code and data,
import- and export-symbol tables, and relocations. Normally, the application
resides in the data fork of its file, although fragments can be resources as
well. The linker in your development system will handle the details of
fragments and the PEF.
Your linker should support the xcoff format, which is an extension of the coff
format found in UNIX. This is important because the only stub libraries Apple
supplies to link to the toolbox and shared-library extensions such as the Drag
Manager are in xcoff format. The stubs are supplied with your development
system. The xcoff format can also be used to link third-party static libraries
and object modules from a single translation unit. Normally, the Symantec
development environment skips the step of writing object files; the compiler
passes them directly to the linker in memory.
Dividing applications into shared libraries will make your code reusable and
smaller by eliminating redundantly loaded code. Your development environment
will help you create and manage shared libraries.


Under the Application Hood


As mentioned, the run-time model of an application running on the PowerMac OS
is quite different from that of the 68K. The PowerMac run-time model has one
code and one data segment, which are normally loaded in memory. The code
segment is read only, which makes it suitable to run in ROM, but unsuitable to
store writable data. Code and data elements may be exported from the fragment,
which means their symbols are made public and may be linked dynamically. With
the Symantec environment, symbols are exported with a #pragma.
Within the data segment resides the TOC, which is like a personal address
book. It provides linkage to symbols inside and outside the fragment. The TOC
has linkage to imported routines, imported data, global variables, and the
pool (or pools) of static variables. When loading the application and its
shared libraries, CFM resolves imported symbols and fills in the appropriate
TOC entries. The TOC is 64 KB, so there is a maximum of 16K TOC entries. Your
development system will warn you of a TOC overflow.
Applications should have a main() entry point and may additionally have
user-initialization and termination routines. CFM will call the main() entry
point of an application after it is loaded. CFM may also call an
initialization routine as part of loading the fragment, and it may call a
termination routine when it unloads the fragment. Your development system will
help you define these entry points.



Shared-Library Details


Although common in UNIX, shared libraries are probably best known as DLLs in
Microsoft Windows. Originally, shared libraries were available as an add-on to
older MacOS versions with Apple Shared Library Manager (ASLM), but they are
now a standard feature and are in common use on the PowerMac. Shared libraries
are similar to applications. The main differences are that the file type for a
shared library is shlb, not APPL, and that there is no main() entry point.
Initialization and termination routines are allowed.
When the PowerMac system starts up, its shared libraries are registered with
CFM and made available to all calling applications. Other shared libraries can
be loaded and called at application startup if specified in the PEF file, or
loaded on request by the application. Shared libraries can be loaded
automatically by specifying them as import libraries to your development
environment. Your linker will resolve external symbols to a shared library as
though they were part of a statically linked library. However, the linker
knows they've been imported and will put them in the import list for the
appropriate library. As CFM loads your application, it will also attempt to
load shared libraries specified by the application.
Shared libraries can also be loaded explicitly with the toolbox call
GetDiskFragment. In this case, imports should be specified as "weak" so the
linker won't be unhappy with the unresolved references. If you load a shared
library explicitly, your code should be able to handle a failed load or an
unresolved import (which will have a null address at run time). Shared
libraries also have version capabilities. CFM checks the version number of a
shared library against the version number required by the application and
fails on load if it is not compatible. Version numbers can be specified by
your development system.


Code Resource Examples


MacOS System 7 code resources such as CDEF and MDEF do not need to be
immediately ported to PowerPC. However, there are performance penalties for
mixed-mode switching and for running 68K-emulated code. If the performance of
a code resource is critical, you should convert that resource to a native, or
"accelerated," resource.
To illustrate the process of gradually porting to the PowerMac, I've included
a sample project and the required modifications. Listings One and Two show the
project source files from a 68K program that calls a 68K resource. This
project is a simple application that creates a window, has a standard event
loop, and calls the main() routine in the code resource to handle the Update
Event. The examples don't use any C++ features, although they were compiled
with the Symantec C++ compiler. The InitToolboxStuff() and MouseDownProc()
routines are standard Mac idioms and aren't shown. Also, the error checking
that would be in commercial-grade code is omitted.
The first modification is the same project ported to the PowerMac. Note that
the code-resource routine is still 68K and therefore unchanged. The
main-project routine (see Listing Three) requires a few changes to call the
code resource through the Mixed Mode Manager. Note the use of the Toolbox
routine CallUniversalProc(), which has a varargs parameter list, and the two
required parameters, ProcInfoType and UniversalProcPtr. ProcInfoType has been
initialized to describe the interface of the routine so that
CallUniversalProc() will use the parameters correctly.
The second modification illustrates the changes required to port the resource
to the PowerMac. This time, the main project routine has not changed because
it was ported in the first modification. Listing Four illustrates the
accelerated resource code. There are new calls to __cplusrsrcinit() and
__cplusterm(); the calls to RememberA0(), SetupA4(), and RestoreA4() have been
deleted.
Normal, nonresource applications always follow the main(argc, argv)
convention. The standard run-time library contains hidden code to set up any
arguments to main(), and initialize static constructors and destructors. Code
resources, by tradition, do not necessarily conform to an entry-point
standard.
The Symantec solution for code resources in C++ requires explicit calls to the
run-time routines __cplusrsrcinit() and __cplusterm() within the main()
routine of the code resource. The run-time routines call any static
constructors and destructors, and make the QuickDraw globals available to the
code resource. Code resources also require routine descriptors, which play a
similar role to the ProcInfoType parameter used in CallUniversalProc.
Another feature of the PowerMac is support for a "fat application"--a single
Mac app that contains a 68K version in the resource fork and a PowerMac
version in the data fork. Many of the resources, such as menus and icons, can
be directly shared. With a little work, code resources can be shared as well.
A fat application is backward compatible with 68K, System-7 machines. Although
they take up more disk space, fat applications neatly solve the packaging
problem for some vendors.


Conclusion


Porting standard applications from 68K to PowerPC is relatively simple. The
tools you have to work with--the Mixed Mode Manager, CFM, and your development
system--will allow you to gradually port your application, develop an
application that will run on both the PowerMac and 68K systems, and create an
application exclusively for the PowerMac.


Acknowledgments


I'd like to thank Jim Laskey, Yuen Li, John Micco, and Susan Rona, all from
Symantec, for their help with this article.

Listing One
// Macintosh application to create a very simple window and do basic 
// event handling. Paul Kaplan - Symantec Corporation
#include "InitToolboxStuff.h"
#include "MouseDownProc.h"
#include "UpdateWinProc.h"
#define WIN_RESID 128
#define CODE_RESID 128
void main()
 {
 static WindowPtr theWindow, foundWindow;
 static EventRecord theEvent;
 Handle UpdateWinProcHandle;
 InitToolboxStuff();
 // Setup Window and mouse tracking region
 theWindow = GetNewWindow(WIN_RESID, nil, (GrafPtr)-1);
 RgnHandle mouseRgn = NewRgn();
 // Get code resource and lock its handle
 UpdateWinProcHandle = GetResource('CODE', CODE_RESID);
 HLock(UpdateWinProcHandle);
 Boolean more2do = TRUE;
 while (more2do) // Standard event loop processing
 {
 if (WaitNextEvent(everyEvent, &theEvent, 0xffffffff, mouseRgn))
 {
 switch(theEvent.what)
 {
 case updateEvt: // Call the code resource !!

 
 (*(UpdateWinProcPtr)(*UpdateWinProcHandle))(theWindow);
 break;
 case mouseDown: // Standard Mac Toolbox handling of Mouse Down
 more2do = MouseDownProc(&theEvent, &foundWindow);
 default:
 break;
 }
 } 
 }
 // Free all allocated memory
 HUnlock(UpdateWinProcHandle);
 ReleaseResource(UpdateWinProcHandle);
 DisposeRgn(mouseRgn);
 DisposeWindow(theWindow);
}

Listing Two
// Code resource procedure to draw text in a window
#include <SetUpA4.h>
#define HORIZ 65
#define VERT 95
void main(WindowPtr myWin)
 {
 static char msg[] = "68K Code Resource";
 GrafPtr savedPort;
 RememberA0(); // Save value of A0 for next macro
 SetUpA4(); // Set up A4 for resource globals
 GetPort(&savedPort); // Save current GrafPort
 SetPort(myWin); // Make mine the current GrafPort
 BeginUpdate(myWin); 
 MoveTo(HORIZ, VERT); // Move cursor to position
 DrawText(msg, 0, sizeof(msg)); // Draw the string
 EndUpdate(myWin);
 SetPort(savedPort); // restore current GrafPort
 RestoreA4(); // Restore A4
 }

Listing Three
// Macintosh application to create a simple window and do basic event
handling.
// Paul Kaplan - Symantec Corporation
#include "InitToolboxStuff.h"
#include "MouseDownProc.h"
#include "UpdateWinProc.h"
#define WIN_RESID 128
#define CODE_RESID 128
void main()
 {
 static WindowPtr theWindow, foundWindow;
 static EventRecord theEvent;
 Handle UpdateWinProcHandle;
 // Variable to hold Universal Proc Pointer
 UniversalProcPtr theUPP;
 // Proc Info Type - describes the called procedure's interface
 ProcInfoType theProcInfo = kCStackBased 
 STACK_ROUTINE_PARAMETER(1,kFourByteCode);
 InitToolboxStuff();
 // Setup Window and mouse tracking region
 theWindow = GetNewWindow(WIN_RESID, nil, (GrafPtr)-1);

 RgnHandle mouseRgn = NewRgn();
 // Get code resource and lock its handle
 UpdateWinProcHandle = GetResource('CODE', CODE_RESID);
 HLock(UpdateWinProcHandle);
 Boolean more2do = TRUE;
 while (more2do) // Standard event loop processing
 {
 if (WaitNextEvent(everyEvent, &theEvent, 0xffffffff, mouseRgn))
 {
 switch(theEvent.what)
 {
 case updateEvt: // Call the code resource using 
 // CallUniversalProc instead of 
 // calling routine directly
 theUPP = (UniversalProcPtr)*UpdateWinProcHandle;
 // Convert dereferenced handle to UPP
 CallUniversalProc(theUPP, theProcInfo, theWindow);
 // Call MixedMode Manager
 break;
 case mouseDown: // Standard Mac Toolbox handling of Mouse Down
 more2do = MouseDownProc(&theEvent, &foundWindow);
 default:
 break;
 }
 } 
 }
 // Free all allocated memory
 HUnlock(UpdateWinProcHandle);
 ReleaseResource(UpdateWinProcHandle);
 DisposeRgn(mouseRgn);
 DisposeWindow(theWindow);
}

Listing Four
// Code resource procedure to draw text in a window
#include <new.h>
#define HORIZ 65
#define VERT 95
void main(WindowPtr myWin)
 {
 static char msg[] = "PPC Code Resource";
 static GrafPtr savedPort;
// Call any static constructors in this link unit. Also make 
// QDGlobals available
 __cplusrsrcinit();
 GetPort(&savedPort); // Save current GrafPort
 SetPort(myWin); // Make mine the current GrafPort
 BeginUpdate(myWin); 
 MoveTo(HORIZ, VERT); // Move cursor to position
 DrawText(msg, 0, sizeof(msg)); // Draw the string
 EndUpdate(myWin);
 SetPort(savedPort); // restore current GrafPort
 __cplusrsrcterm(); // Call any destructors in this link unit
 }
End Listings




































































Optimizing for the PowerPC


Strategies for greater performance




Michael Ross


Michael, a software engineer for MetaWare, can be contacted at
miker@metaware.com.


With its great speed, low power requirements, and flexible programming model,
the PowerPC represents a jump in microprocessor technology. Consequently, many
developers are beginning to port their applications from the Intel
architecture to the PowerPC. The PowerPC and the Pentium, however, represent
two very different approaches to computer architecture, and moving
applications from one platform to the other with a minimum of fuss is not
always easy. You may need to change development platforms, targets, or tool
vendors. 
At MetaWare, we've ported our C/C++ compilers to the PowerPC. In doing so,
we've learned a few tricks that I'll share with you in this article. I'll also
describe some of the techniques we use to improve the code our compilers
generate for the PowerPC. For the basis of my discussion I'll use the 601
model.


PowerPC Quick Tour


One interesting feature of the PowerPC is the branch processing unit (BPU);
see Figure 1. The branch processor doesn't depend on either the integer or
floating-point units; it works in concert with the instruction unit to keep
instructions flowing. The BPU can look ahead in the instruction queue for a
branch instruction and use static branch prediction on unresolved conditional
branches to permit fetching instructions from the predicted target instruction
stream. When prediction is correct, a branch can be performed in zero clock
cycles. This feature of the PowerPC architecture is similar to the branch
table of the Pentium or the branch target buffer and the reorder buffer of
Intel's upcoming processor, the P6. The Pentium's 256-element Branch Table for
dynamic branch prediction does not boast the same success rate as the
PowerPC's BPU, and causes a 3- to 20-cycle penalty if the prediction fails.
This is also true on the P6. The PowerPC BPU has more built-in capability that
doesn't rely on the integer-processing unit. Mispredicted branches, on
average, incur less penalty than in the P6 and Pentium. This is because the
instruction queue is only eight instructions long, and a flush of the queue is
likely to incur only a 1- or 2-cycle penalty. 
The BPU has three special-purpose registers that are not part of the usual
general-purpose registers:
Link register (LR).
Count register (CTR).
Condition register (CR). 
The BPU calculates and saves the return pointer for subroutine calls in the
LR. The CTR contains the target address for some conditional branch
instructions. The LR and CTR can be easily copied to or from any
general-purpose integer register. Because the BPU has these special-purpose
registers, all branching except for synchronization can be carried out
independent of the integer and floating-point units. Unlike the Pentium, which
has many special conditions that must be fulfilled before instructions can
execute in parallel (often producing stalls), the PowerPC's BPU unit helps
prevent stalls from occurring. Since the PowerPC architecture reserves these
registers (LR, CTR, CR) for the branch processing unit, compilers have more
integer registers available for allocation of important variables. 
From the compiler's point of view, the number of fast, general-purpose
registers available on a processor is a key factor in the execution speed of
applications. In this respect, the PowerPC has the Pentium beat. The Pentium
has only eight 32-bit integer registers: ESP and EBP are dedicated to special
purposes, and other registers are implicitly destroyed by certain
instructions, creating a headache for register allocation. The P6 processor
(with 40 general registers and a hardware-register-renaming scheme) helps, but
it doesn't solve the problem, due to the need for backward compatibility.
Under the PowerPC application binary interface (ABI), the compiler has 15
general-integer registers that may be used just for local variables. On the
Pentium, some optimizations such as strength reduction for array accesses
simply cause too much competition for registers, and rarely pay off. The
PowerPC allows you to take advantage of all the possible optimizations at your
disposal.
The instruction unit actually contains the instruction queue and the BPU,
providing a central control over the instructions issued to the execution
units, the integer unit, and the floating-point unit. The instruction unit
determines the next instruction to be fetched from the 32-byte cache and
controls pipeline interlocks. It allows out-of-order execution when
instructions do not depend on the result of an instruction currently
executing.
The PowerPC's integer unit (IU) performs all loads, stores, and integer
arithmetic. It contains an ALU and an integer exception register (XER). Most
integer instructions execute in one clock cycle. Loads and stores are issued
and translated in sequential order, but actual memory access can occur out of
order. Synchronizing instructions are available for times when order is
critical. The IU contains 32 integer registers, each 32 bits wide. 64-bit
versions of the PowerPC are coming soon. 
The floating point unit (FPU) is fully IEEE 754 compliant and contains 32
floating-point registers, each of which can hold either a single- or
double-precision operand. The FPU can look ahead and find instructions in the
queue that do not depend on unexecuted instructions and process the latter
early. The FPU is pipelined, so that most instructions can be issued back to
back, without stalling. Unlike the stack-style access to floating operands of
the Intel chips, the floating-point registers allow true random access, so
complex scheduling and swapping algorithms are not necessary to achieve good
performance. 


Measuring Performance


The difference in philosophies between the PowerPC and the Pentium becomes
more evident as you begin to analyze code and performance. The Pentium chip
places the burden of good performance squarely on the compiler writer's
shoulders. The compiler writer has to be aware of all the special conditions
that allow parallel execution of integer instructions in the U and V pipes on
the Pentium, and the many conditions where instruction pairing isn't possible.
The PowerPC lacks all these constraints and provides a lot of hardware
assistance to make the job easier. Compounding the performance problem for
application vendors is the fact that the same sequence of instructions won't
run universally well across the family of Intel processors. For example, to
gain performance on the Pentium, you should replace integer multiply
instructions with a sequence of less complex shifts and adds where possible.
However, on other Intel processors such as the P6, you should do just the
opposite--use the integer multiply instruction rather than adds and shifts. On
the 80486, you need to be concerned with whether an instruction is aligned so
that it crosses a 32-byte boundary, while on the Pentium this makes little
difference. It's much easier to get an application that has uniform
performance across the PowerPC family. A spreadsheet vendor might have to
compile for the lowest common denominator on the Intel family (probably the
80486), so in many cases, customers would not get to use the power of their
Pentium processors. The main thing Pentium has going for it is a huge
installed base, and a lot of shrink-wrapped software for Windows/NT, DOS, and
OS/2.
Although no benchmark is a real indication of how well or poorly your
application will run on a given platform, the following two programs are more
than just benchmarks, because people really use them in their work.
The first, Espresso, is an almost exclusively integer benchmark. Several "hot
spots" dominate its execution time, one of which is the routine massive_count
in cofactor.c. This routine is mainly a large sequence of If-Then-Else
constructs that should be a natural for branch-prediction and cross-jumping
optimizations. The C source code for Espresso is shown in Listing One. Notice
that in the first two loops, the bulk of the operations are of the form: if
(val & <constant> ) cnt[<constant subscript>]++;. This form shows that each
operation is independent of successive ones. A good compiler will keep val in
a register, along with the base address of the array cnt, and the PowerPC will
fill the integer pipeline so that instructions keep executing at a
one-per-cycle rate without stalls. Because the value in each condition is a
constant, the BPU should be able to easily predict the need to branch or fall
through. The compiler should hoist the load of the base address of cnt out of
each If-Then-Else construct. 
In assembly-language format, most PowerPC instructions use the rightmost two
registers as operands and the leftmost as the destination. The code in Listing
Two was generated by the MetaWare PowerPC C/C++ compiler for Solaris on
PowerPC. The only surprise in Listing Two is that the test for the next If is
scheduled ahead of the increment for the last one. The reason is simple: The
601 BPU is highly independent of the integer-execution unit, and the add and
store instructions do not affect the condition codes in the BPU. Moving the
test up avoids a stall due to a dependency on %r10 from one instruction to the
next. The distance between branch instructions is small, allowing the BPU to
look ahead in the queue and easily direct the instruction stream fetch to the
next set of instructions for execution. Note that the only memory references
are those that are absolutely necessary. On our 60-MHz PowerPC 601 with 32 MB
of RAM, Espresso executes in 48 seconds (cumulative time for all of the SPEC
data input sets from the input.ref directory). It has been estimated that one
in five instructions is a branch, so the importance of the BPU in the
architecture really shines here. Naturally, newer versions of the PowerPC,
such as the 604, would be even faster.
For comparison, Listing Three is output from the version 2.6d of the MetaWare
C/C++ compiler for Unix V R4 386, on UnixWare 1.1. We compiled on a 60-MHz
Pentium with the -586 switch for Pentium optimization. The ability to
increment the array element to memory without loading it reduces the number of
instructions needed to perform the loop. However, the real reason for this is
the paucity of registers into which to load the array element. Pentium and P6
optimization lore would suggest that better instruction overlap might occur if
this were more like the RISC load/store model. With the Pentium,
unfortunately, this would generate an unacceptable number of spills. The P6,
with its dual-integer instruction units and register renaming, may make this
more feasible. Surprisingly, the Pentium is only a hair slower executing this
code, managing to do all of the SPEC data sets in 50 seconds, a difference of
4 percent. The difference here appears to be in the time needed to make memory
references, and the fact that the tests and branches are not independent of
the integer unit. Note, however, that while this code would run the same or
faster on all PowerPC variants, the same is not true of the 80x86 family, not
because of clock speed, but because of differences in architecture. A 100-MHz
486 could not expect to do as well on this code, since each branch to the top
of the loop would incur a 2-cycle penalty. This shows the value of branch
prediction. The Pentium will suffer on integer code from its inability to pair
shift and rotate instructions with variable shift counts, mul and div
instructions, and some floating-point instructions. In spite of Pentium's
parallel U and V pipes, some instructions don't execute in anything but the U
pipe. This forces the compiler to schedule instructions so that two U pipe
instructions don't occur consecutively.


Floating-Point Operations


Though less common in mainstream applications, floating point is certainly
important for scientific programmers. As an exercise in portability and
because I'm the author of MetaWare's Fortran compiler, I decided to do the
floating-point comparison for this article using Fortran. To do this, I had to
port the Fortran front end and library to Solaris PowerPC. To my surprise, the
entire process took under four hours, most of which was actual compiling and
linking, and the result was a compiler that passed all but three of the
Fortran 77 validation suite tests on the first try. (It's been passing
entirely on other platforms for years.) Solaris made the process painless.
With the compiler ported, I was able to run the familiar LINPACK benchmark.
LINPACK is showing its age, but it is still used in a wide variety of real
applications. One routine, SAXPY, contains a loop that is the main time
consumer in the benchmark. The PowerPC version of this loop is in Listing
Four. Here you can see another thoughtful aspect of the PowerPC architecture.
Not many RISC architectures include instructions like floating multiply and
add (or subtract). The HP PA architecture has such an instruction, but its
restrictions make it unusable for LINPACK. The Pentium, for all its CISCness,
includes no such instruction. The compiler has reduced the addressing
computations in this loop by doing strength reduction. And of course, there's
no lack of general floating-point registers to work with. 
Listing Five shows how the Pentium compares. In spite of best efforts and some
good optimization techniques such as loop reversal and loop unrolling, Pentium
doesn't come off too well here. The major problem again is lack of registers
and the rather odd floating-point stack architecture. Pentium turns in a
performance of 7.616 Mflops, compared to the PowerPC's 8.58 Mflops.


Optimization Techniques


The compiler uses a number of techniques to make your code run more
efficiently: common-subexpression elimination, dead-store elimination,
register allocation, and global-constant propagation. The impact of these
optimizations depends upon the architecture and the application code itself.
For example, Listing Six shows the assembly-language output on the Pentium
using the SAXPY loop without any optimization turned on.
Listing Six also shows the useless overhead in loading the address of the
array element and performing the loop. The net effect is a loss in performance
down to 5.317 Mflops. 
The general optimizations pay off. If you compare the optimized and
unoptimized code, you'll mainly see the effects of register allocation, loop
unrolling, and induction elimination. But for fast code, each code generator
needs to pay attention to the specific quirks of the target machine. For each
architecture, the MetaWare compiler uses two phases, massage and expand, that
work on the intermediate language form of the program. Here, the compiler must
consider whether the machine has transcendental floating-point instructions
and scaled-indexing addressing modes, and whether multiply or a series of
shifts and adds is faster for that particular processor. On the PowerPC, for
example, the compiler does not replace constant multiply with shifts and adds
unless the final instruction sequence is no more than two instructions longer.
The reason is that multiplies are fairly cheap on a PowerPC. On the Pentium,
you look for things like block moves that might tie up too many registers and
decrease the register pressure by combining base and index registers. On most
of the architectures, these phases try to eliminate the use of a special
frame-pointer register where possible and use the stack pointer instead,
freeing up another general register for other purposes. With the Pentium, this
is particularly important, since registers are so scarce.

A third phase for code improvement combines peephole optimization and
scheduling. Each code generator has the option of both a high-level scheduling
pass on the intermediate language and a low-level scheduling pass on the
actual machine instructions. Because this is table driven, the code that
actually does the scheduling can be the same in both cases. The tables take
into account the vagaries of U and V pipe pairing on the Pentium and the
pipeline stages of results on the PowerPC.


Programming for Performance


As is evident from the PowerPC design, hardware dynamic-branch prediction is
beginning to supplant the older branch-delay slot design, where an instruction
was moved after a branch to execute while waiting for the branch to take
effect. Most chips have a finite cache or instruction queue that they can
examine in order to predict the branch. If you keep the distance between a
conditional branch and its target small and use expressions that are easily
dynamically evaluated by something like the PowerPC's BPU, you'll increase the
chance that the BPU will effectively predict whether the branch will be taken.
Also, if the target and the branch fall within the cache size, your code will
likely execute faster. You should separate expressions and their consumers as
widely as possible, within cache limits. For example, given the expressions in
Figure 2(a), consider reordering the statements to Figure 2(b). This makes it
possible to complete all the operations of the first statement while
processing some of the independent operations of the second, thus avoiding any
possible stalls waiting for the completion of the first statement.
For processors like the Pentium, look through the critical regions of your
code after profiling your application. If your critical loops have more than
four or five heavily used variables, try to reduce that number. Since the
Pentium only has a few general registers available for local variables, you
can give the compiler a boost in optimizing your code if you confine the
number of heavily used variables or constants to a small number in loops. For
floating-point code, think about ways to break down transcendentals or
floating-point divide instructions into different expressions. For example,
the MetaWare compiler tries to replace ARCSIN with
ARCTAN2(X,SQRT((1-X)*(1+X)), which can be done with inline code. 
By breaking down complex operations into simpler ones, you'll give most
processors the chance to schedule your code for faster execution. You can't
always depend on your compiler to make transformations for you. Keep your
floating-point expressions smaller than the size of the Pentium's
floating-point stack. If you write a huge expression with more than seven or
eight intermediate results, you are forcing the compiler to spill from the
stack to some temporary location, or to use memory-reference instructions to
complete the calculation, resulting in slower code. A little extra care in the
critical regions of your program, keeping the characteristics of current
processors in mind, can pay large dividends in performance.


Conclusion


With the PowerPC's price-to-performance ratio closing in on that of the
Pentium, users are finding this architecture more attractive. In addition to
the Macintosh, there will soon be opportunities to field applications on
Solaris and OS/2 for the PowerPC. The Mac already has fairly successful DOS
80x86 emulation, enabling you to run a lot of shrink-wrapped software without
sacrificing your software investment. New applications can benefit from the
scalability of the PowerPC family and the new speed it offers. However, you'll
need to rethink strategies in order to wring out that last ounce of
performance.
Figure 1: PowerPC 601 block diagram.
Figure 2: Reordering statements to improve performance.
(a)
x=y*z+2.0z=sqrt xc=pi*(r*r)

(b)
x=y*z+2.0c=pi*(r*r)z sqrt x

Listing One
#include "espresso.h"
void massive_count(pcube *T){
 int *count = cdata.part_zeros;
 pcube *T1;
 /* Clear the column counts (count of # zeros in each column) */
 { 
 register int i;
 for(i = cube.size - 1; i >= 0; i--)
 count[i] = 0;
 }
 /* Count the number of zeros in each column */
 { 
 register int i, *cnt;
 register unsigned int val;
 register pcube p, cof = T[0], full = cube.fullset;
 for(T1 = T+2; (p = *T1++) != NULL; )
 for(i = LOOP(p); i > 0; i--)
 if (val = full[i] & ~ (p[i] cof[i])) {
 cnt = count + ((i-1) << LOGBPI);
#if BPI == 32
 if (val & 0xFF000000) {
 if (val & 0x80000000) cnt[31]++;
 if (val & 0x40000000) cnt[30]++;
 if (val & 0x20000000) cnt[29]++;
 if (val & 0x10000000) cnt[28]++;
 if (val & 0x08000000) cnt[27]++;
 if (val & 0x04000000) cnt[26]++;
 if (val & 0x02000000) cnt[25]++;
 if (val & 0x01000000) cnt[24]++;
 }
 if (val & 0x00FF0000) {
 if (val & 0x00800000) cnt[23]++;
 if (val & 0x00400000) cnt[22]++;
 if (val & 0x00200000) cnt[21]++;
 if (val & 0x00100000) cnt[20]++;

 if (val & 0x00080000) cnt[19]++;
 if (val & 0x00040000) cnt[18]++;
 if (val & 0x00020000) cnt[17]++;
 if (val & 0x00010000) cnt[16]++;
 }
#endif
 if (val & 0xFF00) {
 if (val & 0x8000) cnt[15]++;
 if (val & 0x4000) cnt[14]++;
 if (val & 0x2000) cnt[13]++;
 if (val & 0x1000) cnt[12]++;
 if (val & 0x0800) cnt[11]++;
 if (val & 0x0400) cnt[10]++;
 if (val & 0x0200) cnt[ 9]++;
 if (val & 0x0100) cnt[ 8]++;
 }
 if (val & 0x00FF) {
 if (val & 0x0080) cnt[ 7]++;
 if (val & 0x0040) cnt[ 6]++;
 if (val & 0x0020) cnt[ 5]++;
 if (val & 0x0010) cnt[ 4]++;
 if (val & 0x0008) cnt[ 3]++;
 if (val & 0x0004) cnt[ 2]++;
 if (val & 0x0002) cnt[ 1]++;
 if (val & 0x0001) cnt[ 0]++;
 }
 }
 }
 /*
 * Perform counts for each variable:
 * cdata.var_zeros[var] = number of zeros in the variable
 * cdata.parts_active[var] = number of active parts for each variable
 * cdata.vars_active = number of variables which are active
 * cdata.vars_unate = number of variables which are active and unate
 *
 * best -- the variable which is best for splitting based on:
 * mostactive -- most # active parts in any variable
 * mostzero -- most # zeros in any variable
 * mostbalanced -- minimum over the maximum # zeros / part / variable
 */
 { 
 register int var, i, lastbit, active, maxactive;
 int best = -1, mostactive = 0, mostzero = 0, mostbalanced = 32000;
 cdata.vars_unate = cdata.vars_active = 0;
 for(var = 0; var < cube.num_vars; var++) {
 if (var < cube.num_binary_vars) { /* special hack for binary vars */
 i = count[var*2];
 lastbit = count[var*2 + 1];
 active = (i > 0) + (lastbit > 0);
 cdata.var_zeros[var] = i + lastbit;
 maxactive = MAX(i, lastbit);
 } 
 else{
 maxactive = active = cdata.var_zeros[var] = 0;
 lastbit = cube.last_part[var];
 for(i = cube.first_part[var]; i <= lastbit; i++) {
 cdata.var_zeros[var] += count[i];
 active += (count[i] > 0);
 if (active > maxactive) maxactive = active;

 }
 }
 /* first priority is to maximize the number of active parts */
 /* for binary case, this will usually select the output first */
 if (active > mostactive)
 best = var, mostactive = active, mostzero = cdata.var_zeros[best],
 mostbalanced = maxactive;
 else if (active == mostactive)
 /* secondary condition is to maximize the number zeros */
 /* for binary variables, this is the same as minimum # of 2's */
 if (cdata.var_zeros[var] > mostzero)
 best = var, mostzero = cdata.var_zeros[best],
 mostbalanced = maxactive;
 else if (cdata.var_zeros[var] == mostzero)
 /* third condition is to pick a balanced variable */
 /* for binary vars, this means roughly equal # 0's and 1's */
 if (maxactive < mostbalanced)
 best = var, mostbalanced = maxactive;
 cdata.parts_active[var] = active;
 cdata.is_unate[var] = (active == 1);
 cdata.vars_active += (active > 0);
 cdata.vars_unate += (active == 1);
 }
 cdata.best = best;
 }
}

Listing Two
 !137 if (val = full[i] & ~ (p[i] cof[i])) {
 sli %r8,%r11,2 !+a4 Shift i, which is %r11, left 2
 lwzx %r10,%r8,%r12 !+a8 Add %r8(p) and %r12 to form address, load
 lwzx %r7,%r8,%r4 !+ac Add %r4 (cof) and %r8 (i<<2), load
 nor %r10,%r7,%r10 !+b0 or not operation
 lwzx %r8,%r8,%r5 !+b4 Add %r5 (full), %r8(i), load
 and. %r8,%r8,%r10 !+b8 and of full[i] &~ (p[i] cof[i])
 ! val ends up in %r8
 beq ..LL63 !+bc
!138 cnt = count + ((i-1) << LOGBPI);
 sli %r10,%r11,7 !+c0
 addi %r6,%r10,-128 !+c4
 add %r7,%r29,%r6 !+c8 Get base address assigned to cnt in %r7
!139 #if BPI == 32
!140 if (val & 0xFF000000) {
 andis. %r10,%r8,65280 !+cc Test bits in val
 beq ..LL64 !+d0 Branch if bits not set
!141 if (val & 0x80000000) cnt[31]++;
 andis. %r10,%r8,32768 !+d4 Test bits in val
 beq ..LL65 !+d8 Branch if bits not set
 lwz %r10,+124(%r7) !+dc Base address already in %r7, load element
 andis. %r31,%r8,16384 !+e0 Scheduling places test of val ahead
 ! to avoid integer pipeline stall
 addi %r10,%r10,1 !+e4 add 1 to cnt[31]
 stw %r10,+124(%r7) !+e8 Store element to memory
!142 if (val & 0x40000000) cnt[30]++;
 beq ..LL66 !+ec Branch on pretested condition
 b ..LL67 !+f0
 andis. %r10,%r8,16384 !+f4 Test next bits in val
 beq ..LL66 !+f8 Branch if not set
 lwz %r10,+120(%r7) !+fc Load cnt[30]

 andis. %r31,%r8,8192 !+100 Test next bits
 addi %r10,%r10,1 !+104 Increment cnt[30]
 stw %r10,+120(%r7) !+108 Store back to memory 

Listing Three
/136 : for(i = LOOP(p); i > 0; i--)
 movl (%edx),%eax
 andl $1023,%eax / 0x3ff
 testl %eax,%eax / initialize i, get it into %eax, test for zero
 jle .L43
.L44:
/137 : if (val = full[i] & ~ (p[i] cof[i])) {
 movl (%edx,%eax,4),%ebx / Load cof[i] into %ebx
 movl 68(%esp),%esi / Load index into %esi
 orl (%esi,%eax,4),%ebx / or together, from memory
 movl 64(%esp),%esi / load full[i] index
 notl %ebx / not of expr
 andl (%esi,%eax,4),%ebx / and from memory
 testl %ebx,%ebx / val now in %ebx
 je .L45
/138 : cnt = count + ((i-1) << LOGBPI);
 movl %eax,%edi / Copy i
 shll $7,%edi / Shift left
 testl $-16777216,%ebx / 0xff000000 / Test bits in val
 lea -128(%edi,%ecx),%esi / Get base address into %esi
/139 : #if BPI == 32
/140 : if (val & 0xFF000000) {
 je .L46 / Branch on bits not set
/141 : if (val & 0x80000000) cnt[31]++;
 testl $-2147483648,%ebx / Test bits in val
 je .L47 / branch on bits not set
 incl -4(%edi,%ecx) / Increment cnt[31]
.L47:
/142 : if (val & 0x40000000) cnt[30]++;
 testl $1073741824,%ebx / 0x40000000
 je .L48
 incl 120(%esi)
.L48:
/143 : if (val & 0x20000000) cnt[29]++;
 testl $536870912,%ebx / 0x20000000
 je .L49
 incl 116(%esi) 

Listing Four
!459 do 30 i = 1,n
 addi %r11,%r7,4 !+68 Pointer to dy(i)
 addi %r10,%r5,4 !+6c Pointer to dx(i)
 mtctr %r9 !+70 
 addi %r10,%r10,-8 !+74 Array bias
 addi %r11,%r11,-8 !+78
!460 dy(i) = dy(i) + da*dx(i)
 lfs %f12,+4(%r11) !+7c Load dy(i)
 lfsu %f0,+4(%r10) !+80 Load dx(i)
 fmadds %f0,%f0,%f13,%f12 !+84 Floating multiply and add
 stfsu %f0,+4(%r11) !+88 Store result into dy(i)
!461 30 continue
 addi %r12,%r12,1 !+8c Increment loop counter
 bdnz ..LL97 !+90 test and branch non-zero
 b ..LL98 !+94


Listing Five
/459 : do 30 i = 1,n
 cmpl $2,%ecx / Check for loop execution
 movl 68(%esp),%esi / Get pointer to dx
 movl 60(%esp),%edx /
 jle .L82
 jmp .L83
.L81:
.L91:
 movl %eax,%edi
.L83:
/460 : dy(i) = dy(i) + da*dx(i)
 flds -4(%edx,%edi,4) / load dx(i)
 fmul %st(1),%st / da * dx(i)
 addl $-2,%ecx / Loop was reversed to count down
 lea 2(%edi),%eax
 fadds -4(%esi,%edi,4) / Add dy(i)
 cmpl $2,%ecx
 fstps -4(%esi,%edi,4) / Store result to dy(i)
/461 : 30 continue
 flds (%edx,%edi,4) / Next loop iteration, loop unrolling
 fmul %st(1),%st
 fadds (%esi,%edi,4)
 fstps (%esi,%edi,4)
 jg .L91
 jmp .L92

Listing Six
/459 : do 30 i = 1,n
 movl $1,.L98.I
 decl %ecx
 lea 1(%ecx),%eax
 andl %eax,%eax
 jle .L55
 jmp .L57 
 .L57:
/460 : dy(i) = dy(i) + da*dx(i)
 movl .L98.I,%ecx / I is now static, loaded from memory
 movl 24(%ebp),%edi 
 flds -4(%edi,%ecx,4)
 movl 16(%ebp),%edx / Load index
 movl 12(%ebp),%esi / Load address of array element
 flds -4(%edx,%ecx,4) / Load one array element to floating stack
 fmuls (%esi) / Multiply
 faddp %st,%st(1) / Add
 fstps -4(%edi,%ecx,4) / Store result
/461 : 30 continue
 incl %ecx / Do loop book keeping
 movl %ecx,.L98.I
 decl %eax
 andl %eax,%eax
 jg .L57
End Listings





































































PowerPC 601 and Alpha 21064


Second generation RISC processors




Shlomo Weiss and James E. Smith


Shlomo is a faculty member in the Department of Electrical Engineering/Systems
at Tel Aviv University. Jim is a faculty member in the Department of
Electrical and Computer Engineering at the University of Wisconsin-Madison.
They are the authors of POWER and PowerPC: Principles, Architecture,
Implementation (Morgan Kaufmann Publishers, 1994). Shlomo and Jim can be
reached through the DDJ offices.


Just as there is more than one way to skin a cat, there is more than one way
to implement RISC concepts. The PowerPC is a good example of a
high-performance RISC implementation that is tuned to a specific architecture.
It isn't, however, the only RISC implementation style that processor designers
have used. We'll compare the PowerPC 601 to an alternative RISC architecture
and implementation--the DEC Alpha 21064.
The 601 focuses on relatively powerful instructions and great flexibility in
instruction processing. The 21064 depends on a very fast clock, with simpler
instructions and a rigid implementation structure. Both the 601 and the 21064
have load/store architectures with 32-bit, fixed-length instructions. Each has
32 integer and 32 floating-point registers, but beyond these basic properties,
they have little in common; see Table 1. 
The 601 has a relatively small die size due to IBM's aggressive 0.6-micron
CMOS technology with four levels of metal (a fifth metal layer is used for
local interconnect); see Table 2. The cache size of each chip largely accounts
for the substantial difference in the transistor count. Two striking
differences appear in clock cycle and power dissipation. The 21064 is much
faster, but also runs much hotter. It's well-known that CMOS's faster clock
gives it more power, but even if a fast clock "wins" in performance, its
higher power-consumption requirements could "lose" in usefulness--in portable
PCs, for example. 


PowerPC 601 Pipelines


All instructions for the 601 are processed in the fetch and dispatch stages.
Branch and Condition Register instructions go no farther. Fixed-point and
load/store instructions are also decoded in the dispatch stage of the pipe and
are then passed to the FXU to be processed. Most fixed-point arithmetic and
logical instructions take just two clock cycles in the FXU: one to execute and
one to be written into the register file. All load/store instructions have
three cycles in the FXU: address generation, cache access, and register write.
This assumes a cache hit, of course.
The 601 design emphasizes getting the FXU instructions processed in as few
pipeline stages as possible. This low-latency design is evident in the
combining of the dispatch and decode phases of instruction processing. The
effect of an instruction pipeline's length on performance is most evident
after a branch, when the pipeline may be empty or partially empty.
The shorter the pipeline, the more quickly instruction execution can start
again. Most of the time, the first instructions following a branch are FXU
instructions (even in floating-point-intensive code), because a program
sequence following a branch typically begins by loading data from memory (or
by preparing addresses with fixed-point instructions). Obviously, a short FXU
pipeline is desirable.
In contrast, floating-point instructions are processed more slowly. FPU
decoding is not performed in the same clock cycle as dispatching. The first
floating-point instruction following a branch is likely to depend on a
preceding load, so the extra delay in the floating-point pipeline will not
affect overall performance significantly. This extra delay reduces the
interlock between a floating load and a subsequent dependent floating-point
instruction to just one clock cycle.
The buffer at the beginning of the FPU can hold up to two instructions; the
second buffer slot is the decode latch, where instructions are decoded. In the
FXU pipeline, there is a one-instruction decode buffer that can be bypassed.
The decode buffers provide a place for instructions to be held if one of the
pipelines blocks due to some interlock condition or an instruction that
consumes the execute stages for multiple cycles. By getting instructions into
the decode buffers when a pipeline is blocked, the instruction buffers are
allowed to continue dispatching instructions (especially branches) to
nonblocked units.


21064 Pipelines


The 21064 pipeline complex is composed of three parallel pipelines:
fixed-point, floating-point, and load/store. The pipelines are relatively
deep, and the integer and load/store pipes are the same length. These are the
stages that an instruction may go through:
1. F, Fetch. The instruction cache is accessed, and two instructions are
fetched. 
2. S, Swap. The two instructions are directed to either the integer or the
floating-point pipeline, sometimes swapping their positions, and branch
instructions are predicted. 
3. D, Decode. Instructions are decoded in preparation for issue--the opcode is
inspected to determine the register and resource requirements of each
instruction. Unlike IBM processors, registers are not read during the decode
stage. 
4. I, Issue. Instructions are issued and operands are read from the registers.
The register and resource dependencies determine if the instruction should
begin execution or be held back. After the issue stage, instructions are no
longer blocked in the pipelines, and can therefore be completed. 
5. A, ALU stage 1. Integer adds, logicals, and short-length shifts are
executed. Their results can be immediately bypassed back, so these appear to
be single-cycle instructions. Longer-length shifts are initiated in this
stage, and loads and stores do their effective-address add. 
6. B, ALU stage 2. Longer-length shifts complete and their results are
bypassed back to ALU 1, so these are two-cycle instructions. For loads and
stores, the data cache tags are read. Loads also read cache data. 
7. W, Write stage. Results are written into the register file. Cache hit/miss
is determined. Data from store instructions that hit is stored in a buffer. It
will then be written into the cache during a cycle with no loads. 
The 21064 integer pipeline relies on a large number of bypasses to achieve
high performance. In a deep pipeline, bypasses reduce apparent latencies.
There are a total of 38 separate bypass paths. 
Floating-point instructions pass through F, S, D, and I stages just like
integer instructions. Floating-point multiply and add instructions are
performed in stages F through K. The floating-point divide takes 31 cycles for
single precision and 61 cycles for double precision. 


Dispatch Rules


The dispatch rules in the 601 are quite simple. The architecture has three
units--Integer (or Fixed Point), Floating Point, and Branch--that can process
instructions simultaneously. Integer operate instructions and all loads and
stores go to the same pipeline (FXU), and only one instruction of this
category may issue per clock cycle. 
The 21064's swap corresponds to the 601's dispatch. Instructions issue two
stages later. In the 21064, instructions must issue in their original program
order, and dispatch (that is, the swap stage) helps to enforce this order. A
pair of instructions belonging to the same aligned doubleword (or "quadword"
in DEC parlance)can issue simultaneously. Consecutive instructions in
different doublewords may not dual-issue, and if two instructions in the same
doubleword cannot issue simultaneously, the first in the program sequence must
issue first.
The 21064 implements separate integer and load/store pipelines, and several
combinations of these instructions may be dual-issued (with the exception of
integer operate/floating store, and floating operate/integer store). The
separate load/store unit requires an extra set of ports to both the integer
and floating register files. The load/store ports are shared with the Branch
Unit, which has access to all the registers because the 21064 architecture has
no condition codes, and branches may depend on any integer or floating
register. Consequently, branches may not be issued simultaneously with load or
store instructions. 
Table 3 summarizes the dispatch rules for both chips. In the 601 table, an X
in the corresponding row/column indicates that two instructions may
simultaneously issue. For three instructions, all three pairs must have Xs. In
the 21064 table, two instructions with an X may simultaneously issue. 
The ability of the 21064 to dual-issue a load with an integer-operate
instruction is a definite advantage over the 601. Many applications (not to
mention the operating system) use very little floating point; the 21064 can
execute these apps with high efficiency, but the 601 can execute only one
integer instruction per clock cycle (while its FPU sits idle). 


Register Files



The 21064 and 601 have register files with almost the same number of ports;
see Table 4. Both start with one write and two read ports to service operate
instructions. The 21064 provides an additional pair of read/write ports for
load/store unit data. Branches share the load/store register ports, which
brings the count up to 3R/2W for both integer and floating-register files. One
additional integer read port is needed to get the address value for stores and
loads. Doing an integer store in parallel with an integer operate involves an
extra integer read port, but not allowing a register-plus-register addressing
mode saves a register-read port. 
The 601's one write and two read ports for operate instructions are fortified
by an additional integer read port for single-cycle processing of store with
index instructions, which read three registers (two for the effective address,
one for the result). An extra integer write port allows the result of an
operate instruction and data returned from the cache to be written in the same
clock cycle. The same consideration accounts for two write ports in the
floating-register file. The three floating-point read ports accommodate the
combined floating multiply/add instruction. 


Data Caches


The 21064 uses separate instruction and data caches. The data caches are
small, (8KB) direct-mapped data caches designed for very fast access times;
see Figure 1(a). The address add consumes one clock cycle. During the next
clock cycle, the Translation Lookaside Buffer (TLB) is accessed and the cache
data and tag are read. In a direct-mapped cache this is easy because only one
tag must be read, and the data, if present, can only be in one place. The TLB
address translation completes in the third cycle, and the tag is compared with
the upper address bits. A cache hit or miss is determined about halfway
through this clock cycle. The data are always delivered to the registers as an
aligned, 8-byte doubleword. Alignment, byte selecting, and the like must be
done with separate instructions. 
In the 601, the unified data/instruction cache is much larger--32 KB--and is
8-way set associative, yielding a higher hit rate than the 21064. Figure 1(b),
shows how much more "work" the 601 does in a clock cycle. It does an address
add and the cache directory/TLB lookup in the same cycle. During the next
cycle, it accesses the 32-byte-wide data memory and selects and aligns the
data field. 
The 601 gets more done in fewer stages, but the 21064's clock cycle is about a
third to a fourth the length of the 601's. Consequently, the 601's two clock
cycles take much longer than the 21064's three cycles.


Example of Pipeline Flow


Example 1 shows a For loop in C and its corresponding 21064 assembly-language
code. Note in this and subsequent examples that the notation, bit numbering
and assembly language do not conform to that of Alpha; they have been modified
to be consistent with PowerPC notation. Example 2 is the 21064 pipeline flow
for the example loop. It shows in-order issue, dual-issue for aligned
instruction pairs, and the relatively long six-clock-period floating-point
latency. After the I stage, instructions never block. 
The importance of the swap stage is clear from the first two instructions,
which cannot dual-issue because both are loads. The second instruction is held
for one cycle while the first moves ahead. The first dual-issue occurs for the
first addq-mult pair. Because mult is the first instruction in the doubleword,
addq must wait, even though no dependencies hold it back. The sequence of
dependent floating-point instructions paces instruction issue for most of the
loop. Note that the floating store issues in anticipation of the
floating-point result. It waits only four--not six--clock periods for the
result so that it reaches its write stage just in time to have the
floating-point result bypassed to it. 
A bubble follows the predicted branch at the end of the loop. Because other
instructions in the pipeline are blocked, however, by the time the ldt
following the branch is ready to issue, the bubble is "squashed." That is, if
the instruction ahead of the bubble blocks and the instruction behind
proceeds, the bubble is squashed between the two and eliminated. 
Overall, the loop takes 16 clock periods per iteration in steady state. (The
first ldt passes through I at time 4; during the second iteration, it issues
at time 20.) In comparison, the 601 takes six (longer) clock periods.
Floating-point latencies are a major performance problem for the 21064 when it
executes this type of code. Also, in-order issue prevents the loops from
"telescoping" together as they would in the 601--there is very little overlap
among consecutive loop iterations, and the small amount that occurs is mostly
due to branch prediction. Each parallelogram in Figure 2 illustrates the
general shape of the pipeline flow for a single loop iteration.
In the 601, the branch processor eliminates the need for branch prediction,
and the out-of-order dispatch, along with multiple buffers placed at key
points, telescopes the loop iterations. Telescoping in the 601 is limited by
the lack of store buffer in the FPU, which other implementations may choose to
provide. The RS/6000, for example, has register renaming, deeper buffers, and
more bypass paths; it achieves much better telescoping than the 601. 
Software pipelining or loop unrolling are likely to provide much better
performance for a deeply pipelined implementation like the 21064. The DEC
compilers unroll loops. Example 3 shows the unrolled version of Example 2. The
example loop is unrolled four times. The clock period at which instructions
pass through the I stage is shown in the right-hand column. Now, in steady
state, four iterations take 23 clock periods (about six per iteration), more
than three times better than the rolled version. Unrolling also emphasizes the
performance advantage of dual-issue. 
Loop unrolling also improves the performance of the 601, as Example 4 shows.
After dispatching in the 601, instructions may be held in a buffer or in the
decode stage if the pipeline is blocked. Hence, we show FXU and FPU decode
time, and BU execute time (which is the same cycle in which a branch is
decoded). 
Assume that the loop body is aligned in the cache sector. Eight instructions
are fetched, and instruction fetching can keep the instruction buffer full
until time 2; after that, the cache is busy with load instructions. The
instruction queue becomes empty and the pipeline is starved for instructions,
but these cannot be fetched until time 9, when the cache finally becomes
available. At this time, the six remaining instructions of the cache sector
are fetched (the first two were fetched at time 2). 
The unrolled loop (four iterations) takes 20 clock cycles (five clock cycles
per loop iteration versus six in the rolled version). 


Branch Instructions


There are significant differences in the way the PowerPC and Alpha
architectures handle branches; see Figure 5. The PowerPC has a special set of
registers designed to implement branches. Conditional branches may test fields
in the Condition Code Register and the contents of a special register, the
Count Register. A single branch instruction may implement a loop-closing
branch whose outcome depends on both the Count Register and a Condition Code
value. Comparison instructions set fields of the Condition Code Register
explicitly, and most arithmetic and logical instructions may optionally set a
condition field by using the record bit. 
In the Alpha, conditional branches test a general-purpose register relative to
zero or to odd or even. Thus, a test can be performed on the result of any
instruction. Comparison instructions leave their result in a general-purpose
register. 
Certain control-transfer instructions save the updated program counter and use
it as a subroutine return address. In the Alpha, these are special jump
instructions that save the return address in a general-purpose register. In
the PowerPC, this is done in any branch by setting the Link (LK) bit to 1, and
saving the return address in the Link Register. 
The Alpha also implements a set of conditional move instructions that move a
value from one register to another, but only if a condition, similar to the
branch condition, is satisfied. These conditional moves eliminate branches in
many simple, conditional code sequences; see Example 5. A simple If-Then-Else
sequence is given in Example 5(a). A conventional code sequence appears in
Example 5(b); the timing shown is for the best-case path, assuming a correct
prediction. Example 5(c) uses a conditional move. While the load is being
done, both shifts can essentially be performed for free. The shift 4 is
tentatively placed in register r3 to be stored to memory. If the test of a is
True, then the conditional move to c replaces the value in r3 with the shift 2
results. The total time is shorter than the branch implementation (even in the
best case) and does not depend on branch prediction. 
In general, branch target addresses are determined in the following ways:
Adding a displacement to the program counter (PC relative). Available in both
architectures. 
Absolute. Available only in the PowerPC, where the displacement is interpreted
as an absolute address if the Absolute Address (AA) bit is set to 1. 
Register indirect. Available for instructions not shown in Figure 3. These are
the XL-form conditional branches in the PowerPC and jump instructions in the
Alpha. General-purpose registers in the Alpha are used, and the Count Register
and Link Register are used in the PowerPC. 
Both processors predict branches to reduce pipeline bubbles. The 601 uses a
static branch prediction made by the compiler. Also, as a hedge against a
wrong prediction, the 601 saves the contents of the instruction buffer
following a branch-taken prediction until instructions from the taken path are
delivered from memory; thus, the instructions on the not-taken path are
available immediately if a misprediction is detected.
The 21064 implements dynamic branch prediction with a 2048-entry table; one
entry is associated with each instruction in the instruction cache. The
prediction table updates as a program runs and contains the outcome of the
most recent execution of each branch. This predictor is based on the
observation that most branches are decided the same way as on their previous
execution. This is especially true for loop-closing branches. 
This type of prediction does not always work well for subroutine returns,
however. A subroutine may be called from a number of places, so the return
jump is not necessarily the same on two consecutive executions. The 21064 has
special hardware to predict the target address for return-from-subroutine
jumps. The compiler places the lower 16 bits of the return address in a
special field of the jump-to-subroutine instruction. When this instruction is
executed, the return address is pushed on a four-entry prediction stack, so
return addresses can be held for subroutines nested four deep. The stack is
popped prior to returning from the subroutine, and the return address is used
to prefetch instructions from the cache.


Conditional-Branch Pipeline Flow


We are now ready to step through the pipeline flow for the Alpha conditional
branches; see Figure 4. 
The swap stage of the pipeline examines instructions in pairs. After the
branch instruction is detected and predicted, it takes one clock cycle to
compute the target address and begin fetching, which may lead to a one-cycle
bubble in the pipeline. The pipeline is designed to allow squashing of this
bubble. In the case of a simultaneous dispatch conflict, as in Figure 4(a),
the instruction preceding the branch must be split from it anyway, so the
branch instruction waits a cycle and fills in the bubble naturally. If the
pipeline stalls ahead of the branch, the bubble can be squashed by having an
instruction behind the branch move up in the pipe. If the bubble is squashed
and the prediction is correct, the branch effectively becomes a zero-cycle
branch. 
Figure 4(b) shows the incorrect-prediction case. The branch instruction
registers are read during issue stage. During the A stage, the register can be
tested and the correctness of the prediction determined quickly enough to
notify the instruction-fetch stage if there is a misprediction. Then, the
correct path can be fetched in the next cycle. As a result, four stages of the
pipeline must be flushed if the prediction is incorrect. For the
jump-to-subroutine instruction, the penalty for a misprediction is five
cycles. 
For branches, the biggest architectural difference between the Alpha and the
PowerPC is that the Alpha uses general-purpose registers for testing and
subroutine linkage, while the PowerPC uses special-purpose registers held in
the Branch Unit. This allows it to execute branch instructions in the Branch
Unit immediately after they are fetched. In fact, the PowerPC looks back in
the instruction buffer so that it can execute, or at least predict, branches
while they are being fetched. The Alpha implementation, in contrast, must
treat branch instructions like the other instructions. They are decoded in the
D pipeline stage, read registers in I, and executed in the A stage. 
Table 5 compares the branch penalties for integer-conditional branches (far
more common than floating-point branches). The penalties are expressed as a
function of the number of instructions (distance) separating the condition
determining instruction (compare) and the branch from the correctness of the
prediction. The compare-to-branch instruction count is significant only in the
601, however. Instruction cache hits are assumed. 
In the 21064, correctly predicted branches usually take no clock cycles. They
take one clock cycle when a bubble created in the swap stage is not later
squashed. The 601 has a zero-cycle branch whenever there is enough time to
finish the instruction that sets the condition code field prior to the branch
and to fetch new instructions. This may take two clock cycles: one to execute
the compare instruction, and one to fetch instructions from the branch target.
This second clock cycle may be saved when a branch is mispredicted but is
resolved before overwriting the instruction buffer; instructions may be
dispatched from the buffer right after determining that the branch was not
taken. With a two-instruction distance, the 601 has a zero-cycle branch even
if it was mispredicted; the 21064 always depends on a prediction, regardless
of the distance. 
The PowerPC requires fewer branch predictions in the first place; see Table 6.
In the 601, all loop-closing branches that use the CTR register do not have to
be predicted; in the Alpha these are ordinary conditional branches, although
loop-closing branches are easily predictable. A subroutine return must read an
integer register in the Alpha, so these branches are predicted via the return
stack. The PowerPC can execute return jumps immediately in the Branch Unit;
there is no need for prediction. 
Table 5 and Table 6 show that accurate branch prediction is much more critical
in the 21064. Not only does the 21064 predict more of the branches, the
penalties tend to be higher when it is wrong. For this reason, the 21064 has
much more hardware dedicated to the task--history bits and the subroutine
return stack. The Alpha architecture also reduces the penalty for a
misprediction by having branches that always test a register against zero;
testing one register against another would likely take an additional clock
cycle. 
Some doubt the PowerPC method of using special-purpose registers for branches
because they present a potential bottleneck. We think not. These registers
allow many branches to be executed quickly without prediction and are
important for supporting loop telescoping. 


Memory Architecture and Instructions



The Alpha is a 64-bit-only architecture. The PowerPC has a mode bit, and
implementations may come in either 32- or 64-bit versions; the 601 is a 32-bit
version. All 64-bit versions must also have a 32-bit mode. The mode determines
whether the condition codes are set by 32- or 64-bit operations.
The Alpha defines a flat, or linear, virtual-address space and a virtual
address whose length is implementation dependent within a specified range. The
PowerPC supports a system-wide, segmented virtual-address space in either 32-
or 64-bit mode. Differences between the two modes affect the number of
segments and their size, which also results in a difference in the
virtual-address space (52 bits versus 80 bits).
Currently, software developers and architects seem to favor flat,
virtual-address spaces, although the very large segments available in the
PowerPC shouldn't present many problems. The Alpha was defined as a 64-bit
architecture from the start, so developers can easily provide a flat
virtual-address space. The POWER architecture, however, was defined with
32-bit integer registers that were also used for addressing. This presented
the POWER architects with a dilemma: Either use a flat, 32-bit virtual-address
space (which would likely be too small in the very near future) or encode a
larger address in 32 bits. Such an encoding led to the segmented architecture
inherited by the PowerPC. Also, and perhaps more importantly, the single,
shared-address space facilitates capability-based memory-protection methods
similar to those used in IBM's AS/400 computer systems.
The Alpha architecture specification does not define a page-table format.
Because TLB misses are handled by trapping to system software, Alpha systems
using different operating systems may have different page-table formats. Two
likely alternatives are VAX/VMS and OSF/1 UNIX. A Privileged Architecture
Library (PAL) provides an operating-system-specific set of subroutines for
memory management, context switching, and interrupts. The Alpha instruction
set includes the format in Figure 5 for PAL instructions used to define
operating-system primitives. 
The Call PAL instructions are like subroutine calls to special blocks of
instructions, whose locations are determined by one of five different PAL
opcodes. A PAL routine has access to privileged instructions but employs
user-mode address translation. While in the PAL routine, interrupts are
disabled to assure the atomicity of privileged operations that take multiple
instructions. For example, if one instruction turns address mapping off, an
interrupt should not occur until another instruction can turn it back on. The
details of virtual-address translation and page-table format are a
system-software issue to be defined in the context of the particular operating
system using PAL functions. 
Figure 6 compares the format of memory instructions. The format of
instructions using the displacement-addressing mode is identical in the
PowerPC and Alpha. The effective address is calculated in the same way in both
architectures except for the register with the value 0, which is register 0 in
the PowerPC and register 31 in the Alpha. There is no indexed addressing in
the Alpha. As previously mentioned, this saves a register read port. 
Another Alpha characteristic is that load and store instructions transfer only
32- or 64-bit data between a register and memory; there are no instructions to
load or store 8-bit or 16-bit quantities. The Alpha architecture does include
a set of instructions to extract and manipulate bytes from registers. This
approach simplifies the cache interface so that it does not have to include
byte-level shift-and-mask logic in the cache access path. 
In Example 7, the core of a strcpy routine moves a sequence of bytes from one
area of memory to another; a byte of zeros terminates the string. The ldq is a
load-unaligned instruction that ignores the low-order three bits of the
address; in the example, it loads a word into r1, addressed by r4. The extract
byte (extbl) instruction uses the same address, r4, but only uses the three
low-order bits to select one of the eight bytes in r1. The byte is copied into
r2. To move the byte to s, the sequence begins with another load unaligned
instruction to get the word containing the destination byte. The mask byte
(maskbl) instruction uses the three low-order bits of r3 (the address of s) to
zero out a byte in the just-loaded r5. Meanwhile, the insert byte (insbl)
instruction moves the byte from t into the correct byte position, also using
the three low-order bits of the address in r3. The bis performs a logical OR
operation that merges the byte into the correct position, and the store
unaligned (stq_u) instruction stores the word back into s. The t and s
pointers are incremented, the byte is checked for zero, and the sequence
starts again if the byte is nonzero. 


Operate Instructions


The basic operations performed by both architectures are rather similar. One
difference is the combined floating-point multiply-add in the PowerPC. This
instruction requires three floating-point register read ports. The 21064 has
three such ports but uses them for stores so that a floating-point operate can
be done simultaneously with a floating point store; this can't be done in the
601. 
The Alpha architecture does not have an integer-divide instruction; it must be
implemented in software. Leaving out integer divides, or doing them in clever
ways to reduce hardware, seems to be fashionable in RISC architectures,
however, iterative dividers are cheap, and one can expect that all the RISC
architectures will eventually succumb to divide instructions (some already
have). 
The Alpha architecture has scaled integer adds and subtracts that multiply one
of the operands by 4 or 8--one of the few Alpha features that seems non-RISCy.
These instructions are useful for address arithmetic in which indices of word
or doubleword arrays are held as element offsets, then automatically converted
to byte-address values for address calculation using the scaled add/subtracts.
The PowerPC has a richer set of indexing operations embedded in loads and
stores as well as the update version of memory instructions.


Conclusion


We have just seen that the PowerPC 601 and Alpha 21064 represent two distinct
design philosophies. The 601 implements an instruction set containing more
powerful instructions. And, it uses an implementation that provides
considerable flexibility to enhance detection and exploitation of parallelism
by the hardware. 
Of course, this results in more-complex hardware control. The Alpha 21064 uses
a very streamlined instruction set and implementation. While not appearing as
clever as the 601, the simplicity of the implementation contributes to a very
fast clock rate--much faster than any other commercial microprocessor. 
As a final note, follow-on processors from DEC and the PowerPC consortium, the
Alpha 21164 and PowerPC 604, continue the differing design philosophies. The
21164 can issue more instructions per cycle than the 21064, but its pipelines
are still relatively simple, and it has a very fast clock.
The 604, on the other hand, is even more aggressive than the 601 when it comes
to providing hardware mechanisms for increasing parallelism--although, as one
would expect, this comes at the expense of hardware control complexity.
Table 1: Architectural characteristics.
 PowerPC 601 Alpha 21064
 
Basic architecture load/store load/store
Instruction length 32-bit 32-bit
Byte/halfword load/store yes no
Condition codes yes no
Conditional moves no yes
Integer registers 32 32
Integer-register size 32/64 bit 64 bit
Floating-point registers 32 32
Floating-register size 64 bit 64 bit 
Floating-point format IEEE 32-bit, 64-bit IEEE, VAX 32-bit, 64-bit
Virtual address 52-80 bit 43-64 bit
32/64-mode bit yes no
Segmentation yes no
Page size 4 KB implementation specific
Table 2: Implementation characteristics.
 PowerPC 601 Alpha 21064
Technology 0.6-micron CMOS 0.75-micron CMOS
Levels of metal 4 3
Die size 1.09 cm square 2.33 cm square
Transistor count 2.8 million 1.68 million
Total cache
 (instructions + data) 32 KB 16 KB
Package 304-pin QFP 431-pin PGA
Clock frequency 50 MHz (initially) 150 to 200 MHz
Power dissipation 9 watts @ 50 MHz 30 watts @ 200 MHz
Table 3: Instruction dispatch rules; (a) In the 601, three mutually compatible
instructions (marked with X) may issue simultaneously; (b) in the 21064, two
compatible instructions may issue simultaneously. Integer branches depend on
an integer register, and floating branches depend on a floating register.
Table 4: Register file ports.
 Integer Registers Floating Registers
 Read Ports Write Ports Read Ports Write Ports
PowerPC 601 3 2 3 2

Alpha 21064 4 2 3 2
Table 5: Branch penalties.
 Alpha 21064 PowerPC 601
 Distance Correct Incorrect Correct Incorrect
 0 0-1 4 0 2/1
 1 0-1 4 0 1/0
 >/=2 0-1 4 0 0
Table 6: Prediction methods versus branch type.
 Conditional Branches Loop-closing Subroutine
 (non-loop-closing) Branches Returns
PowerPC 601 Static prediction Always zero-cycle Always zero-cycle
Alpha 21064 Dynamic prediction Dynamic prediction Stack prediction
Example 1: Alpha 21064 pipelined processing example. (a) C code; (b) assembly
code.
(a)
double x[512], y[512];for (k = 0; k < 512; k++) x[k] = (r*x[k] + t*y[k]);
(b)
 # r1 points to x # r2 points to y # r6 points to the end y # fp2 contains t #
fp4 contains r # r5 contains the constant 1 LOOP: ldt fp3 = y(r2,0) # load
floating double ldt fp1 = x(r1,0) # load floating double mult fp3 = fp3,fp2 #
floating multiply double t*y addq r2 = r2,8 # bump y pointer mult fp1 =
fp1,fp4 # floating multiply double, r*x subq r4 = r2,r6 # subtract y end from
current pointer addt fp1 = fp3,fp1 # floating add double, r*x+t*z stt x(r1,0)
= fp1 # store floating double to x(k) addq r1 = r1,8 # bump x pointer bne
r4,LOOP # branch on r4 ne 0
Example 2: 21064 pipeline flow for loop example.
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ldt fp3=y(r2.0) F S D I A B W
ldt fp1=X(r1.0) F . S D I A B W
mult fp3=fp3.fp2 F S D . I F G H K I W
addq r2=r2.8 F S D . I A B W
mult fp1=fp1.fp4 F S . D I F G H J K W
subq r4=r2.r6 F S . D I A B W
addt fp1=fp3.fp1 F. S D . . . . . I F G H J K W
stt x(r1.0)=fp1 F. S D . . . . . . . . . I A B W
addq r1=r1.8 F S . . . . . . . . . D I A B W
bne r4.loop F S . . . . . . . . . D I A . .
ldt fp3=y(r2.0) F . . . . . . . . S D I A B
ldt fp1=x(r1.0) F . . . . . . . . . S D I A
Example 3: Example loop, unrolled for the Alpha 21064.
 Issue time
LOOP: ldt fp3 = y(r2,0) # load y[k] 0
 ldt fp1 = x(r1,0) # load x[k] 1
 ldt fp7 = y(r2,8) # load y[k+1] 2
 ldt fp5 = x(r1,8) # load x[k+1] 3
 mult fp3 = fp3,fp2 # t*y[k] 4
 ldt fp11 = y(r2,16) # load y[k+2] 4
 mult fp1 = fp1,fp4 # r*x[k] 5
 ldt fp9 = x(r1,16) # load x[k+2] 5
 mult fp7 = fp7,fp2 # t*y[k+1] 6
 ldt fp15 = y(r2,24) # load y[k+3] 6
 mult fp5 = fp5,fp4 # r*x[k+1] 7
 ldt fp13 = x(r1,24) # load x[k+3] 7
 mult fp11 = fp11,fp2 # t*y[k+2] 8
 addq r2 = r2,32 # bump y pointer 8
 mult fp9 = fp9,fp4 # r*x[k+2] 9
 subq r4 = r2,r6 # remaining y size 9
 mult fp15 = fp15,fp2 # t*y[k+3] 10
 mult fp13 = fp13,fp4 # r*x[k+3] 11
 addt fp1 = fp3,fp1 # r*x[k]+t*y[k] 12
 addt fp5 = fp7,fp5 # r*x[k+1]+t*y[k+1] 13
 addt fp9 = fp11,fp9 # r*x[k+2]+t*y[k+2] 15
 stt x(r1,0) = fp1 # store x[k] 16
 addt fp13= fp15,fp13 # r*x[k+3]+t*y[k+3] 17
 stt x(r1,8) = fp5 # store x[k+1] 17
 stt x(r1,16) = fp9 # store x[k+2] 19
 stt x(r1,24) = fp13 # store x[k+3] 21

 addq r1 = r1,32 # bump x pointer 22
 bne r4,LOOP # next loop 22
LOOP: ldt fp3 = y(r2,0) # next iteration 23
Example 4: Example loop, unrolled for the PowerPC 601. FXU instructions are
dispatched and decoded in the same clock cycle.
 Instr. FXU FPU BU
 fetch decode decode exec.
 time time time time
 # CTR = 128 (loop count/4) 
LOOP: lfs fp0 = y(r3,2052) # load y[k] 0 1 
 lfs fp4 = y(r3,2056) # load y[k+1] 0 2 
 lfs fp6 = y(r3,2060) # load y[k+2] 0 3 
 fmuls fp0 = fp0,fp1 # t*y[k] 0 4
 lfs fp8 = y(r3,2064) # load y[k+3] 0 4 
 fmuls fp4 = fp4,fp1 # t*y[k+1] 0 5
 lfs fp2 = x(r3,4) # load x[k] 0 5 
 fmuls fp6 = fp6,fp1 # t*y[k+2] 0 6
 lfs fp5 = x(r3,8) # load x[k+1] 2 6 
 fmuls fp8 = fp8,fp1 # t*y[k+3] 2 7
 lfs fp7 = x(r3,12) # load x[k+2] 9 10 
 fmadds fp0 = fp0,fp2,fp3 # r*x[k] + t*y[k] 9 11
 lfs fp9 = x(r3,16) # load x[k+3] 9 11 
 fmadds fp4 = fp4,fp5,fp3 # r*x[k+1] + t*y[k+1] 9 12
 fmadds fp6 = fp6,fp7,fp3 # r*x[k+2] + t*y[k+2] 9 13
 fmadds fp8 = fp8,fp9,fp3 # r*x[k+3] + t*y[k+3] 9 14
 stfs x(r3+4) = fp0 # store x[k] 10 14 15
 stfs x(r3+8) = fp4 # store x[k+1] 10 15 16
 stfs x(r3+12) = fp6 # store x[k+2] 10 16 17
 stfsu x(r3=r3+16) = fp8 # store x[k+3] 10 17 18
 bc LOOP,CTR$\neq$0 # dec CTR, branch if
 CTR is not equal to 0 11 15
LOOP: lfs fp0 = y(r3,2052) # load y[k] 20 21
Example 5: Alpha 21064 conditional-move example. (a) C code; (b) assembly code
with conditional branch; (c) assembly code with conditional move.
(a)
if (a == 1) c = b 2; else c = b 4;

(b) Issue time # initially, assume # r1 contains b, # r7 points to a, # r8
points to c. ldl r2 = a(r7,0) # load a from memory 0 cmpeq r5 = r2,1 # test a
3 beq r5,SHFT2 # branch if a==1 4 # assume taken sll & r3 = r1,4 # shift b 4
br & STORE # branch uncond & SHFT2: sll & r4 = r1,2 # shift b 2 STORE: stl &
r3 = c(r8,0) # store c & 6 
(c) Issue time # initially, assume # r1 contains b, # r7 points to a, # r8
points to c. ldl & r2 = a(r7,0) # load a from memory 0 sll & r3 = r1,4 # shift
b 4 1 sll & r4 = r1,2 # shift b 2 2 cmpeq & r5 = r2,1 # test a 3 cmov & r3 =
r4,r5 # conditional move to c 4 stl & r3 = c(r8,0) # store c 4
Example 7: Alpha 21064 strcpy function (null-terminated strings).
 # A string is copied from t to s
 # r4 points to t
 # r3 points to s
LOOP: ldq_u r1 = t(r4,0) # load t, unaligned
 extbl r2 = r1,r4 # extract byte from r1 to r2
 ldq_u r5 = s(r3,0) # load s, unaligned
 maskbl r5 = r5,r3 # zero corresponding byte in r5
 insbl r6 = r2,r3 # insert byte into r6
 bis r5 = r5,r6 # logical OR places byte in r5
 stq_u s(r3,0) = r5 # store unaligned
 addq r4 = r4,1 # bump the t pointer
 addq r3 = r3,1 # bump the s pointer
 bne r6,LOOP # branch if nonzero byte
Figure 1: Cache access paths. (a) Alpha 21064; (b) PowerPC 601.
Figure 2: Comparison of loop overlap in (a) 21064- and (b) PowerPC 601-like
implementations.
Figure 3: Branch instructions. (a) Conditional branches; (b) unconditional
branches.
Figure 4: Timing for conditional branches in the Alpha 21064. (a) Instruction
flow for correct branch prediction; (b) instruction flow for incorrect branch
prediction. (X means instruction is flushed as a result of branch
misprediction.)
Figure 5: Format for PAL instructions used to define operating-system
primitives.
Figure 6: Memory instruction format. (a) Load- and store-instruction format
using register + displacement addressing. The displacement D is sign extended
prior to addition. In Alpha, D is multiplied by 216 if OPCD = LDAH. RT is the
destination register. (b) load- and store- instruction format using register +
register (indexed) addressing. RT is the destination register.


































































High-Performance Programming for the PowerPC


Avoid performance pitfalls when coding for Windows NT




Kip McClanahan, Mike Phillip, and Mark VandenBrink


Kip works in Motorola's RISC software group writing low-level PowerPC software
and is the author of PowerPC Programming for Intel Programmers (IDG Books,
1995). He can be contacted at kip_mcclanahan@risc.sps.mot.com. Mark has been
hacking operating-system kernels for over ten years and is currently on the
team responsible for the PowerPC port of Windows NT. He can be contacted at
markv@risc.sps.mot.com. Mike is manager of the compiler and tools development
group at Motorola in Austin, Texas. He can be contacted at
phillip@risc.sps.mot.com.


Squeezing the best performance out of a processor requires both insight and
experience. When it comes to the PowerPC microprocessor family, however,
programmers are just starting to understand the architecture and each
processor's implementation. The PowerPC architecture specification defines
both required and optional features for any processor implementation. Each
implementation of the PowerPC-architecture specification--the 601, 603, 604,
and 620--may have a very different set of features. For example, cache size
and type, bus width, power-management capabilities, and number of execution
units can vary between each part. However, compliance with the PowerPC
architecture specification ensures binary compatibility across each processor
implementation.
In this article, we'll examine methods for measuring performance and
techniques for improving the speed and efficiency of applications running
under Windows NT for PowerPC. In doing so, we'll point out some of the
pitfalls associated with Little-endian operating systems (such as Windows NT)
and present an application that demonstrates the effect of byte alignment on
performance. We'll also look at some optimization techniques that apply more
generally to the PowerPC architecture.


Measuring Performance


Perhaps the most difficult step in improving performance is simply getting
started. There are several ways to analyze performance for a particular
application, but it's almost always necessary to narrow the scope of the
analysis to factors that can be controlled by the programmer. Performance is
typically affected by:
System hardware.
System software.
Application design/algorithms.
Compiler/tools/build configuration.
System hardware issues include the size and speed of memory and disk
subsystems, the type of video cards and the amount of dedicated video memory,
and the type and speed of the microprocessor itself. For most developers, it
is important to characterize the impact of the system hardware and software on
performance, but most opportunities to improve performance lie in how the
application itself it was built. System-software issues are too numerous to
list, but tend to center around the operating-system and networking
configuration of the computer.
Most applications are too large to examine in their entirety, but
performance-analysis tools such as profilers can often identify those regions
of code in which most time is spent. If profiling tools are not available, you
can gain insight into any possible performance bottlenecks by thoroughly
inspecting the code and answering the following questions:
Does the application access a lot of disk-based data?
Are the program data types integer, floating point, or both?
Does the program frequently access large blocks of memory (such as arrays of
data) or frequently allocate memory?


Improving Performance


Regardless of profiling information, a good place to start improving
performance is to turn on compiler optimizations. Most modern compilers offer
sophisticated code optimization that can be invoked by the user. Although such
options will likely slow down compilation, the resulting application should
execute much faster, often with speedups of 200 to 300 percent. Compiler
optimization controls are akin to stereo-system controls, however: Just as
increasing the volume to its maximum level may not be optimal for each piece
of music, simply setting the default optimization flag is unlikely to yield
optimal performance for every software application. Many compilers offer
specific optimization flags for fine-tuning application performance. While
such optimizations may not apply to enough applications to warrant inclusion
in the default optimization settings, they can greatly improve the performance
of a particular application. 
One such option is automatic inlining of subroutines. Although not much of an
optimization by itself, subroutine inlining exposes more code to the optimizer
in the context in which it will be used. Of course, blindly inlining code is
unwise, because replicating the subroutine body increases code size, which can
actually decrease performance through loss of code locality in the memory
system. However, when used in conjunction with profiling feedback, inlining
small, heavily called routines often improves performance significantly for
the overall application.
While selectively inlining C or C++ subroutines may improve performance,
inline assembly code should be embedded with caution. When assembly code is
inlined into a high-level program, the compiler must typically make
conservative assumptions about register and memory usage, which can throttle
many potential optimizations. But inlining critical PowerPC assembly
instructions (such as synchronization primitives or status-register accesses)
can improve performance. Some compilers, including those developed by
Motorola, provide a set of built-in intrinsic functions that provide efficient
access to low-level system instructions without incurring the performance
penalty associated with inlining seemingly "random" assembly instructions.


Language and Design Considerations


The most significant factor affecting software performance is the design and
implementation of the application itself. While algorithm design is
application specific and clearly beyond the scope of this discussion, several
general design considerations affect performance for virtually any PowerPC
software application.
Stick to the standards. Languages such as C and C++ have well-defined
standards intended to ensure the portability of source code among different
development environments. Many providers of development tools offer
nonstandard language extensions that differentiate their products. While often
alluring to the developer, these extensions can easily tie an application to a
particular tool set, and, to a lesser extent, a particular target
architecture. Many of these features can significantly degrade performance on
RISC microprocessors like a PowerPC.
Watch out for misalignment. Examples of such language extensions are the
__unaligned keyword and #pragma pack, both of which can affect data alignment.
For Little-endian implementations of an operating system such as Windows NT,
misaligned accesses can lead to significant performance losses on the PowerPC
architecture. Many modern microprocessors, including most PowerPC
implementations, are optimized to handle properly aligned memory references,
often at the expense of handling relatively infrequent misaligned references.
Compilers typically will align data on its "natural" alignment boundary, where
the address of the object is an exact multiple of the size of the object in
bytes. However, certain programming practices, including the imprudent use of
some language extensions, can create a situation where the compiler must
access memory in chunks smaller than the natural size of an object. For
example, if you use the __unaligned keyword in Microsoft C/C++ to inform a
compiler that an object is misaligned, a PowerPC compiler will typically be
required to load the corresponding object from memory one byte at a time. For
a 32-bit integer object, this requires four memory accesses instead of one,
plus three additional rotate instructions. 
A similar situation can arise when the compiler is instructed to "pack" the
elements of a structure via source-code pragmas. On older architectures,
including the Intel 80x86 family, such language assertions do not necessarily
affect performance. However, given the relatively high cost of servicing an
alignment exception, PowerPC compilers will typically opt to generate
conservative, albeit slower, code for known, potentially misaligned accesses.
Removing the __unaligned keyword or structure-packing pragmas does not
necessarily solve the problem. In fact, it may lead to incorrect code or even
worse performance. To avoid these performance pitfalls, it's best to avoid
such extensions when designing an application. The alignment example shown in
Listing One demonstrates the performance penalties associated with the three
common techniques for misalignment resolution.
Alignment exceptions can also be created through mismanagement of pointers in
C and C++ programs, or by accessing data through a reference that is not of
the same natural alignment as the original object; for example, accessing a
series of characters or half-word objects through integer variables. For
Windows NT applications, these misalignment exceptions are often hidden from
the user through programmer instructions to the operating system. As the
alignment example in this article demonstrates, it is clearly preferable to
enable exceptions as a means of locating performance losses than to "hide"
them from the user.


Structured Exception-Handling Issues


Structured exception handling can also adversely affect performance. While
they are an elegant and maintainable means of managing the interaction between
the application and the operating system when errors or unexpected events
occur, exception handlers can also restrict the compiler's ability to safely
optimize code. This is true even for code not directly included in the
exception handler itself. Thus, exception handlers should be carefully
designed to minimize the performance impact, as follows:
Isolate the exception-handling code as much as is practical. Since the flow of
program control to an exception handler and its corresponding effect on
optimization is often nonintuitive, exception handlers should not be placed in
the middle of large, otherwise unrelated subroutines, particularly if they are
performance critical. Since most compilers optimize a subroutine at a time,
isolating the actual exception handler within a small subroutine (that is
perhaps called by a larger enclosing subroutine) can help limit performance
degradation.
Avoid introducing pointers and global variables within exception-handling
code. The semantics of most exception-handling language constructs typically
necessitate saving and restoring "live" data across the scope of an exception
handler, since the compiler often does not know which data an exception will
affect. By limiting the exposure of global variables and pointers within an
exception handler, the compiler can often better reduce memory accesses within
the corresponding region of code.



Target-Specific Issues


Target-specific factors can often be utilized to maximize performance.
Although fine-tuning for one processor or architecture at the expense of
others should be avoided, a few PowerPC-specific issues should be considered
to maximize PowerPC performance.


Memory Subsystem


The frequency and speed of memory accesses are almost always key factors in
overall application performance. Compared to most PowerPC-processor
operations, off-chip memory references are extremely expensive (particularly
if the accesses are misaligned). You can often maximize the utilization of
on-chip and secondary off-chip caches by carefully managing large data
structures such as arrays. Since all PowerPC chip implementations utilize
associative cache designs, a slight change in the "stride" of array accesses
can significantly affect the effectiveness of a cache. A simple guideline is
to avoid array sizes that are an exact multiple of the cache size (typically
powers of 2 in the range of 8-64 KB). However, predicting cache behavior
through such simple guidelines is precarious, at best, since actual dynamic
reference patterns vary between applications. The guidelines' intent is to
avoid allocating heavily referenced variables to the same set of cache
addresses. Profiling tools can often provide feedback concerning a program's
cache-utilization efficiency.


A Real-World Alignment Example


To make the following alignment example more meaningful, we'll tie it to an
operating system. Because Windows NT for the PowerPC is a Little-endian
operating system, it is subject to the alignment restrictions described
earlier. Remember, a multibyte access performed at an address not aligned with
the size of the access (known as "natural alignment") will cause an alignment
exception. Table 1 shows the natural-alignment boundaries for memory accesses
of various sizes.
When a Little-endian PowerPC application performs a memory access that is not
aligned on its natural boundary, an alignment exception will result. When
alignment issues exist on PowerPC-based systems, they can be resolved in three
ways:
If there is no support for data-alignment management, the operating system
traps on the alignment exception, usually terminating the "faulting"
application. This may seem the worst of all possible outcomes, but it can be
very helpful during the development cycle; see the discussion that follows.
The operating system can take the alignment exception, perform the necessary
fix-ups to transparently handle the memory access, and return as if nothing
had ever happened. And while abstracting the problem from both user and
programmer may seem like the best solution, it is one of the worst. The
transition through the alignment-exception mechanism is comparably slow and
one of the worst performance killers.
You can use the __unaligned type qualifier, the #pragma pack(1) directive, or
macros on accesses with known alignment problems. Each of these techniques
breaks a single multiple-byte access into individual, byte-wise accesses in
order to eliminate alignment issues.


Trap and Terminate


Under Windows NT, it is possible (and the default for PowerPC Windows NT) not
to have any support for misaligned data. When there is no support for
misalignment and your application accesses data on an unnatural boundary, an
alignment fault is generated. The Windows NT alignment fault handler is
configured not to fix misaligned accesses and will display a pop-up message
and terminate your application. This seems like the one situation to avoid,
but in fact the operating system is doing you a favor.
The ability to trap an alignment exception and terminate the faulting
application is valuable when porting code, particularly when porting Windows
NT applications from 80x86 to PowerPC architectures. And the ability to detect
alignment exceptions is fundamental to handling misalignment efficiently. 


Operating System Fix-ups


The Windows NT kernel can be configured to perform misaligned data fix-ups
upon detection of an alignment exception. This means that the
alignment-exception handler must break the multibyte memory access that caused
the exception into individual byte accesses, which are not constrained by
alignment issues. Although this sounds like a terrific service, it is the most
inefficient solution. Even if the rest of your application is well
constructed, a few OS-based alignment fix-ups can bring performance to its
knees. As Table 2 shows, for 5 million misaligned memory references, the
difference between OS handling and programmatic handling is nearly 30 seconds!
Put another way, having the OS fix-up misaligned memory accesses is 43 times
slower than the same number of aligned accesses.
If this solution is so slow, why is it around? Of course, you wouldn't want
your released and shipping applications to terminate the first time they have
an unexpected alignment fault--compatibility with 80x86 applications would be
reduced significantly. OS-based fix-ups are a reasonable first-pass solution
for PowerPC applications that have not specifically taken data alignment into
consideration. Just as the number of 32-bit applications is slowly increasing
in the 80x86 world, so will the number of PowerPC applications that make data
alignment a priority.
Processes can enable OS-based alignment fix-ups using the call
SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT). A child process inherits its
parent's error mode, so any processes created by your application after
enabling this mode will also suffer the performance penalty for misaligned
data. Under Windows NT for PowerPC and MIPS processors, OS-based misalignment
support is disabled by default. The Alpha version of NT enables OS-based
misalignment support by default; Alpha NT applications must turn this feature
off if programmatic solutions are used. Setting the SEM_NOALIGNMENTFAULTEXCEPT
flag has no effect on x86 processors.


Programmatic Solutions


The first programmatic solution to data misalignment is the __unaligned
pointer type qualifier. When the compiler sees a pointer reference (this
qualifier works only with pointers) declared using the __unaligned type
qualifier, it includes sufficient code to ensure that the memory access does
not generate an alignment exception. In particular, it breaks multiple byte
accesses into individual byte accesses. A single-alignment constrained, 32-bit
load or store instruction is replaced by seven instructions that perform the
same operation using only byte accesses. Similarly, a single 16-bit memory
reference would be replaced by three instructions. Listing Two shows a single,
32-bit, PowerPC store-word instruction. If the store-word instructions in
Listings One and Two were performed using an __unaligned pointer qualifier
reference, the instruction would be converted to the seven instructions shown
in Listing Three.
Listing Three represents compiler-generated code and, taken out of the context
of the original flow of code, may appear suboptimal. Figure 1 depicts the
operation of the seven instructions of Listing Three. The first instruction
stores the low-order byte (0x78) of r3 into the address contained in r4. The
rlwinm instruction is used to rotate the bytes within r3 such that each
subsequent store-byte instruction references the proper value. The value
contained in r3 is in Big-endian format, and r4 points to Little-endian
memory. Therefore, the bytes must be swapped into Little-endian format during
the store operation.
Another programmatic solution, the #pragma pack() directive, is particularly
appropriate for porting between the various Windows NT platforms. One
potentially recurring problem results from data structures and formats that
were never designed from a portability perspective. In particular, graphics
formats such as BMP and DIB do not address natural-boundary alignment. This is
understandable: When these file formats were created for 80x86 software (such
as Microsoft Windows), alignment issues were not a big concern--alignment was
nice, but misaligned accesses didn't kill performance. With the advent of RISC
processors, fixed-length instruction size, and the associated alignment
restrictions, data misalignment has become an important issue.
The #pragma pack() directive tells the compiler two things. First, pack
structure elements as close together as possible. In particular, avoid using
the standard (aligned) structure padding. Second, the compiler knows to
generate additional code to support misaligned accesses for elements within
the "packed" structure, much like the effect of the __unaligned qualifier. In
Listing One, a standard BMP header is packed, and the misaligned access is
performed using the 32-bit BMPDataOffset element. This addresses application
portability concerns because it allows the programmer to guarantee that the
offsets within a native PowerPC structure will exactly match those defined in
a native 80x86 structure.
Finally, there may be well-defined times when you know that you're about to
perform a misaligned access. Instead of permanently packing a structure or
declaring an __unaligned pointer variable to reference the memory, you can use
the macros shown in Figure 2 to break the particular access into its byte
components. To use the macros, simply bracket each misaligned memory reference
with the appropriate macro.


The (Mis)alignment Demonstration Program


Three categories of misalignment resolution are demonstrated in the alignment
program (ALIGN.C) shown in Listing One. This program lets you compare the
performance impact of misaligned data by timing both aligned and misaligned
accesses.
When timing a fixed number of operations in a preemptive, multitasking
operating system, such as Windows NT, it is important to minimize noise in
your timing measurements. To do so, elevate your thread to the highest
priority possible and take multiple data samples. Listing One sets the
thread-timing priority to THREAD_PRIORITY_TIME_CRITICAL. This increases the
accuracy of the misalignment timing by reducing the number of interrupts that
can influence the overall time required to complete the set of operations. In
fact, when running ALIGN, the usefulness of your mouse will decrease
dramatically.
To obtain the timing values, the GetTimeStamp() routine simply uses
QueryPerformanceFrequency() and QueryPerformanceCounter() to sample the Win32
high-frequency counter. The time reported by ALIGN is derived by taking the
difference between the time stamp before and after each set of memory-access
operations.
ALIGN requires two parameters: an iteration count and one of the parameters in
Figure 3. For example, the staggering 28-second measurement was generated
using ALIGN 5000000 -2, which performed five million misaligned accesses using
the OS to handle the alignment fix-ups. Listing One was used to generate the
timing values shown in Table 2, which clearly demonstrates the cost of data
misalignment.


Conclusion



While the increased availability of performance analysis tools and continual
advancement of compiler optimization techniques will accelerate the tuning
process, many key performance issues will remain embedded within the design
and implementation of an application. An increased awareness among PowerPC
developers of the interactions between the operating system and the
microprocessor is critical to avoiding performance losses due to misaligned
memory references, poor utilization of structured exception handling, and
inefficient application of compiler optimizations.
For Little-endian-system implementations such as Windows NT, the means by
which alignment issues are resolved can dominate application performance, as
demonstrated in the ALIGN example of Listing One. Most importantly, many of
the concepts in this article affect performance not only for PowerPC systems,
but for other architectures, as well.


Acknowledgments


Thanks to Ray Essick and the Motorola RISC Software compiler team for a
wonderfully mature PowerPC compiler.


Bibliography


McClanahan, K. PowerPC Programming for Intel Programmers. San Mateo, CA: IDG
Books, 1995.
Win32 Programmer's Reference, MSDN CD-ROM, July 1995.
Table 1: Natural alignment boundaries.
Alignment Size Form of 32-bit Address 
8-bit xxxx xxxx
16-bit integers xxxx xxx0
32-bit integers and
 single-precision FP xxxx xx00
64-bit integers and
 double-precision FP xxxx x000
Table 2: Timing values generated by Listing One. Sample tests performed on a
100-MHz 604 running Windows NT 3.51 (build 1057), compiled with Motorola's NT
compiler and averaged over six runs of the program.
Number of Naturally OS Fix-ups __unaligned #pragma Pack(1) for Iterations
Aligned Access Type Qualifier Sample BMP Structure

500,000 61 ms 2858 ms 86 ms 82 ms
1,000,000 121 ms 5720 ms 171 ms 166 ms
5,000,000 657 ms 28662 ms 852 ms 807 ms
Figure 1: Multi-byte store into Little-endian memory.
Figure 2: Macros to break an access into bytes.
#define rULONG(x) (ULONG)( \
 *(UCHAR *)(&x) \
 (*((UCHAR *)(&x)+1) 8) \
 (*((UCHAR *)(&x)+2) 16) \
 (*((UCHAR *)(&x)+3) 24) )
#define rUSHORT(x) (USHORT)( \
 *(UCHAR *)(&x) \
 (*((UCHAR *)(&x)+1) 8))
Figure 3: Options for the second parameter to the ALIGN program.
-0 Use ONLY aligned accesses.
-1 NO alignment fix ups (causes an exception).
-2 Use OS-based fix ups for misaligned accesses.
-3 Use __UNALIGNED type qualifier.
-4 Use #PRAGMA PACK(1) directive.

Listing One
/*---------------------------------------------------------------------------+
 Windows NT for PowerPC Alignment Demonstration Program 
 
 Mark VandenBrink, markv@risc.sps.mot.com 
 Kip McClanahan, kip_mcclanahan@risc.sps.mot.com -or- kip@io.com 
 Mike Phillip, phillip@risc.sps.mot.com 
 
 
+---------------------------------------------------------------------------*/
#include <stdlib.h>

#include <stdio.h>
#include <stdarg.h>
#include <windows.h>
#include <winioctl.h>
#include <string.h>
#include <ctype.h>
#include <memory.h>
// force compiler fix-ups for data accesses within the
// BMP structure by using #pragma pack(1).
#pragma pack(1)
// Standard Windows3.x BMP file header format
//
typedef struct BMPHeader {
 USHORT FileType; // offset 0
 ULONG FileSize; // offset 2
 USHORT reserved1; // offset 6
 USHORT reserved2; // offset 8
 ULONG BMPDataOffset; // offset 10
};
struct BMPHeader bmpBuffer; // declare structure
//
// Print an error message to to the screen and exit.
//
static VOID
Die(char *format, ...)
{
 va_list va;
 va_start(va, format);
 fprintf(stderr, "\n\nALIGN: ");
 vfprintf(stderr, format, va);
 ExitProcess(2);
}
//
// Return a timestamp from the high frequency performance counters (if
// one exists). Return the stamp in units of number of milliseconds
//
static UINT
GetTimeStamp(VOID)
{
 static DWORD FreqInMs = 0;
 LARGE_INTEGER Time;
 if (!FreqInMs) {
 if (QueryPerformanceFrequency(&Time) == TRUE) {
 if (Time.HighPart) {
 Die("Timer has too high a resolution\n");
 }
 //
 // 100-nanosecond units
 //
 FreqInMs = Time.LowPart / 1000;
 } else {
 Die("Could not get frequency of perfomance counter\n");
 }
 }
 if (QueryPerformanceCounter(&Time) == FALSE) {
 Die("System does not support high-resolution timer\n");
 }
 return Time.LowPart / FreqInMs;
}

//
// Essentially useless function that returns a value to place at 
// IntPointer. Function used to prevent compiler from optimizing 
// away references to *IntPointer inside a loop.
//
DWORD
GetNextValue(VOID)
{
 static DWORD NextValue = 0;
 
 return NextValue++;
}
main(int argc, char **argv)
{
 CHAR Buffer[1024];
 UINT EndTime;
 UINT Max = 0;
 UINT StartTime;
 UINT i;
 struct BMPHeader *headerPtr;
 int *IntPointer2;
 __unaligned int *IntPointer;
 switch (argc) {
 case 3:
 //
 // Note: setting thread's priority to THREAD_PRIORITY_TIME_CRITICAL
 // can effectively bring a machine to its knees, depending on the 
 // process priority class. 
 //
 SetThreadPriority(GetCurrentThread(), 
 THREAD_PRIORITY_TIME_CRITICAL
 );
 
 Max = strtoul(argv[1], 0, 0);
 //
 // The naturally aligned case
 //
 if (argv[2][0] == '-' && argv[2][1] == '0') {
 printf("ONLY aligned references\n");
 IntPointer2 = (int *)(&Buffer[4]);
 printf("Buffer at %x, IntPointer = %x\n", Buffer, IntPointer2);
 StartTime = GetTimeStamp();
 for (i = 0; i < Max; i++) {
 *IntPointer2 = GetNextValue();
 }
 EndTime = GetTimeStamp();
 break;
 }
 //
 // The no fix-ups, alignment exception causing case
 //
 if (argv[2][0] == '-' && argv[2][1] == '1') {
 printf("NO support for misaligned references\n");
 IntPointer2 = (int *)(&Buffer[3]);
 printf("Buffer at %x, IntPointer = %x\n", Buffer, IntPointer2);
 StartTime = GetTimeStamp();
 for (i = 0; i < Max; i++) {
 *IntPointer2 = GetNextValue();
 }

 EndTime = GetTimeStamp();
 break;
 }
 //
 // OS-based fix-ups
 //
 if (argv[2][0] == '-' && argv[2][1] == '2') {
 printf("OS support of misaligned references.\n");
 SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT);
 IntPointer2 = (int *)(&Buffer[3]);
 printf("Buffer at %x, IntPointer = %x\n", Buffer, IntPointer2);
 StartTime = GetTimeStamp();
 for (i = 0; i < Max; i++) {
 *IntPointer2 = GetNextValue();
 }
 EndTime = GetTimeStamp();
 break;
 }
 if (argv[2][0] == '-' && argv[2][1] == '3') {
 printf("Using __UNALIGNED qualifier.\n");
 IntPointer = (int *)(&Buffer[3]);
 printf("Buffer at %x, IntPointer = %x\n", 
 Buffer, 
 IntPointer);
 StartTime = GetTimeStamp();
 for (i = 0; i < Max; i++) {
 *IntPointer = GetNextValue();
 }
 EndTime = GetTimeStamp();
 break;
 }
 if (argv[2][0] == '-' && argv[2][1] == '4') {
 headerPtr = (struct BMPHeader *)Buffer;
 printf("Using #pragma pack(1) directive\n"); 
 printf("Access offset @%x\n", (ULONG)&(headerPtr->BMPDataOffset));
 StartTime = GetTimeStamp();
 for (i = 0; i < Max; i++) {
 headerPtr->BMPDataOffset = GetNextValue();
 }
 EndTime = GetTimeStamp();
 
 break;
 }
 //
 // fall through
 //
 default:
 fprintf(stderr, "Usage: ALIGN number-of-iterations [-option]\n");
 fprintf(stderr, "\nwhere option is one of the following:\n");
 fprintf(stderr, "\t-0 Use ONLY aligned accesses.\n");
 fprintf(stderr, "\t-1 NO alignment fix ups (causes an exception).\n");
 fprintf(stderr, "\t-2 Use OS-based fix ups for misaligned accesses.\n");
 fprintf(stderr, "\t-3 Use __UNALIGNED type qualifier.\n");
 fprintf(stderr, "\t-4 Use #PRAGMA PACK(1) directive.\n");
 ExitProcess(0);
 }
 printf("%d milliseconds\n", EndTime - StartTime);
 ExitProcess(0);
}


Listing Two
;
; Typical 32-bit store instruction
;
; Assumes:
; r3 contains word to store at address contained in r4
;
 stw r3, 0(r4) ; store word contained in r3
 ; to address contained in r4 + 0

Listing Three
;
; The equivalent 32-bit store resulting from use of the 
; __unaligned type qualifier in the pointer declaration
; for IntPointer.
; Assumes:
; r3 contains word to store at address contained in r4
; For the purposes of this example, assume that 
; r3 = 0x12345678.
;
 stb r3, 0(r4) ; store the lower byte (0x78) 
 ; of r3 into address contained
 ; in r4 + 0.
 rlwinm r5, r3,24,8,31 ; extract bits 16-23 into the 
 ; low-order position of r5
 ; How the rlwinm instruction works:
 ; Step 1: rotate contents of r3 left by 24 bits
 ; Result: 0x78123456
 ; Step 2: generate a mask with 1-bits from 
 ; bit 8 to 31 Result: 0x00ffffff 
 ; Step 3: AND the contents of r3 with mask and
 ; place the result into r5.
 ; Result: r5 = 0x00123456
 ; NOTE: the next stb instruction will store 
 ; 0x56 into the address (r4 + 1). 
 ; See Figure 1.
 stb r5, 1(r4) ; store next byte at r4 + 1
 rlwinm r5, r3,16,16,31 ; extract bits 8-15 into r5
 stb r5, 2(r4) ; store next byte at r4 + 2
 rlwinm r3, r3,8,24,31 ; extract bits 0-7 into r3
 stb r3, 3(r4) ; store final byte at r4 + 3
End Listings




















Bit Operations with C Macros


And Knuth's MMIX as a bonus!




John Rogers


John is a programmer in the Seattle area. He can be contacted on CompuServe at
72634,2402.


Endian refers to a processor addressing model that defines the byte ordering
of data and instructions stored in computer memory. The most common addressing
models are Big-endian (left-to-right order) and Little-endian (right-to-left).
Intel-based processors (80x86, Pentium, and the like) are Little-endian, while
others (such as the Motorola 680x0 in the Macintosh) are Big-endian. Still
others, particularly the PowerPC, are "Bi-endian," allowing them to run in
either Big-endian or Little-endian mode (see the accompanying text box
entitled "PowerPC Bi-Endian Capabilities," by James R. Gillig).
As straightforward as this sounds, "endianness" can be confusing for
programmers--particularly when developing portable software running on a
variety of platforms. To address this confusion, I've developed an "endian
engine" which handles every byte order. This engine is presented in my article
"Your Own Endian Engine," (Dr. Dobb's Journal, November 1995). The heart of
the engine is a powerful set of C macros that perform bitwise operations,
which I'll discuss in this article. It's noteworthy that the examples I
implement to handle these C macros are designed to handle instructions for
MMIX, a hypothetical computer developed by Donald Knuth (see the accompanying
text box entitled "MMIX: Knuth's New Computer").
I invented some of the macros myself and modeled others on Fortran functions.
Together, they comprise a complete set of macros for manipulating bits. Since
ANSI C says the value of a right-shifted negative number
is"implementation-defined," I've defined all of the macros for operands with
unsigned integral types, just to be on the safe side. Listing One, bitops.h,
has all of the C macros discussed in this article. The complete source code to
accompany this article is available electronically; see "Availability," page
3.
Except for MVBITS, all of the macros in bitops.h return values rather than
updating parameters; see, for instance, the ALL_ZERO_BITS(type) macro in
Example 1. The companion macro, ALL_ONE_BITS(type), is analogous.


Bit Numbering


Many of the macros here use bit numbering. By convention, the least
significant bit (LSB) is bit 0. Some of the macros indicate one or more
contiguous bits in a value, using the convention of a start-bit number and a
length in bits. You give the bit number of the lowest bit in the range of bits
that you want. For instance, to use bits 0 through 3, give a start-bit number
of 0 and a length of 4. 
Conveniently, a variety of specifications and standards adhere to the LSB
convention of numbering as bit 0. Most Intel processors, the IEEE MUFOM
(microprocessor universal format for object modules), and the MIL-STD FORTRAN
functions all use this convention. The only major exception is IBM mainframes,
which number the most significant bit (MSB) as bit 0.


Fortran-Inspired Macros


Since at least the 1970s, many versions of Fortran have included a standard
set of bit functions that include the usual operations: AND, OR, NOT, and
exclusive-OR. These Fortran functions also include routines for bit
extraction, insertion, shift, and circular shift. Rather than reinvent the
wheel, I've used the same function names and operand orders. However, since
the Fortran functions were designed for implementations with just one integer
type, and C has many sizes of integer types (ranging from char to long), I
added, where necessary, an additional parameter (at the end) to indicate the
data type to be returned. This must be some unsigned integral type.
As for the Fortran-inspired macros, I'll start with NOT(value,type) (bitwise
complement), sometimes called the "flip bits" or "invert" operation. In
Fortran, the NOT(value) function has one operand (an integer value) and
returns an integer: the inverted value. The C NOT(value,type) macro has an
additional operand, which must be an unsigned integral data type. The
NOT(value,type) macro converts the given value to the given type and returns
the converted value with all of the bits inverted. For instance, in an
implementation with 8-bit characters, NOT(0xF0, unsigned char) would be 0x0F.
The IAND(m,n,type) ("integer and") macro simply performs a bitwise-AND of the
bits in the first two operands, which are treated as type type, and returns
the result. It supports any unsigned integer type for its operands. Kenneth
Hamilton used the Fortran version of this macro in his article "Direct Memory
Access from PC Fortrans" (Dr. Dobb's Journal, May 1993). Hamilton's code
needed to extract the low byte from some integer value. Using the C macros in
bitops.h, the equivalent would be:
unsigned int ic1;
ic1=IAND(ic,255,unsigned int);
There is an integer extract bits (IBITS) macro that extracts and
right-justifies one or more contiguous bits from a given value. IBITS is
called as IBITS(value,bitnum,len,type). For instance,
IBITS(0x5678,8,4,unsigned long) returns an unsigned-long value of 0x6. 
Another Fortran-inspired macro is the integer shift (ISHFT) macro, a call to
which appears as ISHFT(value,shifts,type). ISHFT and its circular-shift
companion ISHFTC are unique in that they indicate the direction to shift by
positive or negative values of the shifts parameter. A positive value for
shifts causes a left shift by that many bits; a negative value causes a right
shift by that many bits; a 0 value causes no shift. Make sure the absolute
value of shifts is less than or equal to the size of type in bits; otherwise,
the result is undefined.
For a Fortran version of the ishft routine, see Ray Duncan's "16-Bit Software
Toolbox" column (Dr. Dobb's Journal, August 1985).
Unlike the other macros in bitops.h, MVBITS updates a value (using a pointer
passed to it) rather than returning a value. The call
MVBITS(src,srcindex,len,destptr,destindex,type) updates the value at *destptr
(starting at bit destindex for len bits) with len bits extracted from src
starting at bit number srcindex. Example 3 shows an example of using MVBITS.


BitOps Examples Using MMIX


In the examples from here on, I'll use the bitops.h C macros to handle
instructions for MMIX. The parts of an MMIX instruction are multiples of 8
bits each, but the bitops.h macros don't depend on this.
A normal MMIX instruction is 32 bits long and broken into four fields of 8
bits each; see Table 1. Three fields generally refer to registers or contain
immediate values. Knuth refers to these fields as X, Y, and Z. In some other
instructions, Knuth combines the Y and Z fields for 16 bits. In still other
instructions, he combines the X, Y, and Z fields into a 24-bit field.
For starters, I'll define a type to hold an instruction, keeping in mind that
ANSI C implicitly requires unsigned long to hold 32 bits or more. Remember
that ANSI C does not require any particular byte order when storing
larger-than-byte objects in memory. Using a trailing _T convention to indicate
a type, you can define a type (MMIX_Instr_T) to contain the object code for
one instruction, as shown in mmixcom.h (Listing Two).
Taking advantage of the implicit ANSI C requirement that unsigned char be 8
bits or larger, mmixcom.h also defines types for bytes in general and
instruction opcodes in particular. I call these types MMIX_Byte_T and
MMIX_Opcode_T, respectively.
To use the bitops.h macros to extract the opcode from an instruction, for
example, you need to specify start-bit numbers and bit lengths. Listing Two
also contains equates called MMIX_INSTR_OPCODE_START and MMIX_INSTR_OPCODE_LEN
for this.
Given those bit numbers, you can use the IBITS (integer extract bits) macro to
extract the opcode from an instruction; see Example 2. You will recall that
IBITS right-justifies its result.
MMIX also stores the X, Y, and Z fields as bytes. Example 3 shows how to
create an instruction from scratch using MVBITS. In this case, I am creating
an instruction to set r40 (register 40) to the unsigned sum of registers 41
and 42.


Shifting Bits with MMIX and bitops.h


MMIX has an SRU (shift right unsigned) instruction. SRU r3=r4r5 is a
shift-right unsigned instruction in Knuth's current assembler syntax that sets
register 3 to register 4 shifted right by the number of bits indicated in
register 5. If the value in register 5 is greater than or equal to the size of
a register in bits, then register 3 will be set to zero. Example 4 shows a
short routine that simulates the SRU instruction using the bitops.h
RIGHT_SHIFT_BITS macro.

You can readily emulate MMIX's XOR instruction using the bitops.h IEOR
(integer exclusive-OR) macro; see Example 5.
MMIX is perhaps unique among instruction sets in having a nor bits (NOR)
instruction. You may be familiar with NOR and NAND gates from digital
electronics. The corresponding bitops.h macro is NOR_BITS. Example 6 shows how
the NOR_BITS macro may be used to simulate MMIX's NOR instruction.


Conclusion


C provides some powerful bitwise operators, although you need to be careful
regarding the widening of values between different data types. The macros in
bitops.h provide a more complete set of bitwise operations than bare C, with
protection against widening problems, although you must avoid macro arguments
with side effects. You should also avoid using any of the signed data types
with the macros.


References


ANSI X3.159-1989, American National Standard for Information
Systems--Programming Language--C. New York, NY: American National Standards
Institute (ANSI), 1989.
Knuth, Donald E. MMIX. Private communication, August 20, 1992. 
MIL-STD-1753. Military Standard: FORTRAN, DOD Supplement to American National
Standard X3.9-1978. November 9, 1978. Available free from the Defense Printing
Service at 215-697-2179. 
MMIX: Knuth's New Computer
Back in the 1960s, Donald Knuth designed a hypothetical computer called "MIX"
for his Art of Computer Programming algorithms books. MIX shows its age in
various ways, so Knuth is designing a RISC-like successor called "MMIX" (short
for "Meta-MIX" or "Mega-MIX"). He started from scratch to avoid the
restrictions of the old architecture. The new computer incorporates Big-endian
byte ordering, byte addressing, two's-complement integer arithmetic, and IEEE
floating-point arithmetic.
Knuth has not yet published his description of MMIX. His latest draft is dated
August 20, 1992. He expects to make many technical changes in his next draft,
due sometime in 1995, so details given here may change as well.
Knuth plans to use MMIX for the later volumes of the Art of Computer
Programming series. I myself hope to write a cross assembler and simulator for
MMIX for publication in Dr. Dobb's Journal. This drives my exploration of
"big-integer" (64-bits or more) routines for C, as well as 64-bit, portable
object-file formats (like MUFOM and ELF).
In MMIX, Knuth has adopted the common definition of a byte as having 8 bits.
He is much more generous with registers; MMIX has 256 general-purpose
registers. Knuth has also accounted for other practical issues this time. His
description of MMIX floating point acknowledges that on some models, the
system may trap floating-point "instructions" and interpret them in software.
MMIX is supposed to have virtual memory, although the current draft doesn't
seem to have enough detail for an operating system to deal with page faults or
page tables.
The 1992 draft defines a 32-bit system, but Knuth is likely to convert to 64
bits before he publishes the final version. He has also had second thoughts
about a number of complications the draft introduced: regions, probable
branches, and delay slots, for example. 
Regions are a simple way to provide multiple address spaces, kind of like
segment registers. The delay slot avoids having to refill the prefetch buffer
because of the branch. I first saw delay slots being used on the MIPS when I
worked at Microsoft. Many of us with previous assembly-language experience
kept forgetting that the instruction in the delay slot would execute, too.
Probable branches and delay slots are, in my opinion, architectural warts to
improve pipeline performance while driving the assembly-language programmer
crazy. In a pipelined system, the instruction right after a branch has already
been prefetched, so why not execute it? 
It seems, however, that Knuth is reconsidering these additions, and they will
probably be dropped from the next draft.
--J.R.
PowerPC Bi-Endian Capabilities
Jim Gillig
James R. is a software engineer on OS/2 and IBM Workplace technologies in Boca
Raton, FL. He can be reached through the DDJ offices.<
The PowerPC is a Bi-endian RISC processor that supports both Big- and
Little-endian addressing models. The Bi-endian architecture provides hardware
and software developers with the flexibility to choose either mode when
migrating operating systems and applications from their current BE or LE
platforms to the PowerPC. Program instructions are like multibyte-scalar data
and are subject to the byte-order effect of Endianness. 
Each individual PowerPC machine instruction occupies an aligned word in
storage as a 32-bit integer containing that instruction's value. In general,
the appearance of instructions in memory is of no concern to the programmer.
Program code in memory is inherently either a LE or BE sequence of
instructions even if it is an Endian-neutral implementation of an algorithm.
How does the PowerPC handle both LE and BE addressing models? The processor
calculates the effective address of data and instructions in the same manner
whether in BE mode or LE mode; when in LE mode only, the PowerPC
implementation further modifies the effective address to provide the
appearance of LE memory to the program for loads and stores.
The operating system is responsible for establishing the Endian mode in which
processes execute. Once a mode is selected, all subsequent memory loads and
stores will be affected by the memory-addressing model defined for that mode.
Byte-alignment and performance issues need to be understood before using an
endian mode for a given application. Alignment interrupts may occur in LE mode
for the following load and store instructions:
Fixed-point load instructions. 
Fixed-point store instructions. 
Load-and-store with byte-reversal instructions. 
Fixed-point load-and-store multiple instructions. 
Fixed-point move-assist instructions. 
Storage-synchronization instructions. 
Floating-point load Instructions. 
Floating-point store instructions. 
For multibyte-scalar operations, when executing in LE mode, the current
PowerPC processors take an alignment interrupt whenever a load or store
instruction is issued with a misaligned effective address, regardless of
whether such an access could be handled without causing an interrupt in BE
mode. For code that is compiled to execute on the PowerPC in LE mode, the
compiler should generate as much aligned data and as many aligned instructions
as possible to minimize the alignment interrupts. Generally, more alignment
interrupts will occur in LE mode than in BE mode. When an alignment interrupt
occurs, the operating system should handle the interrupt by software emulation
of the load or store. 
A very powerful feature of the PowerPC architecture is the set of integer
load-and-store instructions with byte reversal that allow applications to
interchange or convert data from one Endian type to the other, without
performance penalty. These load-and-store instructions are lhbrx/sthbrx,
load/store halfword byte-reverse indexed and lwbrx/stwbrx, load/store word
byte-reverse indexed. They are ideal for emulation programs that handle
LE-type instructions and data, such as the emulation of the Intel instruction
set and data. These instructions significantly improve performance in loading
and storing LE data while executing PowerPC instructions in BE mode and
emulating the Intel instruction behavior; this eliminates the byte-alignment
and data-conversion overhead found in architectures that lack byte-reversal
instructions. Currently, these instructions can be accessed only through
assembly language. Until C compilers provide support to automatically generate
the right load and store instructions for this type of data, C programs can
rely on masking and concatenating operations or embed the assembly-language
byte-reversal instructions.
Table 1: MMIX normal instruction layout.
Contents Start-bit # Len
opcode 24 8
X (usually target register) 16 8
Y (usually source register) 8 8
Z (usually source register) 0 8
Example 1: Simple C macro.
unsigned short x;
x = ALL_ZERO_BITS(unsigned short);
Example 2: Extracting an opcode using the IBITS macro.
#include "mmixcom.h" /* MMIX_Opcode_T, etc. */
MMIX_Instr_T Current_Instruction;
MMIX_Opcode_T Current_Opcode;
 ...
/* Assume Current_Instruction has already been set. */
/* IBITS right-justifies result, so use it to extract opcode. */

Current_Opcode = (MMIX_Opcode_T) IBITS(
 Current_Instruction, /* value */
 MMIX_INSTR_OPCODE_START, /* start bit num */
 MMIX_INSTR_OPCODE_LEN, /* len */
 MMIX_Instr_T); /* type */
Example 3: Creating an instruction with the MVBITS macro.
MMIX_Instr_T New_Instr = ALL_ZERO_BITS(MMIX_Instr_T);
/* Set opcode. */
MVBITS(
 0xC2, /* ADDU opcode */ /* src */
 0, /* src index: src bit 0. */
 MMIX_INSTR_OPCODE_LEN, /* len */
 &New_Instr, /* dest ptr */
 MMIX_INSTR_OPCODE_START, /* dest index */
 MMIX_Instr_T); /* type */
/* Set X (target) field to say r40. */
MVBITS(
 40, /* register 40 */ /* src */
 0, /* src index: src bit 0. */
 MMIX_INSTR_X_LEN, /* len */
 &New_Instr, /* dest ptr */
 MMIX_INSTR_X_START, /* dest index */
 MMIX_Instr_T); /* type */
/* Set Y (a source field) to r41. */
MVBITS(
 41, /* register 41 */ /* src */
 0, /* src index: src bit 0. */
 MMIX_INSTR_Y_LEN, /* len */
 &New_Instr, /* dest ptr */
 MMIX_INSTR_Y_START, /* dest index */
 MMIX_Instr_T); /* type */
/* Set Z (the other source field) to r42. */
MVBITS(
 42, /* register 42 */ /* src */
 0, /* src index: src bit 0. */
 MMIX_INSTR_Z_LEN, /* len */
 &New_Instr, /* dest ptr */
 MMIX_INSTR_Z_START, /* dest index */
 MMIX_Instr_T); /* type */
Example 4: Using the RIGHT_SHIFT_BITS macro to simulate an SRU instruction.
#include "mmixcom.h" /* MMIX_Word_T, MMIX_WORD_LEN, etc. */
MMIX_Word_T
Sim_SRU( /* Simulate shift right unsigned instr. */
 MMIX_Word_T Source_Reg,
 MMIX_Word_T Shift_Count_Reg)
{
 if (Shift_Count_Reg >= MMIX_WORD_LEN)
 return (0);
 return (RIGHT_SHIFT_BITS(
 Source_Reg, /* value */
 Shift_Count_Reg, /* shifts */
 MMIX_WORD_LEN, /* len */
 MMIX_Word_T)); /* type */
}
Example 5: Using the IEOR macro to simulate an XOR instruction.
MMIX_Word_T
Sim_XOR( /* Simulate exclusive-OR bits instr. */
 MMIX_Word_T Some_Bits,
 MMIX_Word_T Other_Bits)

{
 return (IEOR(
 Some_Bits,
 Other_Bits,
 MMIX_Word_T)); /* type */
}
Example 6: Using the NOR_BITS macro to simulate a NOR instruction.
MMIX_Word_T
Sim_NOR( /* Simulate NOR bits instr. */
 MMIX_Word_T Some_Bits,
 MMIX_Word_T Other_Bits)
{
 return (NOR_BITS(
 Some_Bits,
 Other_Bits,
 MMIX_Word_T)); /* type */
}

Listing One
/* BitOps.h - bit operation macros. Copyright (c) 1987-1994 by JR 
 * (John Rogers). All rights reserved. CompuServe: 72634,2402
 * Permission is granted to use these macros in compiled code without payment 
 * of royalties or inclusion of a copyright notice. This source file
 * may not be sold without written permission from the author.
 *
 * The following macros are inspired by the FORTRAN bit operation routines in 
 * MIL-STD-1753. Except for MVBITS, all return values rather than modifying
 * parameters. MVBITS updates a parameter and does not return anything. The 
 * leading "I" in most of these names means that they return some kind of 
 * integer result.
 * BTEST(value,bitnum,type)
 * IAND(m,n,type)
 * IBCLR(value,bitnum,type)
 * IBITS(value,bitnum,len,type)
 * IBSET(value,bitnum,type)
 * IEOR(m,n,type)
 * IOR(m,n,type)
 * ISHFT(value,shifts,type)
 * ISHFTC(value,shifts,len,type)
 * MVBITS(src,srcindex,len,destptr,destindex,type)
 * NOT(value,type)
 * The following C macros were invented by me (JR) or various other C 
 * programmers; all return values rather than modifying parameters:
 * ALL_ONE_BITS(type)
 * ALL_ZERO_BITS(type)
 * BIT_NUM_AND_LEN_TO_MASK(bitnum,len,type)
 * BIT_NUM_TO_MASK(bitnum,type)
 * CLEAR_BITS_USING_MASK(value,mask,type)
 * FLIP_BITS_USING_MASK(value,mask,type)
 * LEFT_CIRCULAR_SHIFT_BITS(value,shifts,len,type)
 * LEFT_SHIFT_BITS(value,shifts,len,type)
 * NAND_BITS(m,n,type)
 * NOR_BITS(m,n,type)
 * RIGHT_CIRCULAR_SHIFT_BITS(value,shifts,len,type)
 * RIGHT_SHIFT_BITS(value,shifts,len,type)
 * SET_BITS_USING_MASK(value,mask,type)
 * TEST_BITS_USING_MASK(value,mask,type)
 * TYPE_SIZE_IN_BITS(type)
 * XNOR_BITS(m,n,type)

 * Beware of side effects: many macros in this file evaluate their arguments
 * more than once. These are marked with EVALTWICE comments.
 */
/* Gracefully allow multiple includes of this file. */
#ifndef BITOPS_H
#define BITOPS_H
/******************* I N C L U D E S *****************/
/*lint -efile(766,limits.h) */
#include <limits.h> /* CHAR_BIT. */
/*********************** M A C R O S ****************/
/* ALL_ONE_BITS(type): Generate a value of type "type" with all bits set to 
 * 1. "type" must be an unsigned integral type.
 */
#define ALL_ONE_BITS(type) ( (type) ~((type)0) )
/* ALL_ZERO_BITS(type): Generate a value of type "type" with all bits set to 
 * 0. "type" must be an unsigned integral type.
 */
#define ALL_ZERO_BITS(type) ( (type) 0 )
/* BIT_NUM_AND_LEN_TO_MASK(bitnum,len,type): Return a mask of type "type", 
 * with "len" bits on, starting at "bitnum". Bit 0 is LSB, bits start at 
 * "bitnum" and are turned on in the mask starting at "bitnum" and going to 
 * the left. "type" must be an unsigned integral type.
 */
/*EVALTWICE*/
#define BIT_NUM_AND_LEN_TO_MASK(bitnum,len,type) \
 /*CONSTCOND*/ \
 /*lint -save -e506 -e572 -e778 */ \
 ( (type) \
 ( \
 ( (ALL_ONE_BITS(type)) \
 >> ((TYPE_SIZE_IN_BITS(type)) \
 - ((bitnum)+(type)(len)) ) ) \
 & ( \
 ((bitnum)>0) \
 ? ~( ALL_ONE_BITS(type) \
 >> ( (TYPE_SIZE_IN_BITS(type)) \
 - (bitnum) ) ) \
 : ALL_ONE_BITS(type) ) \
 ) \
 ) \
 /*lint -restore */
/* BIT_NUM_TO_MASK(bitnum,type): Convert bit number "bitnum" to mask of type 
 * "type". Bits are numbered from right to left, with bit 0 being the least
 * significant bit (LSB). "type" must be an unsigned integral type.
 * This is my (JR's) modification of something posted to Usenet by Bill 
 * Shannon (shannon@sun.uucp) many years ago.
 */
#define BIT_NUM_TO_MASK(bitnum,type) \
 ( (type) ( ((type)1) << ((type)(bitnum)) ) )
/* BTEST(value,bitnum,type): Test bit numbered "bitnum" in "value", which must
 * be of type "type". If the tested bit is on, return a Boolean true value 
 * (1); otherwise, return a Boolean false (0). "type" must be an unsigned 
 * integral type.
 */
#define BTEST(value,bitnum,type) \
 ( ( (value) & (BIT_NUM_TO_MASK((bitnum),type)) ) \
 ? 1 : 0 \
 )
/* CLEAR_BITS_USING_MASK(value,mask,type): Return "value", except that any
bits

 * which are turned on in "mask" will be turned off in the return value.
 * "type" must be an unsigned integral type.
 */
#define CLEAR_BITS_USING_MASK(value,mask,type) \
 ( (type) ( (value) & ~(mask) ) )
/* FLIP_BITS_USING_MASK(value,mask,type): Return "value", except that any bits
 * which are turned on in "mask" will be flipped (toggled) in the return
 * value. "type" must be an unsigned integral type.
 */
#define FLIP_BITS_USING_MASK(value,mask,type) \
 ( (type) ( (value) ^ (mask) ) )
/* IAND(m.n,type): Return the bitwise "and" of the integral values "m" and
"n".
 * "type" must be an unsigned integral type.
 */
#define IAND(m,n,type) \
 ( (type) ( (m) & (n) ) )
/* IBCLR(value,bitnum,type): Return "value" with bit at "bitnum" cleared 
 * (zeroed). "type" must be an unsigned integral type.
 */
#define IBCLR(value,bitnum,type) \
 ( \
 (type) CLEAR_BITS_USING_MASK( \
 (value), \
 BIT_NUM_TO_MASK( (bitnum), type ), \
 type) \
 )
/* IBITS(value,bitnum,len,type): Extract bits from "value", starting at bit 
 * "bitnum", for "len" bits. The result will be right justified. "type" must
be
 * an unsigned integral type.
 */
/*EVALTWICE*/
#define IBITS(value,bitnum,len,type) \
 /*CONSTCOND*/ \
 /*lint -save */ /* Preserve PC-LINT options. */ \
 /*lint -e572 */ /* Ignore excessive shift val */ \
 /*lint -e778 */ /* Ignore const expr eval to 0 */ \
 ( (type) \
 ( \
 ( (value) & \
 (BIT_NUM_AND_LEN_TO_MASK( \
 (bitnum), (len), type )) ) \
 >> (bitnum) \
 ) \
 ) \
 /*lint -restore */
/* IBSET(value,bitnum,type): Return "value" with bit at "bitnum" set to true.
 * "type" must be an unsigned integral type.
 */
#define IBSET(value,bitnum,type) \
 ( (type) \
 ( \
 SET_BITS_USING_MASK( \
 (value), \
 BIT_NUM_TO_MASK( (bitnum), type ), \
 type) \
 ) \
 )
/* IEOR(m.n,type): Return the bitwise exclusive-or of the integral values "m" 
 * and "n". "type" must be an unsigned integral type.

 */
#define IEOR(m,n,type) \
 ( (type) ( (m) ^ (n) ) )
/* IOR(m.n,type): Return the bitwise "or" of the integral values "m" and "n".
 * "type" must be an unsigned integral type.
 */
#define IOR(m,n,type) \
 ( (type) ( (m) (n) ) )
/* ISHFT(value,shifts,type): Return "value" with bits logically shifted as 
 * specified by "shifts". Zeros will be shifted-in as applicable. A positive 
 * amount for "shifts" causes a left shift; a negative amount causes a right 
 * shift; a zero amount causes no shift. Note that the absolute value of 
 * "shifts" must be less than or equal to TYPE_SIZE_IN_BITS("type"). Also note
 * that "value" must be of type "type", and "type" must be an unsigned 
 * integral type.
 */
/*EVALTWICE*/
#define ISHFT(value,shifts,type) \
 /*CONSTCOND*/ \
 /*lint -save */ /* Preserve PC-LINT settings. */ \
 /*lint -e504 */ /* Ignore unusual shift value */ \
 /*lint -e778 */ /* Ignore const expr eval to 0 */ \
 ( (type) \
 ( ((shifts)>0) \
 ? ( (value) << (shifts) ) \
 : ( ( (shifts)<0 ) \
 ? ( (value) >> (-(shifts)) ) \
 : (value) \
 ) \
 ) \
 ) \
 /*lint -restore */
/* ISHFTC(value,shifts,len,type): Return "value" with bits circularly shifted
 * (as specified by "shifts") within the lower "len" bits of "value". A 
 * positive amount for "shifts" causes a left shift; a negative amount causes 
 * a right shift; a zero amount causes no shift. Note that the absolute value 
 * of "shifts" must be less than or equal to "len". Also note that "value"
 * must be of type "type", and "type" must be an unsigned integral type. "len"
 * must be greater than 0 and less than or equal to TYPE_SIZE_IN_BITS("type").
 */
/*EVALTWICE*/
#define ISHFTC(value,shifts,len,type) \
 /*lint -save -e501 */ \
 ( (type) ( \
 ( ((shifts) == 0) \
 ((len) == (type) (shifts)) \
 ((len) == - (type) (shifts)) ) \
 ? ((type)(value)) \
 : ( \
 ( (shifts) > 0 ) \
 ? (RIGHT_CIRCULAR_SHIFT_BITS( \
 (value), (shifts), (len), type) ) \
 : (LEFT_CIRCULAR_SHIFT_BITS( \
 (value), (type) (- (shifts)), \
 (len), type) ) ) \
 ) \
 ) /*lint -restore */
/* LEFT_CIRCULAR_SHIFT_BITS(value,shifts,len,type): Return "value" with bits 
 * circularly shifted left "shifts" bits within the lower "len" bits of

 * "value". A zero amount for "shifts" causes no shift. Note that "shifts" 
 * must be less than or equal to "len". Also note that "value" must be of type
 * "type", and "type" must be an unsigned integral type. "len" must also be 
 * greater than zero and less than or equal to TYPE_SIZE_IN_BITS("type").
 */
/*EVALTWICE*/
#define LEFT_CIRCULAR_SHIFT_BITS( \
 value,shifts,len,type) \
 /*lint -save -e504 */ \
 ( (type) ( \
 ((shifts)==0) ((len)==(type) (shifts)) \
 ? (value) \
 : ( ( (value) & \
 ~BIT_NUM_AND_LEN_TO_MASK(0,(len),type) ) \
 ( ( (value) & (BIT_NUM_AND_LEN_TO_MASK( \
 0, (len), type )) ) \
 >> (shifts) ) \
 ( ( (value) & (BIT_NUM_AND_LEN_TO_MASK( \
 0, (shifts), type )) ) \
 << ((len)-(type)(shifts)) ) ) ) \
 ) /*lint -restore */
/* LEFT_SHIFT_BITS(value,shifts,len,type): Return "value" with bits logically 
 * shifted left "shifts" bits within the lower "len" bits of "value". If
 * necessary, zero bits are added on the right. A zero amount for "shifts" 
 * causes no shift. Note that "shifts" must be less than or equal to "len". 
 * Also note that "value" must be of type "type", and "type" must be an 
 * unsigned integral type. "len" must also be greater than zero and less than 
 * or equal to TYPE_SIZE_IN_BITS("type").
 */
/*EVALTWICE*/
#define LEFT_SHIFT_BITS(value,shifts,len,type) \
 /*lint -save -e504 */ \
 ( (type) ( \
 ( ((shifts)==0) ((len)==(type) (shifts)) ) \
 ? (value) \
 : ( ( (value) & \
 ~BIT_NUM_AND_LEN_TO_MASK(0,(len),type) ) \
 ( ( (value) << (shifts) ) \
 & (BIT_NUM_AND_LEN_TO_MASK( \
 0, (len), type )) ) ) \
 ) \
 ) /*lint -restore */
/* MVBITS(src,srcindex,len,destptr,destindex,type): Update the value that 
 * "destptr" points to, using bits extracted from "src" starting at bit 
 * "srcindex" for "len" bits. "destindex" indicates the bit number in the 
 * destination to begin updates. "type" must be an unsigned integral type.
 */
/*EVALTWICE*/
#define MVBITS( \
 src,srcindex,len,destptr,destindex,type) \
 /*CONSTCOND*/ /*lint -save -e506 */ \
 { \
 type srcbits = \
 (src) & BIT_NUM_AND_LEN_TO_MASK( \
 (srcindex), (len), type ); \
 type destmask = BIT_NUM_AND_LEN_TO_MASK( \
 (destindex), (len), type ); \
 *(destptr) &= ~destmask; \
 *(destptr) = ISHFT( \

 srcbits, \
 (int) ((destindex)-(srcindex)), \
 type ); \
 } /*lint -restore */
/* NAND_BITS(m.n,type): Return the bitwise "nand" of the integral values 
 * "m" and "n". "type" must be an unsigned integral type.
 */
#define NAND_BITS(m,n,type) \
 ( (type) ~ ( IAND((m),(n),type) ) )
/* NOR_BITS(m.n,type): Return the bitwise "nor" of the integral values "m" and

 * "n". "type" must be an unsigned integral type.
 */
#define NOR_BITS(m,n,type) \
 ( (type) ~ ( IOR((m),(n),type) ) )
/* NOT(value,type): Return all bits of "value" flipped. Note that "value" must
 * be of type "type", which must be an unsigned integral type.
 */
#define NOT(value,type) ( (type) ~((type)(value)) )
/* RIGHT_CIRCULAR_SHIFT_BITS(value,shifts,len,type). Return "value" with bits 
 * circularly shifted right "shifts" bits within the lower "len" bits of
 * "value". A zero amount for "shifts" causes no shift. Note that "shifts" 
 * must be less than or equal to "len". Also note that "value" must be of type
 * "type", and "type" must be an unsigned integral type. "len" must also be 
 * greater than zero and less than or equal to TYPE_SIZE_IN_BITS("type").
 */
/*EVALTWICE*/
#define RIGHT_CIRCULAR_SHIFT_BITS( \
 value,shifts,len,type) \
 /*lint -save -e504 */ \
 ( (type) ( \
 ((shifts)==0) ((len)==(type) (shifts)) \
 ? (value) \
 : ( ( (value) & \
 ~BIT_NUM_AND_LEN_TO_MASK(0,(len),type) ) \
 ( ( (value) & (BIT_NUM_AND_LEN_TO_MASK( \
 0, ((len)-(type)(shifts)),type)) ) \
 <<(shifts)) \
 ( ((value)&(BIT_NUM_AND_LEN_TO_MASK( \
 ((len)-(type)(shifts)),(shifts),type))) \
 >> ((len)-(type)(shifts)) ) ) ) \
 ) /*lint -restore*/
/* RIGHT_SHIFT_BITS(value,shifts,len,type): Return "value" with bits logically
 * shifted right "shifts" bits within the lower "len" bits of "value". If
 * necessary, zero bits are added on the left. A zero amount for "shifts" 
 * causes no shift. Note that "shifts" must be less than or equal to "len". 
 * Also note that "value" must be of type "type", and "type" must be an 
 * unsigned integral type. "len" must also be greater than zero and
 * less than or equal to TYPE_SIZE_IN_BITS("type").
 */
/*EVALTWICE*/
#define RIGHT_SHIFT_BITS(value,shifts,len,type) \
 /*lint -save -e504 */ \
 ( (type) ( \
 ( ((shifts)==0) ((len)==(type) (shifts)) ) \
 ? (value) \
 : ( ( (value) & \
 ~BIT_NUM_AND_LEN_TO_MASK(0,(len),type) ) \
 ( ( (value) & (BIT_NUM_AND_LEN_TO_MASK( \
 0, (len), type )) ) >> (shifts) ) ) \

 ) \
 ) /*lint -restore */
/* SET_BITS_USING_MASK(value,mask,type): Return "value", except that any bits 
 * which are turned on in "mask" will also be turned on in the return
 * value. "type" must be an unsigned integral type.
 */
#define SET_BITS_USING_MASK(value,mask,type) \
 ( (type) ( (value) (mask) ) )
/* TEST_BITS_USING_MASK(value,mask,type): Return "value", except that only
bits
 * which are turned on in "mask" will be returned. "type" must be an
 * unsigned integral type.
 */
#define TEST_BITS_USING_MASK(value,mask,type) \
 ( (type) ( (value) & (mask) ) )
/* TYPE_SIZE_IN_BITS(type): Return the number of bits required for type
"type".
 */
#define TYPE_SIZE_IN_BITS(type) \
 ( (type) ( sizeof(type) * CHAR_BIT ) )
/* XNOR_BITS(m.n,type): Return the bitwise exclusive "nor" of the integral 
 * values "m" and "n". "type" must be an unsigned integral type. 
 */
#define XNOR_BITS(m,n,type) \
 ( (type) ~ ( IEOR((m),(n),type) ) )
#endif /* BITOPS_H */

Listing Two
/* mmixcom.h-- MMIX common defns. Copyright (c) 1994 by JR (John Rogers).
 * All rights reserved. CompuServe: 72634,2402
 * FUNCTION - mmixcom.h contains types and equates used for defining MMIX 
 * instructions in object code format.
 * We take advantage of the implicit ANSI C requirement that unsigned char be
8
 * bits or larger. Similarly, we can assume unsigned long is 32 bits or
larger.
 */
#ifndef MMIXCOM_H
#define MMIXCOM_H
/* Define a type for one instruction. Note that this will be at least 32 bits,
 * depending on the compiler.
 */
typedef unsigned long MMIX_Instr_T;
/* We also need to deal with single words in MMIX. These are currently 32 bits

 * wide, although Knuth is likely to change them to 64 bits soon.
 */
typedef unsigned long MMIX_Word_T;
#define MMIX_WORD_LEN 32
/* Many parts of MMIX words are in bytes. In MMIX, a byte is 8 bits long. In
C,
 * this might be larger.
 */
typedef unsigned char MMIX_Byte_T;
/* Even if "char" is more than 8 bits, leave this. */
#define MMIX_BYTE_BIT_LEN 8
/* Define a type for an opcode. */
typedef MMIX_Byte_T MMIX_Opcode_T;
/* Define equates for each part of MMIX_Instr_T. Use bit numbering convention 
 * of 0=least significant bit (LSB).
 */
#define MMIX_INSTR_OPCODE_START 24
#define MMIX_INSTR_OPCODE_LEN MMIX_BYTE_BIT_LEN
#define MMIX_INSTR_X_START 16
#define MMIX_INSTR_X_LEN MMIX_BYTE_BIT_LEN

#define MMIX_INSTR_Y_START 8
#define MMIX_INSTR_Y_LEN MMIX_BYTE_BIT_LEN
#define MMIX_INSTR_Z_START 0
#define MMIX_INSTR_Z_LEN MMIX_BYTE_BIT_LEN
#endif /* MMIXCOM_H */
End Listings

























































RAMBLINGS IN REAL TIME


Frames of Reference




Michael Abrash


Michael is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be reached at mikeab@idsoftware.com.


Several years ago, I opened a column in Dr. Dobb's Journal with a story about
singing my daughter to sleep with Beatles songs. Beatles songs, at least the
earlier ones, tend to be bouncy and pleasant, which makes them suitable
good-night fodder--and there are a lot of them, a useful hedge against
terminal boredom. So for many good reasons, "Can't Buy Me Love" and "Hard
Day's Night" and "Help!" and the rest were evening staples for years.
No longer, though. You see, I got my wife some Beatles tapes for Christmas.
We've all been listening to them in the car, and now that my daughter has
heard the real thing, she can barely stand to be in the same room, much less
fall asleep, when I sing those songs.
What's noteworthy is that the only variable involved in this change was my
daughter's frame of reference. My singing hasn't gotten any worse over the
last four years. (I'm not sure it's possible for my singing to get worse.) All
that changed was my daughter's frame of reference for those songs. The rest of
the universe stayed the same; the change was in her mind-lock, stock, and
barrel.
Often, the key to solving a problem or working on a problem efficiently is a
proper frame of reference. Your model of the problem often determines how
deeply you can understand it, and how flexible and innovative you can be in
solving it.
An excellent example of this, and one which I'll discuss toward the end of
this column, is that of 3-D transforms--the process of converting coordinates
from one coordinate space to another, for example from worldspace to
viewspace. The way this is traditionally explained is functional, but not
particularly intuitive, and fairly hard to visualize. Recently, I've come
across another way of looking at transforms that seems far easier to grasp.
The two approaches are technically equivalent, so the difference is purely a
matter of how we view things--but sometimes that's the most important
difference.
Before we can talk about transforming between coordinate spaces, however, we
need two building blocks: dot products and cross products.


3-D Math


In my last column I promised to present a BSP-based renderer this month, to
complement the BSP compiler we've developed over the last two columns. But the
considerable amount of mail about 3-D math that I've received over the last
two months changed my mind. In every case, the writer bemoaned his or her lack
of expertise with 3-D math, asked me to recommend books about 3-D math, and
questioned how they could learn more.
That's a commendable attitude, but the truth is, there's not all that much to
3-D math, at least for the sort of polygon-based, real-time 3-D done on PCs.
You really need only two basic math tools beyond simple arithmetic: dot
products and cross products; mostly, just the former. My friend Chris Hecker
points out that this is an oversimplification; math-related operations like
BSP trees, graphs, discrete math for edge stepping, and affine and perspective
texture mappings also go into a production-quality game. While that's true,
dot and cross products, together with matrix math and perspective projection,
constitute the bulk of what most people mean by "3-D math." As we'll see, dot
and cross products are key tools for a lot of useful 3-D operations.
The mail also made clear that a lot of people out there don't understand dot
or cross products, at least insofar as they apply to 3-D. Since just about
everything I'll do in this column relies to some extent on dot and cross
products (even the line-intersection formula I discussed last time is actually
a quotient of dot products), I'll devote this column to examining these basic
tools and some of their 3-D applications. If this is old hat to you, my
apologies; I'll return to BSP-based rendering next time.


A Little Background


Dot and cross products themselves are straightforward and require almost no
context to understand, but I need to define some terms I'll use when
describing their application.
I assume you have some math background, so I'll quickly define a "vector" as a
direction and a magnitude, represented as a coordinate pair (in 2-D) or
triplet (in 3-D), relative to the origin. That's a pretty sloppy definition,
but it'll do for our purposes; for the real McCoy, check out Calculus and
Analytic Geometry, Eighth Edition, by George B. Thomas, Jr. and Ross L. Finney
(Addison-Wesley, 1991, ISBN 0-201-52929-7).
So, for example, in 3-D, the vector V=[5 0 5] has a length, or magnitude, of
V=5 sqrt 2, by the Pythagorean theorem, as shown in Example 1 (vertical double
bars denote vector length), and a direction in the plane of the x and z axes,
exactly halfway between those two axes.
I'll be working in a left-handed coordinate system, whereby if you wrap the
fingers of your left hand around the z axis with your thumb pointing in the
positive z direction, the fingers will curl from the positive x axis to the
positive y axis. The positive x axis runs left to right across the screen, the
positive y axis runs bottom to top, and the positive z axis runs into the
screen.
For our purposes, projection is the process of mapping coordinates onto a line
or surface. "Perspective projection" projects 3-D coordinates onto a
viewplane, scaling coordinates according to their z distance from the
viewpoint in order to provide proper perspective. "Objectspace" is the
coordinate space in which an object is defined, independent of other objects
and the world itself. "Worldspace" is the absolute frame of reference for a
3-D world; all objects' locations and orientations are with respect to
worldspace, and this is the frame of reference around which the viewpoint and
view direction move. "Viewspace" is worldspace as seen from the viewpoint,
looking in the view direction. "Screenspace" is viewspace after perspective
projection and scaling to the screen.
Finally, "transformation" is the process of converting points from one
coordinate space into another; in our case, that'll mean rotating and
translating (moving) points from objectspace or worldspace to viewspace.
For additional information, check out Computer Graphics: Principles and
Practice, Second Edition, by James D. Foley and Andries van Dam
(Addison-Wesley, 1990. ISBN 0-201-12110-7), or my X-Sharp columns in DDJ in
1992; those columns are also collected in my book Zen of Graphics Programming
(Coriolis Group Books, 1995, ISBN 1-883577-08-X).


The Dot Product


Now for the dot product. Given two vectors U=[u1 u2 u3] and V=[v1 v2 v3],
their dot product (denoted by the symbol), is calculated as in Example 2(a).
The result is a scalar value (a single, real-valued number), not another
vector.
Now that you know how to calculate a dot product, what does that get you? Not
much. The dot product isn't much use for graphics until you start thinking of
it as in Example 2(b), where q is the angle between the two vectors and the
other two terms are the lengths of the vectors, as shown in Figure 1. Although
it's not immediately obvious, Example 2(b) has a wide variety of applications
in 3-D graphics.


Dot Products of Unit Vectors


The simplest case of the dot product is when both vectors are unit vectors;
that is, when their lengths are both one, as calculated as Example 1. In this
case, Example 2(b) simplifies to Example 3(a). In other words, the dot product
of two unit vectors is the cosine of the angle between them.
One obvious use of this is to find angles between unit vectors, in conjunction
with an inverse cosine function or lookup table. A more useful application for
3-D graphics is in lighting surfaces, where the cosine of the angle between
incident light and the normal (perpendicular vector) of a surface determines
the fraction of the light's full intensity at which the surface is
illuminated, as in Example 3(b), where Is is the intensity of illumination of
the surface, Il is the intensity of the light, and q is the angle between -Dl
(where Dl is the light direction vector) and the surface normal. If the
inverse light vector and the surface normal are both unit vectors, then this
calculation can be performed with four multiplies and two additions--and no
explicit cosine calculations--as in Example 3(c), where Ns is the surface unit
normal and Dl is the light unit direction vector; see Figure 2.


A Brief Aside on Cross Products



One question Example 3(c) begs is, Where does the surface unit normal come
from? One approach is to store the end of a surface normal as an extra data
point with each polygon (with the start being some point that's already in the
polygon), and transform it along with the rest of the points. This has the
advantage that if the normal starts out as a unit normal, it will end up that
way too, if only rotations and translations (but not scaling and shears) are
performed.
The problem with an explicit normal is that it will remain a normal--that is,
perpendicular to the surface--only through viewspace. Rotation, translation,
and scaling preserve right angles, which is why normals are still normals in
viewspace, but perspective projection does not preserve angles, so vectors
that were surface normals in viewspace are no longer normals in screenspace.
Why does this matter? Because, on average, half the polygons in any scene face
away from the viewer, and hence shouldn't be drawn. One way to identify such
polygons is to see whether they face toward or away from the viewer; that is,
whether their normals have negative z values (so they're visible) or positive
z values (so they should be culled). However, we're talking about screenspace
normals here, because the perspective projection can shift a polygon relative
to the viewpoint so that although its viewspace normal has a negative z, its
screenspace normal has a positive z, and vice versa, as in Figure 3. So we
need screenspace normals, but those can't readily be generated by
transformation from worldspace.
The solution is to use the cross product of two of the polygon's edges to
generate a normal. Example 4 is the formula for the cross product. (Note that
the cross-product operation is denoted by an X.) Unlike the dot product, the
result of the cross product is a vector. Not just any vector, either--the
vector generated by the cross product is perpendicular to both of the original
vectors. Thus, the cross product can be used to generate a normal to any
surface for which you have two vectors that lie within the surface. This means
that we can generate the screenspace normals we need by taking the cross
product of two adjacent polygon edges, as in Figure 4. In fact, we can cull
with only one-third the work needed to generate a full cross product; because
we're interested only in the sign of the z component of the normal, we can
skip calculating the x and y components entirely. The only caveat is to be
careful that neither edge you choose is zero-length and that the edges aren't
collinear, because the dot product can't produce a normal in those cases.
Perhaps the most often asked question about cross products is, Which way do
normals generated by cross products go? In a left-handed coordinate system,
curl the fingers of your left hand so the fingers curl through an angle of
less than 180 degrees from the first vector in the cross product to the second
vector. Your thumb now points in the direction of the normal.
If you take the cross product of two orthogonal (right-angle) unit vectors,
the result will be a unit vector that's orthogonal to both of them. This means
that if you're generating a new coordinate space--such as a new viewing frame
of reference--you only need to come up with unit vectors for two of the axes
for the new coordinate space. You can then use their cross product to generate
the unit vector for the third axis. If you need unit normals and the two
vectors being crossed aren't orthogonal unit vectors, you'll have to normalize
the resulting vector; that is, divide each of the vector's components by the
length of the vector, to make it a unit long.


Using the Sign of the Dot Product


The dot product is the cosine of the angle between two vectors, scaled by the
magnitudes of the vectors. Magnitudes are always positive, so the sign of the
cosine determines the sign of the result. The dot product is positive if the
angle between the vectors is less than 90 degrees, negative if it's greater
than 90 degrees, and 0 if the angle is exactly 90 degrees. This means that
just the sign of the dot product suffices for tests involving comparisons of
angles to 90 degrees, and there are more of those than you'd think.
Consider, for example, the process of backface culling, discussed earlier in
the context of using screenspace normals to determine polygon orientation
relative to the viewer. The problem with that approach is that it requires
each polygon to be transformed into viewspace, then perspective projected into
screenspace, before the test can be performed, and that involves a lot of
time-consuming calculation. Instead, we can perform culling way back in
worldspace (or even earlier, in objectspace, if we transform the viewpoint
into that frame of reference), given only a vertex and a normal for each
polygon and a location for the viewer.
Here's the trick: Calculate the vector from the viewpoint to any vertex in the
polygon, and take its dot product with the polygon's normal, as in Figure 5.
If the polygon is facing the viewpoint, the result is negative, because the
angle between the two vectors is greater than 90 degrees. If the polygon is
facing away, the result is positive, and if the polygon is edge-on, the result
is 0. That's all there is to it--and this sort of backface culling happens
before any transformation or projection at all is performed, saving a great
deal of work for the half of all polygons, on average, that are culled.
Backface culling with the dot product is just a special case of determining
which side of a plane any point (in this case, the viewpoint) is on. The same
trick can be applied to determine whether a point is in front of or behind a
plane, where a plane is described by any point that's on the plane (which I'll
call the "plane origin"), plus a plane normal. One such application is in
clipping a line (such as a polygon edge) to a plane. Just do a dot product
between the plane normal and the vector from one line endpoint to the plane
origin, and repeat for the other line endpoint. If the signs of the dot
products are the same, no clipping is needed; if they differ, it is. And yes,
the dot product is also the way to do the actual clipping; but before we can
talk about that, we need to understand the use of the dot product for
projection.


Using the Dot Product for Projection


Consider Example 2(b) again, but this time making one of the vectors, say V, a
unit vector. Now the equation reduces to Example 5(a). In other words, the
result is the cosine of the angle between the two vectors, scaled by the
magnitude of the nonunit vector. Now, consider that cosine is really just the
length of the adjacent leg of a right triangle, think of the nonunit vector as
the hypotenuse of a right triangle, and remember that all sides of similar
triangles scale equally. What it all works out to is that the value of the dot
product of any vector with a unit vector is the length of the first vector
projected onto the unit vector, as in Figure 6.
This unlocks all sorts of neat stuff. Want to know the distance from a point
to a plane? Just dot the vector from the point P to the plane origin Op with
the plane unit normal Np, to project the vector onto the normal, then take the
absolute value, as shown in Example 5(b) and Figure 7.
Want to clip a line to a plane? Calculate the distance from one endpoint to
the plane, as just described, and dot the whole line segment with the plane
normal, to get the full length of the line along the plane normal. The ratio
of the two dot products is then how far along the line from the endpoint the
intersection point is; just move along the line segment by that distance from
the endpoint, and you're at the intersection point, as shown in Example 6.


Rotation by Projection


You can use the dot product's projection capability to look at rotation in an
interesting way. Typically, rotations are represented by matrices. This is
certainly a workable representation that encapsulates all aspects of
transformation in a single object, and it is ideal for concatenations of
rotations and translations. One problem with matrices, though, is that many
people, myself included, have a hard time looking at a matrix of sines and
cosines and visualizing what's actually going on. So when two 3-D experts,
John Carmack and Billy Zelsnack, mentioned that they think of rotation
differently, in a way that seemed more intuitive to me, I thought it was worth
passing on.
Their approach is this: Think of rotation as projecting coordinates onto new
axes. That is, given that you have points in, say, worldspace, define the new
coordinate space (viewspace, for example) to which you want to rotate by a set
of three orthogonal unit vectors defining the new axes, and then project each
point onto each of the three axes to get the coordinates in the new coordinate
space, as shown for the 2-D case in Figure 8. In 3-D, this involves three dot
products per point, one to project the point onto each axis. Translation can
be done separately from rotation by simple addition.
Rotation by projection is exactly the same as rotation via matrix
multiplication; in fact, the rows of a rotation matrix are the orthogonal unit
vectors pointing along the new axes. Rotation by projection buys us no
technical advantages, so that's not what's important here; the key is that the
concept of rotation by projection, together with a separate translation step,
gives us a new way to look at transformation that I, for one, find easier to
visualize and experiment with. A new frame of reference for how we think about
3-D frames of reference, if you will.
Three things I've learned over the years are that:
It never hurts to learn a new way of looking at things.
It helps to have a clearer, more intuitive model in your head of whatever it
is you're working on.
New tools, or new ways to use old tools, are Good Things. 
My experience has been that rotation by projection, and dot-product tricks in
general, offer those sorts of benefits for 3-D.
Next time, we'll do BSP-based rendering, and if there's room, maybe I can
sneak in a sample app that shows some smart dot tricks in action.
Figure 1: The dot product; UV=cos(theta) U V.
Figure 2: Lighting intensity is a function of cos(theta)=Ns-Dl.
Figure 3: Viewspace normal z direction doesn't necessarily indicate front/back
visibility after perspective projection.
Figure 4: The cross product of two polygon edge vectors generates a polygon
normal; normal=E0xE1.
Figure 5: Backface culling with the dot product. V0N0<0, so polygon 0 faces
forward and is visible; V1N1>0, so polygon 1 faces backward and is invisible.
Figure 6: The dot product with a unit vector performs a projection.
Figure 7: Using the dot product to get the distance from a point to a plane;
distance =(P-Op)Np.
Figure 8: Rotating to a new coordinate space by projection onto the new axes.
Example 1: The Pythagorean theorem in 3-D, where the vector V=[5 0 5] has a
length, or magnitude, of 5 sqrt 2.
Example 2: (a) Calculating a dot product; (b) using dot products for 3-D
graphics.
Example 3: (a) Example 2(b) with unit vectors; using dot products for lighting
surfaces; (c) performing the calculation with four multiplies and two
additions--and no explicit cosine calculations.
Example 4: Formula for the cross product. 
UxV=[u2v3-u3v2 u3v1-u1v3 u1v2-u2v1]
Example 5: (a) Using the dot product for projection; (b) using the dot product
to determine the distance to a plane.
Example 6: The intersection point on a line segment.
// Given two line endpoints, a point on a plane, and a unit normal
// for the plane, returns the point of intersection of the line
// and the plane in intersectpoint.
#define DOT_PRODUCT(x,y) (x[0]*y[0]+x[1]*y[1]+x[2]*y[2])
void LineIntersectPlane (float *linestart, float *lineend,

 float *planeorigin, float *planenormal, float *intersectpoint)
{
 float vec1[3], projectedlinelength, startdistfromplane, scale;
 vec1[0] = linestart[0] - planeorigin[0];
 vec1[1] = linestart[1] - planeorigin[1];
 vec1[2] = linestart[2] - planeorigin[2];
 startdistfromplane = DOT_PRODUCT(vec1, planenormal);
 if (startdistfromplane == 0)
 {
 // point is in plane
 intersectpoint[0] = linestart[0];
 intersectpoint[1] = linestart[1];
 intersectpoint[2] = linestart[1];
 return;
 }
 vec1[0] = linestart[0] - lineend[0];
 vec1[1] = linestart[1] - lineend[1];
 vec1[2] = linestart[2] - lineend[2];
 projectedlinelength = DOT_PRODUCT(vec1, planenormal);
 scale = startdistfromplane / projectedlinelength;
 intersectpoint[0] = linestart[0] - vec1[0] * scale;
 intersectpoint[1] = linestart[1] - vec1[1] * scale;
 intersectpoint[2] = linestart[1] - vec1[2] * scale;
}







































DTACK REVISITED


Rocket Science Made Simple




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded and can be contacted through the DDJ offices.


I'm going to explain some important stuff about computer architecture, stuff
that you really need to know. I'll cover the Pentium, PowerPC 601, and the P6.
We have to discuss a few basics before we come to the important stuff.
The term "computer architecture" is widely misunderstood. It has little to do
with the design of a computer system or microprocessor chip. The computer
architect is best known as the person who gets to use a clean piece of paper
to define which instructions the computer will be able to execute. But the
most important job the architect does is decide on the length(s), in bits, of
the computer instructions and assign the bit fields within that length to
perform the necessary computer operations.
If there's a large proportion of unused combinations, the architect has done a
lousy job. But a few should be set aside. When Intel designed the 8086, some
then-undefined combinations later became the basis for adding a (very) few
more registers in the 386 generation.
The 8086 was designed back when 64K was a huge memory space and Pascal seemed
to be taking over the personal-computer marketplace. So the 8086 was given
exactly enough registers to run compiled Pascal.
Because memory was then an extremely limited resource, the 8086's basic
instruction-field length was made eight bits, and some of Pascal's most common
instructions (LOOP, for example) were fitted into those eight bits; eight bits
does not provide for specifying a lot of registers.
When the 68000 was designed, larger memories were common, so the architect
selected a 16-bit basic instruction field. Two 4-bit register fields were
assigned. Eight bits, half the 16-bit instruction field, went to defining the
source and destination registers.
But the ability to place many transistors on a single die was exploding, and
soon 32 each, 32-bit registers started showing up, for instance on David
Patterson's Berkeley RISC I design. Five bits are required to select one of 32
registers. If two-address (SRC, DEST) operands were to be used, then ten bits
of the instruction bit field were needed to specify the registers. That leaves
only six bits of a 16-bit instruction field, not enough to be useful.
So computers with 32 registers moved up to a 32-bit instruction field. All the
computer architects made the decision to use three-address operands (SRC1,
SRC2, DEST) and so assigned 15 bits just for register selection--again, about
half the instruction field.
The microprocessor went from a register-starved, 8-bit instruction field in
1977 to a register-rich, 32-bit instruction field in 1982. These architectural
decisions were dictated by the then state of the chip-fabrication art. Let me
repeat--these were architectural decisions.
And architectural innovations stopped right there in 1982, because a personal
computer does not (yet) need a 64-bit instruction field. Yep. Architecture for
personal computers essentially froze in 1982.
How do you upgrade a computer to a new architecture? In other words, how do
you get your hands on more registers while continuing to run your old
software? The answer is, you don't. The only way to get more registers is to
abandon your software--all your software--and move to a new computer. I
understand the MIPS-based ACE computer systems (which run both UNIX and
Windows NT) are particularly good examples of desktop computers with
register-rich environments.
Oh? You don't have an ACE system on your desktop? You still use, and program
for, a register-starved computer architecture? Gee. It appears that computer
architecture, while fundamental, is not important.
The personal-computer marketplace doesn't care about architectural hardware
issues. The marketplace responds to fast and cheap. "Fast" means internal
caches, floating-point accelerators, superscalar techniques, and the
like--none of which has anything to do with architecture. (The presence or
absence of an internal cache is independent of the instruction field.)
"Cheap" means economy of scale. More than 50 million personal computers will
be sold this year, and to a first-order approximation, 100 percent of them
will be based on the x86 architecture. If you want a cheap computer, buy one
based on the x86.
But the marketplace still wants to run the software it acquired ten years ago.
Software compatibility is, in fact, an architectural issue, and it matters in
the marketplace.
The people who designed the Pentium and the P6 and who are currently designing
the P7 are not computer architects. But they're pretty good engineers, based
on the results I've seen. I call them "chip designers."
Back when the world was young and children were respectful of their elders,
the chip designer's job was simple: The design had to execute any instruction
as quickly as possible. Then it had to execute the next instruction as quickly
as possible. That's how the 8086, 286, 386, and 486 work.
But with the advent of the Pentium, those days are gone. The
Pentium--sometimes--executes more than one instruction in the same clock
cycle. That "sometimes" is pretty important to those of you who need to write
code that runs fast, and has afforded my colleague Michael Abrash the
opportunity to publish several articles on optimizing code for the Pentium.
The Pentium is the first x86 generation that uses a "superscalar"
implementation. Let's compare it to the PowerPC 601, which was primarily
designed by IBM, with a little bus-interface assistance from Motorola. To a
first-order approximation, the 60x architecture has 0 percent of the
personal-computer market.
The 601 is based on the latest computer architecture: the 32-bit model with 32
registers. Like the Pentium, its implementation uses superscalar techniques,
but not those used by the Pentium. The 601 can issue up to three instructions
each clock cycle, one each of integer, floating point (fp), and branch.
You are the software experts, not me, so let's pretend you just explained to
me that most application programs in the personal-computer market execute
instructions in the ratio 85 percent integer, 0 percent fp, and 15 percent
branch. This means the 601's ability to simultaneously execute fp instructions
with integer and branch instructions is useless. The only improvement the
superscalar 601 offers is the ability to simultaneously issue integer and
branch instructions. And since there are roughly six times as many integer as
branch instructions, this isn't terribly useful. In fact, the 601's
superscalar ability means that, at best, it can execute 100 instructions in 85
clocks (assuming one clock per instruction). All that superscalar design
effort provides, at best, a 17.6-percent performance improvement.
The Pentium's designers were much more crude. If either an fp or branch
instruction is issued on a given clock cycle, then no other instruction can be
issued at that time. In practice, this means that during the 15 percent of the
time that branch instructions are being issued, the Pentium ain't superscalar.
But in the 85 percent of the time that integer instructions are being issued,
the Pentium can--sometimes--issue two integer instructions on the same clock
cycle. This means the Pentium can, at best, execute a 100-instruction mix
(assuming one clock per instruction cycle) in 85/2+15=57.5 clocks--a 73.9
percent performance improvement.
Okay, instructions sometimes need more than a single clock to execute, and the
Pentium cannot always issue two integer instructions in the same clock period,
thus Abrash's fine articles on optimization. But Intel's chip designers
focused on improving performance during the 85 percent of the time that
integer instructions are being issued, while IBM's designers concentrated
their efforts on the 15 percent of the time that branch instructions were
being issued.
Which design team best earned its paycheck?
I sent a copy of the penultimate draft of this article to some folks who used
to design microprocessor chips for a living. One of them, John Wharton, called
me back and said "Hal, the Pentium doesn't work like that!" (The last four
digits of John's home phone number are 8051, which is one of Intel's most
popular 8-bit micros.)
So I was wrong. A Pentium can issue a branch instruction after an integer
instruction in the same clock (but not an integer instruction after a branch
instruction). And under rare circumstances the Pentium can issue two FP
instructions in the same clock--if one of them is an FXCH instruction.
In the pairing rules, a "complex" instruction is a microprogrammed
instruction, such as one of the string instructions (MOVS or SCAS, for
example). When one of the integer pipes goes into microprogrammed mode, both
pipes do. That's why only one "complex" instruction can be active at a time.
John also explained floating-point processing:
A cute trick the Pentium designers came up with was getting the result of a
64-bit FP operation back to the internal cache quickly. FP operations use the
integer pipes, each of which is 32 bits wide. So the Pentium uses both pipes
to move 64 bits in parallel. It saves one clock and at Pentium speeds, one
clock is important.
(The most interesting thing John told me was about the infighting--I call it
civil war--over Intel's upcoming P7. But that's another story.)
The Pentium design team set up two on-chip production lines, like Ford using
one line for Escorts and another for Taurii. With a budget of 5.5 million
transistors, the P6 design team was able to use more advanced techniques.
Continuing with the automotive analogy, the P6 makes intensive efforts to
build a car in the shortest time.
In the P6, we find a large crowd gathered at the input ends of several
parallel production lines (pipes), and another large crowd at the output ends.
The input crowd looks for tasks ready to proceed and issues them to one of the
production lines. It also looks for tasks that might be ready to proceed and
speculatively issues them, too. A list of 30 tasks to select from is kept.
The crowd at the output accepts and temporarily stores all the results the
several production lines deliver. Not everything that comes off the production
lines proves to be useful. Some "product" is ultimately discarded. ("We can't
use that blue trunk assembly on this red car, Fred. Throw it away!")
A scoreboard keeps track of everything that's going on. The P6 has a lot more
registers than the programmer's model asserts, and renames them for
efficiency. How did Intel's designers get so smart? They probably read Chaitin
et al.'s tutorial, "Register Allocation via Coloring," which is part of the
June 1982 SIGPLAN Proceedings on compiler construction. Yes, tutorial. In
1982. You didn't think this stuff was new, did you?
[Abstract: Register allocation may be viewed as a graph coloring problem...
Preliminary results... suggest that global register allocation approaching
that of hand-coded assembly language may be attainable.]
Now you should have a grasp of what Intel means when it says the P6 uses
scoreboarding techniques and issues instructions speculatively. Specifically,
the P6 guesses which branch paths will be taken and speculatively executes the
instructions following those branches (assuming no data dependencies). If
those branches are taken, then the instruction results are already available.
Otherwise, the results are discarded. The P6 speculatively executes
instructions passed up to five (!) branches, assuming they're available in the
30-instruction queue at the front end.
The P6 is Intel's first x86 that does not always directly execute x86
instructions. If you've read Abrash's articles on Pentium optimization, you
know the performance benefits of breaking some complex instructions down into
two simpler, yet equivalent, x86 instructions. Well, the P6 takes this a step
further. The P6's instruction decoder will often break a complex x86
instruction into simpler instructions, that may not be x86 instructions at
all.
Since P6 continually looks at the next 30 instructions and begins execution of
each as soon as possible, and automatically breaks up complex instructions
when beneficial, you won't have to optimize P6 code.
The P6 self-optimizes all that shrink-wrapped code, no matter what generation
of optimizing compiler was used. Poor Michael Abrash! He'll have nothing to
write about, and the bank will foreclose his mortgage.
The philosophical design differences underlying the 486, Pentium, and P6
generations have nothing whatever to do with computer architecture and
everything to do with chip design. The best chips are designed by persons
familiar with happenings in the mainframe and minicomputer arenas a dozen or
more years back.
Intel's Andrew Grove once publicly asserted that there wasn't any use for a
million-transistor-plus chip except for memory. If he'd known his x86 chip
designers would soon be crafting microprocessors that performed useless
instructions and wouldn't even directly execute x86 code, do you suppose he'd
have fired them?

































































SOFTWARE AND THE LAW


Patents: Best Protection for Software Today?




Marc E. Brown


Marc is a patent attorney and shareholder of the intellectual-property law
firm of Poms, Smith, Lande & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted on CompuServe at 73414,1226.


Patents and software used to be words that were not spoken of together. Today,
that's changing--and fast. Recent court decisions have expanded the
availability of patents for software. Furthermore, the U.S. Patent Office has
just gotten in line by issuing new proposed guidelines for the examination of
"computer-implemented" inventions. The procedures for enforcing software
patents are also being trimmed down to make them less expensive and faster.
At the same time, other forms of legal protection for software are becoming
less attractive. Copyrights provide only limited protection. This is known
only too well to Lotus, which recently was told by an appellate court that the
copyrights on its famous 1-2-3 spreadsheet program do not bar Borland from
copying the entire 1-2-3 menu tree. Trade-secret protection is also often lost
inadvertently, particularly with publicly distributed software.
Unless you are planning to develop your software in a cave during the next
decade, therefore, you need to be able to determine when software can be
patented, how to patent it, and how to deal with patent-infringement
allegations. Let's begin by taking take a look at three tests for
patentability.


Statutory Subject Matter


I usually like to leave out the legal jargon in my column. But if you want to
be knowledgeable in this area, remember the phrase "Statutory subject matter."
This concept addresses whether the "subject matter" of the invention is listed
in the "statute" (35 U.S.C. section 101) that defines the types of inventions
entitled to a patent. It is this requirement that has developed into an
impediment to software patents. 
This statute states that a patent may be obtained on "any new and useful
process, machine, manufacture, or composition of matter, or any new and useful
improvement thereof...." Software clearly falls within at least one of these
broad areas. Nevertheless, the U.S. Supreme Court refused a patent on a
software-driven process for converting a BCD number into pure binary. In its
1972 decision of Gottschalk v. Benson, the Court said that a patent could not
be obtained on "laws of nature, physical phenomena and abstract ideas." 
Again in 1978, the Supreme Court refused to issue a software-related patent,
this time on a method of updating numerical alarm limits using a computer. In
Parker v. Flook, the Court expressed concern that such a patent consisted
merely of a new formula and the computer that implemented it.
But in 1981, the Supreme Court changed tacks. In Diamond v. Diehr, the Court
approved a patent on a process for curing rubber that implemented a well-known
mathematical equation (the Arrhenius equation) in a computer to calculate
optimum cure time. Although the Court reiterated that laws of nature, natural
phenomena, and abstract ideas are not patentable, it said that a patent could
be granted on a practical application of a concept, even if it included a
programmed, digital computer.
At about the same time, Congress created a new court to handle all appeals in
patent cases. It is called the "United States Court of Appeals for the Federal
Circuit." In a series of recent decisions, the Federal Circuit has made clear
that patents may be granted on a broad variety of inventions containing
software.
Perhaps most important is its 1994 decision in In Re Alappat. Alappat involved
a software program that implemented a series of algorithms to clarify the
picture that is displayed on an oscilloscope. In support of its conclusion
that such an invention was "statutory," the Federal Circuit stated that the
invention was "not a disembodied mathematical concept..., but rather a
specific machine to produce a concrete and tangible result." The court noted
that "a general-purpose computer in effect becomes a special purpose computer
once it is programmed to perform particular functions pursuant to instructions
from program software...."
About two weeks later in In Re Warmerdam, the Federal Circuit ruled that a
patent could be issued in connection with an algorithm that created a data
structure that controlled moving objects, such as robots, so that they would
not collide with other objects.
On July 31, 1995, the U.S. Patent and Trademark Office followed suit by
issuing proposed guidelines for "computer-implemented" inventions. "These
guidelines respond to recent changes in the law that govern the patentability
of computer-implemented inventions," the Office said. The proposed guidelines
essentially provided that all software-related inventions constitute
patentable subject matter, except when the invention falls within one of the
following categories:
A compilation or arrangement of data, independent of any physical elements.
A known, machine-readable storage medium encoded with data representing
creative or artistic expression (for example, a work of music, art, or
literature).
A "data structure" independent of any physical element (that is, not as
implemented on a physical component of a computer such as a computer-readable
memory to render that component capable of causing a computer to operate in a
particular manner).
A process that does nothing more than manipulate abstract ideas or concepts
(for example, a process consisting solely of the steps one would follow in
solving a mathematical problem).
Significantly, none of the excluded categories appear to embrace a
general-purpose computer running a software program. Does this mean that all
software can be patented by simply claiming a general-purpose computer running
the new software program? The proposed guidelines appear to address this
question in the following way:
[I]n rare situations, a claim classified as a statutory machine or article of
manufacture may define nonstatutory subject matter. Nonstatutory subject
matter (i.e., abstract ideas, laws of nature, and natural phenomena) does not
become statutory merely through a different form of claim presentation.
Such a claim will (a) define the "invention" not through characteristics of
the machine or article of manufacture claimed but exclusively in terms of a
nonstatutory process that is to be performed on or using that machine or
article of manufacture, and (b) encompass any product in the stated class
(e.g., computer, computer-readable memory) configured in any manner to perform
that process.
To avoid this exclusion, it seems as though the invention must be defined
"through characteristics of the machine or article of manufacture claimed."
But what does this mean? Is this different from the second requirement that
the invention not "encompass any product in the stated class?" I really don't
know! But I do know that the guidelines are intended to conform the practices
of the Patent Office with the more liberal views of the courts. If the
guidelines are interpreted to preclude patents on all general-purpose
computers running new software, that goal seemingly will not be reached.
These guidelines may already have been clarified by the time you read this
column. They were issued in June of this year for "public comment." Final
guidelines were promised for July 31, 1995. Check my next column for an
update.


Novelty


The second requirement for a patent is that the invention be "novel." This is
usually a very easy test to pass. It simply means that the invention is in
some way new. Any distinction whatsoever from what was done before is
sufficient.
An invention's "novelty" is usually lost in the United States if an
application for patent is not filed within one year after certain activity has
begun. (Many foreign countries have no such grace period.) 
Three major types of activity usually start the one-year clock.
When a product embodying the invention is offered for sale by anyone, even
someone other than the inventor. The offer need not result in an actual sale. 
When the invention is described in a "printed publication." There is no
requirement that the publication be widely distributed. Indeed, papers
distributed at conferences are usually sufficient.
When the invention is used for its intended purpose in a nonexperimental
environment. If you invent a superior database system and allow your spouse to
use it to manage the groceries, you had better start the one-year clock. Even
your own use of the invention can start the clock running. The clock will
usually not begin while the primary purpose of the use is to determine whether
the invention works.


Nonobviousness 


The third requirement for obtaining a patent is that the invention not be
"obvious" in view of what has been done before. This is the only qualitative
test that is applied.
"Nonobviousness" does not mean that the software has to be great or achieve a
remarkable result. It merely means that the underlying concepts of the
software are not obvious in view of what was known before.

Most software is simply combinations of previously known routines. In
determining "obviousness," therefore, the real question is whether it would
have been obvious to have combined these routines to make the software.
The determination of "obviousness" is necessarily subjective. But several
objective factors will be considered:
The art taught away from the approach that the software took. 
Widespread efforts to obtain the benefits of the software were previously
unsuccessful.
The software has received widespread recognition.
The software achieves new and unexpected results.
The software has been a commercial success.
Nothing in the prior art suggests combining the routines contained in the
software.


The Patent-Application Process


The first step in patenting your software is to document it by preparing block
diagrams, flowcharts, specifications, data-structure maps, screen layouts, and
the like. While source code is obviously the most precise formulation of the
software, it is usually not perfected until well after the software is
conceived. Also, it does not usually communicate the broad concepts
implemented by the software.
It's wise to corroborate the date on which the documentation was prepared. At
the very least, it should be signed and dated by the developer. It is also
good practice to have people not involved with the development read, date, and
sign the documentation. Above each signature, the document should state that
the witness has read and understood the information, as well as the number of
pages it contains. It is also a good idea to keep a bound invention notebook
and to date each new entry.
A patentability search is usually performed next. Although not required by
law, it is very useful, as it may reveal that the software is not sufficiently
different from previous work to justify the expense of a patent. Even when you
are sure about the distinctiveness of your new software, knowledge of the
closest prior art will help to frame the patent application in the broadest
possible way.
The next step is to prepare the application. A patent application usually
contains drawings, a written description of the invention, and a set of
"claims" (English descriptions of the invention's elements). You are not
required to build a working model of your invention before filing the
application.
Unless you have considerable experience with patents, it is unlikely that you
will be able to prepare the application yourself. Normally, applications are
prepared by a patent attorney or patent agent, both of whom must first pass a
proficiency test. But, you can do certain things to assist in its preparation:
You are legally bound to provide the attorney or agent with copies or a
description of the closest prior art of which you are aware.
Identify the specific differences--steps, components, results--between your
software and the prior art.
Disclose the "best mode" you are aware of for implementing your invention. If
you don't disclose certain features with the thought of keeping them secret
and this is later discovered, your patent will be declared invalid. For
marketed software this is usually easy to prove. To easily satisfy the "best
mode" requirement, provide the Patent Office with a complete copy of your
source code. If this makes you feel uneasy, a patent may not be for you!
Include sufficient information to enable a person of ordinary skill in the art
to which your invention pertains to make and use the invention without undue
experimentation. Submitting the source code will often fulfill this
requirement, too. The more detail, the better. The only downside is the cost
of documenting such detail.
After your application is filed, it will be assigned to a Patent Office
"examiner"--a person knowledgeable in the field of your invention and in the
principles of patent law.
Approximately six months to a year after the application is filed, you will
receive an "office action," a written response to your application from the
Patent Office examiner. The office action will either allow your application
or explain why it is being rejected.
Rejected applications may be amended to try and overcome the grounds of
rejection.Alternatively, you can argue that the rejection is unjustified.
A second rejection is usually final. You will usually have to pay an
additional fee to submit further amendments or arguments. You can also appeal
the examiner's final rejection to the Board of Patent Appeals and, as
thereafter necessary, to the federal courts.


Enforcement


The first step in enforcing a patent is to determine whether the software of
the accused party is infringing.
Those unfamiliar with patent law usually don't make this determination
correctly. Inventors often feel that there is an infringement when the
competing software incorporates features described in the patent. Accused
infringers, on the other hand, often conclude that there is no infringement
because their software does not contain every feature described in the patent.
Neither approach is correct. The drawings and detailed descriptions in a
patent are usually merely examples of the invention, not the invention itself.
Not utilizing every feature in an example does not necessarily avoid
infringement. Conversely, using a few of the features does not necessarily
imply infringement.
The true test of infringement is whether the software in question contains
every feature documented in any single claim at the end of the patent. This is
the rule: A patent is infringed when the software in question contains every
element recited in any single patent claim.
When determining the scope of each element, the words describing it should be
given their ordinary meaning in the art, except when a contrary meaning is
expressed in the patent. The words should also be given their broadest
reasonable meaning, not restricted to the specific examples described in the
patent.
In three circumstances, an infringement will be found, even if the infringer's
software does not contain all of the claim's elements.
The software contains the equivalent of each missing element, usually a
corresponding element that performs substantially the same function in
substantially the same way, to achieve substantially the same result. The
precise reach of this "doctrine of equivalents" is expected to be the subject
of a decision by the Federal Circuit in In Re Hilton.
The missing elements are found in the computer system in which the infringer's
software is installed. If the software has no substantial use other than in a
system that infringes the patent claim, the person making or selling the
software will usually be liable as a "contributory infringer."
The accused infringer did not actually commit the infringement, but encouraged
the person who did by distributing promotional material or product manuals
that promote the software as useful in a configuration that infringes the
patent claim. This is known as "inducing infringement." An officer or employee
of a company who actively participates in infringing activity of that company
can also be held personally liable under this theory.
Charges of infringement should be made carefully, as they give the alleged
infringer the right to sue the person charging infringement for a "declaratory
judgment" that the patent is not infringed or is not enforceable. Unless
defended at typically great expense, the patent could be lost.
It is particularly risky to charge customers of a manufacturer with
infringement. If the claim turns out to be unmeritorious, the person charging
infringement can be exposed to counterclaims for libel, slander,
disparagement, interference with contract, and violation of the antitrust
laws.
Responding to a charge of patent infringement requires even greater care. All
too often, the accused infringer denies the infringement allegation without
having the allegation analyzed by a competent patent attorney. Be warned: A
company that continues infringing activity without having first received a
favorable legal opinion will often be assessed treble damages and attorney's
fees if it loses the case.
Patent litigation has traditionally been very expensive, but this may also be
changing. The Federal Circuit just held in Markman v. Westview Investments
that disputes over the scope of a patent should be determined by a judge, not
a jury. Thus, many cases will now be resolved far short of an expensive jury
trial.
The right to a jury trial in patent cases is also now being questioned. In
American Airlines v. Lockwood, the Supreme Court has agreed to decide whether
the U.S. Constitution gives an alleged infringer a right to a jury trial.
Abolishing jury trials in patent cases entirely would result in additional
savings. 
In many cases, the alleged infringer is aware of prior art that is closer to
the invention than that which the Patent Office knew about when it was issued.
He may then challenge the validity of the patent, arguing that the invention
was "obvious." Although this can be done in court, it also can often be done
in a separate Petition for Reexamination in the Patent Office. Seeking
reexamination of the patent in the Patent Office is far less expensive than
court litigation. Unfortunately, the alleged infringer is usually not
permitted to participate during reexamination. Therefore, alleged infringers
who have the financial resources often opt to have their invalidity allegation
determined by a court.


Conclusion


Considerable dispute continues over the type of software entitled to a patent.
While some are arguing, others are applying for and receiving software
patents.
Don't be left behind! And remember, software can be simultaneously protected
by a patent and a copyright. Indeed, until the patent is granted (typically,
not until at least a year after the application is filed), the software can
also be protected as a trade secret.






































































PATTERNS & SOFTWARE DESIGN


Observations on Observer




Richard Helm


Richard is a consultant with DMR Group, an international
information-technology consulting firm. He can be reached at
Richard.Helm@dmr.ca. Erich is a software engineer with Taligent Inc. He can be
reached at Erich_ Gamma@Taligent.com. Erich and Richard are coauthors of the
award-winning book Design Patterns: Elements of Reusable Object-Oriented
Software (Addison-Wesley, 1994).


Partitioning a system into objects is a key activity during object-oriented
design. As a result of this partitioning, we may create objects that depend on
other objects. Changes in one object must be reflected into others. There are
many different ways to ensure that these dependencies are maintained.
For example, consider the timer object in Figure 1, which keeps the current
time, and a digital-display object that shows the current time. Whenever the
timer ticks, this time-display object has to be updated. In other words, the
time-display object has to maintain the constraint to always reflect the
timer's current time. A simple solution is to connect the timer object
directly to the time-display object. Whenever the timer changes, it explicitly
tells the display object to update itself. Figure 2 shows the corresponding
class diagram for directly coupling the timer with its observer. Listing One
is one way to implement this in C++. While this direct coupling of two objects
is simple to implement, it can also introduce problems in different areas:
Reusability. It is not possible to reuse the timer independently of the time
display. The two objects are strongly coupled and must always be used
together, even when the client is only interested in the timer. 
Maintainability. The direct coupling makes maintenance more difficult. It is
not possible to test or port the timer to a different platform independently
of the time display.
 Extensibility. Whenever you want to add another kind of timer display (say,
an analog display) that needs to be synchronized with the timer, you have to
modify the Timer class to also update this new kind of time display.
These problems are clearly not serious in this simple example. However, in the
context of a larger system we have to be more alert when introducing
dependencies among objects. In fact, it is a key design activity to control
and manage the dependencies between objects. Badly managed dependencies can
result in a tangled system that is hard to reuse, maintain, or extend. A
common theme of several patterns in our book Design Patterns: Elements of
Reusable Object-Oriented Software is how to break hard dependencies by
decoupling the involved objects. The Observer pattern is one of them.
The intent of the Observer pattern is to define dependency relationships
between objects so that when one changes, its dependents are notified and can
update themselves accordingly. The Observer pattern enables objects to observe
and stay synchronized with another object without coupling the observed object
with its observers. The pattern has two participants: 1. a subject, and 2. the
subject's dependent observers. Each time the subject changes, it is
responsible for notifying its observers that it changed. Observers must ensure
that whenever they are notified, they in turn make themselves consistent with
their subject. A subject needs an interface that allows observers to subscribe
and register their interest in changes to the subject. 
The subject usually maintains a list of subscribed observers. Figure 3
illustrates these class relationships in OMT notation. Notice that we
introduced two new base classes. The Subject class defines the mechanism for
registering and notifying observers and the Observer class defines the update
interface. This diagram illustrates how the Observer pattern breaks the direct
coupling between Subject and Observer. The Subject knows nothing about its
Observers except that they can be sent Update requests. This is because the
reference from Subject points to the abstract class Observer. For this reason,
we refer to this kind of coupling as "abstract." The abstract coupling between
Subject and Observer resolves the problems mentioned previously:
Reusability. The timer object can now be reused and distributed without the
time-display object. It only has to be bundled with the abstract Observer
class. Figure 4 illustrates the resulting class coupling.
Maintainability. The objects are no longer directly coupled, and the timer
object can be tested independently of the time display. For example, when you
port the Timer class to another platform, you can test it as soon as you've
ported it and Observer. You no longer have to wait for the testing until the
time display and its associated graphical infrastructure are ported as well.
Extensibility. It is easy to add additional objects that need to be
synchronized with the timer. For example, an analog time display only needs to
inherit from Observer, implement the Update interface, and register itself
with the timer.
Figure 4 shows that the concrete observer knows the class of the subject it is
observing, and it can rely on this interface to query the subject's current
state. 
The Subject and Observer classes (Listings Two and Three, respectively)
illustrate how you could implement Observer in the context of the timer
example. The key point about Timer is that its Tick member function calls
Notify, which will call update on all its Observers.
Listing Four presents the classes Observer and DigitalTimeDisplay.
DigitalTimeDisplay maintains a reference to the timer. Whenever the Timer
ticks, it calls Notify, which in turn calls Update on its attached Observers.
In this case, DigitalTimeDisplay receives the Update request, reads the time
from the timer, and displays the time.
Notice how the Timer has no knowledge of how it is displayed. In fact, you
could add another timer, say an AnalogTimeDisplay, and it would also be
updated whenever the Timer ticked.


Absorbing the Subject and Observer Classes 


One possible simplification of the canonical observer-class structure is to
absorb the Subject and Observer classes into existing classes. For example,
the Microsoft Foundation Classes (MFC) use this kind of simplification. MFC
supports multiple views observing a document (see "Adding Auxiliary Views for
Windows Apps," by Robert Rosenberg, Dr. Dobb's Sourcebook of Windows
Programming, March/April 1995). In MFC, the subject functionality is absorbed
into Document, and Observer is absorbed into the View class. This solution is
simpler since it requires fewer classes, but the dependency relationships can
only be defined between instances of Document and Views. 


Inheritance Variations


There are also several variations in how inheritance is used to implement the
Observer pattern. As Figure 2 shows, the interfaces for notifying and
observing are defined by two classes. The Observer base class defines an
interface consisting of an Update operation. In a language supporting multiple
inheritance, Observer is often not a primary base class (that is, it is mixed
in as an auxiliary base class). For example, in the timer example,
AnalogTimeDisplay might need to inherit from a graphical base class like View.
In this case, AnalogTimeDisplay mixes in the Observer class as an auxiliary
class. Another variation is not to separate the notifying and observing
interfaces into two separate classes. For example, in the Smalltalk-80
implementation of Observer, these two interfaces are supported by the
universal Object class. Thus, each object in the system can act as both
subject and observer. This is particularly convenient in a language that does
not support multiple inheritance or in class libraries that don't want to rely
on it. Using separate classes for subject and observer would require you to
inherit from both subject and observer when an object needs to act as both.
In Figure 3, the Timer subclass inherits the Subject interface without any
overriding. This is not always the case. For example, it is possible that a
Subject subclass wants to customize how observers are maintained. In
Smalltalk-80 the Subject base class (Object) implements the subject interface
in a space-efficient way. Instead of storing the list of observers in each
subject, the subject/observer mapping is maintained in a central dictionary.
Only subjects that actually have observers are stored in the dictionary and
have to pay for the subject service. However, this approach trades space for
time: Accessing a subject's observers requires a dictionary look-up. For
subjects that often notify observers, eliminate this inefficiency by storing
the observers directly in an instance variable. The subject interface can then
be implemented by accessing this list directly. In Smalltalk-80, this kind of
Subject implementation is provided by the Object subclass Model. Consequently,
the client has the choice between subject implementations with different
trade-offs by inheriting from either Object or Model. As an aside, the Subject
interface is an example of so-called "coupled overrides." If you override one
of the subject operations, you should also override the others. 


Push versus Pull Update Protocols


In the timer example, the Timer makes no assumptions about what objects are
observing it. Instead it relies on the various timer displays querying it to
retrieve the current time. The observers "pull" the state of the subject to
them. An alternative is for the timer to send, or "push," the time to its
observers whenever it updates them. Pushing the time requires extending the
interface of Observers to accept the time in seconds. To do this, you replace
the Observer class with a TimerObserver class; see Listing Six. The
TimerSubject class would now have to maintain a List of TimerObservers and its
Notify function would look like Listing Seven. The observers are now more
tightly coupled to the timer, but they no longer need to query the timer for
the time. It is still possible to have arbitrary Timer observers by
subclassing from TimerObserver. However, TimerSubject and TimerObservers can
no longer be used to maintain general dependency relationships.
The decision to use push or pull update protocols depends on many trade-offs:
the amount of data being pushed and the expense of pushing it, the difficulty
of determining what changed in the subject, the cost of notification and
subsequent updates (whether subjects and observers are in the same address
space), and dependencies introduced by observers being dependent on the pushed
data. 
The push model is more appropriate when editing text. Consider an
implementation where a TextSubject stores the textual data and a TextView
acting as its observer presents the text in window. When the user changes the
text by entering a character, the pull model requires that the TextView
completely reformat the text and refresh the window or that it somehow can
determine which range of characters really changed. Both of these operations
can be quite time consuming.
A more satisfactory approach is for the TextSubject to provide a "hint" of its
changed text. The TextView uses this hint to update itself more efficiently.
Hints can be simple, enumerated constants that provide general indications of
what changed in the Subject, or more sophisticated, specific information to
aid the TextView. TextView is interested in how the TextSubject
changed--whether characters were added or removed, and where. 
The hint can package information about the actual changes ("deleted range
12-27") and push it to the observers. The hint essentially sends the deltas
that have occurred in the subject. In practice, not all observers will be
interested in every hint; they may ignore some and act as if they had received
a simple update request.
A hint can be extended with additional information by making it a first-class
object. This enables subjects to bundle the additional information by
subclassing from a Hint base class. At the receiving end, the observer
downcasts the hint to the desired type and extracts the additional
information. This downcast should of course not be done in a "hard" way, and
it should by guarded by using the C++ run-time type identification facilities
(dynamic_cast). 


Who Sends Out Notifications



When notifications are sent, the subject must be in a consistent state. If it
is not, strange results may occur in the observers as they try to update
themselves from a nonsensical subject. 
Which object has the responsibility to actually send the notification is also
important. In our example, it is the Timer object that sends notifications in
its Tick operation. This works fine as long as the subject is simple. 
In Listing Eight, the Tick operation is overridden in a special kind of timer
that allows you to set alarms. See the problem? When subjects have the
responsibility to send notifications, overriding operations in the subject may
cause spurious and inconsistent notifications. By overriding Tick, the first
notification (sent from Timer:: Tick) is sent while the AlarmedTimer is in an
inconsistent state (the _alarm variable should be set at that time but isn't
until the second notification). Some observers could set off alarms by testing
the result of AlarmSet and some could do so by testing the equality of
AlarmTime and CurrentTime. 
There are simple fixes for this problem. But for more complex subjects with
derived classes, overriding operations that send notifications in the subject
could make the subject inconsistent or cause duplicate notifications to be
sent.
One solution to sending notifications to the subject is making clients change
the subject to initiate the notifications. Whenever the client makes a change,
it must call Notify on the subject. This solution is practical, but places
extra burden on the clients. It is easy to forget to call Notify on the
subject. Another solution is to define Tick as a template method (see the
Template Method pattern from our book) that first just calls the operation
DoTick and Notify. The Timer class defines DoTick to increment the current
time. Subclasses can override this operation to provide their own extensions
to the Tick operation. 


Subscribing to Specific Aspects of Observers


When a subject has complex internal state, observers may spend much effort to
determine exactly what changed in the subject. Along with using hints, you can
reduce this burden by relying on intrinsic properties of the subject itself.
Complex subjects may only change their state in predefined ways. Changes in
part of a subject's state may be independent of changes in other parts. 
Such properties can be exploited by having the subject define independent
aspects and having observers only subscribe to the aspects they are interested
in.
In the Timer example, suppose that the Timer class were implemented with three
distinct counters that maintained the time in seconds, minutes, and hours. Now
the various timer displays will usually be defined in terms of hours, minutes,
and seconds. Clearly, not all of these need to be updated each second. In
fact, the hour, minute, and second counters change almost independently of
each other. You can exploit this by defining aspects that represent changes in
hours, minutes, and seconds and defining our displays as consisting of three
independent parts, each subscribing to a particular part of the Timer.
In this example, assume that the aspects are simply defined as integer
constants that are passed as a parameter to Notify; see Listing Nine. The
class Timer makes its aspects available to the clients as class-scoped
constants. In Listing Ten, for example, the changed aspect is passed as a hint
to the Observer's Update operation in Listing Eleven. If there are many
different aspects, the update operation becomes a lengthy conditional
statement that maps an aspect to a piece of code. Such conditional code is not
very elegant. There are different techniques to avoid this kind of
manual-dispatching code. One technique is demonstrated in VisualWorks
Smalltalk, wherein the dispatching problem is solved with a
DependencyTransformer object that implements the Observer interface. It knows
which aspect it is interested in and keeps track of the actual receiver of the
notification and the operation to be executed by the receiver when the aspect
changes. Figure 5 shows a possible class structure for a
DependencyTransformer.
When a DependencyTransformer receives the update, it checks the aspect. If the
aspect matches, DependencyTransformer invokes the operation on the Receiver.
DependencyTransformers are created by the subject when an observer expresses
its interest in a changed aspect. This requires a way to specify the operation
to be called. In Smalltalk, the operation's selector name is specified;
#updateSeconds, for example. 
DependencyTransformers act as an intermediary between the subject and its
dependent object. They map the Observer interface to an operation of the
dependent object. A DependencyTransformer is therefore an example of the
Adapter pattern.


Making an Arbitrary Class a Subject


Sometimes classes are not designed to be subjects, but later, you realize that
instances of these classes might have dependent objects. How do you make such
classes into subjects? You could change the class by mixing in the Subject
interface, but this is not always possible. The class you wish to make a
subject may not be modifiable--it may reside in a class library over which you
have no control.
An elegant way to allow arbitrary classes to become subjects is to wrap the
object in another object that adds the Subject behaviors and interfaces. This
decorator object (this is an example of the Decorator pattern) intercepts and
forwards all requests to the wrapped object, and notifies clients after
operations which are likely to change the wrapped object.
Suppose the class Timer was in fact not designed to be a subject and was
defined as in Listing Twelve. You could make Timer a subject by defining a
TimerDecorator class as in Listing Thirteen. The TimerDecorator has the same
interface as the Timer, so it looks like a timer to clients. Every request
made of the TimerDecorator is forwarded on to the timer, and then the timer
calls Notify on itself to update its observers; see Listing Fourteen.
Making objects be subjects by using decorators is only practical when the
decorated object's interface is relatively small, because you have to
duplicate the subject's interface in the decorator. If the subject's interface
is large, this approach can become unwieldy. 


Conclusion


In one form or another, the Observer pattern occurs in many object-oriented
systems. While most commonly used for decoupling user interfaces from data to
be displayed on the user interface, often it is used to manage dependencies
between objects. The Observer pattern has many more possible variations than
the few we've examined. For example, we did not look at batching
notifications, concurrency and distribution, or observing more than one
subject.
Finally, a description of the Observer pattern would not be complete without
mentioning its origin in the Smalltalk's Model-View-Controller (MVC)
framework. In this design, the Model encapsulates application data. The View
presents the model to the user. The controller is responsible for handling
user input. From a dependency-management view, MVC provides the idea of
decoupling the application data from the user interface. The benefit of this
decoupling is that the application data can be presented by different user
interfaces. In MVC terminology, the timer object becomes the "model" and the
time display becomes a "view." If we supported manipulation of the time
display by the user, then this behavior would be assigned to the Controller.
Figure 1: Simple timer object.
Figure 2: Class diagram for directly coupling the timer with its observer.
Figure 3: Class relationships in OMT notation.
Figure 4: Resulting class coupling. 
Figure 5: Possible class structure for a DependencyTransformer.

Listing One
class DigitalTimeDisplay;
class Timer {
public:
 Timer(DigitalTimeDisplay*);
 long CurrentTime() const;
 void Tick();
private:
 DigitalTimeDisplay* _display;
 long _currentTime;
};
class DigitalTimeDisplay {
public:
 DigitalTimeDisplay();
 void DisplayTime();
 void UpdateTime(long time);
};
void Timer::Tick()
{
 _currentTime++;

 _display->UpdateTime(_currentTime);
}

Listing Two
class Subject {
public:
 void Attach(Observer*);
 void Detach(Observer*);
 void Notify();
protected:
 Subject();
private:
 List<Observer*> *_observers;
};
void Subject::Notify () {
 ListIterator<Observer*> i(_observers);
 for (i.First(); !i.IsDone(); i.Next() ) {
 i.CurrentItem()->Update();
 }
}

Listing Three
class Timer : public Subject {
public:
 Timer();
 virtual void Tick();
 long CurrentTime()
const;
private:
 long _currentTime;
}
void Timer::Tick() 
{
 _currentTime++;
 Notify();
}

Listing Four
class Observer {
public:
 virtual void Update() = 0;
protected:
 Observer();
};
class DigitalTimeDisplay : public Observer {
public:
 DigitalTimeDisplay(Timer*);
 virtual void Update();
void DisplayTime(long time);
private:
 Timer* _timer;
};
DigitalTimeDisplay::DigitalTimeDisplay(Timer* t) : _timer(t) 
{
}
void DigitalTimeDisplay::Update() 
{
 DisplayTime( _timer->CurrentTime() );
}


Listing Five
class AnalogTimeDisplay : public Observer {
public:
 AnalogTimeDisplay(Timer*);
 virtual void Update();
 void DisplayTime(long time);
private:
 Timer* _timer;
};
AnalogTimeDisplay::AnalogTimeDisplay(Timer* t) : _timer(t) 
{
}
void AnalogTimeDisplay::Update() 
{
 DisplayTime( _timer->CurrentTime() );
}

Listing Six
class TimerObserver {
public:
 virtual void Update(long) = 0;
protected:
 TimerObserver();
};

Listing Seven
void TimerSubject::Notify (long time)
{
 ListIterator<TimerObserver*> i(_observers);
 for (i.First(); !i.IsDone(); i.Next() ) {
 i.CurrentItem()->Update(time);
 }
}

Listing Eight
class AlarmedTimer : public Timer {
public:
 AlarmedTimer();
 virtual void Tick();
 long AlarmTime();
 bool AlarmSet();
private:
 long _alarmTime;
 bool _alarm;
};
AlarmedTimer::AlarmedTimer() 
 : _alarmTime(0), _alarm(false) 
{
}
void AlarmedTimer::Tick() 
{
 Timer::Tick();
 if ( CurrentTime() == _alarmTime ) {
 _alarm = true;
 } else {
 _alarm = false;
 }
}


Listing Nine
class Subject {
 //...
 void Notify(int aspect);
 //...
};

Listing Ten
class Timer: public Subject {
public:
 //...
 static const int ASPECT_SECONDS;
 static const int ASPECT_MINUTES;
 static const int ASPECT_HOURS;
 //...
 int Seconds() const;
 int Minutes() const;
 int Hours() const;
private:
 int _seconds;
 int _minutes;
 int _hours;
};
void Timer::Tick() 
{
 _seconds = ++_seconds % 60;
 Notify(ASPECT_SECONDS);
 if ( _seconds == 0 ) {
 _minutes = ++_minutes % 60;
 Notify(ASPECT_MINUTES);
 }
 if ( _seconds ==0 && _minutes == 0 ) {
 _hours = ++_hours % 24;
 Notify(ASPECT_HOURS);
 }
}

Listing Eleven
class AnalogTimeDisplay : public Observer {
public:
 AnalogTimeDisplay(Timer*);
 virtual void Update(int aspect);
 void DisplayTime(long time);
private:
 Timer* _timer;
};
void AnalogTimeDisplay::Update(int aspect) 
{
 if (aspect == Timer::ASPECT_SECONDS)
 // update second hand ...
 else if (aspect == Timer::ASPECT_MINUTES)
 // update minute hand ...
 else if (aspect == Timer::ASPECT_HOURS)
 // update hour hand ...
 else
 // full update
}


Listing Twelve
class Timer {
public:
 virtual void Tick();
 long CurrentTime() const; 
private:
 long _currentTime;
};
void Timer::Tick() 
{
 _currentTime++;
}

Listing Thirteen
class TimerDecorator : public Timer, public Subject {
public:
 TimerDecorator(Timer*);
 virtual void Tick();
private:
 Timer* _timer;
};

Listing Fourteen
void TimerDecorator::Tick () 
{
 _timer->Tick();
 Notify();
}
End Listings


































EDITORIAL


The Domain-Name Game


It's getting so that you can run, but you can't hide--and I'm not just talking
about Windows 95 advertisements, either. From car dealers
(http://www.dealernet.com) to chocolate factories (http://www.godiva.com),
World Wide Web home pages are popping up like virtual bread in video toasters.
According to some estimates, there are upwards of four million home pages on
the Web, among them my local library, corner video store, and morning
newspaper.
Granted, most of these home pages aren't running on dedicated Internet domain
sites--there are currently only 110,000 or so registered Internet domain
names. But with another 20,000 being registered every month, there's every
likelihood that the number of home pages will increase proportionally. 
For all practical purposes, however, many of these domain names are bogus in
that they are "just in case" registrations that companies or individuals may
want to use some time in the future. Procter & Gamble, for instance, recently
registered a batch of about 50 domain names, including those related to
specific P&G products (ivory.com, for its Ivory soap) as well as generic terms
(diarrhea.com, for...well, you get the idea). P&G hopes to use Internet
technology to market its products by directly reaching consumers, bypassing
traditional marketing channels such as television, newspapers, and magazines.
To that end, P&G has allocated about $50 million for developing interactive
media. (This is on top of a $120 million partnership P&G previously forged
with Paramount Television to generate network and first-run syndicated TV
shows.) Still, P&G has yet to actually go online with an Internet domain or
home page.
A recent move by the Federal government may put the brakes to rampant
domain-name registration, at least temporarily. Network Solutions, the
official registrar for the Internet, will begin levying a $50.00 registration
fee for commercial (.com) and non-profit (.org) organizations. Existing
clients will be hit with the $50.00 on the anniversary of their registration.
Of course, fifty bucks isn't even a drop in the bucket to companies that have
set aside millions of dollars for Internet development. However, it will make
a difference to individuals building personal sites--especially if the
registration fee increases in the coming years (and you can bet it will).
On the plus side, there's a benefit to slowing down the propagation of
Internet domain sites. The technological view of infinitely extending the
Internet by simply adding more nodes is unquestionably valid. Realistically,
however, this is a problem because the availability of valid domain names is
limited (there can only be one "bbb.com," as Marc Brown points out later in
this issue). Of greater concern is the finite availability of unique IP
addresses. When we run out of valid numeric addresses, that's it--unless new
technologies, such as IPng, come along in time. 
Like it or not, part of Network Solutions' role is to establish some order in
the wonderful chaos of the Internet. In doing so, maybe the organization
should closely examine instances of inactive domain names and IP addresses
(for all I know, the company is already doing this). In most states, for
instance, banks have to relinquish to the government accounts that have been
inactive for a period of time. Applying this model to the Internet, if Procter
& Gamble doesn't actually implement a domain named "cough.com" (yep, it's
registered) within, say, two years, then maybe the name should go back into
the available hopper.
In the meantime, Internet growth will continue at breakneck speed, and we'll
be working up a sweat just keeping pace with the introduction of new home
pages. Luckily, you'll be able to find relief at http://www.deodorant.com,
thanks again to P&G.
Jonathan Erickson
editor-in-chief














































Programming HotJava Applets


Executable content becomes a reality




John Rodley


John is an independent consultant. You can reach him at
john.rodley@channel1.com or visit his home page at
http://www.channel1.com/users/ajrodley.


From HTML 2.0 to Netscape extensions to VRML, a slew of new technologies have
popped up that promise to flesh out a Web that is still more flash than cash.
Among these new developments, none is more eagerly anticipated than executable
content--Web content that actually executes on the local computer. Java, a
programming language from Sun Microsystems, makes executable content a
reality. 
HTML is essentially a "flat" technology--static text with hyperlinks to other
lumps of static text. Current Web browsers take a stream of static text and
display it on screen. The only logic embedded in HTML text consists of text
formatting and image/sound file-loading commands. The combination of the
HotJava browser and the Java programming language changes all of this. 


HotJava is Not Java


A Java program (myprogram.java, for example) is compiled into bytecodes that
are interpreted at run time by the Java interpreter. HotJava, a Web browser
written in the Java language, supports <APP>, a new HTML tag that allows you
to load an applet located at an arbitrary URL and run it locally. With
appropriate limitations, this applet has broad access to the resources of the
local machine--screen (via the browser window), mouse, keyboard, sound, and
network cards. Applets are written in Java and compiled into Java bytecodes.
You could write a stand-alone application in Java without involving the Web or
HotJava in any way. In short, as Tim Lindholm of Sun said, HotJava is just a
novel way of delivering Java applications. For more information on Java, see
"Java and Internet Programming," by Arthur van Hoff (DDJ, August 1995); "Net
Gets a Java Buzz," by Ray Valds (Dr. Dobb's Developer Update, August 1995);
and "Programming Paradigms," by Michael Swaine (DDJ, October 1995).
Java's raison d'etre is architecture neutrality. The language itself contains
no platform dependencies. All types have a fixed size (8/16/32/64 bytes) that
may or may not correspond to the norm on whatever platform you're running. But
you pay a price for that neutrality. Java and its packages attempt to supply
all the mechanisms of the native GUI APIs for systems such as X, Macintosh OS,
and Windows through a common syntax. That search for common ground sometimes
means throwing out features (the middle and right mouse buttons, for example)
not supported on all platforms. Thus, the key to Java's eventual usefulness
for developers is not so much its level of functionality (which will always
lag behind that of native GUIs), but how many platforms it runs on. 


The Application


Java makes it possible to quickly write big, portable, robust, graphical,
network apps. To illustrate what you can do with Java, I've developed a
WAN-based, multiuser game called "Battle of the Java Sea," a variant of the
old board game "Battleship." Players on a 40x40 grid fire at each other,
scoring points for hits on other players and losing points for being hit.
Battle of the Java Sea contains three parts: 
The HTML source, which merely provides entry to the applet via the <APP ...>
tag. 
The Java applet, which provides the interface to the game and most of its
logic. 
The game-server daemon, which is a C++ program that runs on a remote host. 
Figure 1 shows where the various pieces of the application run. Listing One
shows the HTML code for the page on which the applet appears. The applet is
placed in-line where the <APP ...> tag appears. The applet sizes itself within
the init method, and all the HTML code following the applet appears below the
applet. If the applet fails to load, a placeholder appears where the applet
would have been. The HotJava frame, title bar, menu bar, URL edit field,
vertical scroll bar, and navigation buttons remain on the screen. The applet
scrolls seamlessly with the HTML text. You can place as many applets as you
want on the page. Our HTML page has the game applet and another applet for
displaying the high scores.


Code Organization


To provide a manageable namespace, Java lumps classes into packages. In the
Java source, you can import a package or a single class within a package. The
compiler identifies classes as mypackage.myclass. My application defines a new
package, Ship, and four classes within it: GameSrv, Ship, Explosion, and
PortThread. There is still considerable debate as to the proper use of package
and class names, given the distributed nature of the Java/HotJava developer
community. The most sensible suggestion I've seen is to incorporate the
company and project name into the package name, leaving classes unique on a
per-project basis.
For the most part, my program uses only three of the packages delivered with
HotJava: awt, browser, and net. awt encompasses all the graphics and drawing
functionality, browser contains the applet wrapper class that our applet
subclasses, and net implements the socket class that provides our connection
to the game server.


Taking Out the Garbage


C/C++ programmers used to chasing down and trying to prevent memory leaks will
enjoy Java's automatic garbage collection. In this program, you'll see plenty
of news, but no deletes. There's no need to call the garbage collector in your
program, because it runs automatically in a separate thread.


Arrays


Java requires a novel syntax when declaring arrays. Listing Two shows the
declaration of the array of Explosions. The class variable xp is declared as
an array of Explosion objects with no dimension. xp exists as a symbol with a
type, but its actual Explosion objects are not instantiated until the new
statement in the body of the init method. No memory is allocated for an array
until it is created with new. In the Java hierarchy, arrays are objects, not
simple types, and thus embody more intelligence than C/C++ arrays. All array
references are bounds checked, and the length variable gives the size of the
array.
Arrays bring up another key feature of Java: Exceptions. The Java language
package (java.lang) defines a few dozen Exceptions which have default
behaviors and can be caught and thrown as necessary. Bad array references
throw an ArrayIndexOutOfBoundException, and bad arguments to the Integer
constructor throw a NumberFormatException. You'd do well to understand
Exceptions before proceeding with Java coding. I went to considerable trouble
in the message-parsing routines (PortThread.java) to avoid throwing a
NumberFormatException. A better approach would have been to catch the
Exception and deal with the problem then.


Interfaces



In Java, interfaces allow you to define a type of object without subclassing.
Listing Three shows the PortThread class, an implementation of the Runnable
interface. Unlike a subclass, which attaches all the baggage of the superclass
(class variables, un-overridden methods, and so on), an interface only
requires the implementation to supply the methods specified by the interface.
Unlike classes, interfaces can be multiply inherited. Often the interface
system is a much more natural solution. For instance, for debugging purposes,
I'd like to read a stream of game-server messages from a file, rather than
opening a network socket. In that case, I'd define two new classes, FileStream
and SocketStream, and one new interface, MyStream, which would define four
methods: open, read, write, and close. FileStream and SocketStream would
implement both Runnable (to get their own thread) and MyStream.
The application starts with the GameSrv class which subclasses Applet and
implements the Runnable interface. Subclassing Applet allows GameSrv to be
loaded as an applet by HotJava. GameSrv's implementation of the Runnable
interface's run method and Thread classes' start and stop methods allow
GameSrv to run in its own thread. The thread is actually created by allocating
a Thread object and passing this as the sole parameter. Then, we can do
whatever we want in the run method, in this case repainting the applet window
at 100-ms intervals to reflect the changing game state.
GameSrv overrides four Applet methods: mouseUp, mouseMove, keyDown, and
update. Of these, the most interesting is update (along with repaint),
familiar to Windows coders as the WM_PAINT case in your message-processing
switch. As with any GUI app, most of the detail work goes into the paint
routine. Applets can override either paint or update to do their painting. If
you choose paint, the window is cleared before paint gets called. That was no
good for my application, as it gave the window an annoying, flickering
appearance. update, on the other hand, leaves all window management up to the
programmer, so each time a ship is moved, we have to erase the old ship, and
each time a status string is changed, we have to erase the old one. Listing
Four shows the update method and one of the paint methods it calls.
The thorniest problem in implementing the update method was a by-product of
Java's inherent multithreadedness. In Windows 3.1 SDK programming, you can
process your WM_PAINT message without worrying that globals or statics outside
the paint routine will change unexpectedly. Not so in a multithreaded
environment. 
The paintShip method needs to erase the old ship and draw the new ship. This
requires three steps: erasing the current ship, painting the new ship, and
saving the new ship's coordinates as the current ship's. Listing Five shows
the original update method. The keyDown method changes the ship's coordinates
by calling Move. The bug in this is that during the X method invocations
between clearRect, which erases the current ship, and the call to
Ship.setLastLoc, the keyDown method can be invoked, setting LastXLoc and
LastYLoc to values other than those at which that ship is currently painted.
This occurs because update and keyDown can be called from separate threads.
Figure 2 illustrates the problem.
Veteran painters will also notice that update contains an important cheat--it
doesn't repaint the background. Were we to use an image background or a color
other than the default browser background, we'd have to clear all the
background sections of the window. As it is, we get a good visual effect for
very little code.
The Applet class provides start and stop methods that are called whenever the
applet becomes visible or invisible. Though this version of the game doesn't
use them, future versions will skip repainting and stop all network
communications when the applet is invisible, to minimize CPU load and network
traffic.
The PortThread class, which reads input from the game server, uses the run
method a little differently. Given an IP address and port number, the run
method creates an instance of a socket and sits in a loop doing blocking reads
of the socket's inputStream member. PortThread also illustrates that a Java
application doesn't exit until all nondaemon threads are killed. The
PortThread thread calls setDaemon so that the app can exit without explicitly
killing the PortThread. Listing Two shows the PortThread use of the Runnable
interface.
The fixed-message-size, fixed-field-length message protocol I started with was
surprisingly painful to implement, mostly due to the difficulty of creating
the necessary Java object from an array of bytes. I chose that message style
because I've implemented it a hundred times in C. Having done it once in Java,
I'll never do it again. There are no pointers in Java and you can't cast
between different types, including char (16-bit Unicode) and byte (8 bit).
Whereas in C I'd have written a couple of memmoves (with appropriate casts),
in Java I had to create various objects from copied sections of the array. The
next version will undoubtedly go with a variable-message-length,
variable-field-length, character-delimited protocol. This will complicate the
socket-reading routine a little, but will allow me to use the supplied
StringTokenizer class to parse the message. 
For someone who's built GUI apps with Java before, one of the greatest
temptations is to just go wild--creating windows left and right, changing the
menu bar, and so on--but you're often limited by the visibility (or lack
thereof) of variables. For instance, to change the menu bar, you need access
to mbar in browser .hotjava. Access to such components is an architectural
issue within HotJava that is not really settled yet.


Debugging


Since there's no Java debugger yet, debugging Java code is problematic--you're
left with the tried and true "print to standard out" style. Java encapsulates
some of the common system functions in a System object, so a module under
development will be sprinkled with System.out.println() calls.


Conclusion


In general, the Java-language documentation is very good. The class
documentation, on the other hand, is very frustrating. I often followed a
package-class-method documentation trail that left me at a page that contained
only the method's name. You really have to follow the mailing lists (including
the archives and the soon-to-be-created HotJava newsgroup) to get the most out
of the Java/HotJava class packages. That said, for anyone who's spent time
writing C++ code for the current crop of GUIs, awt's classes will seem like a
very intuitive and straightforward abstraction of the native GUI capabilities.
I built Ship using HotJava Alpha 2 running under Windows NT 3.51. While Java
itself is relatively stable and well mannered, HotJava and the awt package are
still moving targets. There is already an Alpha 3 running under Sun Solaris,
and ports to Windows 95 and the Macintosh should be available as you read
this. If you have HotJava, you can run Battle of the Java Sea by checking into
http://www.channel1.com/users/ajrodley.
Special thanks to Dennis Foley for his help with this article.
Figure 1:The various pieces of a Java application and where they run.
Figure 2: Calling update and keyDown from separate threads could cause an
error.

Listing One 
<!doctype html public "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<TITLE>Battleship</TITLE>
<META NAME="GENERATOR" CONTENT="Internet Assistant for Word 1.0Z">
<META NAME="AUTHOR" CONTENT="John Rodley">
</HEAD>
<BODY>
<H1>Battleship</H1>
<P>
Naval free-for-all. You're the green square, everyone else is
red. Place the mouse over your opponent and hit &lt;space&gt;
to fire.<HR>
<P>
<APP class="GameSrv"> <HR>
<ADDRESS>
<APP class = "Scores">
</ADDRESS>
<HR>
<P>
Click <A HREF="GameSrv.java">here</A> to see the Battleship Java applet
source.
<P>
Click <A HREF="Scores.java">here</A> to see the Scoring applet source.
<P>
Click <A HREF="socket.c">here</A> to see the game server C source.<HR>
<ADDRESS>
<A href="http://www.channel1.com/ajrodley"><IMG SRC="images/jvr.gif"></A>
John Rodley - john.rodley@channel1.com 
</ADDRESS>

</BODY>
</HTML>

Listing Two
public class GameSrv extends Applet implements Runnable {
 ...
Explosion xp[] = new Explosion[0];
 ...
public void init() {
 ...
 Explosion axp[] = new Explosion[StartAmmo];
 xp = axp;
 for( i = 0; i < StartAmmo; i++ )
 xp[i] = new Explosion();
 ...

Listing Three
package ship;
import awt.*;
import java.util.*;
import java.io.*;
import net.*;
import browser.*;
import browser.audio.*;
import ship.*;
// PortThread class - implements a socket reading daemon thread class 
public class PortThread extends Thread {
public Socket s; // Our connection to the server
GameSrv g; // The game server that created us
public String MyID;
public final String ServerID = new String( "0001" );
// The maximum number of messages we can read at one time.
int maxMessages = 100;
// The fixed size of the messagr
int messagesize = 25;
public PortThread( GameSrv game ) {
 // ID gets set to 0000 initially, then game server sets it to > 1 
 MyID = new String( "0000" );
 // Save this so that we can do callbacks to our game
 g = game;
 // Turn this into a daemon thread so that the program 
 // doesn't hang on exit waiting for this thread to terminate
 setDaemon( true );
 }
public void run() {
 s = new Socket( "0.0.0.0", 1099 );
 sendStart( g.Mine.getXLoc(), g.Mine.getYLoc());
 int ret = 0;
 byte buffer[] = new byte[maxMessages*messagesize];
 while( ret != -1 )
 {
 ret = s.inputStream.read( buffer );
 if( ret > 0 )
 {
 parse( buffer, ret );
 }
 }
 }
// sendStart - this is the message we send to the server to tell him we're

// here. The server responds with the ID__ message that tells us what
// ID we need to prepend to all our send messages
public void sendStart( int x, int y ) {
 String Xm = new String( "00" + x );
 System.out.println( "Xm = " + Xm );
 String Ym = new String( "00" + y );
 System.out.println( "Ym = " + Ym );
 int xStart = Xm.length() - 3;
 int yStart = Ym.length() - 3;
 System.out.println( "xStart = " + xStart + " yStart = " + yStart );
 String Msg = new String( MyID + ":" + "NEW_:" + "+" +
 Xm.substring(xStart) + "," + "+" + 
 Ym.substring( yStart ) + ": " );
 System.out.println( "Msg = " + Msg );
 byte msg[] = new byte[messagesize];
 Msg.getBytes( 0, messagesize-1, msg, 0 );
 msg[messagesize-1] = 0;
 s.outputStream.write( msg );
 }
// sendMove - tell the game server that we moved x byte right and y bytes up
public void sendMove( int x, int y ) {
 byte msg[] = new byte[messagesize];
 msg[messagesize-1] = 0;
 String Xm = new String( "00" + x );
 String Ym = new String( "00" + y );
 int xStart = Xm.length() - 3;
 int yStart = Ym.length() - 3;
 String Msg = new String( MyID + ":" + "MOVE:" + "+" + 
 Xm.substring(xStart) + "," + "+" + 
 Ym.substring( yStart ) + ": " );
 Msg.getBytes( 0, messagesize-1, msg, 0 );
 s.outputStream.write( msg );
 }
// sendShot - We took a shot, tell the game server
public void sendShot( int x, int y ) {
 byte msg[] = new byte[messagesize];
 msg[messagesize-1] = 0;
 String Xm = new String( "00" + x );
 String Ym = new String( "00" + y );
 int xStart = Xm.length() - 3;
 int yStart = Ym.length() - 3;
 String Msg = new String( MyID + ":" + "FIRE:" + "+" + 
 Xm.substring( xStart ) + "," + "+" + 
 Ym.substring(yStart ) + ": " );
 Msg.getBytes( 0, messagesize-1, msg, 0 );
 s.outputStream.write( msg );
 }
// sendHitBy - tell the game server that we got hit by a shot
public void sendHitBy( int x, int y, String whoGotMe ) {
 byte msg[] = new byte[messagesize];
 msg[messagesize-1] = 0;
 String Xm = new String( "00" + x );
 String Ym = new String( "00" + y );
 int xStart = Xm.length() - 3;
 int yStart = Ym.length() - 3;
 String Msg = new String( MyID + ":" + "HIT_:" + "+" + 
 Xm.substring( xStart ) + "," + "+" + 
 Ym.substring( yStart) + ":" + whoGotMe );
 Msg.getBytes( 0, messagesize-1, msg, 0 );

 s.outputStream.write( msg );
}
// parse - message format is:
// ID 4 bytes - a string representing integer id of sender 0000 to 9999
// : 1 byte b[4]
// type 4 bytes - either MOVE or FIRE NEW_ or ID______LINEEND____
// colon : 1 byte b[9]
// +/- 1 byte b[10]
// XCoord 3 bytes - 0 padded ASCII number 00 through 99
// coma , 1 byte b[14]
// +/- 1 byte b[15]
// YCoord 3 bytes - 0 padded
// terminator 0 1 byte b[19]
// Example:
// MOVE:+01,+00
// says to move enemy 1 in the x direction
// for now, this is a lossy protocol - if we get out of sync we throw
// out all data until we resync. We can afford to lose the MOVE messages,
// but losing a FIRE message that would have hit us is bad news. Oh
// well, Fog of War and all that ...
public void parse( byte b[], int numbytes ) {
 Integer modifier;
 for( int i = 0; (i+5) < numbytes; i++ )
 {
 if( i+messagesize > numbytes )
 {
 System.out.println("sync error - numbytes "+numbytes );
 break;
 }
 switch( b[i+5] )
 {
 case 'N': // this is a NEW_ message
 case 'M': // this is a MOVE message
 case 'F': // this is a FIRE message
 int ret = parsePositionMessage(b, i, numbytes);
 i += ret;
 break;
 case 'I': // This is the ID__ message
 {
 boolean bBadNum = false;
 
 char msg[] = new char[messagesize+1];
 for( int j = 0; j < messagesize && 
 ((i+j) < messagesize); j++ )
 msg[j] = (char )b[i+j];
 msg[messagesize] = 0;
 String s = new String( msg );
 System.out.println( s );
 char imsg[] = new char[4];
 // If data errors go through to the point of
 // trying to initialize a java.Integer with 
 // non-integer data, then we'll throw an
 // exception which kills the 
 // thread--so catch them here
 imsg[0] = (char )msg[0];
 imsg[1] = (char )msg[1];
 imsg[2] = (char )msg[2];
 imsg[3] = (char )msg[3];
 for( int k = 0; k < 4; k++ )

 if( imsg[k] < '0' imsg[k] > '9' )
 bBadNum = true;
 if( !bBadNum )
 {
 MyID = new String( imsg );
 System.out.println("ID set to "+ MyID);
 }
 else
 System.out.println( "Bad ID message" );
 }
 break;
 default:
 System.out.println( "bad message: " );
 break;
 }
 }
 }
// parsePositionMessage - all messages are position messages except ID message
public int parsePositionMessage( byte b[], int i, int numbytes ) {
 Integer modifier;
 char msg[] = new char[messagesize+1];
 for( int j = 0; j < messagesize && ((i+j) < messagesize); j++ )
 msg[j] = (char )b[i+j];
 msg[messagesize] = 0;
 String s = new String( msg );
 System.out.println( s );
 char xmsg[] = new char[3];
 if( msg[10] == '+' )
 modifier = new Integer( 1 );
 else
 modifier = new Integer( -1 );
 // If we let data errors go through to the point of
 // trying to initialize a java.Integer with non-integer data then we'll
 // throw an exception which kills the thread, so catch them here
 xmsg[0] = (char )msg[11];
 xmsg[1] = (char )msg[12];
 xmsg[2] = (char )msg[13];
 if( xmsg[0] < '0' xmsg[0] > '9' )
 {
 System.out.println( "data error " + xmsg[0] );
 return( 1 );
 }
 if( xmsg[1] < '0' xmsg[1] > '9' )
 {
 System.out.println( "data error " + xmsg[1] );
 return( 1 );
 }
 if( xmsg[2] < '0' xmsg[2] > '9' )
 {
 System.out.println( "data error " + xmsg[2] );
 return( 1 );
 }
 String xstr = new String(xmsg);
 System.out.println( "X = " + xstr );
 Integer xMove = new Integer( xstr );
 xMove = new Integer( xMove.intValue() * modifier.intValue());
 char ymsg[] = new char[3];
 if( msg[10] == '+' )
 modifier = new Integer( 1 );

 else
 modifier = new Integer( -1 );
 ymsg[0] = (char )msg[16];
 ymsg[1] = (char )msg[17];
 ymsg[2] = (char )msg[18];
 if( ymsg[0] < '0' ymsg[0] > '9' )
 return( 1 );
 if( ymsg[1] < '0' ymsg[1] > '9' )
 return( 1 );
 if( ymsg[2] < '0' ymsg[2] > '9' )
 return( 1 );
 String ystr= new String(ymsg);
 System.out.println( "Y = " + ystr );
 if( msg[15] == '+' )
 modifier = new Integer( 1 );
 else
 modifier = new Integer( -1 );
 Integer yMove = new Integer( ystr );
 yMove = new Integer( yMove.intValue() * modifier.intValue());
 if( b[i+5] == 'M' )
 g.moveTheirs( xMove.intValue(), yMove.intValue());
 if( b[i+5] == 'F' )
 {
 char id[] = new char[4];
 for( int j = 0; j < 4; j++ )
 id[j] = b[i+j];
 String FromID = new String( id );
 g.shotByLandedAt(FromID, xMove.intValue(),yMove.intValue());
 }
 if( b[i+5] == 'N' )
 g.newEnemy( xMove.intValue(),yMove.intValue());
 return( messagesize-1 );
 }
}

Listing Four
 /**
 * Paint it.
 */
 public void update(Graphics g) {
 if( bGameGoing == true )
 {
 paintBorder( g );
 paintShip( g, Theirs );
 paintShip( g, Mine );
 paintExplosions( g, txp );
 paintExplosions( g, xp );
 paintStatusStrings( g );
 }
 }
public void paintShip( Graphics g, Ship s ) {
 Image shipImage;
 Color MyColor = new Color( g.wServer, 0, 255, 0 );
 Color TheirColor = new Color( g.wServer, 255, 0, 0 );
// Do everything having to do with position. We have to get loc and
// lastloc and reset lastloc right here together because we can get a
// keypress (which changes ship.loc) while this function is executing
// which will cause orphan/ghost ships to remain on the screen. 
// Theoretically, we could still get a keypress amongst those four calls

// that would screw things up, but the chances are dramatically reduced.
 int xLoc = s.getXLoc();
 int yLoc = s.getYLoc();
 int LastXLoc = s.getLastXLoc();
 int LastYLoc = s.getLastYLoc();
 s.setLastLoc();
 g.clearRect(leftMargin + (LastXLoc*GridSize)-(ImageSize-GridSize)/2,
 topMargin+(LastYLoc*GridSize)-(ImageSize-GridSize)/2, 
 ImageSize,ImageSize);
 if( s == Mine )
 {
 g.setForeground( MyColor );
 shipImage = shipImages[Mine.direction];
 }
 else
 {
 g.setForeground( TheirColor );
 shipImage = shipImages[Theirs.direction];
 }
 g.drawImage( shipImage,leftMargin +(xLoc*GridSize)-
 (ImageSize-GridSize)/2,topMargin+(yLoc*GridSize)-
 (ImageSize-GridSize)/2 );
 if( s == Mine )
 {
 g.setForeground( MyColor );
 int sw = g.drawStringWidth( "Position: ", leftMargin + width +
 GridSize+scoreMargin, topMargin+(height*6)/6);
 if( LastXLoc != xLoc LastYLoc != yLoc )
 else
 {
 g.setForeground( TheirColor );
 shipImage = shipImages[Theirs.direction];
 }
 g.drawImage( shipImage,leftMargin +(xLoc*GridSize)-
 (ImageSize-GridSize)/2,topMargin+(yLoc*GridSize)-
 (ImageSize-GridSize)/2 );
 if( s == Mine )
 {
 g.setForeground( MyColor );
 int sw = g.drawStringWidth( "Position: " , leftMargin + 
 width+GridSize+scoreMargin, topMargin+(height*6)/6);
 if( LastXLoc != xLoc LastYLoc != yLoc )
 g.clearRect( leftMargin + width+GridSize+scoreMargin+
 sw, topMargin+((height*6)/6)-statusHeight, 
 statusWidth-sw-scoreMargin, statusHeight );
 g.drawStringWidth( "" + xLoc + ":" + yLoc,leftMargin +
 width+GridSize+scoreMargin+sw,topMargin+(height*6)/6);
 }
}

Listing Five
 /** Paint it.***/
 public void update(Graphics g) {
 if( bGameGoing == true )
 {
 paintBorder( g );
 paintShip( g, Theirs );
 paintShip( g, Mine );
 paintExplosions( g, txp );

 paintExplosions( g, xp );
 paintStatusStrings( g );
 }
 }
public void paintShip( Graphics g, Ship s ) {
 Image shipImage;
 Color MyColor = new Color( g.wServer, 0, 255, 0 );
 Color TheirColor = new Color( g.wServer, 255, 0, 0 );
 int xLoc = s.getXLoc();
 int yLoc = s.getYLoc();
 int LastXLoc = s.getLastXLoc();
 int LastYLoc = s.getLastYLoc();
 g.clearRect(leftMargin + (LastXLoc*GridSize)-(ImageSize-GridSize)/2,
 topMargin+(LastYLoc*GridSize)-(ImageSize-GridSize)/2, 
 ImageSize,ImageSize);
 if( s == Mine )
 {
 g.setForeground( MyColor );
 shipImage = shipImages[Mine.direction];
 }
 else
 {
 g.setForeground( TheirColor );
 shipImage = shipImages[Theirs.direction];
 }
 g.drawImage( shipImage,leftMargin +(xLoc*GridSize)-
 (ImageSize-GridSize)/2,topMargin+(yLoc*GridSize)-
 (ImageSize-GridSize)/2);
 ...
 s.setLastLoc();
}
End Listings































Animation Using the Netscape Browser


Dynamic documents via server push and client pull




Andrew Davison


Andrew is a lecturer in the Department of Computer Science at the University
of Melbourne, Australia. He can be reached at ad@cs.mu.oz.au.


One way of sending animation to a Web page is to include a link to a piece of
video. This could require an expensive investment in hardware and software for
manipulating the image. Also, video files are often very large, which can be a
problem for users accessing them over a network. Aside from these concerns,
many animation effects really don't need video technology. For instance, a
great deal can be achieved by rapidly displaying a sequence of GIF files.
GIFs do not require special equipment to be displayed. Indeed, every graphical
browser can treat them as inline images. There is a plethora of software for
manipulating GIFs, and extensive libraries of clip art (see
http://www.yahoo.com/ Computers/Multimedia/Pictures/ Clip_Art/).
Netscape 1.1 (and later) can display sequences of GIF files. In fact, its
"client-pull" and "server-push" dynamic-document capabilities permit a variety
of animation effects.


Client Pull


Client pull makes it possible for a client (the user's Netscape browser) to
request a new page without the intervention of the user. This is achieved
through a META tag (see Example 1) in the head of the HTML document being
processed by the browser. The Content attribute specifies the delay in seconds
before the new page is requested, and the URL attribute identifies the
location of that page. The URL must be fully specified; relative addresses are
insufficient.
You can use this technique to animate the introduction to a page. The
animation effect is heightened by using the same layout in all the
introductory HTML files. Examples 2(a), 2(b), and 2(c) are three HTML files
(intro1.html, intro2.html, and intro3.html, respectively) that generate
Figures 1(a), 1(b), and 1(c). Together they form a three-stage introduction to
a questionnaire in quest2.html (see Figure 2). The relevant METAtags from each
intro file are shown in Example 3. The METAtag in intro1.html causes it to be
replaced after one second by intro2.html. The META tag in intro2.html causes
it to be replaced after another second by intro3.html. The META tag in
intro3.html causes it to be replaced after one more second by quest2.html. The
questionnaire does not contain a META tag, thereby terminating the rapid
change of pages. Try the animation (and the questionnaire) for yourself by
accessing http://www.cs.mu.oz .au/~ad/code/intro1.html
If the URL attribute is left out of the META tag, then the page itself will be
reloaded after the specified delay. The file coffee.html (see Example 4 and
Figure 3, or http://www .cs.mu.oz.au/~ad/code/ quest/coffee.html) is an
example of a page of this type. It is loaded every 60 seconds until the
browser terminates or is set to point to another page. The displayed GIF file
can be generated by a camera monitoring something interesting (the office
coffee maker, for example) and left running. The browser will load the current
coffeepot picture as it reloads coffee.html.
An advantage of this approach is that only one GIF file needs to be stored,
thereby reducing the memory requirements. A disadvantage is that the browser
has to continually reload the page, which means multiple connections to the
server.
Remember that the Content-attribute value is only approximate, since it does
not take into account the time to establish a server connection, retrieve the
file, and display it. Therefore, the period between reloads is typically
longer than the Content value suggests, and the extra delay depends on
variables such as network and machine usage.


Server Push


A server-push-based dynamic document is sent to the client (the user's
browser) by the server. Such a document uses an experimental, multipart MIME
type called "multipart/x-mixed-replace," which allows its data items to be
treated as separate documents by the receiving browser.
Figure 4 presents the general format of a multipart/x-mixed-replace document:
data1, data2,..., dataN are treated as separate documents by the browser. The
arrival of a new piece of data (document) will cause the previous one to be
cleared from the browser's window and the new one to be displayed. SomeString
can be any string, but it must be preceded by "-" when used as a data
separator. When used as a terminator for the document, it must also be
followed by "-". When a browser receives a data separator (or terminator), any
data that is still in the browser's buffers will be displayed.
The data can be any recognized MIME type, including HTML, plaintext, or GIF.
The data must begin with a Content-Type declaration, followed by a blank line
and the actual information. The Content-Type values for HTML, plaintext, and
GIF data are text/html, text/plain, and image/gif, respectively. (A complete
list of official MIME types can be found at
ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/ media-types.)
Typically, multipart documents are constructed by CGI scripts and utilize a
delaying mechanism between the transmission of each piece of data. This gives
the user time to read the data before it is replaced. CGI scripts can also
create multipart documents consisting of an infinite number of data items (for
instance, by outputting data from a loop). This is useful if the application
requires a continual stream of information to be sent to the client (the
periodic results of monitoring a network, for example).
Listing One is simp-serv.c, a program that demonstrates how a simple,
multipart document can be constructed using the CGI approach. It is invoked by
the HTML document in Example 5 (also at http://www.cs
.mu.oz.au/~ad/code/quest/converse.html). When simp-serv.c is called, it
transmits a multipart document consisting of two data items corresponding to
two HTML pages. The first printf("-HTMLSep") statement makes sure that the
multipart Content-Type string has been output to the browser. Then the call to
sleep() suspends the output of the first piece of data for one second. The
second printf("-HTMLSep") statement makes sure that all of the first data item
(the first HTML page, in this case) has been output. The second sleep() call
suspends the output of the second data item (the second HTML page) for three
seconds, allowing time for the user to read the first page before it is
replaced.
Server push differs from client pull in a number of ways. The most important
one is that the server sends a stream of data items to the client using one,
possibly long-lived, network connection. In contrast, client pull uses a
series of separate requests over the network to obtain its documents. Thus,
server push is more efficient since it only needs to set up one connection;
however, it may monopolize a TCP/IP port for a lengthy period. Server push is
easier to control than client pull: It can be terminated by the user pressing
the browser's Stop button or by the server terminating the multipart document.


Server Push with Images


You can generate animation effects by using server push to send a sequence of
GIF images to the browser. Consequently, the multipart document will consist
of data items whose Content-Type is image/gif.
If the retrieval of the multipart document is initiated from inside an IMG tag
in an HTML document, then the sequence of images will appear at that spot
without altering the rest of the document. Also, the dimensions of the first
image will be used to scale all the subsequent images to the same size.
Rather than writing a GIF-filled multipart document directly, I'll use
frames.c, a CGI script, to generate it (see Listing Two). The script must be
passed the name of a file containing a list of GIF filenames that are to be
sent to the browser.
Example 6 shows how frames.c is utilized. The important part is the IMG tag
<img src="http://www.cs.mu.oz.au/cgi-bin/frames?intro">, which causes the
compiled frames.c code to be invoked with the text after the "?" as its
command-line argument. frames.c interprets it as the file intro.pics in the
directory /home/staff/ad/ www_public/code/quest/gifs.
The .pics file format is simple, consisting of lines of filenames, optionally
preceded by integer delay values. Example 7, for instance, shows the contents
of intro.pics, which is interpreted as a request to immediately transmit
hello.gif to the browser. frames.c waits for two seconds before sending
blank.gif and finally transmits prepare.gif. Then it will wait another two
seconds before sending blank .gif and finish by transmitting question.gif.
frames.c assumes that all of these GIF files are in
/home/staff/ad/www_public/code/ quests/gifs.
The processing of intro.pics is similar to the example in the client-pull
section, but the images all appear in the same Web page. Using this approach,
the questionnaire file, quest2.html, could be modified to include the same IMG
tag at its start, making the animated introduction part of the questionnaire.
A second .pics file is shown as Example 8. When read by an IMG tag, it sends a
stream of different images to the browser. The new feature illustrated by this
example is the "times" line at the end, which specifies that the sequence of
images will be sent ten times. (You can find the images used in Example 7 and
Example 8 at http:// www.cs.mu.oz.au/~ad/code/ quest/gifs.)


Inside frames.c 


frames.c first adds the path and .pics extension to its command-line argument.
Then the file is opened, and build_pics() reads the delay and GIF-filename
information into the pics array. build_pics() checks for the presence of a
times line at the end of the pics array by calling num_ times(). num_ times()
sets tmsno to the value on the times line, or to 0 if there is no such line.
The multipart Content-Type is written to standard output (and thus, to the
browser), and a Do-While loop is entered that continues until tmsno is equal
to (or less than) 0. The nested For loop processes pics' elements, each of
which corresponds to a GIF file and a delay value (if no delay was given, then
this is 0). The separator string is printed first, ensuring that the previous
image is fully displayed, and then the pics element is processed. Any delay is
handled by calling sleep(), then the GIF is output using write_gif().

write_gif() and wstring() are derived from code written by Rob McCool, which
can be found at ttp://home.netscape.com/ assist/net_sites/mozilla/doit.c.
write_gif() does not read the GIF file into a data structure prior to writing
it to standard output. Instead, the file is memory mapped to a process
address, which is more efficient, especially if the file is large. However,
this technique uses the low-level file operations open() and write(), which
utilize integer file descriptors, while the standard I/O library functions use
streams. The two mechanisms do not work together, so the rest of the output
from the program must also use write(). This explains the use of wstring() to
print a string, rather than printf(). Another drawback is that the
memory-mapping library is not included with every flavor of UNIX. For this
reason, a version that uses read() is included as write_gifp(). Change the
write_gif() call in main() to write_gifp() if memory mapping is unavailable.


Server-Push Issues


Delays specified by the server (by using sleep(), for example) only delay the
transmission of the data. They do not directly influence when the client
browser will receive, load, and display the data. For instance, the first few
pieces of data (GIF images, HTML pages, or whatever) sent to a browser often
take some time to be loaded and displayed. As a result, any server-side delays
between these data items will be less apparent to the browser user.
Also, the browser can decide to stop displaying data if too much is waiting at
the browser end of the network connection. This may happen if many pieces of
data are sent without server-side delays between them.


Summary


Animation using video can be costly and complicated, and the same effects can
often be achieved by displaying a sequence of GIF images or HTML pages.
Netscape (1.1 and later) can be used to code this type of animation, using
client-pull and server-push dynamic documents.
Client pull enables HTML pages to be automatically loaded after a given time.
This makes it straightforward to build introductory animation sequences and
pages that regularly reload themselves.
Server push can be used to send a sequence of data to a client browser. The
program I've described makes server push easier to use. Delays can be
specified between the images, and the sequence can be repeated an arbitrary
number of times. 
Further details on dynamic documents can be found at the Netscape site
(http:// www.netscape.com/assist/net_sites/dynamic_docs.html) and at the
Animationest page at http://bakmes.colorado.edu/~bicanic/altindex.html.
Invented World's test page, at http://www.enterprise.net/iw/testpage.html, has
several interesting server-push examples, including animated lightning,
blooming roses, and Jupiter in motion, all created with Invented World's
Webvid '95 program. Animate v0.9, written in Perl, can be found at
http://www.homepages.com, together with some nice examples.
For details on other Netscape extensions to HTML, see
http://www.netscape.com/assist/net_sites/html_extensions.html.
Example 1: A META tag in the head of the HTML document
<meta http-equiv="Refresh" content="5;
url=http://www.cs.mu.oz.au/~ad/code/quest/quest.html">
Example 2: The three HTML files that generate Figures 1(a), 1(b), and 1(c).(a)
intro1.html; (b) intro2.html; (c) intro3.html.
(a)
intro1.html
<html>
<head>
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/intro2.html">
<title>A Rather Silly Computing Questionnaire</title>
</head>
<h1>A Rather Silly Computing Questionnaire</h1>
<br>
<img src="gifs/hello.gif" alt="Hello... ">
</body>
</html>

(b)
intro2.html
<html>
<head>
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/intro3.html">
<title>A Rather Silly Computing Questionnaire</title>
</head>
<h1>A Rather Silly Computing Questionnaire</h1>
<br>
<img src="gifs/prepare.gif" alt="prepare to... ">
</body>
</html>

(c)
intro3.html
<html>
<head>
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/quest2.html">
<title>A Rather Silly Computing Questionnaire</title>
</head>
<h1>A Rather Silly Computing Questionnaire</h1>

<br>
<img src="gifs/question.gif" alt="fill in the questionnaire... ">
</body>
</html>
Example 3: (a) The META tag in intro1.html causes a one-second delay between
Figures 1(a) and 1(b); (b) another one-second delay is caused before Figure
1(c) appears; (c) intro3.html requests the questionnaire in Figure 2 after
another second, as defined by its META tag.
(a)
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/intro2.html">

(b)
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/intro3.html">

(c)
<meta http-equiv="Refresh" content="1;
 url=http://www.cs.mu.oz.au/~ad/code/quest/quest2.html">
Example 4: The file coffee.html.
<html>
<head>
<meta http-equiv="Refresh" content=60>
<title>How's the Coffee?</title>
</head>
<body>
<h1>How's the Coffee?</h1>
The coffee pot:<p>
<img src="gifs/coffee.gif" alt="coffee gif unavailable">
</body>
</html>
Example 5: This HTML file invokes simp-serv.c (Listing One).
<html>
<head>
<title>A Simple Conversation</title>
</head>
<body>
<h1>A Simple Conversation</h1>
<br>
Start a conversation
<a href="http://www.cs.mu.oz.au/cgi-bin/simp-serv">now</a>.<p>
</body>
</html>
Example 6: How frames.c (Listing Two) is utilized.
<html>
<head>
<title>Server Push
Animation</title>
</head>
<h1>Server Push Animation</h1>
<br>
<img src="http://www.cs.mu.oz.au/cgi-bin/frames?intro">
</body>
</html>
Example 7: Contents of intro.pics.
# introductory images for the# questionnaire

hello
2 blank
prepare
2 blank
question

Example 8: A typical .pics file.
# A crazy mishmash of images

2 chitz
cloudz
2 cracks
fire
galaxy4
2 marble
hello

times 10
Figure 1: (a) First image in the animation sequence; (b) second image in the
sequence; (c) final introductory image in the sequence.
Figure 2: The questionnaire.
Figure 3: Watching the coffeepot
Figure 4: General format of a multipart/x-mixed-replace document
Content-type: multipart/x-mixed-replace;boundary=SomeString
-SomeString
data1
-SomeString
data2
-SomeString
data3
 .
 .
 .
-SomeString
dataN
-SomeString-

Listing One 
/* simp-serv.c */
/* By Andrew Davison (ad@cs.mu.oz.au), August 1995 */
/* A simple server push program that sends two HTML
 pages to the browser with some delays. */
#include <stdio.h>
#include <unistd.h> /* for sleep() */
void print_html(char *title, char *body);
int main()
{
 printf("Content-type: multipart/x-mixed-replace;boundary=HTMLSep\n");
 printf("\n--HTMLSep\n"); 
 sleep(1);
 print_html("Hello", "Hello User!");
 printf("\n--HTMLSep\n"); 
 sleep(8); 
 print_html("Goodbye", "Goodbye User!");
 printf("\n--HTMLSep--\n");
 return 0;
}
void print_html(char *title, char *body)
/* Print out the title and body strings in HTML document format */
{
 printf("Content-type: text/html\n\n");
 printf("<html><head>\n");
 printf("<title>%s</title>\n", title);
 printf("</head><body>\n");
 printf("<h1>%s</h1>\n", title);
 printf("%s<p>\n", body);

 printf("</body></html>\n");
}

Listing Two
/* frames.c -- by Andrew Davison (ad@cs.mu.oz.au), August 1995 */
/* Read a pics filename from the command line and load
 the contents into a pics array.
 The pics array is used to send a sequence of gifs
 to standard output (with optional delays) as a 
 MIME multipart/mixed message.
 The intention is to display the gifs in 
 netscape v1.1 (or later) as an animation.
 The gifs sequence can be repeatedly sent, depending
 on the presence of "times tmsno" in the pics file
 (tmsno is the number of times the sequence should be
 sent to the browser.
*/
/* The location of the gifs and pics files is hardwired
 into the program as the PATH constant.
*/
/* write_gif() and wstring() are
 based on code by Rob McCool, available at
 http://home.netscape.com/assist/net_sites/mozilla/doit.c
*/
#include <sys/types.h>
#include <sys/mman.h> /* for mmap(), munmap() */
#include <unistd.h> /* for sleep() */
#include <fcntl.h> /* for open(), write() */
#include <sys/stat.h> /* for fstat() */
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#define PATH "/home/staff/ad/www_public/code/quest/gifs"
 /* path to the gifs and pics file; change to suit */
#define NAME 20 /* max chars in a number string */
#define MAXLEN 120 /* max length of a filename/line */
#define MAXPICS 50 /* max no. of pictures */
#define BUFSIZE 512 /* size of chunk to be read from gif file */
typedef struct {
 int delay; /* delay before gif is sent */
 char *name; /* gif filename */
} frame;
int build_pics(FILE *fp, frame pics[], int *pnum);
int extract_delay(char *ln, int *ppos);
int num_times(char *s);
void write_gif(int fd);
void wstring(char *s);
void write_gifp(int fd); 
int main(int argc, char *argv[])
{
 FILE *fp;
 int filedes;
 frame pics[MAXPICS]; /* array of delays & gif filenames */
 char fnm[MAXLEN]; /* full pics filename */
 char gif_name[MAXLEN]; /* full gif filename */
 int picnum, x, tmsno;
 sprintf(fnm, "%s/%s.pics", PATH, argv[1]); /* build full pics fnm */
 if ((fp = fopen(fnm, "r")) == NULL)

 exit(0);
 else {
 tmsno = build_pics(fp, pics, &picnum);
 fclose(fp);
 }
 wstring("Content-type: multipart/x-mixed-replace;boundary=GifSeperator\n");
 do { /* always do actions at least once */
 for (x=0; x < picnum; x++) {
 wstring("\n--GifSeperator\n");
 sleep(pics[x].delay);
 wstring("Content-type: image/gif\n\n");
 
 sprintf(gif_name, "%s/%s.gif", PATH, pics[x].name);
 /* build full gif filename */
 
 if((filedes = open(gif_name, O_RDONLY)) != -1) {
 write_gif(filedes); /* or write_gifp(filedes); */
 close(filedes);
 }
 }
 tmsno--;
 } while (tmsno > 0);
 wstring("\n--GifSeperator--\n"); 
 return 0;
}
int build_pics(FILE *fp, frame pics[], int *pnum)
/* Read in lines from the pics file using the fp file pointer
 and store the delays and gif filenames in the pics array.
 Lines starting with a '#' are comments and are ignored.
 Lines beginning with a new line are also ignored.
 The last pics entry is checked by num_times() to see if
 it contains "times tmsno". If it does then this entry
 is ignored but tmsno is recorded.
*/
{
 char line[MAXLEN];
 int num, len, letpos, tmsno;
 num = 0;
 while ((fgets(line, MAXLEN, fp) != NULL) && (num < MAXPICS))
 if ((line[0] != '\n') && (line[0] != '#'))
 { /* not a blank line or comment*/
 len = strlen(line);
 if (line[len-1] == '\n')
 line[--len] = '\0'; /* overwrite '\n'; decr len */
 if (isdigit(line[0]) != 0) {
 pics[num].delay = extract_delay(line, &letpos);
 pics[num].name = (char *)malloc(sizeof(char)*(len-letpos+1));
 strcpy(pics[num].name, &line[letpos]);
 }
 else {
 pics[num].delay = 0;
 pics[num].name = (char *)malloc(sizeof(char)*(len+1));
 strcpy(pics[num].name, line);
 }
 num++;
 }
 if ((tmsno = num_times(pics[num-1].name)) > 0)
 num--; /* ignore the last pics array entry */
 *pnum = num;

 return tmsno;
}
int extract_delay(char *ln, int *ppos)
/* Attempt to extract a number from the start of the ln line.
 Also find the position of the first non-white space 
 character and store it in ppos. The gif filename begins
 at that position.
*/
{
 int i = 0;
 char num[NAME];
 while (isdigit(ln[i]) != 0) {
 num[i] = ln[i];
 i++;
 }
 num[i] = '\0';
 while (isspace(ln[i]) != 0)
 i++;
 *ppos = i;
 return atoi(num);
}
int num_times(char *s)
/* If the s line begins with "times" then extract
 the number following it, skipping any white space
 in between.
 If there isn't a "times" string then return 0, so allowing
 build_pics() to detect the string's absence. 
 If there is a "times" string but no number then return 1.
*/
{
 int pos = 5; /* length of "times" */
 int tmsno = 0;
 if (strncmp(s,"times",pos) == 0) {
 while((s[pos] != '\0') &&
 (isspace(s[pos]) != 0))
 pos++;
 if (s[pos] == '\0')
 tmsno = 1; /* a "times" string with no number */
 else
 tmsno = abs(atoi(&s[pos])); /* avoid -ve */
 }
 return tmsno;
}
/* Functions based on code by Rob McCool */
void write_gif(int fd)
/* Use memory mapping instead of read() to access the
 file with the fd file descriptor.
*/
{
 struct stat fi; /* file information */
 caddr_t pa; /* process address */
 fstat(fd, &fi);
 pa = mmap(NULL, fi.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
 if(pa == (caddr_t) -1)
 exit(0);
 if(write(STDOUT_FILENO, (void *) pa, fi.st_size) == -1)
 exit(0);
 munmap(pa, fi.st_size);
}

void wstring(char *s)
/* Write the s string to stdout */
{
 if (write(STDOUT_FILENO, s, strlen(s)) == -1)
 exit(0);
}
/* A more portable version of write_gif() */
void write_gifp(int fd)
/* Use read() to access the file with the fd file descriptor. */
{
 char buffer[BUFSIZE];
 int nread;
 while ((nread = read(fd, buffer, BUFSIZE)) > 0)
 if (write(STDOUT_FILENO, buffer, nread) == -1)
 exit(0);
}
End Listings>>
<<













































Programming CGI in C


Sometimes Perl isn't the best tool for CGI programs




Eugene Eric Kim


Eugene is a history and science concentrator at Harvard University. He can be
contacted at eekim@fas.harvard.edu.


The World Wide Web has rapidly become the most popular application on the
Internet due to its simplicity and visual appeal. Perhaps its most important
feature, however, is its interactivity. The ability to communicate both ways
over the Web allows site maintainers to develop sophisticated programs that
provide information based on user feedback.
Data is transmitted between the user and the server using a protocol called
"Common Gateway Interface" (CGI). Although the CGI protocol is not difficult
to understand, it can be intimidating at first, especially for the Web
designer spoiled by the simplicity of HyperText Markup Language (HTML), the
language used to format content on the Web. Additionally, programmers who are
well-versed in CGI prefer to focus on developing the actual application rather
than dealing with the internals of the protocol. Consequently, there is a
genuine need for useful, efficient, and simple CGI-processing tools.
Such libraries have been developed in several languages, including Perl, C,
and C++. In this article, I'll introduce cgihtml, a public-domain C library I
wrote that can simplify CGI programming on a UNIX platform. I'll present
examples of CGI programs here; the complete code for the library is available
electronically; see "Availability," page 3.


CGI Specification


CGI describes the "gateway" for information between the Web application, the
server, and the browser; see Figure 1. Every time you select a CGI program
from your browser, either by filling out a form and pushing the Submit button
or by selecting a link to a CGI program, data is sent to the server. The
server then invokes the program and gives it your data in an encoded form.
(For a description of how the data is encoded, see Example 1.) The data is
either delivered to the standard input (the POST method) or stored in an
environment variable (the GET method).
Once you have this data in some usable form, you can manipulate it.
Additionally, the server sets several environment variables for the CGI
program to use. These environment variables usually contain useful information
about the client, the server, and the current request. For example, the
environment variable REMOTE_ ADDR provides the IP address of the client
connected to the server, while the variable HTTP_USER_ AGENT defines the type
of browser on the client machine. For a complete list of these environment
variables, see http://hoohoo.ncsa .uiuc.edu/cgi/.
Although a CGI program does not need to receive any input from the browser
(for example, when the program is invoked from within a <a href="..."> tag),
it is required to send output to the browser. Data sent to standard out is
received, interpreted, and displayed on the browser. Two sets of data are sent
to the output: HyperText Transfer Protocol (HTTP) headers, and the actual
information you wish to transmit.
There are several different HTTP headers that give the browser important
information about the data it is about to receive. The most important header,
known as "Content-Type:", tells the browser what type of information the
server is sending, so the browser can interpret it accordingly. This header is
of the form Content-Type: type/subtype, where type/type is a valid
Multipurpose Internet Mail Extensions (MIME) type. The header is followed by a
blank line. 
The most common of these headers is Content-Type: text/html, the MIME type for
HTML documents. Note that a blank line is required after the Content-Type,
even if you are returning a blank page.
Another important header is Location:, which tells the browser to look for a
file at a specified location. This is useful for writing scripts to redirect
the browser to a different location. For example, to tell the browser to
access the file file.html when it calls a CGI program, you need a CGI program
that returns Location: file.html followed by a blank line.
One other HTTP header that is often useful is the no parse header (nph), which
tells the Web browser that the server does not want to return any information.
Nph headers are commonly used in imagemaps. To have a designated portion of an
imagemap ignore the user's click, you have that portion pointing to a CGI
script that returns an nph header. Nph headers are of the form shown in
Example 2, followed by a blank line. You must include a Content-Type header
even if you are not planning on returning any data.
To use the CGI protocol, you need a language that can read from the standard
input, get environment variables, and write to the standard output. Since
practically every computer language in existence can do all three, CGI
programs may be written in whichever language suits you best.


C versus Perl


CGI programs must invariably parse plain text; Perl's high-level syntax,
flexibility, and text-manipulation routines make it an ideal language in which
to program CGI.
However, Perl and other very high-level scripting languages have limitations.
One downside is their size. The Perl executable can be as much as ten times
larger than CGI C binaries. While some of the CGI libraries for Perl greatly
simplify programming, some do so at a cost in server performance. Since most
servers fork a separate process every time a CGI program is invoked, overhead
can grow rapidly on a high-traffic site with lots of CGI access.
Some Web servers (most notably Netscape and Apache) have their own APIs. These
allow you to code your CGI programs as extensions to the server, thus avoiding
the overhead created by forking new processes. Communicating with these APIs
generally means coding your CGI programs in C.
Many specialized applications come with only C libraries. Additionally, you
sometimes may require a high level of control over your program's actions.
Only a lower-level language such as C can provide this control for all types
of applications.


A CGI Library for C Programs


A properly implemented CGI library in C needs to strike a careful balance
between usability and flexibility. My cgihtml library focuses almost entirely
on providing routines for the most mundane CGI-specific tasks, such as
decoding CGI input. When using cgihtml to code CGI programs that have
specialized needs, such as type-conversion or advanced string-parsing
routines, you must decide how best to implement these functions. Rather than
attempt to completely hide the intricacies of CGI programming, cgihtml tries
to complement and assist the programmer's skill. 
The parsing routine read_cgi_input() determines whether the input is sent via
the POST or GET method and interprets it accordingly, as in Figure 2. Parsed
data is placed in a linked list of structures consisting of two elements: the
name of the structure and the value.
cgihtml provides the cgi_val() routine to easily obtain the value of an
element of the linked list, given the name. If you would rather search for
elements with a given value or look for elements with a given combination of
name and value, you can easily parse the list using one of the provided
linked-list routines to return the value for any desired key.
All CGI programs must return a header, and most usually return some HTML as
well. The routines in cgihtml are simply wrapper functions for the appropriate
printf() call. These routines simplify outputting HTTP headers and HTML tags,
which should encourage you to use proper, well-structured HTML rather than a
hacked-together string of tags. Your code's readability will improve as well,
since the function names express their purpose better than a printf() call
would. For example, the code in Example 3(a) uses the cgihmtl library, which
makes far more sense and is far more readable than the equivalent C code
without the library, shown in Example 3(b).
There are also a number of more specialized routines that come with cgihtml to
provide additional functionality and security. 


Returning Query Results


To demonstrate use of this library, I'll build a rudimentary application: a
generic query-results program that returns the names and values of everything
entered in any form, such as the one in Example 4. This example form uses the
POST method; however, you can use either GET or POST with query results. 
The example assumes that the compiled query-results program is in your Web
server's cgi-bin directory. The query-results program simply shows the name of
each form item and the value entered by the user. First, the code needs to
include the appropriate header files, namely cgi-lib.h and html-lib.h. Next,
it instantiates an automatic variable called "item," a linked list that will
store the names and values; see Example 5.

Next, the program needs to read the input. Remember that read_cgi_input()
understands both the GET and POST methods automatically; you do not need a
separate function to handle either case. The next statement in the program
uses read_cgi_input() to read the item data into the linked list.
You now have a linked list with all the items entered from the form. You could
iterate through the list yourself and print each entry using some of the
linked-list routines provided by cgihtml. However, it's simpler to use the
print_entries() function, which outputs each name and value using an HTML
definition list (the <dl> tag).
Before you output your data, you must tell the browser what type of
information you are sending by outputting a Content-Type header. This is
accomplished here by the html_header() function. Finally, the list is cleared
and the memory is freed up before exiting.
You now have a fully functioning CGI program in only a few lines of code. This
code can be extended to do many things because each item in the linked list is
easily accessible.


Programming Strategies


One of the most important issues related to CGI programming is security. A
badly written CGI program can open up your system to anyone smart enough to
manipulate it. In general, you should run your Web server as a nonexistent
user (usually "nobody") to limit the damage someone could do if he or she were
to break in via a CGI script.
Although running the program as a nonexistent user reduces the risk, it does
not eliminate it. In CGI C programs, C functions that fork a Bourne shell
process (system() or popen(), for example) present a serious potential
security hole. If you allow user input into any of these functions without
first "escaping" the input (adding a backslash before offending characters),
someone malicious can take advantage of your system using special,
shell-reserved "metacharacters." For instance, Example 6(a) may seem perfectly
safe; it simply opens up a pipe to sendmail. However, since popen forks a
shell, invoking the CGI script with the response in Example 6(b) as a value
for "to:" will create the file I_HAVE_ACCESS on server's /tmp directory.
Although this is a relatively harmless example, there are more serious
possibilities.
In order to prevent malicious input into system() and related routines,
cgihtml comes with an escape_input() function, which merely precedes every
shell metacharacter in a string with a backslash. Example 6(c) is a modified,
safe version of the code. Now if the user enters the response in Example 6(b),
the semicolon will be preceded with a backslash before it is appended to the
string command. In Example 6(d), the sanitized popen() command string will
simply send mail to three bad addresses rather than allow a user to create
files on the server machine from an unauthorized client.
Another common programming challenge occurs when you press the Stop button on
your browser while a CGI script is still running. Although most servers
receive a SIGNAL stating that the client has closed its connection, they
rarely bother passing these messages on to the CGI program. Additionally, if
your experimental CGI program has a bug and goes into an infinite loop,
pressing the Stop button will not break you out of it.
The solution is to set an alarm to go off much later than the program needs to
run and quit cleanly. If the alarm sounds, the program probably has a bug, so
trap for that signal and call an appropriate exit function to deal with it.
cgihtml comes with its own primitive die() function, which sends an error
message to the Web browser, but you are encouraged to write your own die() to
fit your needs. In C, this looks like Example 7. If this program is still
running 30 seconds after launch, then it will automatically print an error
message to the Web browser and quit.
One other task CGI programmers often face is content negotiation. The large
number of existing browsers, each supporting different features, often
frustrates the Web-page designer, who must design pages that look good on any
browser. For example, an imagemap may look fantastic on a graphical browser
with a T1 connection but is utterly useless to those with text browsers or
slow Internet connections.
One way to deal with this dilemma is to use CGI scripts to determine what the
browser is capable of displaying, then send the appropriate HTML file. There
are several variables that you can check for different kinds of content
negotiation. cgihtml comes with the function accept_ image(), which checks the
HTTP_ ACCEPT environment variable to see whether the browser can view inline
images. Other functions could be written to check environment variables; for
example, HTTP_USER_AGENT could send pages that use the special features of
several different browsers.
Example 8 is a CGI program that provides content negotiation. It assumes you
have previously designed two HTML files: a very graphical one (index-img.html)
and another that is text only (index-txt.html). When accessed, this program
sends a graphical page to graphical browsers and a text-only page to text
browsers.


Conclusion


Although Perl is currently the language of choice among Web programmers,
increased server strain will provide a greater incentive for Web maintainers
to write their code in lower-level, more-efficient languages such as C. As
cgihtml shows, well-written libraries can simplify CGI programming in C
without restricting C's flexibility and power.
Example 1: The encoding scheme for CGI input. Names and values are separated
by =, records are separated by &, spaces are replaced by +, and special
characters are preceded by \.
name1=value1a+value1b&name2=value2&name3=value3 ...
Example 2: A no parse header that might be returned by an imagemap section to
ignore the user's mouse click
HTTP/1.0 204 No Response
Content-Type: text/plain
Example 3: (a) Code that uses the cgihtml library; (b) equivalent code without
the cgihtml library.
(a)
html_header();
html_begin("HTML Page");
h1("HTML Page");
printf("<p>This is a sample HTML page.\r\n");
html_end();

(b)
printf("Content-Type: text/html\r\n\r\n");
printf("<html> <head>\r\n");
printf("<title>HTML Page</title>\r\n");
printf("</head>\r\n<body>\r\n");
printf("<h1>HTML Page</h1>\r\n");
printf("<p>This is a sample HTML page.\r\n");
printf("</body></html>\r\n");
Example 4: A sample form in HTML that uses the CGI program query results.
<form method=POST action="/cgi-bin/query-results">
<p>Name: <input type=text name="name">
<p>Age: <input type=text name="age">
<p>E-mail: <input type=text name="email">
<p><input type=submit>
</form>
Example 5: The query-results program, which reads data from a form and returns
a page of data values
#include <stdio.h>
#include "cgi-lib.h"
#include "html-lib.h"
int main()
{
 llist items;
 read_cgi_input(&items);

 html_header();
 html_begin("Query Results");
 h1("Query Results");
 print_entries(items);
 html_end();
 list_clear(entries);
 exit(0);
}
Example 6: (a) Code that has a security hole; (b) a user can breach security
via shell metacharacters; (c) modified program that escapes the input string;
(d) the popen() call rendered harmless.
(a) llist items;
 char *command;
 read_cgi_input(&items);
 strcpy(command,"/usr/lib/sendmail ");
 strcat(command,cgi_val(items,"to"));
 popen(command);

(b) ; touch /tmp/I_HAVE_ACCESS

(c) llist items;
 char *command;
 read_cgi_input(&items);
 strcpy(command,"/usr/lib/sendmail ");
 strcat(command,escape_input( cgi_val(items,"to") ));
 popen(command);

(d) /usr/lib/sendmail \; touch /tmp/I_HAVE_ACCESS
Example 7: Program with a built-in watchdog timer.
#include <signal.h>
#include <unistd.h>
#include "cgi-lib.h"
int main()
{
 signal(SIGALRM,die);
 alarm(30);
 while (1) ;
}
Example 8: Program that sends an HTML page tailored to the type of browser.
#include "cgi-lib.h"
#include "html-lib.h"
int main()
{
 if (accept_image())
 show_html_page("/index-img.html");
 else
 show_html_page("/index-txt.html");
}
Figure 1: Data flow between the browser, server, and CGI program.
Figure 2: The read_cgi_input() function parses the raw data and places the
entries in a linked list.











































































Tracking Home Page Hits


Reporting on user access




Ann Lynnworth


Ann, a long-time Paradox developer, is cofounder of the Delphi Northbay SIG in
Petaluma, CA. She can be contacted at ann@sonic.net.


When it comes to measuring traffic on the Internet, the numbers are sometimes
mind boggling. Last year, traffic on the World Wide Web reportedly increased
1800 percent. By 1998, some forecasters are predicting 11.8 million Web users,
while others estimate the Internet market will grow tenfold between 1994 and
1998. Clearly, the Internet can't be ignored.
But after you've come up with a business plan, studied books on HTML, rented
disk space on an Internet-access provider's system, and launched your own
World Wide Web home page, how do you determine if the system is effective?
Currently, the predominant means of measuring home-page traffic is the number
of "hits" you get over a given period of time. The graphics-intensive Web site
for Playboy magazine, for instance, is racking up about 800,000 hits per day.
Likewise, Wired magazine's HotWired site reportedly gets nearly half a million
hits per day. Hits, however, can be a misleading metric because they depend on
the graphical content of the page--each graphic counts as a hit. To get the
number of actual users visiting a site, you need to divide the hits by a
factor, say between 3 and 6, that depends on the characteristics of the site.
Furthermore, the average Internet provider is too busy to give you customized
statistical information. You might be able to find out how the whole server is
doing, or once a week, you might find out how your page is doing. Or, you
could learn to analyze the server-log files to get an in-depth understanding
of your traffic.
But how about a straightforward meth-od that lets you (and surfers) see how
popular your page is? You've probably seen pages that say "You are guest #nnn
on this page." A more accurate report, like that in Figure 1, might tell you
that "there have been nnn requests for this page since mm/dd/yy hh:mm."
Webmasters at these pages have installed a "traffic-counter" program that logs
the number of individual accesses to a particular page. In this article, I'll
present a minimal traffic counter that tracks and reports on user access; see
Figure 2. This counter was built using components for Borland's Delphi. The
resulting traffic.exe program runs on either of Bob Denny's web servers:
WebSite (32-bit, for Win '95 or NT 3.5x, published by O'Reilly & Associates),
or Win-HTTPD (16-bit, for Win 3.1x, shareware). Since it also uses the Borland
Database Engine, IDAPI must be installed on the server. (If you are working on
a UNIX server or have a different Web server, you might take a look at
http://www.stars.com for other approaches in the public domain.) 


A CGI Backgrounder


Before launching into specifics about my traffic counter, an introduction to
CGI is in order. In this discussion, I'll refer to Robert Denny's win-cgi
implementation (for WebSite) for illustrative purposes.
When a Web browser requests a static page from a WebSite server, that request
is sent to the server using HTTP protocol, and the server responds with a
document from its disk. To request a dynamic page, however, the browser sends
a request and, based on the URL, the WebSite server realizes that it is a
request for a CGI program. The server then runs that CGI program (WinExec() of
an .EXE) and waits for it to finish. (Note that WebSite is multithreaded; this
has obvious benefits.) The CGI program executes its instructions and feeds a
response document back to the server. When the CGI program exits, WebSite
sends the document back to the browser.
More specifically, under win-cgi, all the CGI environment data (browser name,
surfer's IP address, referring page, authentication data, and the like) plus
any HTML-form data (data requested on the referring page) is parsed by the
server and stored in an .INI file in a temporary directory. When WebSite calls
your CGI application, it passes the name of that temporary .ini file as the
first command-line argument. Thus, your CGI application can orient itself to
the current session by reading values from the .INI file.
In addition to all the standard CGI information, the WebSite server also adds
a key=value statement in the .INI file, defining the name of the output file
that your program should build. This will also be in the temporary directory.
WebSite takes care of unique naming issues and file cleanup after the file is
sent to the browser.
In short, the CGI program executes, sending its output to a prenamed,
temporary output file. Generally, the output is a standard HTTP prologue
followed by a blank line and some valid HTML syntax; see Example 1. The
dynamic HTML page at my online classified-ads site
(http://super.sonic.net/ann/forsale/) illustrates this. When you look up
"Bicycles," my CGI application searches a Paradox table and displays an HTML
3.0 table displaying the "answer" set.


Returning to the Traffic Counter


When building an application like a traffic counter, you're faced with a
number of challenges. For instance, you need to:
Launch a CGI program in the middle of an otherwise static page. 
Output a graphical image instead of plain text/html.
Track the hits.
Make the graphical version of the longInt counter value.
Do file I/O that works with Windows NT and Windows 3.x.
After studying the source code on a variety of Web pages, I realized you can
launch CGI programs from within IMG SRC commands. For instance, with standard
HTML 1.0 you can say
<IMGSRC="http://yourprovider.com/cgi-win/yourprogram.exe"> to launch a CGI
program. This lets you build an otherwise static page using a standard HTML
editor, then have the image filled in at run time. (The output of the CGI
program must be an image file--not text--because the browser is expecting an
IMG.)
Outputting a graphical image instead of plain text/html was simply a matter of
changing the "prologue" portion of the output file, to use Content-Type:
image/gif and to send the actual .GIF data in the place of plain text.
(Remember the blank line between prologue and content!)
How do you know where the hits are coming from? The CGI specification provides
the environment variable Referer, which refers to the name of the calling page
(unless the surfer typed in the URL to your CGI program directly, using no
form at all). The only difficulty with tracking hits by Referer is that there
can be many ways to address the same page. My server, for example, is named
both "super.sonic.net" and "www2.sonic.net." My Web site is running at port
80, so a Referer could say "super.sonic.net:80/." Also, there are
capitalization differences, and it's not necessary to enter the home-page name
of index.html in order to land on that page (for example, super.sonic.net/ann/
is equivalent to super.sonic.net/ann/index.html). To deal with these issues, I
decided to strip off the server name, append index.html if the Referer ends in
"/", and change everything to lower case.
With Delphi, making the graphical version of the longInt counter value is
relatively straightforward once you know the rules. Delphi provides a method,
TextOut, which can "draw" text into a bitmap at any coordinate. It cannot
"draw" a numeric value, so we convert the longInt count to a string. See
Listing One (beginning on page 30) to see exactly how a counter bitmap is
created.
Since Delphi doesn't have complete support for .GIF files, I converted .BMP
data to .GIF format using Dan Dumbrill's BmpToGifStream shareware program
(available at my site), which does graphical-file-format conversion, in
memory, within Delphi. Once the .GIF image is in memory, it is fairly easy to
append that data to the prologue in the output file.
Finding a way to do file I/O that works with Windows NT as well as Windows 3.x
was the most time-consuming portion of building this application. Under
Windows NT (not Windows 3.1), it is necessary to open the stdout file via
stdout =TFileStream.create( stdoutname, fmOpenWrite );. (Under Windows 3.1, I
used the fmCreate flag instead of fmOpenWrite.)
The complete source code to the traffic-counter application is available
electronically from DDJ (see "Availability," page 3) and at my home page
(http://super.sonic.net/ann/delphi/cgicomp/code.html).
Figure 1: Page with a sophisticated traffic counter.
Figure 2: Low-volume traffic counter.
Example 1: Standard prologue and HTML code
HTTP/1.0 200 OK
SERVER: _server name here_____LINEEND____
DATE: _date/time in GMT format_____LINEEND____
CONTENT-TYPE: text/html
<HTML><HEAD><TITLE>A Sample
Page</TITLE></HEAD><BODY>
<h1>Welcome</h1>
<hr><address>Goodbye</address>

</BODY></HTML>

Listing One
unit Trafform;
{ TRAFFIC.EXE : Track web traffic counts on a page-by-page basis. This is a 
16-bit application that will run under Windows 3.1x, Win 95 and Windows NT. 
It has been tested extensively under Windows NT AS 3.5 and 3.51.
The program requires three components which can be downloaded at 
http://super.sonic.net/ann/delphi/cgicomp/detail.html
 
TCGIEnvData (free, written by Ann Lynnworth )
TCGIDB ($39 shareware from Ann )
BMPGIF.dcu ($15, shareware written by Dan Dumbrill )
Usage: <img src="/cgi-win/traffic.exe">
See cgicomp/index.html for sample usage.
traffic.exe requires one DLL on the PATH of the server, to make the .gif 
image: BIVBX11.DLL which ships with Delphi. It requires IDAPI installed 
on the server (on the Delphi CD). It also requires a BDE Alias named 
WebTrafficCounter, pointing to a directory which contains hit.db and hit.px. 
This Paradox table (TableHit object) should be included in the .zip file you 
found this source code in!
This program is Copyright c 1995 Ann Lynnworth. Permission is hereby granted 
for any registered user of TCGIDB and BMPGIF to freely copy and/or modify 
this program provided that these original credits are kept intact. 
Suggestions should be mailed to ann@sonic.net -- thank you.
}
interface
uses
 SysUtils, WinTypes, WinProcs, Messages, Classes, Graphics, Controls, 
 Forms, StdCtrls, ExtCtrls, Cgidb, Cgi, DB, DBTables,
 gifbmp; {written by Dan Dumbrill}
 { use TDebugControl from TPack if you want to trace the code }
const
 wm_Traffic = wm_User;
type
 TForm1 = class(TForm)
 DataSource1: TDataSource;
 TableHit: TTable;
 CGIEnvData1: TCGIEnvData;
 CGIDB: TCGIDB;
 Image: TImage;
 procedure FormCreate(Sender: TObject);
 function makegif : TMemoryStream;
 function getCount : string;
 private
 { Private declarations }
 procedure wmTraffic(var Msg: TMessage);
message wm_Traffic;
 public
 { Public declarations }
 end;
var
 Form1: TForm1;
implementation
{$R *.DFM}
procedure TForm1.FormCreate(Sender: TObject);
begin
 with CGIEnvData1 do
 begin

 websiteINIfilename := paramstr(1);
 application.onException := cgiErrorHandler;
 application.processMessages;
 end;
 PostMessage(Handle, wm_Traffic, 0, 0); { this takes us to wmTraffic below }
 { postMessage of a custom message only works if
 application.run; is left in the .dpr file
}
end;
function getFileSize( filename : string ) : longint;
var
 tmpFile : file of byte;
begin
 try
 assignfile( tmpFile, filename );
 except
 raise exception.create( 'error assigning FILE ' + filename );
 end;
 { these 2 lines might not be necessary }
 filemode := 0;
 reset( tmpFile );
 result := filesize( tmpFile );
 closeFile( tmpFile );
end;
procedure TForm1.wmTraffic(var Msg: TMessage);
var
 gifFile : integer;
 stdoutname : string;
 stdout : TFileStream;
 gifFilename : string;
 bufSize : word;
 buf : pchar;
 gifbuf : TMemoryStream; { this is created & filled in makegif }
 count : longint;
begin
 buf := nil;
 stdoutname := CGIEnvData1.SystemOutputFile^;
 { This is where we actually create the .gif image based on the count... }
 gifBuf := makegif; { makegif looks for CGIReferer, which could be 
 cginotfound and cause writing to stdout! Therefore 
 this call must be before any other use of stdout. }
 if gifBuf = nil then
 begin
 CGIEnvData1.sendNoOp;
 closeApp( application );
 end;
{ A Note about File I/O techniques ...
 * The fileOpen, fileWrite series of commands worked ok under Win HTTPD, but 
 not WebSite with Win NT.
 * TFileStream.open with fmOpen parameter worked in the same situations.
 * Only TFileStream.open with fmCreate OR fmOpen worked with WebSite under NT.
 * If anyone would like to tell me *why*, please do. I'm just glad
 I got it working. -ann
}
 try
 { adding fmCreate seems to have done the trick for NT }
 stdout := TFileStream.create( stdoutname, fmCreate OR fmOpenWrite );
 except
 raise exception.create( 'failed to open stdout: ' + stdoutname );

 exit;
 end;
 try
 getmem( buf, bufsize ); { buf is used to hold header below }
 strpcopy( buf, 'HTTP/1.0 200 OK' + #13#10 + 'Server: ' +
 CGIEnvData1.CGIServerSoftware^ + #13#10 +
 'Date: ' + CGIEnvData1.webDate(now ) + #13#10 +
 'Expires: ' + CGIEnvData1.webDate( now + 
 (1/(24*120)) ) + #13#10 + {in 30 seconds}
 'Content-type: image/gif' + #13#10 +
 #13#10 ); { blank line after prologue }
 try
 { send header info defined above }
 stdout.write( buf[0], strlen(buf) ); { from CWG.HLP }
 except
 freemem( buf, bufsize );
 raise exception.create( 'write of buf failed' );
 end;
 gifBuf.saveToStream( stdout );
 finally
 gifbuf.free;
 if buf <> nil then
 freeMem( buf, bufsize );
 stdout.free;
 end;
 application.processMessages;
 closeApp( application ); { see cgihelp.hlp file }
end;
{ getCount figures out the count and returns it as a string }
function TForm1.getCount : string;
var
 n : double;
 refer : string;
 x : byte;
const
 URLFld = 0;
 countFld = 1;
begin
 result := '???'; { hopefully we'll have something better to say shortly }
 refer := CGIEnvData1.CGIReferer^; { get URL of page that launched us }
 if refer = cginotfound then
 begin
 result := 'N/A';
 CGIEnvData1.closeStdout; { don't want error message keeping file open! }
 exit;
 end;
 refer := lowercase( refer );
 x := pos( '//', refer );
 if x > 0 then
 begin
 { strip off http://super.sonic.net portion of referer }
 refer := copy( refer, x + 2, 60 );
 x := pos( '/', refer );
 if x > 0 then
 refer := copy( refer, x, 60 );
 end;
 { if URL ends in /, append index.html as document name }
 if refer[ length(refer) ] = '/' then
 refer := refer + 'index.html';

 with tableHit do
 begin
 open;
 edit;
 if NOT findKey( [ refer ] ) then
 begin
 insert;
 fields[ URLFld ].asString := refer; { primary key field }
 fields[ CountFld ].asFloat := 2.0; { # for next surfer }
 n := 1;
 end
 else
 begin
 edit;
 n := fields[ CountFld ].asFloat;
 fields[ CountFld ].asFloat := n + 1;
 end;
 post;
 close;
 end;
 result := floatToStr( n );
end;
{ this function generates a .bmp first, and then converts that to a .gif }
function TForm1.makegif : TMemoryStream;
var
 pict : TPicture;
 Bitmap: TBitmap;
 theGifImage : TMemoryStream;
begin
 theGIFImage := nil;
 try
 pict := TPicture.create;
 image.picture := pict; { image is a TImage on the form }
 bitmap := TBitmap.create;
 bitmap.height := 20;
 bitmap.width := 80;
 bitmap.monochrome := true; { added to fix >256 color problem on 
 Ann's server }
 image.picture.bitmap := bitmap;
 { here's the magic -- use textOut to create a bitmap with count value !! }
 image.picture.bitmap.canvas.textout( 2, 2, getCount );
 theGIFImage := TMemoryStream.create;
 if BMPToGifStream( image.picture.bitmap, theGIFImage ) <> CVROK then begin
 { error during conversion bmp to gif }
 theGifImage.free;
 theGifImage := nil;
 end;
 finally
 bitmap.free;
 pict.free;
 end;
 result := theGIFImage;
end;
End Listing





































































Client/Server Development and the World Wide Web


Writing interactive programs for the Web




Jim Lawless


Jim is a lead programmer/analyst for a financial institution, and specializes
in Windows software development. He teaches C and Visual Basic, and can be
contacted at 74217.531@compuserve.com.


The World Wide Web is a collection of documents and programs spread across a
multitude of computers connected to the Internet. Web documents on one
computer can cross-reference Web documents on any other computer on the Web.
These documents are constructed in HyperText Markup Language (HTML). Embedded
in each document are special indicators called "tags" that direct the Web
browser or server to perform a special action. In addition to documents, HTML
contains tags that allow the browser to gather data on formatted, GUI input
screens ("forms").
The location of each document or program is specified by a Uniform Resource
Locator (URL), a construct that determines the server on which the documents
reside. To include a link to another document in a Web page, you simply build
the appropriate URL reference coupled with the appropriate HTML anchor tags in
the text.
A URL reference to a program causes the server to activate the program and
route the console output to the browser client. The output must be specially
formatted to include the Multipurpose Internet Mail Extensions (MIME) type. If
your program is sending an HTML document to the client, the first line that
the client usually sees consists of Content-Type: text/html, followed by a
blank line to indicate that no further MIME information will follow. The
Content-Type description is enough to indicate that an HTML script will follow
in the data stream.


Designing a Game for the Web


To add some fun to my Web home page, I recently wrote a trivia game that's
typical of interactive client/server Web applications. You can find it at
http://www.gonix.com/cjbr/wtriv.html. I wanted to keep the game simple.
Consequently, it's designed to do the following: 
1. Ask the user a question.
2. Get the answer.
3. If the user wants to quit, stop the game.
4. Issue a message to the user indicating whether or not an answer is correct.
5. If the user answers all questions, stop the game.
6. Otherwise, repeat the process.
What could be simpler, right? While this concept is easily implemented in
familiar programming environments, the Web introduces a new dimension. On the
Web, the game actually runs on an external machine. The local computer simply
runs a Web browser that provides a graphical interface to the game server.
Due to the client/server nature of Web interaction, each of these steps would
need to be broken down into a completely autonomous process, which would come
to life on the server and terminate immediately after performing its specific
function.
As I worked on my page, the roles of the client browser program and the server
program became more evident. The server delivers HTML files and graphics files
to the browser, which displays them. When the user activates a URL anchor, the
browser requests a connection to another document (possibly on another
server). The browser is doing a lot of work and does not try to contact the
server until the user requests a document change or similar action.
Between each of the steps outlined in the rough pseudocode, the browser
temporarily stops talking to the server. Each time it contacts the server to
invoke the program, the program must be able to invoke any of the outlined
processes separately. Thus, I would have to pass information to the server
each time the trivia program was to be invoked so that the appropriate
function would be executed. The server program would function using
finite-state machine logic, controlled by state variables received from the
client.


Building Input and Response Forms


To understand exactly what states my game would be required to handle, I first
composed my game's input and response forms. The first form asks the client a
question; see Figure 1. The second form indicates the correctness of the
user's choice (Figure 2). The final form smoothly navigates the user back to
the Web page after either the questions have been exhausted or the user
chooses to exit the game; see Figure 3.
Figure 1, the question form, contains the following HTML elements:
1. A heading and title. This is created by enclosing text within the <HEAD>
<TITLE> and </TITLE></HEAD> tags.
2. A secondary heading in "strong" heading type number two. The text is
enclosed within the <H2><STRONG> and </STRONG></H2> tags.
3. A form definition beginning with the <FORM ACTION="/cgi-bin/webtriv"> tag.
This indicates that the form is to run the program "webtriv" from a special
area on the server.
4. Text for the question and all three choices.
5. Three radio buttons to select an answer.
6. One radio button to indicate that the user is done playing.
7. A push button to trigger the form's action.
The ACTION portion of the form tag refers to a Common Gateway Interface (CGI)
script--a special file on the server that is either a binary executable
application or a script that can be interpreted by language interpreters such
as Perl. The server must be able to distinguish a CGI script from an ordinary
document. Otherwise, the server would simply transmit the contents of the CGI
script file to the client.
When the server recognizes that the CGI script is an executable file, it
invokes the script and routes the output to the client. Using this mechanism,
a program can dynamically construct HTML scripts and return them to the
client.


Writing the Game


Many options exist for developing CGI scripts. Although many CGI programs are
constructed in Perl, I chose C (K&R style) because it is the language I'm most
familiar with, and I wanted to ensure that my game program was extremely
efficient.
As Listing One illustrates, I saved time by storing my questions as an array
internal to the program. To create a second trivia game, you would have to
change the array elements and the control variable _maxq that defines the
maximum number of questions.
I needed to generate a random number to determine the initial question. Rather
than muck about with the random-number functions in the standard library, I
simply called the ctime() function to generate a time/date string and used the
number of seconds as a pseudorandom number.

Establishing a random question order was easy. The only difficulty was that my
trivia game would have to know which questions had been asked so that it did
not repeat them.
My first thought was to create a shared file on the server that would house
information for each client to access; however, a client could quit at any
time by exiting their browser without the server knowing. Using this
technique, the shared file could grow enormously and would require regular
manual cleaning by a special program to prune old information. The ideal
methodology would be to keep the game-context information on the client's
side.
After poring over my HTML reference material, I discovered that HTML forms
support a special kind of element called a "hidden field." The purpose of this
element type is to enable data to be stored on the client browser. I would
simply have to encode my game-context information into a text format that
could be stored in hidden-field elements.
I used a simple bitmap to indicate the questions (by ordinal value) that had
already been asked. I encoded this bitmap as a series of hexadecimal digits.
Tools such as UUENCODE/UUDECODE use a larger alphabet to encode binary data
(base-64), but if I'd used this, future revisions to the HTML specification
might possibly invalidate characters that I would use to denote data. Simple
hexadecimal notation would suffice. The function encode_ flags() uses
sprintf() to encode a fixed-length series of bytes into a hexadecimal string.
The complementary function decode_flags() uses sscanf() to translate the
string of hexadecimal digits into a set of binary data. 
In addition to the bitmap of questions asked, I needed to track the number of
questions asked and the number answered correctly. Again, I chose to use
hidden HTML elements for this task. The main() function of the WEBTRIV.C file
serves as a finite-state function processor.
The first task that WEBTRIV performs is to analyze the current game context.
The game context is delivered to the program during each invocation via the
environment variable QUERY_STRING. The values for each form element are
encoded into the single string, using the ampersand character to separate each
element's string-data value. The function parse_query_string() utilizes the
strtok() function to separate the input data and appropriately sets a series
of global variables with the information.
During the initial invocation of the game program, the QUERY_STRING
environment variable is empty. This simply causes all internal data items to
be left in their default empty state. The main() function then checks to see
if the global flag _mode has the value of an uppercase letter "A." If so, the
user has answered the question, and the WEBTRIV needs to assess the validity
of the answer and issue a form to the user via the function process_answer().
If the _mode flag was not an uppercase A, WEBTRIV checks to see if the global
variable _answer has a value of 100. I used this magic number to indicate that
users selected the No option on the form that indicates the correctness of
their answer. Choosing No indicates that they no longer wish to continue
playing. In this case, WEBTRIV invokes the no_more() function to terminate the
game.
If processing has not yet terminated, WEBTRIV issues a form asking a question.
It then terminates. When the user transmits an answer, the whole process
occurs again.
The function ask_question() gets a pseudorandom question number that has not
already been used. The function new_num() handles the chore of coming up with
a unique pseudorandom number.
Before attempting to derive a new number, ask_question() determines if all
questions have been answered. If so, it displays the final form via the
function no_more() and supplies a means of navigating back to a known Web
page.
If the pseudorandom number has not yet been used, ask_question() sets the
appropriate bit in the bitmap and encodes the bitmap into a string of
hexadecimal digits. The function ask_question() then builds a form based on
the nature of the random question and sends it to the user. The function
process_answer() processes the user's input and dynamically builds an HTML
form indicating whether the user was correct. It then transmits this form to
the user.
If the user requests to terminate the game, process_answer() calls the
function no_more() to terminate the game and return to the regular Web page.


Conclusion


Client browser programs will play an increasingly important role in future
Web-oriented programs. Although I'd like to create my own programming-language
translator that would abstract the finite-state nature of programming tasks
similar to those mentioned, Web programming tools have already evolved that
are superior to my unrealized utility.
Programming systems such as Java (see "Java and Internet Programming," by
Arthur van Hoff, DDJ, August 1995 and "Net Gets a Java Buzz," by Ray Valds,
Dr. Dobb's Developer Update, August 1995) will allow the client to perform the
bulk of the processing chores while the server distributes programs and data.
This is the foundation upon which more complex programs will be created.
As you have seen, even simple tasks from familiar environments can be a little
more difficult to implement in a client/ server environment. However, after
getting your feet wet, I'm certain you'll find that the effort isn't terribly
painful.
Figure 1: Input form.
Figure 2: Form that indicates the correctness of user's input.
Figure 3: Form that takes the user back to the Web page at end of the game.

Listing One
/* WEBTRIV.C
 * by Jim Lawless
 * cjbr@gonix.com
 * http://www.gonix.com/cjbr/wtriv.html
 * A simple trivia game for a World Wide Web page.
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <ctype.h>
/* Information structure for array of questions */
struct info_t {
 /* Question */
 char *quest;
 /* Three possible answer strings */
 char *ans[3];
 /* Which answer is correct (1-based) */
 int answer;
} ;
/* A populated question array */
struct info_t _info[]={
 /* 1 */
 {"Who played Spock in Star Trek?",
 "Leonard Nimoy",
 "Lenny Bruce",
 "Adam West",
 1},
 /* 2 */
 {"Who invented the Forth language?",
 "Niklaus Wirth",
 "Bjarne Stroutstrup",
 "Charles Moore",
 3},

 /* 3 */
 {"Who is buried in Grant's tomb?",
 "Lincoln",
 "Grant",
 "Washington",
 2},
} ;
/* Maximum number of questions */
int _maxq=3;
/* This array contains the bitmap information for questions asked. */
unsigned short _flags[5]; /* 80 bits */
/* Work string for bitmap functions */
char _flag_str[21];
/* The following globals will be populated with data from an HTML form. */
char _mode;
int _answer;
int _qnum;
int _num_asked;
int _num_correct;
/* Constants for bit-manipulation */
unsigned short pow[]={32768,16384,8192,4096,2048,1024,
 512,256,128,64,32,16,8,4,2,1};
/* Environment variable used to retrieve form info */
char *_qs="QUERY_STRING";
/* Get the seconds value from the ctime() function */
short getsec()
{
 time_t t;
 char time_str[3];
 char *time_wrk;
 time(&t);
 time_wrk=ctime(&t);
 time_str[0]=time_wrk[17];
 time_str[1]=time_wrk[18];
 time_str[2]=0;
 return(atoi(time_str));
}
/* Encode _flags into _flag_str */
void encode_flags()
{
 sprintf(_flag_str,"%04x%04x%04x%04x%04x",
 _flags[0],_flags[1],_flags[2],
 _flags[3],_flags[4]);
}
/* Decode _flag_str into _flags */
void decode_flags()
{
 sscanf(_flag_str,"%04x%04x%04x%04x%04x",
 _flags,_flags+1,_flags+2,
 _flags+3,_flags+4);
}
/* Set a bit in _flags */
void bit_set( bitnum )
unsigned short bitnum;
{
 unsigned short arr,offs;
 arr=bitnum/16;
 offs=pow[bitnum%16];
 _flags[arr] = offs;

}
/* Read a bit from _flags */
unsigned short bit_get( bitnum )
unsigned short bitnum;
{
 unsigned short arr,offs;
 arr=bitnum/16;
 offs=pow[bitnum%16];
 return( ( _flags[arr] & offs) == offs );
}
/* Get a pseudo-random number based on the value of getsec(). If the number
has
 * already been used, search for the next open spot in the bitmap, wrapping 
 * around to the beginning of the array as necessary.
 */
short new_num()
{
 short num;
 num=getsec()%_maxq;
 while( bit_get( num )) {
 num=(num+1)%_maxq;
 }
 return(num);
}
/* Parse the QUERY_STRING environment variable based on the ampersand 
 * character. Fill global variables based on data from the input form.
 */
void parse_query_string(s)
char *s;
{
 char *p;
 p=strtok(s,"&");
 while(p!=NULL) {
 if(!memcmp(p,"Data=",5)) {
 strcpy(_flag_str,p+5);
 decode_flags();
 }
 else
 if(!memcmp(p,"Mode=",5)) {
 _mode=*(p+5);
 }
 else
 if(!memcmp(p,"Answer=",7)) {
 _answer=atoi(p+7);
 }
 if(!memcmp(p,"QNum=",5)) {
 _qnum=atoi((p+5));
 }
 if(!memcmp(p,"NAsk=",5)) {
 _num_asked=atoi((p+5));
 }
 if(!memcmp(p,"NCor=",5)) {
 _num_correct=atoi((p+5));
 }
 p=strtok(NULL,"&");
 }
}
/* Issue a form that will resume a known Web page. */
void no_more()
{

 /* Send HTML MIME-type */
 printf("Content-TYPE: text/html\n\n");
 /* HTML document header */
 printf("<HTML><HEAD><TITLE>Web Trivia</TITLE></HEAD><BODY>\n");
 printf("<H2><STRONG>Web Trivia</STRONG></H2>\n");
 printf("<H2>");
 printf("Thanks for playing!!!</H2>");
 if(_num_asked>=_maxq) {
 printf("<P>We're all out of questions!");
 }
 printf("<P>You answered %d of %d questions correctly.",_num_correct,
 _num_asked);
 /* Create an anchor to get back to the first page */
 printf("<P><A HREF=\"http://www.gonix.com/cjbr/wtriv.html\">");
 printf("Go back to where you started...</A>");
 printf("</BODY></HTML>\n");
}
/* Ask a pseudo-random question */
void ask_question()
{
 unsigned short i;
 _num_asked++;
 if(_num_asked>_maxq) {
 _num_asked--;
 no_more();
 return;
 }
 i=new_num();
 bit_set( i );
 encode_flags();
 /* Send HTML MIME-type */
 printf("Content-TYPE: text/html\n\n");
 /* HTML document header */
 printf("<HTML><HEAD><TITLE>Web Trivia</TITLE></HEAD><BODY>\n");
 printf("<H2><STRONG>Web Trivia</STRONG></H2>\n");
 /* Form definition */
 printf("<FORM ACTION=\"/cgi-bin/webtriv\">\n");
 /* Ask question*/
 printf("<P>%s\n",_info[i].quest);
 /* Provide radio-buttons as choices */
 printf("<P><INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"1\">\n");
 printf("%s\n",_info[i].ans[0]);
 printf("<P><INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"2\">\n");
 printf("%s\n",_info[i].ans[1]);
 printf("<P><INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"3\">\n");
 printf("%s ",_info[i].ans[2]);
 printf(
 "<P><INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"4\"><B>Quit </B>\n");
 printf("<INPUT NAME=\"Go\" TYPE=\"submit\" VALUE=\"Send\">\n");
 printf("<P><INPUT NAME=\"Data\" TYPE=\"hidden\" VALUE=\"%s\">\n",_flag_str);
 printf("<INPUT NAME=\"Mode\" TYPE=\"hidden\" VALUE=\"A\">\n");
 printf("<INPUT NAME=\"QNum\" TYPE=\"hidden\" VALUE=\"%d\">\n",i);
 printf("<INPUT NAME=\"NAsk\" TYPE=\"hidden\" VALUE=\"%d\">\n",_num_asked);
 printf("<INPUT NAME=\"NCor\" TYPE=\"hidden\" VALUE=\"%d\">\n",_num_correct);
 /* Closing FORM and HTML tags */
 printf("</FORM></BODY></HTML>\n");
}
/* Process the response from an "ask_question" form. */
void process_answer()

{
 /* Did the user select Quit? */
 if(_answer==4) {
 no_more();
 return;
 }
 /* Send HTML MIME-type */
 printf("Content-TYPE: text/html\n\n");
 /* HTML document header */
 printf("<HTML><HEAD><TITLE>Web Trivia</TITLE></HEAD><BODY>\n");
 printf("<H2><STRONG>Web Trivia</STRONG></H2>\n");
 printf("<H2>");
 if(_answer==_info[_qnum].answer) {
 printf("<P>You are correct!</H2>");
 _num_correct++;
 }
 else {
 printf("<P>Wrong!</H2>\n");
 printf("<P>The correct answer was %d <B>%s</B>",_info[_qnum].answer,
 _info[_qnum].ans[_info[_qnum].answer-1]);
 }
 printf("<P>You have answered %d of %d questions correctly.",_num_correct,
 _num_asked);
 /* Form definition */
 printf("<FORM ACTION=\"/cgi-bin/webtriv\">\n");
 printf("<P>Play again?");
 printf("<P><INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"0\">\n");
 printf("Yes");
 /* Magic value 100 used here to indicate "NO" */
 printf("<INPUT NAME=\"Answer\" TYPE=\"radio\" VALUE=\"100\">\n");
 printf("No");
 printf("<P><INPUT NAME=\"Go\" TYPE=\"submit\" VALUE=\"Send\">\n");
 printf("<P><INPUT NAME=\"Data\" TYPE=\"hidden\" VALUE=\"%s\">\n",_flag_str);
 printf("<INPUT NAME=\"Mode\" TYPE=\"hidden\" VALUE=\"R\">\n");
 printf("<INPUT NAME=\"QNum\" TYPE=\"hidden\" VALUE=\"%d\">\n",_qnum);
 printf("<INPUT NAME=\"NAsk\" TYPE=\"hidden\" VALUE=\"%d\">\n",_num_asked);
 printf("<INPUT NAME=\"NCor\" TYPE=\"hidden\" VALUE=\"%d\">\n",_num_correct);
 /* Closing FORM and HTML tags */
 printf("</FORM></BODY></HTML>\n");
}
/* Main flow */
int main()
{
 char *p;
 p=getenv(_qs);
 if(p!=NULL) {
 parse_query_string(p);
 }
 /* Mode will contain A only if an answer is present. */
 if(_mode=='A')
 process_answer();
 else {
 /* Check to see if the user selected "No"
 * when asked if they want to play again.
 */
 if( _answer==100) {
 no_more();
 }
 else {

 /* ask a question */
 ask_question();
 }
 }
 return(0);
}
End Listing
























































CGI and AppleScript


The Macintosh as an Internet platform




Cal Simone


Cal is founder of Main Event Software, publisher of the Scripter development
tool. He can be reached at mainevent@his.com.


One of the best-kept secrets about the World Wide Web is the extent to which
the Macintosh is used as a development platform. A survey of 13,000 Web users
conducted earlier this year by the Graphics, Visualization, and Usability
(GVU) Center at Georgia Tech University found that the MacHTTP server software
is the second-most popular server package. MacHTTP has 20.8 percent of
installations, trailing NCSA (38.6 percent), but ahead of CERN (18.5 percent),
and significantly ahead of most other servers. (The study was conducted in
April and May of this year, and focused on a number of topics, not just
servers.) 
In this article, I begin by discussing the principal aspects of the Macintosh
as an Internet platform, and then describe how you can use the AppleScript
language for writing CGI applications that run on Macintosh servers, and end
with a quick look at alternatives to AppleScript.


Macintosh Servers


The current situation is highly subject to change, of course; competition
among desktop server packages has escalated several notches recently, with the
release of a slew of Windows-based server products over the summer.
Nevertheless, it is worth reminding those who discount the Mac as an Internet
platform that MacHTTP was the third Web server ever written, after NCSA and
CERN (both of which run on UNIX only).
MacHTTP is shareware and available at a number of ftp repositories on the net.
Recently, Chuck Shotton, the author of MacHTTP, revamped the program and
turned it into a commercial product called "WebStar,'' marketed by StarNine
Technologies (Berkeley, CA). The author claims that WebStar is about four
times faster than MacHTTP, and can handle 500,000 hits per day on a PowerMac
7100. Two sites at Apple currently running WebStar are www.apple.com and
quicktime.apple.com. According to Shotton, MacHTTP and WebStar together
comprise 66 percent of the "commercial Web-server market."
Whether you use MacHTTP or Webstar, you can create CGI-style applications that
don't rely on the cumbersome UNIX-style CGI interface, as defined by the NCSA
and CERN servers. The advantage of traditional CGI is that it is supported in
many servers on many platforms, not just NCSA and CERN. One disadvantage, at
least on the Mac platform, is that it is not as effective a mechanism for
interapplication communication (IAC) as using native facilities--namely, Apple
Events.
Apple incorporated Apple Events into System 7 as an efficient and flexible
means for applications to communicate with each other. MacHTTP and WebStar use
Apple Events for a CGI-style interface between server and back-end processing.
WebStar also uses Apple Events to implement remote-administration facilities,
which allow the site administrator to monitor multiple server machines
simultaneously from a single workstation: examining outputs, memory usage,
connection status, and so on. The Apple Event-based CGI facility lets you
write CGI applications using any scripting language that is compliant with
Apple's Open Scripting Architecture (OSA). OSA defines a standard on the
Macintosh that lets different languages access the same system-level
facilities. This includes Apple's own AppleScript language, as well as
languages from third parties such as UserTalk, or simple macro packages such
as QuicKeys.
Of course, you can still write CGI applications using the traditional CGI
facility. With MacPerl, the Macintosh implementation of the Perl scripting
language, it is possible to port scripts originally written for UNIX servers,
as long as they do not rely on UNIX-specific operating system calls or call
UNIX-specific utilities. But in many cases, you are better off rewriting the
program using an OSA-compliant language. The increased flexibility and
performance make it worth your while.


About AppleScript


The AppleScript language in many ways resembles HyperTalk, the language used
in Hypercard, except that it is not designed just for user-interface-intensive
single applications. Instead, it is an object-oriented language that
integrates multiple applications and interacts with system-level facilities in
the Macintosh operating system. The AppleScript package from Apple consists of
a language, a system-software extension, a simple scripting editor, and
language additions. AppleScript is now a standard component of System 7.5, and
therefore available to all users.
Using AppleScript, you can design work-flow applications that govern data flow
from one program to another. If a program has been designed to be scriptable,
AppleScript (and other OSA languages) has access, at a fine-grain level, to a
range of object types within the application. For example, in the QuarkXPress
page-layout program your scripts have access to specific paragraphs or
graphics; in the FileMaker Pro database program, you can manipulate individual
database records and fields. There are over a hundred more scriptable
applications, including WebStar, Eudora, Netscape Navigator, and the MacOS
Finder itself. As a result, power users and programmers can quickly put
together end-user scenarios, be they simple or complex.
An Apple Event is a message (corresponding to an action) sent between two
applications or between a scripting system and an application. The code in the
target application is known as the "event handler." In a CGI communication
between server and back-end application, the CGI arguments are represented by
parameters to the Apple Event, and are identified by 4-byte IDs (see Table 1).
The CGI application returns data by providing a "reply" to the Apple Event.
Within an AppleScript CGI script, you can process data or instructions entered
into a form. You can initiate a database search, assemble text, and produce
charts or graphics--all driven from user choices. In addition, you can send
e-mail, assemble HTML pages on the fly, and deal with run-time errors.
Integrating multiple applications is accomplished through application-specific
vocabularies housed within scriptable applications. A vocabulary extends
AppleScript to include new terms representing actions and objects specific to
the particular application. Together with AppleScript's built-in terms, you
write scripts by putting together sentences that often resemble grammatically
correct, English-language sentences rather than traditional C code.
A script consists of properties (the data) and handlers (the methods).
AppleScript variables and properties are dynamically typed --their types are
not declared and can change from statement to statement. Properties are
initialized when the script is first run, and are updated each time a script
completes. Example 1(a) shows a script that increments its counter every time
it is run. Handlers (which either respond to an Apple-Event message or are
analogous to subroutines in other languages) use one of two methods of
specifying parameters: positional and keyword. For CGI use, the keyword form
is employed.


An AppleScript CGI


In CGI scripts, the message sent by the Web server is handled by a "raw event
handler," for which there are no terminology equivalents for the verb and
parameter keywords. AppleScript provides a mechanism for specifying these
terms using their 4-byte code values. In a declaration for a keyword handler,
raw event and parameter codes are enclosed in "chevron" or "French quote" <<
>> characters. You don't need to specify all the CGI parameters in your
handler declaration, only the ones you need. Example 1(b) shows the
counter-incrementing script as a CGI program.
Moving to a more interesting example, suppose you maintain a regularly updated
database of information that includes a table of numeric values, and you want
to provide, on demand to your Web users, the table in the form of a chart. For
this example, I'll use three familiar applications, FileMaker Pro (to hold the
data), DeltaGraph Pro (to make a chart of the data), and Clip2Gif (to store
the chart in a GIF file). You can use redirection ("Location" in the reply
header) to point the Web server to the GIF file. The resulting CGI program is
shown in Example 2.
AppleScript provides an interesting mechanism for handling errors known as a
"try block," which resembles the TRY...CATCH construct in C and some other
languages. Your regular application code goes in the first part of the try
block; if an error occurs, the error-handling code in the second part of the
try block is invoked. Example 3 shows a simple error handler that generates an
HTML page by concatenating a bunch of strings.


Other Tools and Alternative Languages


There are a variety of small tools to simplify processing the data passed to
an AppleScript CGI program by MacHTTP. These small tools are AppleScript
scripting additions, colloquially referred to as "osax" in the singular, and
"osaxen" in the plural. An osax extends the AppleScript command set by
providing additional verb and parameter keywords to the vocabulary. Scripting
additions work not just with AppleScript but with any OSA-compliant language. 
Most CGI programs use HTML forms to pass in fields of data using particular
formats. Various osaxen are used to process this data. For example, there is
an osax for decoding URLs, and one that splits up the arguments into
individual chunks (the Tokenize osax). There is also the "parse CGI" osax that
combines many of the commonly used tasks in implementing CGI programs. The
parse CGI osax can be used to decode, parse, and access the HTML form
information that is passed in to an AppleScript CGI. It combines the
functionality of the "DePlus," "Decode URL," and "Tokenize" osaxen to handle
incoming HTML forms and field data.
How do you create, edit, and debug AppleScript code? As part of System 7.5,
Apple provides a very simple Script Editor, offering basic editing functions.
This editor allows you to write and edit text, and turn the text into scripts.
There are also third-party tools, such as Scripter and FaceSpan. 
Scripter, from my company, Main Event Software (Washington, DC), is an
authoring and development environment that includes point-and-click access to
the vocabularies in dictionaries of scriptable applications, an enhanced
scripting editor, and integrated debugger of considerable depth, specifically
designed for AppleScript. ScriptBase, also from Main Event, provides an object
database for system-wide storage of values and objects that may need to be
accessed from scripts.
FaceSpan, from Software Designs Unlimited of (Chapel Hill, NC), is a design
tool for visual-interface-intensive applications, containing UI elements such
as buttons, text boxes, lists, popups, menus, and gauges. FaceSpan
applications can be built using AppleScript or another OSA-compliant language,
such as Usertalk or QuicKeys.
Usertalk is the language used in the Aretha scripting environment from
Userland Software. Aretha (Palo Alto, CA), formerly called "Frontier,"
actually predates AppleScript and was designed with similar goals: to provide
a system-level facility for integrating applications. Usertalk is now
OSA-compliant, so AppleScript and Usertalk code can be used interchangeably in
the same script. Usertalk is considered by some to be more difficult to learn.
It offers a multithreaded environment with an object database that can store
both application data and scripts. The Aretha package used to be a commercial
application but its author, Dave Winer, has recently made it free. The support
is now handled by volunteers.



Conclusion


This whirlwind tour of scripting on the Macintosh platform gives you some idea
of the facilities available for developing internetworked applications. As you
explore further, you'll find that the tools have a flexibility and power that
cannot be emulated on other platforms.


References


Apple Computer. Inside Macintosh: Interapplication Communication. Reading, MA:
Addison-Wesley, 1994. 
--------. AppleScript Language Reference Guide. Reading, MA: Addison-Wesley,
1994. 
Goodman, Danny. Danny Goodman's Complete AppleScript Handbook. New York, NY:
Random House, 1994.
Two internet mailing lists, available via listproc@abs.apple.com, are
applescript-users (for writers of scripts), and applescript-implementors (for
developers of scriptable apps).
Table 1: The parameters for the Apple Event WWWdoc, a custom event sent by
the Web server.
Code Suggested name Description 
---- path_args Data in URL following "$"
kfor http_search_args Data in URL passed using GET method, or
 following "?"
post post_args Data in URL passed using POST method
meth method Which argument to use: GET (search args)
 or POST (post args)
addr client_address The client's IP address (or DNS)
user username The client's user name (if using security)
pass password The client's password (if using security)
frmu from_user Additional user information (often an
 e-mail address)
svnm server_name The server's name (MacHTTP or WebSTAR)
svpt server_port The server's port
scnm script_name The URL sent to the server
refr referer The URL the client was viewing before
 the current one
Agnt user_agent The client software's name
ctyp content_type MIME content type of post args
Kact action Method of calling CGI: PREPROCESSOR,
 POSTPROCESSOR, CGI, or ACGI
Kapt action_path If present, path to file from disk root
 (used for ACTIONS), defaults to
 script_name
Kcip client_ip The client's IP address (present even if
 NO_DNS is false)
Kfrq full_request Full text of client's request
Example 1: (a) A simple script that increments a counter; (b) the equivalent
script as a CGI program.
(a)
property counter : 0
on run
 -- The run handler is executed each time
 -- a script is run or launched.
 set counter to counter + 1
end run

(b)
property counter : 0
-- The run handler is executed when the CGI is
-- launched. But if the CGI is already running when
-- the special Apple event fires, the run handler isn't run.
on run 
 -- This handler is optional in a CGI
 -- If you have any initialization code, it goes here

end run
-- This is the CGI Handler receiving the arguments from the Web server.
-- This handler has the same form as any other handler.
-- In this example we only handle five of the 18 CGI parameters,
-- the direct parameter plus four others.
on <<event WWWdoc>> path_args 
 given <<class post>>:post_args, <<class meth>>:method, <<class
addr>>:client_address
 -- CGI body goes here
 set counter to counter + 1
end <<event WWWdoc>>
Example 2: A CGI program that generates a chart on the fly from a database.
property crlf : (ASCII character 13) & (ASCII character 10)
property reply_header : "HTTP/1.0 302 FOUND" & crlf & 
 "Server: WebSTAR/1.0 ID/ACGI" & crlf & 
 "Location: http://www.your.site/home.html" & crlf & 
 "URI: http://www.your.site/home.html" & crlf & 
 crlf
-- This is the CGI Handler
on <<event WWWdoc>> path_args 
 given <<class post>>:post_args, 
 <<class meth>>:method, 
 <<class addr>>:client_address
 -- Interpret CGI arguments. Here you would obtain the
 -- user's choices from the arguments.
 -- After this, the next step is to retrieve data from database
 tell application "FileMaker Pro"
 Open alias "Macintosh HD:Databases folder:Web DataBase"
 set numRecs to Count of Record in Document 1
 -- Make tab-delimited data for Deltagraph
 set dataString to "Month" & tab & "Amount" & return
 repeat with i from 1 to numRecs
 copy dataString & Cell "month" of Record i 
 & tab & Cell "amount" of Record i 
 & return to dataString
 end repeat
 end tell
 -- Create the desired chart
 tell application "Deltagraph Pro"
 Data dataString
 Plot Options Text Font "Palatino" Text Size 12 
 Colorstyle "blue"
 Set Axis Lengths for X 100 for Y 100
 Output PICT
 set dataChart to Plot chart chartType
 end tell
 -- Make the GIF file for the redirect
 tell application "clip2gif"
 save dataChart as GIF in file "Macintosh HD:home.html"
 end tell
 -- Return the reply data to the server and exit the CGI handler
 return reply_header -- reply and exit
end <<event WWWdoc>>
Example 3: A simple error handler.
property crlf : (ASCII character 13) & (ASCII character 10)
property http_header : "HTTP/1.0 200 OK" & crlf 
 & "Server: WebSTAR/1.0 ID/ACGI" & crlf 
 & "MIME-Version: 1.0" & crlf & "Content-type: text/html" 
 & crlf & crlf
-- The CGI Handler

on <<event WWWdoc>> path_args 
 given <<class post>>:post_args, 
 <<class meth>>:method, 
 <<class addr>>:client_address
 -- Enclose your CGI code in a "try/on error/end try" construct
 try
 -- Put the body of your CGI here.
 on error errNum number errMsg
 -- If there's an error, the error-handling
 -- mechanism will drop you in here:
 -- Create a page of HTML text to return.
 set return_page to http_header 
 & "<html><head><title>Error Page</title></head>" 
 & "<body><h1>Error Encountered!</h1>" & return 
 & "An error was encountered while trying " 
 & "to run this script." 
 & return
 set return_page to return_page 
 & "<H3>Error Message</H3>" & return 
 & errMsg & return 
 & "<H3>Error Number</H3>" & return 
 & errNum & return 
 & "<H3>Date</H3>" & return 
 & (current date) 
 & return
 set return_page to return_page 
 & "<hr>Please notify the webmaster at " 
 & "<a href=\"mailto:webmaster@your.site.com\">" 
 & "mailto:webmaster@your.site.com</a> " 
 & "of this error." 
 & "</body></html>" 
 & return
 -- Return the error page created and exit the handler.
 return return_page
 end try
end <<event WWWdoc>>



























RAMBLINGS IN REAL TIME


One Story, Two Rules, and a BSP Renderer




Michael Abrash


Michael is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be reached at mikeab@idsoftware.com.


As I've noted before, I'm part of the team that's working on Quake, the
follow-up to DOOM. A month or so back, we added page flipping to Quake, and
made the startling discovery that the program ran nearly twice as fast with
page flipping as it did with the alternative method of drawing the whole frame
to system memory, then copying it to the screen. We were delighted by this,
but baffled. I did a few tests and came up with several possible explanations,
including slow writes through the external cache, poor main-memory
performance, and cache misses when copying the frame from system memory to
video memory. Although each of these can indeed affect performance, none
seemed to account for the magnitude of the speedup, so I assumed some hidden
hardware interaction was at work. Anyway, "why" was secondary; what really
mattered was that we had a way to double performance, which meant I had a lot
of work to do to support page flipping as widely as possible.
A few days ago, I was using the Pentium's built-in performance counters to
seek out areas for improvement in Quake, and, for no particular reason,
checked the number of writes performed while copying the frame to the screen
in non-page-flipped mode. The answer was 64,000. That seemed odd, since there
were 64,000 byte-sized pixels to copy, and I was calling memcpy(), which, of
course, copies a dword whenever possible. Maybe the Pentium counters report
the number of bytes written rather than the number of writes performed, I
thought, but fortunately, this time I tested my assumptions by writing an ASM
routine to copy the frame a dword at a time, without the help of memcpy().
This time the Pentium counters reported 16,000 writes.
Oops.
As it turns out, the memcpy() routine in the DOS version of our compiler (gcc)
inexplicably copies memory a byte at a time. With my new routine, the
non-page-flipped approach suddenly became slightly faster than page flipping.
The first relevant rule is pretty obvious: Assume nothing. Measure early and
often. Know what's really going on when your program runs, if you catch my
drift. To do otherwise is to risk looking mighty foolish.
The second rule: When you do look foolish (and trust me, it will happen if you
do challenging work), have a good laugh at yourself, and use it as a reminder
of Rule #1. I hadn't done any extra page-flipping work yet, so I didn't waste
any time due to my faulty assumption that memcpy() performed a maximum-speed
copy, but that was just luck. I should have experimented until I was sure I
knew what was going on before drawing conclusions and acting on them.
In general, make it a point not to fall into a tightly focused rut; stay
loose, think of alternative possibilities and new approaches, and always,
always, always keep asking questions. It'll pay off big in the long run. If I
hadn't indulged my curiosity by running the Pentium counter test on the copy
to the screen--even though there was no specific reason to do so--I would
never have discovered the memcpy() problem. By so doing, I doubled the
performance of the entire program in five minutes, a rare accomplishment,
indeed.
By the way, I have found the Pentium's performance counters to be very useful
in figuring out what my code really does and where the cycles are going. One
useful source of information on the performance counters and other aspects of
the Pentium is Mike Schmit's Pentium Processor Optimization Tools (AP
Professional, 1994, ISBN 0-12-627230-1).
Onward to rendering from a BSP tree.


BSP-Based Rendering


In my previous two columns, I discussed the nature of binary space
partitioning (BSP) trees and presented a compiler for 2-D BSP trees. Now it's
time to use those compiled BSP trees to do real-time rendering.
As you'll recall, the BSP compiler took a list of vertical walls and built a
2-D BSP tree from the walls, as viewed from above. Figure 1 is the result: The
world is split into two pieces by the line of the root wall, and each half of
the world is then split again by the root's children, and so on, until the
world is carved into subspaces along the lines of all the walls.
The objective now is to draw the world so that whenever walls overlap, we see
the nearer wall at each overlapped pixel. The simplest way to do this is with
the painter's algorithm, drawing the walls in back-to-front order, assuming no
polygons interpenetrate or form cycles. BSP trees guarantee that no polygons
interpenetrate (such polygons automatically get split) and make it easy to
walk the polygons in back-to-front (or front-to-back) order.
To render a view of that BSP tree, simply descend the tree, deciding at each
node whether you're seeing the front or back of the wall at that node from the
current viewpoint. You use this knowledge to first recursively descend and
draw the farther subtree of that node, then draw that node, and finally draw
the nearer subtree of that node. Applied recursively from the root of the BSP
tree, this approach guarantees that overlapping polygons will always be drawn
in back-to-front order. Listing One (beginning on page 51) draws a BSP-based
world in this fashion. Because of space constraints, Listing One is only the
core of the BSP renderer, without the program framework, some math routines,
and the polygon rasterizer. The entire program is available both from DDJ (see
"Availability," page 3) and as ddjbsp2.zip from ftp.idsoftware.com/mikeab.
Listing One is in a compressed format, with little whitespace, again due to
space constraints; the full version is formatted normally.
Rendering from a BSP tree really is that simple conceptually, but the
implementation is a bit more complicated. The full rendering pipeline, as
coordinated by UpdateWorld(), is:
1. Update the current location.
2. Transform all wall endpoints into viewspace (the world as seen from the
current location with the current viewing angle).
3. Clip all walls to the view pyramid.
4. Project wall vertices to screen coordinates.
5. Walk the walls back to front, and for each wall that lies at least
partially in the view pyramid, perform backface culling (skip walls facing
away from the viewer), and draw the wall if it's not culled.
Next, let's look at each part of the pipeline more closely. The pipeline is
too complex for me to discuss each part in complete detail; sources for
further reading include Computer Graphics, Second Edition, by James D. Foley
and Andries van Dam (Addison-Wesley Publishing, 1990, ISBN 0-201-12110-7), and
Dr. Dobb's Essential Books on Graphics Programming CD-ROM.


Moving the Viewer


The sample BSP program performs first-person rendering; that is, it renders
the world as seen from your eyes as you move about. The rate of movement is
controlled by key-handling code not shown in Listing One; however, the
variables set by the key-handling code are used in UpdateViewPos() to bring
the current location up to date.
The view position can change not only in x and z (movement around the plane
upon which the walls are set), but also in y (vertically). However, the view
direction is always horizontal; that is, the code in Listing One supports
moving to any 3-D point, but viewing horizontally only. Although the BSP tree
is only 2-D, it is quite possible to support looking up and down to some
extent, particularly if the world data set is restricted so that, for example,
there are never two rooms stacked on top of each other, or any tilted walls.
For simplicity, I have chosen not to implement this in Listing One, but you
may find it educational to add it to the program yourself.


Transformation into Viewspace


The viewing angle (which controls direction of movement as well) can sweep
through the full 360 degrees around the viewpoint, so long as it remains
horizontal. The viewing angle is controlled by the key handler, and is used to
define a unit vector stored in currentorientation that explicitly defines the
view direction (the z-axis of viewspace) and implicitly defines the x-axis of
viewspace, because that axis is at right angles to the z-axis, where x
increases to the right of the viewer.
As discussed in my last column, rotation to a new coordinate system can be
performed by using the dot product to project points onto the axes of the new
coordinate system. TransformVertices() does this, after first translating
(moving) the coordinate system to have its origin at the viewpoint. (It's
necessary to perform the translation first so that the viewing rotation is
around the viewpoint.) This operation can equivalently be viewed as a
matrix-math operation, the more-common way to handle transformations.
At the same time, the points are scaled in x according to PROJECTION_RATIO to
provide the desired field of view. Larger scale values result in narrower
fields of view.
When this is done, the walls are in viewspace, ready to be clipped.



Clipping


In viewspace, the walls may be anywhere relative to the viewpoint: in front,
behind, or off to the side. You want to draw only those parts of walls that
properly belong on the screen--the parts that lie in the view pyramid (view
frustum), as in Figure 2. Unclipped walls (which lie entirely in the frustum)
should be drawn in their entirety, fully clipped walls should not be drawn,
and partially clipped walls must be trimmed before being drawn.
In Listing One, ClipWalls() does this in three steps for each wall, in turn.
First, the z-coordinates of the two ends of the wall are calculated.
(Remember, walls are vertical and their ends go straight up and down, so the
top and bottom of each end have the same x- and z-coordinates.) If both ends
are on the near side of the front clip plane, then the polygon is fully
clipped, and you're done with it. If both ends are on the far side, then the
polygon isn't z-clipped, and you leave it unchanged. If the polygon straddles
the near clip plane, then the wall is trimmed to stop at the near clip plane
by appropriately adjusting the t value of the nearest endpoint; this
calculation is a simple matter of scaling by z, because the near clip plane is
at a constant z distance. (The use of t values for parametric lines was
discussed in the May/June 1995 column.) The process is further simplified
because the walls can be treated as lines viewed from above, so you can
perform 2-D clipping in z; this would not be the case if walls sloped or had
sloping edges.
After clipping in z, you clip by viewspace x-coordinate, to ensure that you
draw only wall portions that lie between the left and right edges of the
screen. Like z-clipping, x-clipping can be done as a 2-D clip, because the
walls and the left and right sides of the frustum are all vertical. You
compare both the start and endpoint of each wall to the left and right sides
of the frustum, and reject, accept, or clip each wall's t values, accordingly.
The test for x clipping is very simple, because the edges of the frustum are
defined as the planes where x==z and -x==z.
The final clip stage is clipping by y-coordinate. This is the most
complicated, because vertical walls can be clipped at an angle in y, so true
3-D clipping of all four wall vertices is involved; see Figure 3. You handle
this in ClipWalls() by detecting trivial rejection in y, using y==z and -y==z
as the y boundaries of the frustum. However, you leave partial clipping to be
handled as a 2-D clipping problem; you are able to do this only because our
earlier z-clip to the near clip plane guarantees that no remaining polygon
point can have z<=0, ensuring that when you project, you'll always pass valid,
y-clippable screenspace vertices to the polygon filler.


Projection to Screenspace


At this point, you have viewspace vertices for each wall that's at least
partially visible. You now project these vertices according to z distance
(that is, perform perspective projection), scale the results to the width of
the screen, and you're ready to draw. Although this step is logically separate
from clipping, it is performed as the last step for visible walls in
ClipWalls().


Walking the Tree, Backface Culling, and Drawing


Now that you have all the walls clipped to the frustum, with vertices
projected into screen coordinates, all you have to do is draw them back to
front. That's the job of DrawWallsBackToFront(), which walks the BSP tree,
descending recursively from each node to draw the farther children of each
node first, then the wall at the node, then the nearer children. In the
interests of efficiency, this particular implementation performs a
data-recursive walk of the tree, rather than the more familiar code recursion.
Interestingly, the performance speedup from data recursion turned out to be
more modest than I expected, based on past experience; see my "Pushing the
Envelope" column in the October 1994 issue of PC Techniques magazine for
further details, as well as an in-depth discussion of both types of BSP tree
walking.
As it comes to each wall, DrawWallsBackToFront() first descends to draw the
farther subtree. Next, if the wall is both visible and pointing toward the
viewer, it is drawn as a solid polygon. The polygon filler (available
electronically) is a modification of the polygon filler I presented in my
"Graphics Programming" column (DDJ, February and March, 1991). A compilation
of the "Graphics Programming" columns is included on the aforementioned Dr.
Dobb's Essential Books on Graphics Programming CD-ROM.
It's worth noting how backface culling and front/back wall-orientation testing
are performed. (Walls are always one-sided, visible only from the front.) I
discussed backface culling in general in the last column, and mentioned two
possible approaches: generating a screenspace normal (perpendicular vector) to
the polygon and seeing which way it points, or taking the world or viewspace
dot product between the vector from the viewpoint to any polygon point and the
polygon's normal and checking the sign. Listing One does both, but because the
BSP tree is 2-D, you can save some work.
Consider this: Walls are stored so that the left end, as viewed from the front
side of the wall, is the start vertex, and the right end is the end vertex. A
wall can be positioned in screenspace in one of two ways: viewed from the
front, in which case the start vertex is to the left of the end vertex; or
viewed from the back, in which case the start vertex is to the right of the
end vertex, as in Figure 4. So you can tell which side of a wall you're
seeing, and thus backface cull, simply by comparing the screenspace
x-coordinates of the start and end vertices, a simple 2-D version of checking
the direction of the screenspace normal.
The wall-orientation test used for walking the BSP tree, performed in
WallFacingViewer(), takes the other approach, and checks the viewspace sign of
the dot product of the wall's normal with a vector from the viewpoint to the
wall. Again, this code takes advantage of the 2-D nature of the tree to
generate the wall normal by swapping x and z and altering signs. You can't use
the quicker screenspace x test here that you used for backface culling,
because not all walls can be projected into screenspace; for example, trying
to project a wall at z==0 would result in division by zero.
All the visible, front-facing walls are drawn into a buffer by
DrawWallsBackToFront(); then, UpdateWorld() calls Win32 to copy the new frame
to the screen, and the frame of animation is complete.


Notes on the BSP Renderer


Listing One is far from complete or optimal. There is no such thing as a tiny
BSP rendering demo, because 3-D rendering, even when based on a 2-D BSP tree,
requires a substantial amount of code and complexity. Listing One is
reasonably close to a minimum rendering engine, and is specifically intended
to illuminate basic BSP principles, given the space limitations of this
column. Think of Listing One as a learning tool and a starting point.
The most obvious lack in Listing One is that there is no support for floors
and ceilings; the walls float in space, unsupported. Is it necessary to go to
3-D BSP trees to get a normal-looking world?
No. Three-dimensional BSP trees offer many advantages, in that they allow
arbitrary data sets with viewing in any arbitrary direction. They aren't
really much more complicated than 2-D BSP trees for back-to-front drawing, but
they do tend to be larger and more difficult to debug, and they aren't
necessary for floors and ceilings. One way to get floors and ceilings out of a
2-D BSP tree is to change the nature of the BSP tree so that polygons are no
longer stored in the splitting nodes. Instead, each leaf of the tree--that is,
each subspace carved out by the tree--would store the polygons for the walls,
floors, and ceilings that lie on the boundaries of that space, facing into it.
The subspace would be convex, because all BSP subspaces are automatically
convex, so the polygons in that subspace can be drawn in any order. Thus, the
subspaces in the BSP tree would each be drawn, in turn, as convex sets, back
to front, just as Listing One draws polygons back to front.
This sort of BSP tree, organized around volumes rather than polygons, has some
additional interesting advantages in simulating physics, detecting collisions,
doing line-of-sight determination, and performing volume-based operations such
as dynamic illumination and event triggering. However, that discussion will
have to wait until another day.


WWW BSP Site


Some months back, I mentioned a WWW site on BSP trees that was under
construction at Cornell. The site is now up and running:
http://www.graphics.cornell.edu/bspfaq/, or send e-mail to
bsp-faq@graphics.cornell.edu with a subject line of "SEND BSP TREE TEXT". It's
worth a look if you're interested in BSP trees.
Figure 1: A BSPtree.
Figure 2: Clipping to the view pyramid (view frustum). Solid lines are visible
(unclipped) parts of walls.
Figure 3: Y-clipping is more complex than x- or z-clipping because walls can
be clipped at an angle in 3-D.
Figure 4: Fast backface-culling test in screenspace.

Listing One 
/* Core renderer for Win32 program to demonstrate drawing from a 2D
 BSP tree; illustrate the use of BSP trees for surface visibility.
 UpdateWorld() is the top-level function in this module.
 Full source code for the BSP-based renderer, and for the
 accompanying BSP compiler, may be downloaded from
 ftp.idsoftware.com/mikeab, in the file ddjbsp2.zip.
 Tested with VC++ 2.0 running on Windows NT 3.5. */
#define FIXEDPOINT(x) ((FIXEDPOINT)(((long)x)*((long)0x10000)))
#define FIXTOINT(x) ((int)(x >> 16))
#define ANGLE(x) ((long)x)
#define STANDARD_SPEED (FIXEDPOINT(20))
#define STANDARD_ROTATION (ANGLE(4))
#define MAX_NUM_NODES 2000
#define MAX_NUM_EXTRA_VERTICES 2000

#define WORLD_MIN_X (FIXEDPOINT(-16000))
#define WORLD_MAX_X (FIXEDPOINT(16000))
#define WORLD_MIN_Y (FIXEDPOINT(-16000))
#define WORLD_MAX_Y (FIXEDPOINT(16000))
#define WORLD_MIN_Z (FIXEDPOINT(-16000))
#define WORLD_MAX_Z (FIXEDPOINT(16000))
#define PROJECTION_RATIO (2.0/1.0) // controls field of view; the
 // bigger this is, the narrower the field of view
typedef long FIXEDPOINT;
typedef struct _VERTEX {
 FIXEDPOINT x, z, viewx, viewz;
} VERTEX, *PVERTEX;
typedef struct _POINT2 { FIXEDPOINT x, z; } POINT2, *PPOINT2;
typedef struct _POINT2INT { int x; int y; } POINT2INT, *PPOINT2INT;
typedef long ANGLE; // angles are stored in degrees
typedef struct _NODE {
 VERTEX *pstartvertex, *pendvertex;
 FIXEDPOINT walltop, wallbottom, tstart, tend;
 FIXEDPOINT clippedtstart, clippedtend;
 struct _NODE *fronttree, *backtree;
 int color, isVisible;
 FIXEDPOINT screenxstart, screenxend;
 FIXEDPOINT screenytopstart, screenybottomstart;
 FIXEDPOINT screenytopend, screenybottomend;
} NODE, *PNODE;
char * pDIB; // pointer to DIB section we'll draw into
HBITMAP hDIBSection; // handle of DIB section
HPALETTE hpalDIB;
int iteration = 0, WorldIsRunning = 1;
HWND hwndOutput;
int DIBWidth, DIBHeight, DIBPitch, numvertices, numnodes;
FIXEDPOINT fxHalfDIBWidth, fxHalfDIBHeight;
VERTEX *pvertexlist, *pextravertexlist;
NODE *pnodelist;
POINT2 currentlocation, currentdirection, currentorientation;
ANGLE currentangle;
FIXEDPOINT currentspeed, fxViewerY, currentYSpeed;
FIXEDPOINT FrontClipPlane = FIXEDPOINT(10);
FIXEDPOINT FixedMul(FIXEDPOINT x, FIXEDPOINT y);
FIXEDPOINT FixedDiv(FIXEDPOINT x, FIXEDPOINT y);
FIXEDPOINT FixedSin(ANGLE angle), FixedCos(ANGLE angle);
extern int FillConvexPolygon(POINT2INT * VertexPtr, int Color);
// Returns nonzero if a wall is facing the viewer, 0 else.
int WallFacingViewer(NODE * pwall)
{
 FIXEDPOINT viewxstart = pwall->pstartvertex->viewx;
 FIXEDPOINT viewzstart = pwall->pstartvertex->viewz;
 FIXEDPOINT viewxend = pwall->pendvertex->viewx;
 FIXEDPOINT viewzend = pwall->pendvertex->viewz;
 int Temp;
/* // equivalent C code
 if (( ((pwall->pstartvertex->viewx >> 16) *
 ((pwall->pendvertex->viewz -
 pwall->pstartvertex->viewz) >> 16)) +
 ((pwall->pstartvertex->viewz >> 16) *
 ((pwall->pstartvertex->viewx -
 pwall->pendvertex->viewx) >> 16)) )
 < 0)
 return(1);

 else
 return(0);
*/
 _asm {
 mov eax,viewzend
 sub eax,viewzstart
 imul viewxstart
 mov ecx,edx
 mov ebx,eax
 mov eax,viewxstart
 sub eax,viewxend
 imul viewzstart
 add eax,ebx
 adc edx,ecx
 mov eax,0
 jns short WFVDone
 inc eax
WFVDone:
 mov Temp,eax
 }
 return(Temp);
}
// Update the viewpoint position as needed.
void UpdateViewPos()
{
 if (currentspeed != 0) {
 currentlocation.x += FixedMul(currentdirection.x,
 currentspeed);
 if (currentlocation.x <= WORLD_MIN_X)
 currentlocation.x = WORLD_MIN_X;
 if (currentlocation.x >= WORLD_MAX_X)
 currentlocation.x = WORLD_MAX_X - 1;
 currentlocation.z += FixedMul(currentdirection.z,
 currentspeed);
 if (currentlocation.z <= WORLD_MIN_Z)
 currentlocation.z = WORLD_MIN_Z;
 if (currentlocation.z >= WORLD_MAX_Z)
 currentlocation.z = WORLD_MAX_Z - 1;
 }
 if (currentYSpeed != 0) {
 fxViewerY += currentYSpeed;
 if (fxViewerY <= WORLD_MIN_Y)
 fxViewerY = WORLD_MIN_Y;
 if (fxViewerY >= WORLD_MAX_Y)
 fxViewerY = WORLD_MAX_Y - 1;
 }
}
// Transform all vertices into viewspace.
void TransformVertices()
{
 VERTEX *pvertex;
 FIXEDPOINT tempx, tempz;
 int vertex;
 pvertex = pvertexlist;
 for (vertex = 0; vertex < numvertices; vertex++) {
 // Translate the vertex according to the viewpoint
 tempx = pvertex->x - currentlocation.x;
 tempz = pvertex->z - currentlocation.z;
 // Rotate the vertex so viewpoint is looking down z axis

 pvertex->viewx = FixedMul(FixedMul(tempx,
 currentorientation.z) +
 FixedMul(tempz, -currentorientation.x),
 FIXEDPOINT(PROJECTION_RATIO));
 pvertex->viewz = FixedMul(tempx, currentorientation.x) +
 FixedMul(tempz, currentorientation.z);
 pvertex++;
 }
}
// 3D clip all walls. If any part of each wall is still visible,
// transform to perspective viewspace.
void ClipWalls()
{
 NODE *pwall;
 int wall;
 FIXEDPOINT tempstartx, tempendx, tempstartz, tempendz;
 FIXEDPOINT tempstartwalltop, tempstartwallbottom;
 FIXEDPOINT tempendwalltop, tempendwallbottom;
 VERTEX *pstartvertex, *pendvertex;
 VERTEX *pextravertex = pextravertexlist;
 pwall = pnodelist;
 for (wall = 0; wall < numnodes; wall++) {
 // Assume the wall won't be visible
 pwall->isVisible = 0;
 // Generate the wall endpoints, accounting for t values and
 // clipping
 // Calculate the viewspace coordinates for this wall
 pstartvertex = pwall->pstartvertex;
 pendvertex = pwall->pendvertex;
 // Look for z clipping first
 // Calculate start and end z coordinates for this wall
 if (pwall->tstart == FIXEDPOINT(0))
 tempstartz = pstartvertex->viewz;
 else
 tempstartz = pstartvertex->viewz +
 FixedMul((pendvertex->viewz-pstartvertex->viewz),
 pwall->tstart);
 if (pwall->tend == FIXEDPOINT(1))
 tempendz = pendvertex->viewz;
 else
 tempendz = pstartvertex->viewz +
 FixedMul((pendvertex->viewz-pstartvertex->viewz),
 pwall->tend);
 // Clip to the front plane
 if (tempendz < FrontClipPlane) {
 if (tempstartz < FrontClipPlane) {
 // Fully front-clipped
 goto NextWall;
 } else {
 pwall->clippedtstart = pwall->tstart;
 // Clip the end point to the front clip plane
 pwall->clippedtend =
 FixedDiv(pstartvertex->viewz - FrontClipPlane,
 pstartvertex->viewz-pendvertex->viewz);
 tempendz = pstartvertex->viewz +
 FixedMul((pendvertex->viewz-pstartvertex->viewz),
 pwall->clippedtend);
 }
 } else {

 pwall->clippedtend = pwall->tend;
 if (tempstartz < FrontClipPlane) {
 // Clip the start point to the front clip plane
 pwall->clippedtstart =
 FixedDiv(FrontClipPlane - pstartvertex->viewz,
 pendvertex->viewz-pstartvertex->viewz);
 tempstartz = pstartvertex->viewz +
 FixedMul((pendvertex->viewz-pstartvertex->viewz),
 pwall->clippedtstart);
 } else {
 pwall->clippedtstart = pwall->tstart;
 }
 }
 // Calculate x coordinates
 if (pwall->clippedtstart == FIXEDPOINT(0))
 tempstartx = pstartvertex->viewx;
 else
 tempstartx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtstart);
 if (pwall->clippedtend == FIXEDPOINT(1))
 tempendx = pendvertex->viewx;
 else
 tempendx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtend);
 // Clip in x as needed
 if ((tempstartx > tempstartz) (tempstartx < -tempstartz)) {
 // The start point is outside the view triangle in x;
 // perform a quick test for trivial rejection by seeing if
 // the end point is outside the view triangle on the same
 // side as the start point
 if (((tempstartx>tempstartz) && (tempendx>tempendz)) 
 ((tempstartx<-tempstartz) && (tempendx<-tempendz)))
 // Fully clipped--trivially reject
 goto NextWall;
 // Clip the start point
 if (tempstartx > tempstartz) {
 // Clip the start point on the right side
 pwall->clippedtstart =
 FixedDiv(pstartvertex->viewx-pstartvertex->viewz,
 pendvertex->viewz-pstartvertex->viewz -
 pendvertex->viewx+pstartvertex->viewx);
 tempstartx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtstart);
 tempstartz = tempstartx;
 } else {
 // Clip the start point on the left side
 pwall->clippedtstart =
 FixedDiv(-pstartvertex->viewx-pstartvertex->viewz,
 pendvertex->viewx+pendvertex->viewz -
 pstartvertex->viewz-pstartvertex->viewx);
 tempstartx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtstart);
 tempstartz = -tempstartx;
 }
 }

 // See if the end point needs clipping
 if ((tempendx > tempendz) (tempendx < -tempendz)) {
 // Clip the end point
 if (tempendx > tempendz) {
 // Clip the end point on the right side
 pwall->clippedtend =
 FixedDiv(pstartvertex->viewx-pstartvertex->viewz,
 pendvertex->viewz-pstartvertex->viewz -
 pendvertex->viewx+pstartvertex->viewx);
 tempendx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtend);
 tempendz = tempendx;
 } else {
 // Clip the end point on the left side
 pwall->clippedtend =
 FixedDiv(-pstartvertex->viewx-pstartvertex->viewz,
 pendvertex->viewx+pendvertex->viewz -
 pstartvertex->viewz-pstartvertex->viewx);
 tempendx = pstartvertex->viewx +
 FixedMul((pendvertex->viewx-pstartvertex->viewx),
 pwall->clippedtend);
 tempendz = -tempendx;
 }
 }
 tempstartwalltop = FixedMul((pwall->walltop - fxViewerY),
 FIXEDPOINT(PROJECTION_RATIO));
 tempendwalltop = tempstartwalltop;
 tempstartwallbottom = FixedMul((pwall->wallbottom-fxViewerY),
 FIXEDPOINT(PROJECTION_RATIO));
 tempendwallbottom = tempstartwallbottom;
 // Partially clip in y (the rest is done later in 2D)
 // Check for trivial accept
 if ((tempstartwalltop > tempstartz) 
 (tempstartwallbottom < -tempstartz) 
 (tempendwalltop > tempendz) 
 (tempendwallbottom < -tempendz)) {
 // Not trivially unclipped; check for fully clipped
 if ((tempstartwallbottom > tempstartz) &&
 (tempstartwalltop < -tempstartz) &&
 (tempendwallbottom > tempendz) &&
 (tempendwalltop < -tempendz)) {
 // Outside view triangle, trivially clipped
 goto NextWall;
 }
 // Partially clipped in Y; we'll do Y clipping at
 // drawing time
 }
 // The wall is visible; mark it as such and project it.
 // +1 on scaling because of bottom/right exclusive polygon
 // filling
 pwall->isVisible = 1;
 pwall->screenxstart =
 (FixedMulDiv(tempstartx, fxHalfDIBWidth+FIXEDPOINT(0.5),
 tempstartz) + fxHalfDIBWidth + FIXEDPOINT(0.5));
 pwall->screenytopstart =
 (FixedMulDiv(tempstartwalltop,
 fxHalfDIBHeight + FIXEDPOINT(0.5), tempstartz) +
 fxHalfDIBHeight + FIXEDPOINT(0.5));

 pwall->screenybottomstart =
 (FixedMulDiv(tempstartwallbottom,
 fxHalfDIBHeight + FIXEDPOINT(0.5), tempstartz) +
 fxHalfDIBHeight + FIXEDPOINT(0.5));
 pwall->screenxend =
 (FixedMulDiv(tempendx, fxHalfDIBWidth+FIXEDPOINT(0.5),
 tempendz) + fxHalfDIBWidth + FIXEDPOINT(0.5));
 pwall->screenytopend =
 (FixedMulDiv(tempendwalltop,
 fxHalfDIBHeight + FIXEDPOINT(0.5), tempendz) +
 fxHalfDIBHeight + FIXEDPOINT(0.5));
 pwall->screenybottomend =
 (FixedMulDiv(tempendwallbottom,
 fxHalfDIBHeight + FIXEDPOINT(0.5), tempendz) +
 fxHalfDIBHeight + FIXEDPOINT(0.5));
NextWall:
 pwall++;
 }
}
// Walk the tree back to front; backface cull whenever possible,
// and draw front-facing walls in back-to-front order.
void DrawWallsBackToFront()
{
 NODE *pFarChildren, *pNearChildren, *pwall;
 NODE *pendingnodes[MAX_NUM_NODES];
 NODE **pendingstackptr;
 POINT2INT apoint[4];
 pwall = pnodelist;
 pendingnodes[0] = (NODE *)NULL;
 pendingstackptr = pendingnodes + 1;
 for (;;) {
 for (;;) {
 // Descend as far as possible toward the back,
 // remembering the nodes we pass through on the way.
 // Figure whether this wall is facing frontward or
 // backward; do in viewspace because non-visible walls
 // aren't projected into screenspace, and we need to
 // traverse all walls in the BSP tree, visible or not,
 // in order to find all the visible walls
 if (WallFacingViewer(pwall)) {
 // We're on the forward side of this wall, do the back
 // children first
 pFarChildren = pwall->backtree;
 } else {
 // We're on the back side of this wall, do the front
 // children first
 pFarChildren = pwall->fronttree;
 }
 if (pFarChildren == NULL)
 break;
 *pendingstackptr = pwall;
 pendingstackptr++;
 pwall = pFarChildren;
 }
 for (;;) {
 // See if the wall is even visible
 if (pwall->isVisible) {
 // See if we can backface cull this wall
 if (pwall->screenxstart < pwall->screenxend) {

 // Draw the wall
 apoint[0].x = FIXTOINT(pwall->screenxstart);
 apoint[1].x = FIXTOINT(pwall->screenxstart);
 apoint[2].x = FIXTOINT(pwall->screenxend);
 apoint[3].x = FIXTOINT(pwall->screenxend);
 apoint[0].y = FIXTOINT(pwall->screenytopstart);
 apoint[1].y = FIXTOINT(pwall->screenybottomstart);
 apoint[2].y = FIXTOINT(pwall->screenybottomend);
 apoint[3].y = FIXTOINT(pwall->screenytopend);
 FillConvexPolygon(apoint, pwall->color);
 }
 }
 // If there's a near tree from this node, draw it;
 // otherwise, work back up to the last-pushed parent
 // node of the branch we just finished; we're done if
 // there are no pending parent nodes.
 // Figure whether this wall is facing frontward or
 // backward; do in viewspace because non-visible walls
 // aren't projected into screenspace, and we need to
 // traverse all walls in the BSP tree, visible or not,
 // in order to find all the visible walls
 if (WallFacingViewer(pwall)) {
 // We're on the forward side of this wall, do the
 // front children now
 pNearChildren = pwall->fronttree;
 } else {
 // We're on the back side of this wall, do the back
 // children now
 pNearChildren = pwall->backtree;
 }
 // Walk the near subtree of this wall
 if (pNearChildren != NULL)
 goto WalkNearTree;
 // Pop the last-pushed wall
 pendingstackptr--;
 pwall = *pendingstackptr;
 if (pwall == NULL)
 goto NodesDone;
 }
WalkNearTree:
 pwall = pNearChildren;
 }
NodesDone:
;
}
// Render the current state of the world to the screen.
void UpdateWorld()
{
 HPALETTE holdpal;
 HDC hdcScreen, hdcDIBSection;
 HBITMAP holdbitmap;
 // Draw the frame
 UpdateViewPos();
 memset(pDIB, 0, DIBPitch*DIBHeight); // clear frame
 TransformVertices();
 ClipWalls();
 DrawWallsBackToFront();
 // We've drawn the frame; copy it to the screen
 hdcScreen = GetDC(hwndOutput);

 holdpal = SelectPalette(hdcScreen, hpalDIB, FALSE);
 RealizePalette(hdcScreen);
 hdcDIBSection = CreateCompatibleDC(hdcScreen);
 holdbitmap = SelectObject(hdcDIBSection, hDIBSection);
 BitBlt(hdcScreen, 0, 0, DIBWidth, DIBHeight, hdcDIBSection,
 0, 0, SRCCOPY);
 SelectPalette(hdcScreen, holdpal, FALSE);
 ReleaseDC(hwndOutput, hdcScreen);
 SelectObject(hdcDIBSection, holdbitmap);
 ReleaseDC(hwndOutput, hdcDIBSection);
 iteration++;
}
End Listing
>>>
<<
















































DTACK REVISITED


Pushy? Me?? 




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded, and can be contacted through the DDJ offices.


Once upon a time, a young boy helped out at the family-owned neighborhood
grocery store. This was back when people would buy coffee beans at the store
and grind them at home. It was a working-class neighborhood, and the goods the
store carried reflected that fact. Most folks bought their coffee beans from
the mid-priced bin. But the grocery also had a higher-priced bin for the
professionals who were customers, and a lower-priced bin for widows, retired
persons, and others who were financially distressed.
In the morning, before the store opened for business, the grocer would fill
all three bins from the same large bag of coffee beans.
When WW II broke out and all the men my father's age went off to war, I was
five years old and began to "help" my grandfather run his grocery store in
extremely rural Alabama. So I could claim that it was my grandfather who
filled those bean bins. It might even be true. But I wasn't all that observant
when I was five; I read this story in the L.A. Times op-ed section a couple of
decades ago.


Summary


For those of you who are short on time, here's a summary of this issue's
column: Intel's 75-, 90-, 100-, and 120-MHz gold-top Pentiums all come out of
the same bag. That's a fact. And there's a rumor of black-top 75-MHz parts
that come out of the same bag as 133-MHz Pentiums.
What this means is that if you happen to own a 75-MHz Pentium system with
conventional fan/heat sink cooling, it will almost certainly run very nicely
at 90 MHz, and it might run as fast as 120 MHz. And if you have a black-top
part (assuming these actually exist), it may run at 133 MHz--or even faster!
Honest.
This column, then, is about clock pushing. I'm going to explain the
manufacturing, testing, and marketing practices used by all microprocessor
makers (not just Intel) that make clock pushing possible and even desirable.
And ethics are involved; there's white-hat and black-hat clock pushing.


A Quick Pentium Review


The original P5 used 0.8m design rules, dissipated a lot of heat, and
initially had a poor yield at 66 MHz, so 60 MHz was what was generally
available. Early last year, Intel's Liexlip, Ireland 0.6m fab plant began
producing the P54C. This fab originally had a poor yield above 90 MHz, so
100-MHz Pentiums were in short supply. In fact, Intel at first only made
enough 100-MHz parts for board and system manufacturers to qualify their
designs. As with any semiconductor production line, the center of the
statistical yield kept moving up, until 100-MHz parts became commonly
available. This year, Intel's Rio Rancho, New Mexico 0.45m fab came on line,
and by June, you could buy 133-MHz P54CS Pentiums in local clone shops, not
just from the major brand-name vendors.
Meanwhile, the 0.6m fabs are producing faster parts. In recent months, Intel
has been shipping 120-MHz parts produced at both its 0.6m and 0.45m fabs. This
brings us up to date as of September 1.
Let's look at a 0.6m fab producing 75- to 120-MHz parts. What's the difference
in the production process? There is no difference. In fact, on a given wafer,
one die may turn out to be a 120-MHz part, and the die next to it, a 75-MHz
part.


Testing and Grading CPUs


Here's how the parts are tested and graded: Individual dies are first given a
functional test using a special test head with needle-like contacts. This is
done at a very low frequency to keep power consumption down, since a bare die
can't dissipate much heat. This initial screening eliminates nonfunctional
dies before they're mounted, since the ceramic package used for Pentiums is
expensive.
After the die is mounted in the ceramic package, speed grading begins.
Qualifying a Pentium takes several minutes--the exact time is proprietary
information. (Andy Grove recently pointed out that the depreciation on one
Pentium test station runs $1.40 per minute!)


Why Do CPU Speeds Vary?


There are three reasons why microprocessors that come off the same production
line don't all run at the same speed: 
Random defects in the silicon wafer may render the die inoperative. If a
defect is very small, it may cripple a transistor so that it still functions,
but more slowly.
If a mote of dust blocks a mask exposure (casts a shadow), the effect is the
same as that of a defect. This is why fabs use such expensive clean rooms;
dust is deadly.
There are about 20 wafer-processing steps in making a Pentium-class CPU. The
alignment of successive stages isn't perfect, and the variation from
perfection fluctuates. The fastest CPUs are the ones whose successive stages
were best aligned and which have little or no partial defects.
This stuff all acts statistically. There's no way to predict in advance which
die will run at all, much less at which speed. Expensive test equipment must
be used to determine which CPU will run at which speed. As time passes,
there's a "learning curve," and the fab's production equipment is
improved/upgraded from time to time. This means that the center of the "yield"
moves steadily upward. Originally, the 0.6m P54C fabs could not economically
produce 100-MHz parts. Now the same fabs ship 120-MHz parts.


Marketing: The Blue Suede Shoe Dept.


Next we turn to marketing. Because 60- or 66-MHz Pentiums are no longer
mainstream, Intel has "positioned" (a marketing term) the P54C/75 against
AMD's DX4 parts. That means the 75 must be sold at a competitive price, in
this case, just under $200. And Intel wants to get a good price for its faster
CPUs, so it sells the P54CS/133 for about $800 (leaving room under $1000 for
the upcoming 150-MHz Pentium).
So the production process produces a variety of speed mixes, and Intel needs
to be able to sell Pentiums at different price points. If you aren't familiar
with the semiconductor industry, you might logically assume a one-to-one
correspondence between production and marketing. This is not the case.

Try a thought experiment: The yield at each Pentium fab turns out to be 100
percent at the maximum frequency. Does Intel send letters to customers
rejecting their purchase orders for 75-, 90-, and 100-MHz parts? Of course
not! Intel would--and does--have no choice but to mark the faster parts as
lower-speed grades and sell them at the lower prices.


Testing the Speed Mix


Here's how the process works: Intel assigns each fab its weekly (monthly?)
production quotas, based on what the customers want to buy. (If the customers
want to buy the "wrong" speed mix, the price list is adjusted for the
different speed grades.) So we start at the test stations with large stacks of
freshly ceramic-packaged Pentiums. These raw parts are tested at the top
frequency (120 MHz at the 0.6m fabs) until that production quota is met. This
leaves a significant number of functional parts that fail to pass the 120-MHz
test, along with a lot of untested parts.
Next, testing is performed at 100 MHz. A lot of the parts that failed to make
the 120-MHz grade pass this time. When the 100-MHz quota is met, some
functional parts have still failed the speed test, but not nearly as many as
before. The yield at 100 MHz is probably about 90 percent, assuming the chip
is functional at all.
After 90-MHz testing, there are practically no functional rejects, since the
yield at 90 MHz is probably about 99 percent.
Almost all of the Pentiums that are finally tested at 75 MHz are raw parts not
previously tested at a higher speed. If they pass at 75 MHz, as they certainly
will if they are at all functional, they are stamped "75 MHz" and shipped to
the customers.
But we know that statistically, those 75-MHz parts have a 99 percent
probability of being perfectly good 90-MHz parts, a 90 percent probability of
being good 100-MHz parts, and a much smaller, but real, chance of being
120-MHz parts.


This Stuff is a Secret


Intel doesn't want you to know this. Because if you're like me, you'll change
the jumpers on your motherboard to a higher clock frequency to see if the
system still runs. Since the motherboard and its chip set will have been
designed to accommodate the full range of Pentium clock frequencies (if you
purchased your motherboard separately, which is why you should do that), there
is a really good chance that your 75-MHz Pentium system will run at 100 MHz or
so. Without costing you any more money.
All CPU makers want to discourage this. So they'll tell you that overclocking
the CPU will irrevocably damage it. Oh, yeah? Then why do parts that fail the
initial 120-MHz test later get retested at 100 MHz, and even re-retested at 90
MHz if necessary? Huh? Huh?


What Color Hat?


Now for the ethics of clock pushing. I insist that if I pay for a 75-MHz P5
system, it's my property. If I want to experiment with jumper settings on my
Pentium motherboard, I'm wearing a white hat. There may be some black-hat folk
in Taiwan. Lots of Pentiums, of all speed grades, are sold to Taiwanese PC
makers. It is alleged that some of those folk are doing their own speed-grade
testing of "75-MHz" Pentiums, grinding off Intel's markings, and relabeling
the chips as faster parts. Never mind that the parts are probably fully
functional as relabeled; these vendors would be misrepresenting the chips as
being Intel tested and Intel guaranteed.
About those black-top P5/75s: If there's anything to this rumor, Intel may be
selling Pentiums from its 0.45m fab to meet its sales requirements for 75-MHz
parts. You see, it's cheaper to make a part on a 0.45m line than on a 0.6m
line because the die is smaller. Thus, there are more dies--and fewer silicon
defects per die--on that wafer, and the center of the yield is at a higher
frequency. In fact, I'm personally looking for a black-top 75-MHz part myself,
to replace the CPU I'm using right now.


Countermeasures


CPU vendors will sometimes try to make sure their cheaper CPUs aren't pushed
to higher frequencies. One way is to use a plastic package, as IBM/Motorola
did on the PowerPC 601. Power dissipation goes up with frequency, so a
plastic-packaged CPU can't be "pushed" very far. Another is to glue a small
heat sink to the top of a ceramic package so that one of those $7 fan/heat
sinks can't be mounted. Like the plastic package, this limits the heat the
package can dissipate. This is not a good way to go except in a portable
system; if a CPU in the crippled package is reliable at (say) 75 MHz in a room
that's not air-conditioned, then it will be acceptably reliable in an
air-conditioned room at a considerably higher frequency. 


Practical Advice


For two years, my #1 system used a CPU that Intel sold as a DX2/66 part. I
pushed it to 80 MHz and it ran all that time with never a hiccup. I recently
switched to an AMD DX4/100 motherboard which, naturally, is running at 120
MHz. You have to be willing to experiment on your (not AMD or Intel's)
property.
When experimenting, don't forget to use the advanced CMOS setup to adjust the
L2 cache and DRAM wait states, and so on, to their slowest timings. If the
"pushed" CPU works with these settings, advance them one at a time, over
several days, to determine which timings work at the new CPU clock speed.
Don't worry about this; an upgraded CPU clock is far more important than a
slightly degraded timing parameter.
I've participated in pushing four AMD DX4/100s to 120 MHz. Two would not run
reliably, one needs to boot with a slow clock but otherwise seems fine, and
one--the one in my #1 system--runs fine, period. This jibes with the fact that
AMD is shipping a few DX4/120s, but they seem to be scarce, indicating a low
yield (for now).


Tokenism


Back to my grandfather's grocery store: The smallest unit of U.S. currency
then was the "token," worth one-tenth of a cent. This aluminum coin was
switched to plastic as the war progressed. At the war's start, you could get a
cold bottle of cola from a vending machine for a nickel, equivalent to 50
tokens. Right now you can't (in my neighborhood) get a cold cola for the
equivalent of 50 pennies, which means the penny is now worth less than a token
was in the early 1940s. The token disappeared after the war when the price of
a cold cola went up to a dime. I expect the penny to disappear soon.


















SOFTWARE AND THE LAW


Trademark Wars in Cyberspace




Marc E. Brown


Marc is a patent attorney and shareholder of the intellectual-property law
firm of Poms, Smith, Lande, & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at 73414.1226@compuserve.com.


With domain names up for grabs, some interesting choices have been made. A
writer for Wired magazine fancied "ronald@mcdonalds.com." A businessman
decided to better his business with "bbb.com." A former MTV veejay latched on
to "mtv.com." A Sprint employee sped away with "mci.net." 
The companies whose names were associated with these enterprising escapades
weren't amused. Each turned to what is now more common in America than apple
pie--a lawsuit. The fun-filled cybermice weren't the only ones snared in this
nasty net. Network Solutions, the organization blessed with dishing out
Internet domain names, was sometimes named, too, along with site providers to
whom all bytes seem the same.
Almost every month you read about one of these trademark-related cybertangles.
But can big companies with legions of lawyers really put a stop to it? So far,
no court has decided.


What is a Trademark?


Before there can be trademark infringement, there must be trademark use.
Consider the software whose packaging states "for use with all IBM-compatible
computers." Intuitively, this does not seem like an act of trademark
infringement. But why not? The software vendor was using IBM's name without
its permission, in a manner which the public would immediately recognize as a
reference to Big Blue.
The answer lies in an understanding of what a trademark is and what it is not.
A word, by itself, is not a trademark. A word (or phrase) is a trademark only
when it functions to distinguish the source or sponsorship of one company's
goods or services from another's. (Technically, "trademark" is used in
reference to goods, while "service mark" is used in reference to services. For
simplicity, I am using "trademark" as a reference to both.)
This definition helps explain the intuitive answer to the IBM example. The
name IBM was not used in a manner which suggested that the software was
manufactured or sponsored by IBM. Thus, IBM was not being used as a trademark
for this software. Its use in this example, therefore, could not be a
trademark infringement.
Now take a look at Internet domain names. Do they really indicate sponsorship
or source of origin? How does the net surfer learn about a particular name?
Does he just think to himself that he would like to see whether Michael
Jackson's newest video will be on MTV and then takes a shot at mtv.com? It's
possible, but not likely. Far more likely is that the domain name came from a
promotional piece or a company-name search. In these cases, the user knows who
is associated with the domain name and who is not. Whatever uncertainty may
have existed is also usually dispelled upon connection to the site. It is
usually then readily apparent that the site is not sponsored by the
complaining company.
Here lies one of the fundamental problems with the allegation that an
unauthorized domain name constitutes trademark infringement. Net surfers
simply are not likely to associate the domain name with the complaining
company.
One issue you may soon be hearing more about is whether the "look and feel" of
software can be protected as a trademark. There has not been a great deal of
litigation on this issue. But that may change in view of the recent decision
in Lotus Development Corp. v. Borland International, where the court held that
the entire menu tree to the famous 1-2-3 spreadsheet program was not
protectable under copyright law. As the scope of protection under copyright
law shrinks, more emphasis may be placed upon trademark theories.
Remember, also, that a design can function as a trademark. The checkered, wavy
window used on Microsoft Windows is one such example. In some circumstances,
color can also function as a trademark.


All Trademarks are not Created Equal


The relationship between a trademark and its goods or services can have a
substantial impact upon the degree of protection provided. 
Some marks, such as "Paradox" for Borland's database manager, have no
relationship to the product. These "arbitrary" marks are given the greatest
degree of protection.
A "suggestive" relationship to the software, however, is often important to
achieving market penetration. Examples include "Access" for Microsoft's
database manager and "WordPerfect" for Novell's word processor. These
trademarks suggest a feature of the software. A suggestive trademark is still
entitled to legal protection, but not as much as an arbitrary trademark.
When the trademark simply describes the software, the trademark may be found
"merely descriptive." Examples include "Fast" for a communication program or
"High Resolution" for a CAD program. Merely descriptive trademarks will not be
protected until the purchasing public perceives them as indicating source, not
simply a description of the product. This usually requires long and widespread
use. This ensures that other companies can continue to describe features of a
competing product.
One type of relationship that will never be protected is when the trademark is
the generic name for the software. "Word Processor" for a word-processing
program is a good example. No matter how long, prominent, or widespread the
use, the law will never protect "Word Processor" as a trademark for this
software.


Search to Determine Availability


After selecting a trademark, it is prudent to have a search conducted to
determine its availability.
There are different levels of searches. For about $300.00, a search can be
performed in databases containing all federal and state trademark
registrations and many major telephone books and trade directories. One of the
more experienced trademark-search companies is Thomson & Thomson (800-692-8833
or htpp://www.thomson.com/thomthom.html).
When the initial investment in promoting the trademark will be substantial,
far more extensive searches can and should be made. 
Why should you search in telephone books and trade directories? If the
trademark has not been registered, is there anything you have to worry about?
Yes. Unlike an invention, a trademark will normally be protected after it is
first used, even if it is not registered. Therefore searching only trademark
registrations is not sufficient.


When is it Yours?


To obtain rights in a trademark, use of the trademark is therefore essential.
For software, the trademark should be affixed to the package containing the
software and to the label on the disk. It should also be displayed during use
of the software. For a software-related service, the service mark should
appear in promotional material for the service, as well as in all other
material associated with the service.
The manner in which the trademark is used is also very important. A trademark
should usually be used as an adjective, not a noun. For example, the carton
for Microsoft Access reads "Access...Database Management System." The
trademark should also be displayed more prominently than other words (or
symbols) on the packaging. 
To help teach the public that you regard the word or phrase as a trademark,
the symbol TM should be placed near the trademark; for software services, an
SM should be used, instead. If the trademark has been registered with the
United States Patent and Trademark Office, the symbol (R) should be used.

If registration is not a requirement for trademark protection, should you
bother to register? Yes. Registration provides many additional benefits, one
of the most important of which is the right to expand the use of your
trademark into new geographic areas. While you were busy developing a market
in northern California, another company might start using the same trademark
on the same type of software in New York. Once you became successful and began
expanding into a national market, that New York company would have the legal
right to stop you from using your trademark in New York! The result would be
exactly the opposite if you first procured a federal registration.
Another important benefit of federal registration is that it solidifies rights
in a trademark on the day the application for registration is filed, even if
the trademark is not used until many months later. This is often of vital
importance. The first to obtain rights in a trademark is the one entitled to
enforce it against another. Without federal registration, rights can only be
procured through actual use in the marketplace. Usually, months pass between
the time a trademark is selected and the time it is actually used in the
marketplace. If an application to federally register is promptly filed, the
filing date of the application can be relied upon as the first-use date,
instead. This several-month difference can be a determining factor in a legal
dispute. 


Likelihood of Confusion: The Test of Infringement


The second major problem in asserting a trademark-infringement claim against a
domain name is that a trademark does not provide the right to prevent every
other company from using the trademark. Instead, it merely provides the right
to stop another company from using the trademark in a manner likely to cause
confusion as to the origin or sponsorship of goods or services. Likelihood of
confusion as to source or sponsorship is the test, not mere similarity in the
trademarks.
The similarity between the trademarks, of course, is still important. The
accused trademark is examined for similarities in appearance, pronunciation,
and meaning (the so-called sight, sound, and meaning test).
But equally important are similarities in the goods or services. Mead Data
Central was reminded of this fact a few years ago when it was unable to obtain
an injunction against use of the name "Lexus" for automobiles, even though
Mead Data had used "Lexis" for computer-assisted research services for over a
decade. There was simply no likelihood that consumers would believe that the
automobile was manufactured or sponsored by the information-service provider;
that is, no "likelihood of confusion" would arise as to source or sponsorship.
The use of one company's name as part of another's domain name may similarly
not result in any "likelihood of confusion." The pathways through which a
domain name come to one's attention usually are not likely to suggest any
association between the domain name and the company whose name appears in it,
unless, of course, it is the domain name of that company's site. If the goods
or services that are ultimately associated with that domain name after
connection to the site are markedly different from those of the company whose
name is a part of that domain name, any uncertainty which might have existed
will also probably be extinguished quickly. 
Thus, simply incorporating another company's name as part of a domain name
does not necessarily create a likelihood of confusion with that other company.
If the goods or services associated with that domain name are similar to the
company's, however, such a likelihood may well exist. 
Likelihood of confusion is also determined by similarity between marketing
channels, the strength of the asserted mark (including such considerations as
distinctiveness and fame), and whether the alleged infringer adopted his mark
with the intention of causing confusion. Although not required, proof of
actual confusion is considered highly probative of a likelihood of confusion.


Alternate Pathways


Other legal theories may be used to support an attack against an unauthorized
domain name. These theories are also often used in more-traditional business
settings.
"Dilution" is a theory under which a court will enjoin an unauthorized use of
a company's trademark. The unauthorized use of a trademark can often dilute
its distinctiveness, even though the consumer is not likely to mistake the
unauthorized use for the authorized one. Dilution is only recognized in 26
states, including California and New York. In these states, it may become the
weapon of choice in the cyberspace trademark wars. 
Even in these states, however, most courts have been unwilling to protect a
mark against dilution unless it is very famous. In the past, these courts also
required the unauthorized use to have been disparaging. Recent decisions,
however, are moving away from this last requirement. Other legal theories that
might be asserted include unfair competition and intentional interference with
business relations. Many unauthorized domain names, however, are not being
used in competition with the trademark owner. There are also broad privileges
arising out of the rights to freedom of speech and to do business. These might
also serve as defenses to such claims. However, it is unlikely that a court
would find any privilege when another company's name is incorporated into a
domain name simply to coerce the payment of money for its relinquishment or to
be mischievous. 


Other Pointers


Here are some other useful pointers:
You cannot assign or license a trademark by itself. You must transfer the
goodwill associated with a trademark when you assign it. For licenses, quality
controls should be specified in the license and maintained after it is signed.
Do not accuse another company of infringing your trademark until you have
first assured yourself that this other company did not use or apply to
federally register its mark before you. Otherwise, you might be digging your
own grave.
Rights to a trademark can be lost in a variety of ways. Don't use it as a noun
(for example, "Xerox" for a photocopy machine). Don't stop using it for a long
time. And don't selectively enforce it--all infringers must be pursued. 
Trademark-infringement claims are usually covered under the standard
Commercial General Liability (CGL) insurance policy held by most companies.
Notify the insurance company as soon as the infringement charge is made, and
don't be discouraged by an initial refusal to provide coverage. Most insurance
companies provide it if diligently pursued.


So What's the Answer?


So you still want to know whether McDonald's, the Better Business Bureau, MTV,
and MCI will be victorious? Here's my best guess (or what we lawyers call "the
answer"): If the domain name was selected in hopes of coercing a payment for
its relinquishment, to see the owner's name in print, or because the owner
actually wanted to confuse people into believing that his goods or services
were associated with the company whose name he was so fond of, he had better
start packing his bytes and looking for a new pad. He may have some fanciful
arguments, but I doubt the judge will be listening. Network Solutions has also
recently issued a new set of regulations to pull the rug out from under him
before he ever gets to the courthouse.
But what if the owner truly adopted the domain name in good faith and without
the intent to cause any of this havoc? He might be okay if he did not
associate the domain name with any goods or services similar to those of the
other company.
Now for your final-exam question: Could another lawyer use the name of this
column, "Software and the Law," as the name of his column in another magazine?
And for extra credit (and please don't send your answers to DDJ): If "Software
and the Law" is an enforceable service mark, who owns this enforceable
right--its creator and author (me) or its publisher?














